Comparing two Excel columns with Pandas and Numpy

Having been asked multiple times if I can quickly compare two numeric columns from an excel file, I set up a small Jupyter notebook (and an R script) to show the intersection, the union and set differences of two columns. You can find the notebook on GitHub or read the code below. I hope it is useful. from pandas import read_excel import numpy as np df = read_excel('excel_data.xlsx', names=['A','B'], header=None) df A B 1 10 2 20 3 30 4 4 5 40 6 1 7 2 #intersection: items in both list A and list B np.intersect1d(df['A'], df['B']) # => array([1, 2, 4]) #union of two lists np.union1d(df['A'], df['B']) # => array([ 1, 2, 3, 4, 5, 6, 7, 10, 20, 30, 40]) #only in list A np.setdiff1d(df['A'], df['B']) # => array([3, 5, 6, 7]) #only in list B np.setdiff1d(df['B'], df['A']) # => array([10, 20, 30, 40]) If you don’t have Numpy handy, take a look at Python sets. ...

May 10, 2017

Display inline images in a Jupyter notebook with Matplotlib

Today I was working with the MNIST handwritten digits data and wanted to display a few images in a Jupyter notebook. After looking at PIL, then Pillow, I found the easiest way is to just use Matplotlib. Here’s a code snippet that let’s you do it. from matplotlib.pyplot import imshow %matplotlib inline w, h = 20, 20 image = X[0].reshape(w,h).T #assuming X[0] is of shape (400,) imshow(image, cmap='gray') Note: You may still need Pillow when working with Matplotlib. See explanation and requirements here: Image tutorial.

May 7, 2017

Debugging Jupyter notebooks

While searching for ways to debug code in a Jupyter notebook, I found a lot of outdated posts. So I decided to quickly write up my findings. (Just show me the answer…) Let’s say we have this piece of code and we want to set a breakpoint between the original answer to the ultimate question of life, the universe and everything stored in answer and the addition to that answer: def add_to_life_universe_everything(x): answer = 42 # we want a breakpoint here answer += x return answer add_to_life_universe_everything(12) pdb The built-in Python debugger pdb works just fine in a Jupyter notebook. With import pdb; pdb.set_trace() we can enter the debugger and get a little interactive prompt. ...

April 22, 2017

Resolving import issues when deploying Python code to AWS Lambda

AWS Lambda is Amazon’s “serverless” compute platform that basically lets you run code without thinking (too much) of servers. I used Lambda in the past, though only in the Node.js environment. Wanting to deploy my first Python function, I ran into a couple of problems. Deployment scenarios There are two deployment scenarios: Simple scenario Advanced scenario The simple scenario applies to you when your function code only requires the AWS SDK library (Boto 3) and no other external resources. Just use Lambda’s inline code editor and you are good to go. No need to read the rest of this article :-) ...

January 27, 2017

Indexing: A few handy ways to access NumPy arrays

The following code snippets should serve as an (incomplete) cheat sheet for accessing NumPy arrays. All examples expect an import numpy as np. Basic access NumPy arrays can be accessed just like lists with array[start:stop:step] a = np.array([1,2,3,4], int) # => array([1, 2, 3, 4]) a[2] # => 3 a[:2] # => array([1, 2]) a[::2] # => array([1, 3]) When working with multidimensional arrays, a comma can be used to access values for the different axes: ...

January 16, 2017