Pandas: Select rows that match a string

Micro tutorial: Select rows of a Pandas DataFrame that match a (partial) string. import pandas as pd #create sample data data = {'model': ['Lisa', 'Lisa 2', 'Macintosh 128K', 'Macintosh 512K'], 'launched': [1983,1984,1984,1984], 'discontinued': [1986, 1985, 1984, 1986]} df = pd.DataFrame(data, columns = ['model', 'launched', 'discontinued']) df model launched discontinued 0 Lisa 1983 1986 1 Lisa 2 1984 1985 2 Macintosh 128K 1984 1984 3 Macintosh 512K 1984 1986 We want to select all rows where the column ‘model’ starts with the string ‘Mac’. df[df['model'].str.match('Mac')] model launched discontinued 2 Macintosh 128K 1984 1984 3 Macintosh 512K 1984 1986 We can also search less strict for all rows where the column ‘model’ contains the string ‘ac’ (note the difference: contains vs. match). ...

June 26, 2017

LaTeX mathematics cheat sheet

LaTeX is the de facto standard typesetting system for scientific writing. A lot of the nice looking equations you see in books and all around the web are written using LaTeX commands. Knowing a few of the mathematics commands is not only helpful if you want to write a book or an article (or do some extreme stuff), but can come in handy in a lot of places, as many systems support LaTeX. You can use LaTeX in MathJax to display expressions on the web (like here), you can make yourself good looking mathematics flashcards in Anki, you can even nerd out and send formulas built with LaTeX commands to your friends via an iMessage app. Also, Apple’s latest Pages release now supports LaTeX equations. ...

June 12, 2017

Comparing two Excel columns with Pandas and Numpy

Having been asked multiple times if I can quickly compare two numeric columns from an excel file, I set up a small Jupyter notebook (and an R script) to show the intersection, the union and set differences of two columns. You can find the notebook on GitHub or read the code below. I hope it is useful. from pandas import read_excel import numpy as np df = read_excel('excel_data.xlsx', names=['A','B'], header=None) df A B 1 10 2 20 3 30 4 4 5 40 6 1 7 2 #intersection: items in both list A and list B np.intersect1d(df['A'], df['B']) # => array([1, 2, 4]) #union of two lists np.union1d(df['A'], df['B']) # => array([ 1, 2, 3, 4, 5, 6, 7, 10, 20, 30, 40]) #only in list A np.setdiff1d(df['A'], df['B']) # => array([3, 5, 6, 7]) #only in list B np.setdiff1d(df['B'], df['A']) # => array([10, 20, 30, 40]) If you don’t have Numpy handy, take a look at Python sets. ...

May 10, 2017

Display inline images in a Jupyter notebook with Matplotlib

Today I was working with the MNIST handwritten digits data and wanted to display a few images in a Jupyter notebook. After looking at PIL, then Pillow, I found the easiest way is to just use Matplotlib. Here’s a code snippet that let’s you do it. from matplotlib.pyplot import imshow %matplotlib inline w, h = 20, 20 image = X[0].reshape(w,h).T #assuming X[0] is of shape (400,) imshow(image, cmap='gray') Note: You may still need Pillow when working with Matplotlib. See explanation and requirements here: Image tutorial.

May 7, 2017

Debugging Jupyter notebooks

While searching for ways to debug code in a Jupyter notebook, I found a lot of outdated posts. So I decided to quickly write up my findings. (Just show me the answer…) Let’s say we have this piece of code and we want to set a breakpoint between the original answer to the ultimate question of life, the universe and everything stored in answer and the addition to that answer: def add_to_life_universe_everything(x): answer = 42 # we want a breakpoint here answer += x return answer add_to_life_universe_everything(12) pdb The built-in Python debugger pdb works just fine in a Jupyter notebook. With import pdb; pdb.set_trace() we can enter the debugger and get a little interactive prompt. ...

April 22, 2017