Comparing two Excel columns with Pandas and Numpy

3 minute read

Having been asked multiple times if I can quickly compare two numeric columns from an excel file, I set up a small Jupyter notebook (and an R script) to show the intersection, the union and set differences of two columns.

You can find the notebook on GitHub or read the code below. I hope it is useful.

workflow

from pandas import read_excel
import numpy as np

df = read_excel('excel_data.xlsx', names=['A','B'], header=None)
df
A B
1 10
2 20
3 30
4 4
5 40
6 1
7 2
#intersection: items in both list A and list B
np.intersect1d(df['A'], df['B'])

# => array([1, 2, 4])

#union of two lists
np.union1d(df['A'], df['B'])

# => array([ 1,  2,  3,  4,  5,  6,  7, 10, 20, 30, 40])

#only in list A
np.setdiff1d(df['A'], df['B'])

# => array([3, 5, 6, 7])

#only in list B
np.setdiff1d(df['B'], df['A'])

# => array([10, 20, 30, 40])

If you don’t have Numpy handy, take a look at Python sets.

I have also uploaded an equivalent version for R.

Leave a Comment