Comparing two Excel columns with Pandas and Numpy

3 minute read

Having been asked multiple times if I can quickly compare two numeric columns from an excel file, I set up a small Jupyter notebook (and an R script) to show the intersection, the union and set differences of two columns.

You can find the notebook on GitHub or read the code below. I hope it is useful.


from pandas import read_excel
import numpy as np

df = read_excel('excel_data.xlsx', names=['A','B'], header=None)
1 10
2 20
3 30
4 4
5 40
6 1
7 2
#intersection: items in both list A and list B
np.intersect1d(df['A'], df['B'])

# => array([1, 2, 4])

#union of two lists
np.union1d(df['A'], df['B'])

# => array([ 1,  2,  3,  4,  5,  6,  7, 10, 20, 30, 40])

#only in list A
np.setdiff1d(df['A'], df['B'])

# => array([3, 5, 6, 7])

#only in list B
np.setdiff1d(df['B'], df['A'])

# => array([10, 20, 30, 40])

If you don’t have Numpy handy, take a look at Python sets.

I have also uploaded an equivalent version for R.

Like to comment? Feel free to send me an email or reach out on Twitter.

Did this or another article help you? If you like and can afford it, you can buy me a coffee (3 EUR) ☕️ to support me in writing more posts. In case you would like to contribute more or I helped you directly via email or coding/troubleshooting session, you can opt to give a higher amount through the following links or adjust the quantity: 50 EUR, 100 EUR, 500 EUR. All links redirect to Stripe.