Comparing two Excel columns with Pandas and Numpy
Having been asked multiple times if I can quickly compare two numeric columns from an excel file, I set up a small Jupyter notebook (and an R script) to show the intersection, the union and set differences of two columns. You can find the notebook on GitHub or read the code below. I hope it is useful. from pandas import read_excel import numpy as np df = read_excel('excel_data.xlsx', names=['A','B'], header=None) df A B 1 10 2 20 3 30 4 4 5 40 6 1 7 2 #intersection: items in both list A and list B np.intersect1d(df['A'], df['B']) # => array([1, 2, 4]) #union of two lists np.union1d(df['A'], df['B']) # => array([ 1, 2, 3, 4, 5, 6, 7, 10, 20, 30, 40]) #only in list A np.setdiff1d(df['A'], df['B']) # => array([3, 5, 6, 7]) #only in list B np.setdiff1d(df['B'], df['A']) # => array([10, 20, 30, 40]) If you don’t have Numpy handy, take a look at Python sets. ...