Pandas: Select rows that match a string
Micro tutorial:
Select rows of a Pandas DataFrame that match a (partial) string.
import pandas as pd
#create sample data
data = {'model': ['Lisa', 'Lisa 2', 'Macintosh 128K', 'Macintosh 512K'],
'launched': [1983,1984,1984,1984],
'discontinued': [1986, 1985, 1984, 1986]}
df = pd.DataFrame(data, columns = ['model', 'launched', 'discontinued'])
df
model | launched | discontinued | |
---|---|---|---|
0 | Lisa | 1983 | 1986 |
1 | Lisa 2 | 1984 | 1985 |
2 | Macintosh 128K | 1984 | 1984 |
3 | Macintosh 512K | 1984 | 1986 |
We want to select all rows where the column ‘model’ starts with the string ‘Mac’.
df[df['model'].str.match('Mac')]
model | launched | discontinued | |
---|---|---|---|
2 | Macintosh 128K | 1984 | 1984 |
3 | Macintosh 512K | 1984 | 1986 |
We can also search less strict for all rows where the column ‘model’ contains the string ‘ac’ (note the difference: contains
vs. match
).
df[df['model'].str.contains('ac')]
model | launched | discontinued | |
---|---|---|---|
2 | Macintosh 128K | 1984 | 1984 |
3 | Macintosh 512K | 1984 | 1986 |
More info about working with text data: https://pandas.pydata.org/pandas-docs/stable/text.html
Like to comment? Feel free to send me an email or reach out on Twitter.