Are You In?

Sub-setting DataFrame rows.

Eric Gitonga

Oct 04, 2021

A question came up the other day. How do you filter rows in a dataframe based on specific values in a column?

My original inclination was to perform a series of tests on the values needed, then use that as the mask.

First, a recap of row selection from a Pandas DataFrame...

To select all rows, just call the dataframe...

df

To return a single column, specify that in brackets.

df["day"]

That returns a Series.

To see whether certain values are in that Series, you can test for it using a specific value like shown below:

df["day"] == "Friday"

This returns a Series of True/False values depending on whether a particular entry matches the test or not.

Let us assign that to a mask variable. We will use Monday instead of Friday.

mask = df["day"] == "Monday"

We can now use this to subset rows that meet this requirement.

df[mask]

This gives us just the rows that meet this requirement.

If we need to return rows for several masks, we need to use the bitwise "or" operator "|"

First, let us create the different masks

mask_Saturday = df["day"] == "Saturday"

mask_Sunday = df["day"] == "Sunday"

mask_Monday = df["day"] == "Monday"

Now let's subset the rows with these masks.

df[mask_Saturday | mask_Sunday | mask_Monday]

This returns the following rows.

That is a lot of typing.

That got me thinking...

Was there a simpler way to do this?

I ran the following command:

dir(pd.Series)

to get a list of attributes/methods belonging to it.

On scrolling down, I came across the isin() method. That looked promising. On checking the documentation, I realised it was exactly what I needed. You just pass a list of words to it, and if it finds a match in the DataFrame, it will create that mask and return the values needed.

df[df["day"].isin(["Saturday", "Sunday", "Monday"])]

Exact same results as before. With much less typing.

What's not to like about pd.Series.isin()?

Tech & Art with Eric

Discussion about this post