Tech & Art with Eric

Share this post

Are You In?

ericgitonga.substack.com

Discover more from Tech & Art with Eric

All things Tech & Art. Combining my love of computers and photography of bugs and of watercolours. Posts shall feature either one of those or a combination of them.
Continue reading
Sign in

Are You In?

Sub-setting DataFrame rows.

Eric Gitonga
Oct 4, 2021
1
Share this post

Are You In?

ericgitonga.substack.com
Share
Jewel Beetles. Photograph by Eric Gitonga

A question came up the other day. How do you filter rows in a dataframe based on specific values in a column?

My original inclination was to perform a series of tests on the values needed, then use that as the mask.

First, a recap of row selection from a Pandas DataFrame...

To select all rows, just call the dataframe...

df

To return a single column, specify that in brackets.

df["day"]

That returns a Series.

To see whether certain values are in that Series, you can test for it using a specific value like shown below:

df["day"] == "Friday"

This returns a Series of True/False values depending on whether a particular entry matches the test or not.

Let us assign that to a mask variable. We will use Monday instead of Friday.

mask = df["day"] == "Monday"

We can now use this to subset rows that meet this requirement.

df[mask]

This gives us just the rows that meet this requirement.

If we need to return rows for several masks, we need to use the bitwise "or" operator "|"

First, let us create the different masks

mask_Saturday = df["day"] == "Saturday"

mask_Sunday = df["day"] == "Sunday"

mask_Monday = df["day"] == "Monday"

Now let's subset the rows with these masks.

df[mask_Saturday | mask_Sunday | mask_Monday]

This returns the following rows.

That is a lot of typing.

That got me thinking...

Was there a simpler way to do this?

I ran the following command:

dir(pd.Series)

to get a list of attributes/methods belonging to it.

On scrolling down, I came across the isin() method. That looked promising. On checking the documentation, I realised it was exactly what I needed. You just pass a list of words to it, and if it finds a match in the DataFrame, it will create that mask and return the values needed.

df[df["day"].isin(["Saturday", "Sunday", "Monday"])]

Exact same results as before. With much less typing.

What's not to like about pd.Series.isin()?

1
Share this post

Are You In?

ericgitonga.substack.com
Share
Comments
Top
New

No posts

Ready for more?

© 2023 Eric Gitonga
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing