Filter Pandas Rows Containing A Given Substring
Aug 17, 2021
The aim of this playbook🏁 is to provide steps for quickly filtering Pandas dataframa on values contained within rows of a selected column.
SQL way is easy with
SELECT *
FROM test_table
WHERE col1 LIKE "%substring%"
1. instructions
- I have a csv with with soon-to-expire TLS certificates
- the following filters pandas dataframe to records where in a column
issuer
there is a substringSectigo
- important: do not forget about
na = False
otherwise a ValueError is raised in empty rows
import pandas as pd
df = pd.read_csv(".\\ssl.csv")
df[df['issuer'].str.contains('Sectigo', na = False)]
- this is called boolean indexing:
You may select rows from a DataFrame using a boolean vector the same length as the DataFrame’s index (for example, something derived from one of the columns of the DataFrame)
— https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing
2. sources
- https://stackoverflow.com/a/27975789/11082684 a great rubberducking
- Boolean Indexing