Filter Pandas Rows Containing A Given Substring

Pavol Kutaj
Aug 17, 2021

--

The aim of this playbook🏁 is to provide steps for quickly filtering Pandas dataframa on values contained within rows of a selected column.

SQL way is easy with

SELECT * 
FROM test_table
WHERE col1 LIKE "%substring%"

1. instructions

  • I have a csv with with soon-to-expire TLS certificates
  • the following filters pandas dataframe to records where in a column issuer there is a substring Sectigo
  • important: do not forget about na = False otherwise a ValueError is raised in empty rows
import pandas as pd
df = pd.read_csv(".\\ssl.csv")
df[df['issuer'].str.contains('Sectigo', na = False)]
  • this is called boolean indexing:

You may select rows from a DataFrame using a boolean vector the same length as the DataFrame’s index (for example, something derived from one of the columns of the DataFrame)

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing

2. sources

--

--

No responses yet