How to Extract All Regex Matches from a File With Python

The aim of this playbook🏁 is to list steps for extracting all regex matches with Python’s re module from a file into a list using the re.findall() method

Pavol Kutaj
1 min readSep 13, 2022
  • NOTE: this is not about extracting matches from capture groups.

1. RE.FINDALL()

  1. import re module
  2. initialize a variable bound to a regex compile object with p = re.compile(<pattern>)
  3. open the file with with open(...) as <alias>: statement
  4. assign the content of the file with f = <alias>.read()
  5. assign the list of matches with m = re.findall(f, p)
  • re.findall() will return a list of matching tuples, not a list of matching strings

2. EXAMPLE

import re


def extract_images(filename):
p = re.compile("../assets/(.*?jpg|.*?png)")
with open(filename, mode="rt", encoding="utf-8") as docFile:
doc = docFile.read()
images = re.findall(p, doc)
return ["./assets/" + img for img in images]
# later used in e.g. [os.remove(img) for img in extract_images(filename)]
# above deletes all images located in ./assets/<filename>.jpg|png
  • I have a file containing
|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
| ![example_description](../assets/81.03_2.png) | ![example_description](../assets/81.03_3.png) | ![example_description](../assets/81.03_4.png) | ![example_description](../assets/81.03_5.png) |
|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|-----------------------------------------------|
  • Running the function returns
extract_images("Test-Ignore.md"))
['./assets/81.03_2.png', './assets/81.03_3.png', './assets/81.03_4.png', './assets/81.03_5.png', './assets/81.03_1.png']
  • These are paths and I can e.g. commit and push those files if they are part of an automated git workflow

--

--

No responses yet