Extract All Regex Matches With Python
1 min readMay 22, 2021
The aim of this playbookš is to list steps for extracting all regex matches with Pythonās re module.
import re
def extractImages(filename):
imgReg = re.compile("../assets/(.*jpg|.*png)")
with open(filename, mode="rt", encoding="utf-8") as docFile:
doc = docFile.read()
images = re.findall(imgReg, doc)
return ["./assets/" + img for img in images]
# later used in e.g. [os.remove(img) for img in extractImages(filename)]
# above deletes all images located in ./assets/<filename>.jpg|png
- import
re
module - define regex and assign it within the regex compile object with
reg = re.compile(<regex>).
You of course need to define a capture group to be extracted I.e. withhttp://pavol.kutaj.com/assets/(.*jpg|.*png)
ā only the image filename and nothttp://pavol.kutaj.com/assets/
location is extracted - open the file with
with open(...) as alias:
statement - assign the content of the file with
inputObj = alias.read()
- assign the list of matches with
matches = re.findall(inputObj, reg)
- If the match contains more than one group,
re.findall()
will return a list of matching tuples, not a list of matching strings