The first argument of re.match()
is the regular expression, the second is the string to match:
import re
pattern = r"123"
string = "123zzb"
re.match(pattern, string)
# Out: <_sre.SRE_Match object; span=(0, 3), match='123'>
match = re.match(pattern, string)
match.group()
# Out: '123'
You may notice that the pattern variable is a string prefixed with r
, which indicates that the string is a raw string literal.
A raw string literal has a slightly different syntax than a string literal, namely a backslash \\\\
in a raw string literal means “just a backslash” and there’s no need for doubling up backlashes to escape “escape sequences” such as newlines (\\n
), tabs (\\t
), backspaces (\\\\
), form-feeds (\\r
), and so on. In normal string literals, each backslash must be doubled up to avoid being taken as the start of an escape sequence.
Hence, r"\\n"
is a string of 2 characters: \\\\
and n
. Regex patterns also use backslashes, e.g. \\d
refers to any digit character. We can avoid having to double escape our strings ("\\\\d"
) by using raw strings (r"\\d"
).
For instance:
string = "\\\\t123zzb" # here the backslash is escaped, so there's no tab, just '\\' and 't'
pattern = "\\\\t123" # this will match \\t (escaping the backslash) followed by 123
re.match(pattern, string).group() # no match
re.match(pattern, "\\t123zzb").group() # matches '\\t123'
pattern = r"\\\\t123"
re.match(pattern, string).group() # matches '\\\\t123'
Matching is done from the start of the string only. If you want to match anywhere use [re.search](<https://stackoverflow.com/documentation/python/632/regular-expressions-regex/2065/searching>)
instead:
match = re.match(r"(123)", "a123zzb")
match is None
# Out: True
match = re.search(r"(123)", "a123zzb")
match.group()
# Out: '123'