Python 正则表达式
wangzf / 2023-01-09
目录
re 库介绍
re
库是 Python 处理文本的标准库
Python re
库主要定义了:
- 9 个常量
- 12 个函数
- 1 个异常
re 库使用
import re
re 库常量
re
库 中的常量表示不可更改的变量, 一般用于做标记. \ re
模块中有 9 个常量, 常量值都是 int
类型:
re.ASCII
orre.A
re.IGNORECASE
orre.I
re.LOCALE
orre.L
re.UNICODE
orre.U
re. MULTILINE
orre.M
re.DOTALL
orre.S
re.VERBOSE
orre.X
re.TEMPLATE
orre.T
re.DEBUG
re 库源码
class RegexFlag(enum.IntFlag):
ASCII = A = sre_compile.SRE_FLAG_ASCII # assume ascii "locale"
IGNORECASE = I = sre_compile.SRE_FLAG_IGNORECASE # ignore case
LOCALE = L = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
UNICODE = U = sre_compile.SRE_FLAG_UNICODE # assume unicode "locale"
MULTILINE = M = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
DOTALL = S = sre_compile.SRE_FLAG_DOTALL # make dot match newline
VERBOSE = X = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments
# sre extensions (experimental, don't rely on these)
TEMPLATE = T = sre_compile.SRE_FLAG_TEMPLATE # disable backtracking
DEBUG = sre_compile.SRE_FLAG_DEBUG # dump pattern after compilation
def __repr__(self):
if self._name_ is not None:
return f're.{self._name_}'
value = self._value_
members = []
negative = value < 0
if negative:
value = ~value
for m in self.__class__:
if value & m._value_:
value &= ~m._value_
members.append(f're.{m._name_}')
if value:
members.append(hex(value))
res = '|'.join(members)
if negative:
if len(members) > 1:
res = f'~({res})'
else:
res = f'~{res}'
return res
__str__ = object.
re.IGNORECASE 使用
- 语法:
re.IGNORECASE
orre.I
- 作用:
- 忽略大小写匹配
- 代码:
text = "Hello World."
pattern = r"Hello World."
print("默认模式: ", re.findall(pattern, text))
print("忽略大小写模式: ", re.findall(pattern, text, re.I))
re.ASCII 使用
- 语法:
re.ASCII
orre.A
- 作用:
- 让
\w
\ , \\W
\ , \\b
\ , \\B
\ , \\d
\ , \\D
\ , \\s
\ , \\S
只匹配 ASCII 编码支持的字符, 而不是 Unicode 编码支持的字符
- 让
- 代码:
text = "a测试b测试c"
pattern = r"\w+"
print("Unicode:", re.findall(pattern, text))
print("ASCII:", re.findall(pattern, text, re.A))
re.DOTALL 使用
- 语法:
re.DOTALL
orre.S
- 作用:
- 让
.
匹配所有字符, 包括换行符
- 让
- 代码:
text = "测试\n测试"
pattern = r".*"
print("默认模式:", re.findall(pattern, text))
print(".匹配所有模式:", re.findall(pattern, text, re.S))