Python3-RE模块

主要搬运官方文档，记录一些重点。

Introduction

无

Simple Patterns

无

Matching Characters

反斜杠 \ 不仅能使原本的元字符可用，此外，反斜杠后接某些字符还能代表字符组，比如：

\d 与 [0-9] 的功能相同

Repeating Things

重复匹配比如 * 通常来说是贪心的，就是会匹配尽可能长字符串

Using Regular Expressions

Compiling Regular Expressions

使用如下语句能把所需的正则表达式编译为模式串：

p = re.compile('ab*', re.IGNORECASE)

语句的第二个参数通常可以被省略。

The Backslash Plague

由于反斜杠 \ 的问题，在python中，表达一个正则表达式最好用生字符串来表示。在生字符串中，反斜杠 \ 一定不会被特殊处理。比如：

r”\n” 表示两个字符，一个是\，一个是n
“\n”表示换行符

Performing Matches

如果tkinter是可用的话，就可以使用Tools/demo/redemo.py来调试RE模块。

可用的函数（不全）：

match() 该正则表达式是否匹配字符串的开头
search() 该正则表达式是否匹配字符串中的一段
findall() 返回字符串中所有匹配的子串

前两个函数返回的都是match object。如果啥也没匹配到，match object为None。match object有几个可用的函数：

group() 返回字符串
start() 返回匹配的起点
end() 返回匹配的终点
span() 返回一个tuple，包含(start, end)

Module-Level Functions

re模块支持一种简单的形式来进行匹配，比如

re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998')

预编译的形式会更加快一点

Compilation Flags

介绍了一些特殊的flag

More Pattern Power

More Metacharacters

\b可以匹配非字母数字，比如空白符之类的

Grouping

有时候会希望一次同时匹配多个字符串，这个时候就可以用 ’(‘ 和 ’)’ 将希望匹配的子字符串扩起来。在匹配完以后，使用数字查询匹配到的内容。

>>> p = re.compile('(a(b)c)d')
>>> m = p.match('abcd')
>>> m.group(0)
'abcd'
>>> m.group(1)
'abc'
>>> m.group(2)
'b'

使用括号括起来的组还可以在后续被反向引用，比如下面这个例子：

>>> p = re.compile(r'\b(\w+)\s+\1\b')
>>> p.search('Paris in the the spring').group()
'the the'

\1 指代了 (\w+)，这两个组要匹配相同的内容。

Non-capturing and Named Groups

可以使用名字来对组进行命名

也可以选择不捕获组内的内容（这有啥意义？）

Lookahead Assertions

向前的断言，有点类似于条件判断句的作用，只有当断言满足时，才会继续向前匹配。

(?=…) 正向的向前断言
(?!…) 反向的向前断言

举个例子，如果要找出不以 .bat为结尾的文件名，则使用

.*[.](?!bat$)[^.]*$

Modifying Strings

Splitting Strings

可以使用re模块对字符串进行切割，例子如下：

>>> p = re.compile(r'\W+')
>>> p.split('This is a test, short and sweet, of split().')
['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', '']
>>> p.split('This is a test, short and sweet, of split().', 3)
['This', 'is', 'a', 'test, short and sweet, of split().']

Search and Replace

sub是替换，和split差不多，例子如下：

>>> p = re.compile('(blue|white|red)')
>>> p.sub('colour', 'blue socks and red shoes')
'colour socks and colour shoes'
>>> p.sub('colour', 'blue socks and red shoes', count=1)
'colour socks and red shoes'

python3-RE模块

python3