Python RegEx through the ages, Appendix B: Other third-party regular expression modules
B.1. The Wahlig PCRE binding
PyPI package: python-pcre
Arkadiusz Wahlig’s third-party (second) pcre
module binding interfaces with the current version of the PCRE 1 library (not PCRE 2; it uses pcre.h
, not pcre2.h
). It accordingly supports the syntax supported by the current version of PCRE 1. For details of this syntax, its readme refers to the documentation of PCRE’s PHP binding.
B.1.1. API of the second pcre
module
Despite the module name, it does not match the API of the older, low-level module by the same name (it actually uses its own low-level _pcre
support module). Rather, its API is substantially the same as re
with a handful of important differences:
- By default, the substitution syntax matches
str.format
, i.e.sub
and friends behave like the secondregex
module’ssubf
and friends by default. - The global
enable_re_template_mode()
function changes this so that the substitution syntax works likere
. This recalls the syntax flag system from the firstregex
module in that it is done globally, but it furthermore offers no straightforward way of undoing it (technically, you can undo it with the decidedly kludgypcre.Match = pcre.REMatch.__bases__[0]
). - Its readme describes it as lacking the “scanner APIs” of the
re
module. I am not sure what this refers to. - The available flags are somewhat different, but with a large overlap.
B.1.2. Flags of the second pcre
module
Below is the list of flags as they currently appear in PCRE 1, along with the names under which they are exported in the Wahlig binding, and also their values. I call your attention to the following:
- The
LOCALE
flag is absent, as with (less surprisingly) theDEBUG
flag and (utterly unsurprisingly) theTEMPLATE
flag. - The
VERBOSE
,ANCHORED
,MULTILINE
andDOTALL
flags have different numerical values in the current PCRE 1 library than in the old and custom version used by thepre
module. - A few of the flags which were supported but not exported by the
pre
(and firstpcre
) modules are now exported. - Many, many more flags are supported by the current version of PCRE 1, only a few of which are exported, though more could theoretically be used as magic numbers.
- Because the
PCRE_NEWLINE_*
flags are not supposed to coëxist with each other, they are assigned permutations of two of the bits, rather than being assigned individual bits. Consequently,PCRE_NEWLINE_CRLF
andPCRE_NEWLINE_ANYCRLF
are not powers of two.
Name from Python | Name from C | Value |
---|---|---|
IGNORECASE | PCRE_CASELESS | 1 |
MULTILINE | PCRE_MULTILINE | 2 |
DOTALL | PCRE_DOTALL | 4 |
VERBOSE | PCRE_EXTENDED | 8 |
ANCHORED | PCRE_ANCHORED | 16 |
(not exported) | PCRE_DOLLAR_ENDONLY | 32 |
(not exported) | PCRE_EXTRA | 64 |
NOTBOL | PCRE_NOTBOL | 128 |
NOTEOL | PCRE_NOTEOL | 256 |
NOTEMPTY | PCRE_NOTEMPTY | 1024 |
UTF8 | PCRE_UTF8 | 2048 |
(not exported) | PCRE_NO_AUTO_CAPTURE | 4096 |
NO_UTF8_CHECK | PCRE_NO_UTF8_CHECK | 8192 |
(not exported) | PCRE_AUTO_CALLOUT | 16384 |
(not exported) | PCRE_PARTIAL | 32768 |
(not exported) | PCRE_NEVER_UTF | 65536 |
(not exported) | PCRE_NO_AUTO_POSSESS | 131072 |
(not exported) | PCRE_FIRSTLINE | 262144 |
(not exported) | PCRE_DUPNAMES | 524288 |
(not exported) | PCRE_NEWLINE_CR | 1048576 |
(not exported) | PCRE_NEWLINE_LF | 2097152 |
(not exported) | PCRE_NEWLINE_CRLF | 3145728 |
(not exported) | PCRE_NEWLINE_ANY | 4194304 |
(not exported) | PCRE_NEWLINE_ANYCRLF | 5242880 |
(not exported) | PCRE_BSR_ANYCRLF | 8388608 |
(not exported) | PCRE_BSR_UNICODE | 16777216 |
(not exported) | PCRE_JAVASCRIPT_COMPAT | 33554432 |
(not exported) | PCRE_NO_START_OPTIMIZE | 67108864 |
(not exported) | PCRE_PARTIAL_HARD | 134217728 |
NOTEMPTY_ATSTART | PCRE_NOTEMPTY_ATSTART | 268435456 |
UNICODE | PCRE_UCP | 536870912 |
B.1.3. The _pcre
C support module
TODO