Sunteți pe pagina 1din 2

Perl Regular Expression Quick Reference Card Syntax \d A digit [0-9]

Revision 0.1 (draft) for Perl 5.8.5 \D A nondigit [ˆ0-9]


Iain Truskett (formatting by Andrew Ford) refcards.com TM \ Escapes the character immediately following it \w A word character [a-zA-Z0-9 ]
. Matches any single character except a newline (un- \W A non-word character [ˆa-zA-Z0-9 ]
less /s is used) \s A whitespace character [ \t\n\r\f]
ˆ Matches at the beginning of the string (or line, if /m \S A non-whitespace character [ˆ \t\n\r\f]
is used)
$ Matches at the end of the string (or line, if /m is used) \C Match a byte (with Unicode, ‘.’ matches a charac-
* Matches the preceding element 0 or more times ter)
+ Matches the preceding element 1 or more times \pP Match P-named (Unicode) property
? Matches the preceding element 0 or 1 times \p{...} Match Unicode property with long name
{...} Specifies a range of occurrences for the element pre- \PP Match non-P
ceding it \P{...} Match lack of Unicode property with long name
[...] Matches any one of the characters contained within \X Match extended unicode sequence
This is a quick reference to Perl’s regular expressions. For full the brackets
information see the perlre and perlop manual pages. (...) Groups subexpressions for capturing to $1, $2...
(?:...) Groups subexpressions without capturing (cluster) POSIX character classes and their Unicode and Perl equivalents:
| Matches either the subexpression preceding or fol-
lowing it alnum IsAlnum
\1, \2 ... The text from the Nth group alpha IsAlpha
ascii IsASCII
Operators blank IsSpace
Escape sequences cntrl IsCntrl
These work as in normal strings. digit IsDigit
=˜ determines to which variable the regex is applied. In its ab- graph IsGraph
sence, $_ is used. \a Alarm (beep) lower IsLower
$var =˜ /foo/; \e Escape print IsPrint
!˜ determines to which variable the regex is applied, and negates \f Formfeed punct IsPunct
the result of the match; it returns false if the match succeeds, \n Newline space IsSpace
and true if it fails. \r Carriage return IsSpacePerl
$var !˜ /foo/; \t Tab upper IsUpper
m/pattern/igmsoxc \038 Any octal ASCII value word IsWord
searches a string for a pattern match, applying the given op- \x7f Any hexadecimal ASCII value xdigit IsXDigit
tions. \x{263a} A wide hexadecimal value
i case-insensitive \cx Control-x
g global – all occurrences \N{name} A named character Within a character class:
m multiline mode – ˆ and $ match internal lines
s match as a single line – . matches \n \l Lowercase next character
o compile pattern once \u Titlecase next character POSIX traditional Unicode
x extended legibility – free whitespace and com- \L Lowercase until \E [:digit:] \d \p{IsDigit}
ments \U Uppercase until \E [:ˆdigit:] \D \P{IsDigit}
c don’t reset pos on failed matches when using /g \Q Disable pattern metacharacters until \E
If pattern is an empty string, the last successfully matched \E End case modification
regex is used. Delimiters other than ‘/’ may be used for both This one works differently from normal strings: Anchors
this operator and the following ones. \b An assertion, not backspace, except in a character
qr/pattern/imsox class
lets you store a regex in a variable, or pass one around. Mod- All are zero-width assertions.
ifiers as for m// and are stored within the regex. Character classes
s/pattern/replacement/igmsoxe ˆ Match string start (or line, if /m is used)
substitutes matches of pattern with replacement. Modifiers [amy] Match ‘a’, ‘m’ or ‘y’ $ Match string end (or line, if /m is used) or before
as for m// with one addition: [f-j] Dash specifies range newline
e evaluate replacement as an expression [f-j-] Dash escaped or at start or end means ‘dash’ \b Match word boundary (between \w and \W)
‘e’ may be specified multiple times. replacement is inter- [ˆf-j] Caret indicates “match any character except these” \B Match except at word boundary (between \w and \w
preted as a double quoted string unless a single-quote (’) is or \W and \W)
the delimiter. The following sequences work within or without a character class. \A Match string start (regardless of /m)
?pattern? The first six are locale aware, all are Unicode aware. The default \Z Match string end (before optional newline)
is like m/pattern/ but matches only once. No alternate de- character class equivalent are given. See the perllocale and perlu- \z Match absolute string end
limiters can be used. Must be reset with reset. nicode man pages for details. \G Match where previous m//g left off
1 2 3
Quantifiers Functions
Quantifiers are greedy by default – match the longest leftmost. lc Lowercase a string
lcfirst Lowercase first char of a string
Maximal Minimal Allowed range uc Uppercase a string
{n,m} {n,m}? Must occur at least n times but no more ucfirst Titlecase first char of a string
than m times
pos Return or set current match position
{n,} {n,}? Must occur at least n times quotemeta Quote metacharacters
{n} {n}? Must occur exactly n times reset Reset ?pattern? status
* *? 0 or more times (same as {0,}) study Analyze string for optimizing matching
+ +? 1 or more times (same as {1,})
split Use regex to split a string into parts
? ?? 0 or 1 time (same as {0,1})
The first four of these are like the escape sequences \L, \l, \U,
There is no quantifier {,n} – that gets understood as a literal and \u. For Titlecase, see below.
string.
Terminology
Extended constructs Titlecase
Unicode concept which most often is equal to uppercase, but for
(?#text) A comment certain characters like the German ‘sharp s’ (ß) there is a differ-
(?imxs-imsx:...) Enable/disable option (as per m// modifiers) ence.
(?=...) Zero-width positive lookahead assertion
(?!...) Zero-width negative lookahead assertion See also
(?<=...) Zero-width positive lookbehind assertion • perlretut for a tutorial on regular expressions.
(?<!...) Zero-width negative lookbehind assertion • perlrequick for a rapid tutorial.
(?>...) Grab what we can, prohibit backtracking
(?{ code }) Embedded code, return value becomes $ˆR • perlre for more details.
(??{ code }) Dynamic regex, return value used as regex
(?(cond)yes|no) cond being integer corresponding to captur- • perlvar for details on the variables.
ing parens
(?(cond)yes) or a lookaround/eval zero-width assertion • perlop for details on the operators.
• perlfunc for details on the functions.
Variables
$ Default variable for operators to use • perlfaq6 for FAQs on regular expressions.
$* Enable multiline matching (deprecated; not in 5.9.0 • The remodule to alter behaviour and aid debugging.
or later)
$& Entire matched string • “Debugging regular expressions” in perldebug
$‘ Everything prior to matched string • perluniintro, perlunicode, charnames and locale for details on
$’ Everything after to matched string regexes and internationalisation.
The use of those last three will slow down all regex use
within your program. Consult the perlvar man page for • Mastering Regular Expressions by Jeffrey Friedl
@LAST_MATCH_START to see equivalent expressions that won’t (http://regex.info/) for a thorough grounding and reference
cause slow down. See also Devel::SawAmpersand. on the topic.
$1, $2 ... Hold the Xth captured expr
$+ Last parenthesized pattern match Authors
$ˆN Holds the most recently closed capture This card was created by Andrew Ford. Perl Regular Expression Quick Reference Card
$ˆR Holds the result of the last (?{...}) expr Revision 0.1 (draft) for Perl version Perl 5.8.5 [July 2005]
@- Offsets of starts of groups. $-[0] holds start of A refcards.comTM quick reference card.
The original document (perlreref.pod) is part of the standard refcards.com is a trademark of Ford & Mason Ltd.
whole match Perl distribution. It was written by Iain Truskett, with thanks to
@+ Offsets of ends of groups. $+[0] holds end of whole Published by Ford & Mason Ltd.
David P.C. Wollmann, Richard Soderberg, Sean M. Burke, Tom c Iain Truskett. This document may be distributed under the same terms
match Christiansen, Jim Cromie, and Jeffrey Goff for useful advice. as Perl itself. Download from refcards.com.
Captured groups are numbered according to their opening paren.

4 5 6

S-ar putea să vă placă și