Sunteți pe pagina 1din 16

Using Regular Expressions with PHP

Regular expressions are a powerful tool for examining and modifying text. Regular expressions
themselves, with a general pattern notation almost like a mini programming language, allow you to
describe and parse text. They enable you to search for patterns within a string, extracting matches
flexibly and precisely. However, you should note that because regular expressions are more powerful,
they are also slower than the more basic string functions. You should only use regular expressions if you
have a particular need.

This tutorial gives a brief overview of basic regular expression syntax and then considers the functions
that PHP provides for working with regular expressions.

The Basics
Matching Patterns
Replacing Patterns
Array Processing

PHP supports two different types of regular expressions: POSIX-extended and Perl-Compatible Regular
Expressions (PCRE). The PCRE functions are more powerful than the POSIX ones, and faster too, so
we will concentrate on them.

The Basics

In a regular expression, most characters match only themselves. For instance, if you search for the
regular expression "foo" in the string "John plays football," you get a match because "foo" occurs in that
string. Some characters have special meanings in regular expressions. For instance, a dollar sign ($) is
used to match strings that end with the given pattern. Similarly, a caret (^) character at the beginning of a
regular expression indicates that it must match the beginning of the string. The characters that match
themselves are called literals. The characters that have special meanings are called metacharacters.

The dot (.) metacharacter matches any single character except newline (\). So, the pattern h.t matches
hat, hothit, hut, h7t, etc. The vertical pipe (|) metacharacter is used for alternatives in a regular
expression. It behaves much like a logical OR operator and you should use it if you want to construct a
pattern that matches more than one set of characters. For instance, the pattern Utah|Idaho|Nevada
matches strings that contain "Utah" or "Idaho" or "Nevada". Parentheses give us a way to group
sequences. For example, (Nant|b)ucket matches "Nantucket" or "bucket". Using parentheses to group
together characters for alternation is called grouping.

If you want to match a literal metacharacter in a pattern, you have to escape it with a backslash.

To specify a set of acceptable characters in your pattern, you can either build a character class yourself
or use a predefined one. A character class lets you represent a bunch of characters as a single item in a
regular expression. You can build your own character class by enclosing the acceptable characters in
square brackets. A character class matches any one of the characters in the class. For example a
character class [abc] matches a, b or c. To define a range of characters, just put the first and last
characters in, separated by hyphen. For example, to match all alphanumeric characters: [a-zA-Z0-9].
You can also create a negated character class, which matches any character that is not in the class. To
create a negated character class, begin the character class with ^: [^0-9].

The metacharacters +, *, ?, and {} affect the number of times a pattern should be matched. + means
"Match one or more of the preceding expression", * means "Match zero or more of the preceding
expression", and ? means "Match zero or one of the preceding expression". Curly braces {} can be used
differently. With a single integer, {n} means "match exactly n occurrences of the preceding expression",
with one integer and a comma, {n,} means "match n or more occurrences of the preceding expression",
and with two comma-separated integers {n,m} means "match the previous character if it occurs at least n
times, but no more than m times".

Now, have a look at the examples:

Regular Expression Will match...


foo The string "foo"
^foo "foo" at the start of a string
foo$ "foo" at the end of a string
^foo$ "foo" when it is alone on a string
[abc] a, b, or c
[a-z] Any lowercase letter
[^A-Z] Any character that is not a uppercase letter
(gif|jpg) Matches either "gif" or "jpeg"
[a-z]+ One or more lowercase letters
[0-9\.\-] ny number, dot, or minus sign
^[a-zA-Z0-9_]{1,}$ Any word of at least one letter, number or _
([wx])([yz]) wy, wz, xy, or xz
[^A-Za-z0-9] Any symbol (not a number or a letter)
([A-Z]{3}|[0-9]{4}) Matches three letters or four numbers

Perl-Compatible Regular Expressions emulate the Perl syntax for patterns, which means that each
pattern must be enclosed in a pair of delimiters. Usually, the slash (/) character is used. For instance,
/pattern/.

The PCRE functions can be divided in several classes: matching, replacing, splitting and filtering.

Back to top

Matching Patterns

The preg_match() function performs Perl-style pattern matching on a string. preg_match() takes two
basic and three optional parameters. These parameters are, in order, a regular expression string, a source
string, an array variable which stores matches, a flag argument and an offset parameter that can be used
to specify the alternate place from which to start the search:

preg_match ( pattern, subject [, matches [, flags [, offset]]])


The preg_match() function returns 1 if a match is found and 0 otherwise. Let's search the string "Hello
World!" for the letters "ll":

<?php
if (preg_match("/ell/", "Hello World!", $matches)) {
echo "Match was found <br />";
echo $matches[0];
}
?>

The letters "ll" exist in "Hello", so preg_match() returns 1 and the first element of the $matches variable
is filled with the string that matched the pattern. The regular expression in the next example is looking
for the letters "ell", but looking for them with following characters:

<?php
if (preg_match("/ll.*/", "The History of Halloween", $matches)) {
echo "Match was found <br />";
echo $matches[0];
}
?>

Now let's consider more complicated example. The most popular use of regular expressions is
validation. The example below checks if the password is "strong", i.e. the password must be at least 8
characters and must contain at least one lower case letter, one upper case letter and one digit:

<?php
$password = "Fyfjk34sdfjfsjq7";

if (preg_match("/^.*(?=.{8,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$/", $password)) {
echo "Your passwords is strong.";
} else {
echo "Your password is weak.";
}
?>

The ^ and $ are looking for something at the start and the end of the string. The ".*" combination is used
at both the start and the end. As mentioned above, the .(dot) metacharacter means any alphanumeric
character, and * metacharacter means "zero or more". Between are groupings in parentheses. The "?="
combination means "the next text must be like this". This construct doesn't capture the text. In this
example, instead of specifying the order that things should appear, it's saying that it must appear but
we're not worried about the order.

The first grouping is (?=.*{8,}). This checks if there are at least 8 characters in the string. The next
grouping (?=.*[0-9]) means "any alphanumeric character can happen zero or more times, then any digit
can happen". So this checks if there is at least one number in the string. But since the string isn't
captured, that one digit can appear anywhere in the string. The next groupings (?=.*[a-z]) and (?=.*[A-
Z]) are looking for the lower case and upper case letter accordingly anywhere in the string.

Finally, we will consider regular expression that validates an email address:

<?php
$email = firstname.lastname@aaa.bbb.com;
$regexp = "/^[^0-9][A-z0-9_]+([.][A-z0-9_]+)*[@][A-z0-9_]+([.][A-z0-9_]+)*[.][A-z]{2,4}$/";

if (preg_match($regexp, $email)) {
echo "Email address is valid.";
} else {
echo "Email address is <u>not</u> valid.";
}
?>

This regular expression checks for the number at the beginning and also checks for multiple periods in
the user name and domain name in the email address. Let's try to investigate this regular expression
yourself.

For the speed reasons, the preg_match() function matches only the first pattern it finds in a string. This
means it is very quick to check whether a pattern exists in a string. An alternative function,
preg_match_all(), matches a pattern against a string as many times as the pattern allows, and returns the
number of times it matched.

Back to top

Replacing Patterns

In the above examples, we have searched for patterns in a string, leaving the search string untouched.
The preg_replace() function looks for substrings that match a pattern and then replaces them with new
text. preg_replace() takes three basic parameters and an additional one. These parameters are, in order, a
regular expression, the text with which to replace a found pattern, the string to modify, and the last
optional argument which specifies how many matches will be replaced.

preg_replace( pattern, replacement, subject [, limit ])

The function returns the changed string if a match was found or an unchanged copy of the original string
otherwise. In the following example we search for the copyright phrase and replace the year with the
current.

<?php
echo preg_replace("/([Cc]opyright) 200(3|4|5|6)/", "$1 2007", "Copyright 2005");
?>
In the above example we use back references in the replacement string. Back references make it possible
for you to use part of a matched pattern in the replacement string. To use this feature, you should use
parentheses to wrap any elements of your regular expression that you might want to use. You can refer to
the text matched by subpattern with a dollar sign ($) and the number of the subpattern. For instance, if
you are using subpatterns, $0 is set to the whole match, then $1, $2, and so on are set to the individual
matches for each subpattern.

In the following example we will change the date format from "yyyy-mm-dd" to "mm/dd/yyy":

<?php
echo preg_replace("/(\d+)-(\d+)-(\d+)/", "$2/$3/$1", "2007-01-25");
?>

We also can pass an array of strings as subject to make the substitution on all of them. To perform
multiple substitutions on the same string or array of strings with one call to preg_replace(), we should
pass arrays of patterns and replacements. Have a look at the example:

<?php
$search = array ( "/(\w{6}\s\(w{2})\s(\w+)/e",
"/(\d{4})-(\d{2})-(\d{2})\s(\d{2}:\d{2}:\d{2})/");

$replace = array ('"$1 ".strtoupper("$2")',


"$3/$2/$1 $4");

$string = "Posted by John | 2007-02-15 02:43:41";

echo preg_replace($search, $replace, $string);?>

In the above example we use the other interesting functionality - you can say to PHP that the match text
should be executed as PHP code once the replacement has taken place. Since we have appended an "e"
to the end of the regular expression, PHP will execute the replacement it makes. That is, it will take
strtoupper(name) and replace it with the result of the strtoupper() function, which is NAME.

Back to top

Array Processing

PHP's preg_split() function enables you to break a string apart basing on something more complicated
than a literal sequence of characters. When it's necessary to split a string with a dynamic expression
rather than a fixed one, this function comes to the rescue. The basic idea is the same as preg_match_all()
except that, instead of returning matched pieces of the subject string, it returns an array of pieces that
didn't match the specified pattern. The following example uses a regular expression to split the string by
any number of commas or space characters:

<?php
$keywords = preg_split("/[\s,]+/", "php, regular expressions");
print_r( $keywords );
?>

Another useful PHP function is the preg_grep() function which returns those elements of an array that
match a given pattern. This function traverses the input array, testing all elements against the supplied
pattern. If a match is found, the matching element is returned as part of the array containing all matches.
The following example searches through an array and all the names starting with letters A-J:

<?php
$names = array('Andrew','John','Peter','Nastin','Bill');
$output = preg_grep('/^[a-m]/i', $names);
print_r( $output );
?>

preg_match
preg_match Perform a regular expression match

Description

int preg_match ( string $pattern, string $subject [, array &$matches [, int $flags [, int $offset]]] )

Searches subject for a match to the regular expression given in pattern.

Parameters
pattern

The pattern to search for, as a string.

subject

The input string.

matches

If matches is provided, then it is filled with the results of search. $matches[0] will contain the
text that matched the full pattern, $matches[1] will have the text that matched the first captured
parenthesized subpattern, and so on.

flags
flags can be the following flag:

PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant string offset will also be returned.
Note that this changes the return value in an array where every element is an array consisting of
the matched string at index 0 and its string offset into subject at index 1.
offset

Normally, the search starts from the beginning of the subject string. The optional parameter
offset can be used to specify the alternate place from which to start the search (in bytes).

Note: Using offset is not equivalent to passing substr($subject, $offset) to


preg_match() in place of the subject string, because pattern can contain assertions such
as ^, $ or (?<=x). Compare:

<?php
$subject = "abcdef";
$pattern = '/^def/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3);
print_r($matches);
?>

The above example will output:

Array
(
)

while this example

<?php
$subject = "abcdef";
$pattern = '/^def/';
preg_match($pattern, substr($subject,3), $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>

will produce

Array
(
[0] => Array
(
[0] => def
[1] => 0
)
)

Return Values

preg_match() returns the number of times pattern matches. That will be either 0 times (no match) or 1
time because preg_match() will stop searching after the first match. preg_match_all() on the contrary
will continue until it reaches the end of subject. preg_match() returns FALSE if an error occurred.

Examples

Example 1728. Find the string of text "php"

<?php
// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
?>

Example 1729. Find the word "web"

<?php
/* The \b in the pattern indicates a word boundary, so only the distinct
* word "web" is matched, and not a word partial like "webbing" or "cobweb" */
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}

if (preg_match("/\bweb\b/i", "PHP is the website scripting language of choice.")) {


echo "A match was found.";
} else {
echo "A match was not found.";
}
?>

Example 1730. Getting the domain name out of a URL

<?php
// get host name from URL
preg_match('@^(?:http://)?([^/]+)@i',
"http://www.php.net/index.html", $matches);
$host = $matches[1];

// get last two segments of host name


preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>

The above example will output:

domain name is: php.net

Notes

Do not use preg_match() if you only want to check if one string is contained in another string. Use
strpos() or strstr() instead as they will be faster.

preg_match_all
The preg_match_all() function allows you to perform multiple matches on a given string
based on a single regular expression. For example:
$string = "a1bb b2cc c2dd";
$regex = "#([abc])\d#";
$matches = array();
if (preg_match_all ($regex, $string, $matches)) {
var_dump ($matches);
}
This script outputs the following:
array(2) {
94 Strings And Patterns
[0]=>
array(3) {
[0]=>
string(2) "a1"
[1]=>
string(2) "b2"
[2]=>
string(2) "c2"
}
[1]=>
array(3) {
[0]=>
string(1) "a"
[1]=>
string(1) "b"
[2]=>
string(1) "c"
}
}
As you can see, all the whole-pattern matches are stored in the first sub-array of the
result, while the first captured subpattern of every match is stored in the corresponding
slot of the second sub-array.

preg_replace
(PHP 4, PHP 5)

preg_replace Perform a regular expression search and replace

Description
mixed preg_replace ( mixed $pattern, mixed $replacement, mixed $subject [, int $limit [, int
&$count]] )

Searches subject for matches to pattern and replaces them with replacement.

Parameters
pattern

The pattern to search for. It can be either a string or an array with strings.

The e modifier makes preg_replace() treat the replacement parameter as PHP code after the
appropriate references substitution is done. Tip: make sure that replacement constitutes a valid
PHP code string, otherwise PHP will complain about a parse error at the line containing
preg_replace().

replacement

The string or an array with strings to replace. If this parameter is a string and the pattern
parameter is an array, all patterns will be replaced by that string. If both pattern and
replacement parameters are arrays, each pattern will be replaced by the replacement
counterpart. If there are fewer elements in the replacement array than in the pattern array, any
extra patterns will be replaced by an empty string.

replacement may contain references of the form \\n or (since PHP 4.0.4) $n, with the latter form
being the preferred one. Every such reference will be replaced by the text captured by the n'th
parenthesized pattern. n can be from 0 to 99, and \\0 or $0 refers to the text matched by the
whole pattern. Opening parentheses are counted from left to right (starting from 1) to obtain the
number of the capturing subpattern.

When working with a replacement pattern where a backreference is immediately followed by


another number (i.e.: placing a literal number immediately after a matched pattern), you cannot
use the familiar \\1 notation for your backreference. \\11, for example, would confuse
preg_replace() since it does not know whether you want the \\1 backreference followed by a
literal 1, or the \\11 backreference followed by nothing. In this case the solution is to use \${1}1.
This creates an isolated $1 backreference, leaving the 1 as a literal.

When using the e modifier, this function escapes some characters (namely ', ", \ and NULL) in
the strings that replace the backreferences. This is done to ensure that no syntax errors arise from
backreference usage with either single or double quotes (e.g. 'strlen(\'$1\')+strlen("$2")'). Make
sure you are aware of PHP's string syntax to know exactly how the interpreted string will look
like.

subject

The string or an array with strings to search and replace.

If subject is an array, then the search and replace is performed on every entry of subject, and
the return value is an array as well.

limit

The maximum possible replacements for each pattern in each subject string. Defaults to -1 (no
limit).

count

If specified, this variable will be filled with the number of replacements done.

Return Values
preg_replace() returns an array if the subject parameter is an array, or a string otherwise.

If matches are found, the new subject will be returned, otherwise subject will be returned unchanged.

Examples
Example 1736. Using backreferences followed by numeric literals

<?php
$string = 'April 15, 2003';
$pattern = '/(\w+) (\d+), (\d+)/i';
$replacement = '${1}1,$3';
echo preg_replace($pattern, $replacement, $string);
?>

The above example will output:

April1,2003

Example 1737. Using indexed arrays with preg_replace()

<?php
$string = 'The quick brown fox jumped over the lazy dog.';
$patterns[0] = '/quick/';
$patterns[1] = '/brown/';
$patterns[2] = '/fox/';
$replacements[2] = 'bear';
$replacements[1] = 'black';
$replacements[0] = 'slow';
echo preg_replace($patterns, $replacements, $string);
?>

The above example will output:

The bear black slow jumped over the lazy dog.

By ksorting patterns and replacements, we should get what we wanted.

<?php
ksort($patterns);
ksort($replacements);
echo preg_replace($patterns, $replacements, $string);
?>

The above example will output:

The slow black bear jumped over the lazy dog.


Example 1738. Replacing several values

<?php
$patterns = array ('/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/',
'/^\s*{(\w+)}\s*=/');
$replace = array ('\3/\4/\1\2', '$\1 =');
echo preg_replace($patterns, $replace, '{startDate} = 1999-5-27');
?>

The above example will output:

$startDate = 5/27/1999

Example 1739. Using the 'e' modifier

<?php
preg_replace("/(<\/?)(\w+)([^>]*>)/e",
"'\\1'.strtoupper('\\2').'\\3'",
$html_body);
?>

This would capitalize all HTML tags in the input text.

Example 1740. Strip whitespace

This example strips excess whitespace from a string.

<?php
$str = 'foo o';
$str = preg_replace('/\s\s+/', ' ', $str);
// This will be 'foo o' now
echo $str;
?>

Example 1741. Using the count parameter

<?php
$count = 0;

echo preg_replace(array('/\d/', '/\s/'), '*', 'xp 4 to', -1 , $count);


echo $count; //3
?>

The above example will output:


xp***to
3

Notes
Note: When using arrays with pattern and replacement, the keys are processed in the order they
appear in the array. This is not necessarily the same as the numerical index order. If you use indexes to
identify which pattern should be replaced by which replacement, you should perform a ksort() on
each array prior to calling preg_replace().

preg_split
(PHP 4, PHP 5)

preg_split Split string by a regular expression

Description
array preg_split ( string $pattern, string $subject [, int $limit [, int $flags]] )

Split the given string by a regular expression.

Parameters
pattern

The pattern to search for, as a string.

subject

The input string.

limit

If specified, then only substrings up to limit are returned, and if limit is -1, it actually means
"no limit", which is useful for specifying the flags.

flags
flags can be any combination of the following flags (combined with bitwise | operator):

PREG_SPLIT_NO_EMPTY
If this flag is set, only non-empty pieces will be returned by preg_split().
PREG_SPLIT_DELIM_CAPTURE
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned
as well.
PREG_SPLIT_OFFSET_CAPTURE

If this flag is set, for every occurring match the appendant string offset will also be returned.
Note that this changes the return value in an array where every element is an array consisting of
the matched string at offset 0 and its string offset into subject at offset 1.

Return Values
Returns an array containing substrings of subject split along boundaries matched by pattern.

ChangeLog

Version Description
4.3.0 The PREG_SPLIT_OFFSET_CAPTURE was added
4.0.5 The PREG_SPLIT_DELIM_CAPTURE was added
4.0.0 The flags parameter was added

Examples
Example 1742. preg_split() example : Get the parts of a search string

<?php
// split the phrase by any number of commas or space characters,
// which include " ", \r, \t, \n and \f
$keywords = preg_split("/[\s,]+/", "hypertext language, programming");
?>

Example 1743. Splitting a string into component characters

<?php
$str = 'string';
$chars = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY);
print_r($chars);
?>

Example 1744. Splitting a string into matches and their offsets


<?php
$str = 'hypertext language programming';
$chars = preg_split('/ /', $str, -1, PREG_SPLIT_OFFSET_CAPTURE);
print_r($chars);
?>

The above example will output:

Array
(
[0] => Array
(
[0] => hypertext
[1] => 0
)

[1] => Array


(
[0] => language
[1] => 10
)

[2] => Array


(
[0] => programming
[1] => 19
)

Notes
Tip

If you don't need the power of regular expressions, you can choose faster (albeit simpler) alternatives
like explode() or str_split().

S-ar putea să vă placă și