Sunteți pe pagina 1din 2

Oracle and Regular Expressions - James Koopmann

UNIX comes to Oracle in the form of regular expressions to increase the power of searching.
In the last article, I wrote on case insensitivity I made the point that the only way to extend an
application to be completely case insensitive was to get away from the LIKE comparison and
begin to use regular expression matching using the REGEXP_LIKE function. If you missed this
article, you can get to it at the following URL.
http://www.databasejournal.com/features/oracle/article.php/3494646. I thought it would be
beneficial to dedicate an article on regular expression matching so we, myself included, get
acquainted or re-acquainted with the power these expression can give us.
I was introduced to regular expressions a long time ago when programming in Perl or UNIX
shell scripting with sed and awk commands. In addition, if you use the vi editor you may be
familiar with regular expressions and pattern matching. Regular expressions are in effect a
small programming language that is specific to matching character patterns in a text string. To
be frankly honest when I started using pattern matching in my Perl days it was quite
confusing. Mostly because there seemed to be so many alterations and options, I never knew
where to start. After a while, though I did get the hang of it. And you will too. Take some time
to go through this article and learn how powerful this pattern matching can be.
As stated, regular expressions are designed to match character strings. Stated another way,
and possibly a bit clearer, is that regular expressions are templates that character strings are
compared against to see if the character string matches or fits the template. About the closest
example, or comparison, that I can give is an example that uses the LIKE comparison, which
does some forms of pattern or template matching. For example, if we wanted to search a
character string for the existence of 'Deborah' we could issue the following SQL.
SELECT text FROM my_text WHERE text LIKE '%Deborah%';
This in effect will return any row where 'Deborah' is within the character string text. To get a
quick comparison on how to do this with regular expression matching you would issue the
following SQL that contains the new function REGEXP_LIKE.
SELECT text FROM my_text WHERE REGEXP_LIKE (text, '\Deborah\');
So why would we want to use regular expressions instead of just using the LIKE comparison
that we are all too familiar with? Mostly because as soon as you start asking higher level
questions about the "likeness" of a string to a pattern you will most likely end up with a
WHERE condition that will have multiple 'OR' and 'LIKE' matches to weed through to find what
you are truly looking for. With regular expressions, you can typically just provide one pattern to
the comparison. Take for example the previous example. Suppose now that we need to search
a novel and the main character Deborah is also referred to as 'Debbie'. Now we could modify
our condition to look for text where it is LIKE '%Deb%' and we might be able to solve the
problem. However, what happens if this particular novel also discusses finances of our highclass society girl. We might end up finding certain sentences that contained words such as
'Debt', or 'Debutante'. We now have a very complex issue. Therefore, to validate that we
actually find those text that actually refer to Deborah in the script we should use a regular
expression. Here is the SQL you should issue. It now finds all of the sentences where the novel
actually talks about Deborah.
SELECT text FROM my_text WHERE REGEXP_LIKE (text, '\Deb(bie|orah)\');

You can see that within the pattern matching we basically programmed the option for the text
to contain 'Debbie' or the alternative 'Deborah' by giving the option (bie|orah) in the
expression. This is a basic form of providing a template to match a string against.
Another complex example where you can only get by with expressions is finding email
addresses. You could quickly do this by providing the following expression. Basically, you are
building the portions of the email before the '@' sign and after. Do take a close look at this
pattern as it requires the three distinct pieces of an email address and allows for periods '.' as
well as underscores '_' in the name. All of which are quite valid.
SELECT text FROM my_text
WHERE REGEXP_LIKE (text, '[A-Z0-9._%-]+@[A-Z0-9._%-]+\.[A-Z]{2,4}');
Therefore, if you now only wanted to search for '.com' email addresses you could change the
SQL to the following.
SELECT text FROM my_text WHERE REGEXP_LIKE (text, '[A-Z0-9._%-]+@[A-Z0-9._%-]
+\.com');
Another often-difficult item is to find if there is a number in a string of text or to validate if a
character string is numeric. To find if a string contains a number that follows the pattern of
containing a decimal place, you could use this method. Keep in mind that numbers that have a
decimal point should have a valid number after the decimal point, and I have specified a '+' to
indicate one or more numbers.
SELECT text FROM my_text WHERE REGEXP_LIKE (text, '[0-9]?\.[0-9]+');
Regular expressions are quite powerful when looking for texts patterns or validating a specific
pattern such as a phone number, email address, IP numbers, or just about any pattern that
needs to be checked. On the flip side of this is checking for invalid formats and patterns of
characters. In our previous example, we could easily have searched for text where 'Debbie' is
misspelled as 'Debbbie' or 'Debie'.
SELECT text FROM my_text WHERE REGEXP_LIKE (text, '(Debbbie)');
On the other hand, to get those with really heavy fingers where the 'b' may have been keyed
in more than 3 times we could use the following
SELECT text FROM my_text WHERE REGEXP_LIKE (text, '(Deb)(b){2,}');
Regular expressions may seem very convoluted at first. It even took me a bit of effort to
relearn these patterns, as I have not used them for a bit. However, after taking the time to
learn how to construct them you will be very thankful you did when confronted with difficult
search criteria.
So where do you learn how to construct regular expressions? This really is not as difficult as
you might think. Regular expressions have been around for a long time in the UNIX world and
thus there is a lot of information on the net. The Oracle manuals have a few pages, about 5,
that are dedicated to regular expressions so your best bet is to start there and supplement
heavily from the net. You can easily search on Yahoo, Google, or your favorite search engine
for tutorials and web sites strictly dedicated to regular expressions.

S-ar putea să vă placă și