Sunteți pe pagina 1din 35

Regular Languages,

Regular Operations
September 11, 2001

Agenda
Today

Regular languages
Finite languages are regular

Regular operations on languages


Union ()
Concatenation ()
Kleene star (*)

For next time:

Read 1.3 and handout on minimization

Thursday, 9/20 (revised ): HW1 collected


2

Definition of Regular
Language
Recall the definition of a regular language:
DEF: The language accepted by an FA M
is the set of all strings which are accepted
by M and is denoted by L (M).
Would like to understand what types of
languages are regular. Languages of this
type are amenable to super-fast
recognition of their elements
Would be nice to know for example, which
of the following are regular:
3

Language Examples

Unary prime numbers:


{ 11, 111, 11111, 1111111, 11111111111, }
= {12, 13, 15, 17, 111, 113, }
= { 1p | p is a prime number }

Unary squares:
{, 1, 14, 19, 116, 125, 136, }
= { 1n | n is a perfect square }

Palindromic bit strings:


{, 0, 1, 00, 11, 000, 010, 101, 111, }
= {x {0,1}* | x = xR } o

Will explore whether or not these are regular


in future.
4

Finite Languages
All the previous examples had the following
property in common: infinite cardinality
NOTE: The strings which made up the
language were finite (as they always will
be in this course); however, the collection
of such strings was infinite.
Before looking at infinite languages, should
definitely look at finite languages.

Languages of Cardinality 1
Q: Is the singleton language
containing one string regular? For
example, is
{ banana }
regular?

Languages of Cardinality 1
A: Yes.

Q: Whats, wrong with this


example?
7

Languages of Cardinality 1
A: Nothing, really. This an example of a
nondeterministic FA. This turns out to be
the most concise way to encapsulate the
language { banana }

But we will deal with nondeterminism in


coming lectures. So:
Q: Is there a way of fixing this and making it
deterministic?
8

Languages of Cardinality 1
A: Yes, just add a fail state q7; I.e., put
a state that sucks in all strings different
from banana for all eternity unless
they happen to be the banana
prefixes {, b, ba, ban, bana, banan}.

Two Strings
Q: How about two strings? For
example
{ banana, nab } ?

11

Two Strings
A: Just add another route:

12

Arbitrary Finite Number of


Strings
Q1: How about more? For example
{ banana, nab, ban, babba } ?
Q2: Or less (the empty set):
= {} ?

13

Arbitrary Finite Number of


Strings
A1:

14

Arbitrary Finite Number of


Strings: Empty Language
A2: Build a 1-state automaton
whose accept states set F is
empty!

15

Arbitrary Finite Number of


Strings
THM: All finite languages are regular.
Proof : Can always construct a tree whose leaves
are word-ending. In our example the tree is:
b
a
n
a

n
a
b

b
a
into

n
Now make word endings
accept states, add a
a
fail sink-state and
add links to the fail state to
finish the construction. 

16

Infinite Cardinality
Q: Are all regular languages finite?

17

Infinite Cardinality
A: No! Many infinite languages are regular.
Common Mistake 1: The strings of regular
languages are finite, therefore the regular
languages must be finite.
Common Mistake 2: Regular languages are
by definition accepted by finite
automata, therefore regular languages are
finite.
Q: Give an example of a infinite but regular
language.
18

Infinite Cardinality

bit strings with an even number of bs

Simplest example is

many, many more


Home exercise: think of a criterion for
non-finiteness

19

Regular Operations
You may have come across the regular
operations when doing advanced searches
utilizing programs such as emacs, egrep,
perl, python, etc. There are three basic
operations we will work with:
1. Union
2. Concatenation
3. Kleene-star
And a fourth definable in terms of the previous:
4. Kleene-plus

20

Regular Operations
Summarizing Table
Operatio Symbo
n
l
Union
Concatenati
on

Kleenestar
Kleene-

UNIX
version

Meaning

match one of
the patterns

implicit in
UNIX

match
patterns in
sequence

Match pattern
0 or more
times

*
+

Match pattern
21
1 or more

Regular operations - Union


UNIX: to search for all lines containing
vowels in a text one could use the
command
egrep -i `a|e|i|o|u
Here the pattern vowel is matched
by any line containing one of a, e, i,
o or u.
Q: What is a string pattern?
22

String Patterns
A: A good way to define a pattern is
as a set of strings, i.e. a language.
The language for a given pattern is
the set of all strings satisfying the
predicate of the pattern.
EG: vowel-pattern =
{ the set of strings which
contain at least one of: a e i o u }
23

UNIX patterns vs.


Computability patterns
In UNIX, a pattern is implicitly
assumed to occur as a substring of
the matched strings.
In our course, however, a pattern
needs to specify the whole string,
and not just a substring.

24

Regular operations - Union


Computability: union is exactly what
we expect. If you have patterns
A = {aardvark}, B = {bobcat},
C = {chimpanzee}
union the patterns together to get
AB C = {aardvark, bobcat,
chimpanzee}
25

Regular operations Concatenation


UNIX: to search for all consecutive
double occurrences of vowels, use:
egrep -i `(a|e|i|o|u)(a|e|i|o|u)

Here the pattern vowel has been


repeated. Parentheses have been
introduced to specify where exactly
in the pattern the concatenation is
occurring.
26

Regular operations Concatenation


Computability. Consider the
previous result:
L = {aardvark, bobcat, chimpanzee}
Q: What language results when we
concatenate L with itself obtaining
LL ?
27

Regular operations Concatenation


A: LL =
{aardvark, bobcat, chimpanzee}{aardvark, bobcat, chimpanzee}

=
{aardvarkaardvark, aardvarkbobcat, aardvarkchimpanzee,
bobcataardvark, bobcatbobcat, bobcatchimpanzee,
chimpanzeeaardvark, chimpanzeebobcat,
chimpanzeechimpanzee}

Q1: What is L ?
Q2: What is L ?
28

Algebra of Languages
A1: L = L. In general, is the identity in

the algebra of languages. I.e., if we think of


concatenation as being like multiplication,
acts like the number 1.
A2: L = . Opposite to , acts like the
number zero obliterating everything it is
concatenated with.
Note: We can carry on the analogy between
numbers and languages. Addition becomes
union, multiplication becomes concatenation.
This forms a so-called algebra.

29

Regular operations
Kleene-*
UNIX: search for lines consisting purely of
vowels (including the empty line):
egrep -i `^(a|e|i|o|u)*$

NOTE: ^ and $ are special symbols in UNIX


regular expressions which respectively
anchor the pattern at the beginning and
end of a line. The trick above can be used
to convert any Computability regular
expression into an equivalent UNIX form.
30

Regular operations
Kleene-*
Computability: Suppose we have a
language
B = { ba, na }
Q: What is the language B * ?

31

Regular operations
Kleene-*
A:
B * = { ba, na }*=
{ ,
ba, na,
baba, bana, naba, nana,
bababa, babana, banaba, banana,

nababa, nabana, nanaba, nanana,


babababa, bababana, }
32

Regular operations
Kleene-+
Kleene-+ is just like Kleene-* except that the
pattern is forced to occur at least once.
UNIX: search for lines consisting purely of vowels
(not including the empty line):
egrep -i `^(a|e|i|o|u)+$

Computability: B+ = { ba, na }+=


{ ba, na,
baba, bana, naba, nana,
bababa, babana, banaba, banana,

nababa, nabana, nanaba, nanana,


babababa, bababana, }

33

Generating the Regular


Languages
The real reason that regular languages are
called regular is the following:
THM: The regular languages are all those
languages which can be generated
starting from the finite languages by
applying the regular operations.
This will be proved in the coming lectures.
Q: Can we start with even more basic
languages than arbitrary finite
languages?
34

Generating the Regular


Languages
A: Yes. We can start with languages consisting
of single strings which are themselves just a
single character. These are the atomic
regular languages.
EG: To generate the finite language
L = { banana, nab }
we can start with the atomic languages
A = {a}, B = {b}, N = {n}.
Then we can express L as:
L = (B A N A N A)

(N A B )
35

Blackboard Exercises
Express the DFA patterns from the
previous board-exercises using
regular operations in both UNIXstyle and Computability-style.

36

S-ar putea să vă placă și