Sunteți pe pagina 1din 4

2.

Regular Sets and Regular Grammars

2.1 Pumping Lemma for Regular Sets


Theorem: Let L be a regular language. Then there is a constant n such that for every string
z in L, and |z| > n, we can break z into three strings, z = uvw in such that
1. v ≠ ϵ
2. |uv| < n
3. For all i > 0, the string uviw is also in L.
Proof:
Suppose L is regular. Then L = L(A) for some DFA A. Suppose A has n states. Now,
consider any string z of length n or more, say z = a1a2a3…am, where m > n and each ai is an
input symbol. For i = 1,2,…,n define state qi to be ẟ�(q0, a1a2a3…ai), ẟ is the transition function
of A, and q0 is the start state of A. That is, qi is the state A is in after reading the first i symbols
of w.
It is not possible for the n+1 different qi's for for i = 1,2,…,n to be distinct, since there
are only n different states. Thus, we can find two different integers i & j with 0 < i < j < n, such
that qi = qj. we can break z = uvw as follows
1. u = a1 … ai
2. v = ai+1 … aj
3. w = aj+1 … am
That is, u takes us to qi once; v takes us from qi back to qi and w is the balance of z.
The relationships among the strings and states are suggested by figure below. Note that u
and w may be empty. However, v cannot be empty, since i is strictly less than j.
v=
ai+1 ….. aj

u= w=
a1 ….. ai aj+1 ….. am
q0 qi= qj qm

Now, consider what happens if the automaton A receives the input uviw for any i > 0.
If i = 0 then the automaton goes from the start state q0 to qi on input u. Since qi is also qj, it
must be that A goes from qi to the accepting state on input w. Thus, A accepts uw.
If i > 0 then A goes from q0 to qi on input u, circles from qi to qi i times on input vi, and
then goes to the accepting state on input w. Thus, for any i > 0, the string uviw is in L.

2.2 Closure - Properties of Regular Sets


Let L1, L2 are two regular sets, we apply some operation between L1, L2 to get the resultant
language L3. If L3 is also regular we say that regular sets are closed under certain operation.
We call it as a closure property.
The following are some closure properties of regular sets
1) Regular sets are closed under union, concatenation and kleene closure.
2) Regular sets are closed under complementation and intersection.
3) Regular sets are closed under difference.
4) Regular sets are closed under substitution and homomorphism.
5) Regular sets are closed under quotient property.
6) Regular sets are closed under Reversal

1
2.2.1 Union, Concatenation and Kleene Closure
Theorem: Prove that regular sets are closed under union, concatenation and kleene closure.
Proof: Let L1, L2 are two regular sets
Let M1, M2 are the finite automata’s accepting L1 and L2
Let r1, r2 are regular expressions denoting L1, L2 i.e.
L1 = L(r1), L2 = L(r2)
Now from definition of regular expression r1+r2, r1r2, r1* are regular expressions
denoting the regular sets L1 U L2, L1L2, L1* respectively. So, regular sets are closed under
union, concatenation and Kleene closure.
2.2.2 Complementation and Intersection
Theorem: Prove that regular sets are closed under complementation.
Proof: Let L be a regular set that accepts DFA M = (Q, ∑, ẟ, q0, F) i.e. L=L(M)
We construct another DFA M' = (Q, ∑, ẟ, q0, Q – F) i.e. M and M' differ only in their
final state. A final state of M' is a non-final state of M and vice versa.
If w ∈ L(M') iff δ(q0, w) ∈ Q – F i.e. iff w ∉ L.
Obviously, M' will not accept any string in L, M' accepts every string in ∑*- L = 𝐿𝐿�. So, 𝐿𝐿�
is regular since it is accepted by M'. Hence regular sets are closed under complementation.
Theorem: Prove that regular sets are closed under intersection.
Proof: Let L1, L2 are two regular sets accepted by DFA M1 = (Q1, ∑, ẟ1, q1, F1) and M2 = (Q2, ∑,
ẟ2, q2, F2) such that L1 = L(M1) and L2 = L(M2)
From Demorgan’s law
L1 ∩ L2 = ∑* - [(∑* - L1) U (∑* - L2)]
= ∑* - [𝐿𝐿�1 U 𝐿𝐿�2]
= ∑* - L 3
= 𝐿𝐿�3
So L1 ∩ L2 is regular. So regular sets are closed under intersection.
2.2.3 Difference
Theorem: Prove that regular sets are closed under difference
Proof: Let L1, L2 are two regular sets accepted by DFA M1, M2 such that
L1 = L(M1) and L2 = L(M2).
From demorgan’s law
L1 – L2 = L1 ∩ 𝐿𝐿�2
We observe that regular sets are closed under complementation and intersection. So

L1 ∩ 𝐿𝐿2 is regular. Hence L1 – L2 is regular. So regular sets are closed under difference.
2.2.4 Substitutions and Homomorphism
Theorem: The class of regular sets are closed under substitution.
Proof: Let R be the regular set over ∑ then R ⊆ ∑*. For each a in ∑ we define Ra ⊆ ∆* be a
regular set. Now we define substitution function f as f: ∑  ∆*
Now we select the regular expression denoting R and each Ra. Now replace the
occurrence of a1, a2, …, an in R with w1, w2, … , wn, where each wi is Rai
The resultant string obtained is denoting a regular expression, we can prove it by
mathematical induction on number of operators in the regular expression.
Theorem: The class of regular sets are closed under homomorphism and inverse
homomorphism.
Proof: We know that regular sets are closed under substitution. Homomorphism is a special
kind of substitution. So regular sets are closed under homomorphism.
To show closure under inverse homomorphism, Let M = (Q, ∑, ẟ, q0, F) be a DFA
accepting L, and let h be a homomorphism from ∆ to ∑*.

2
We construct a DFA M' that accepts h-1(L) by reading a symbol a in ∆ and simulating
M on h(a). Formally, let M' = (Q, ∆, ẟ', q0, F) and define δ'(q, a) for q in Q and a in ∆ to be δ(q,
h(a)).
It is easy to show by induction on |x| that δ'(q0, x) = δ(q0, h(x)). Therefore M' accepts
x if and only if M accepts h(x) i.e. L(M') = h-1(L(M)).
2.2.5 Quotients of Languages
Theorem: Prove that regular sets are closed under quotient property.
Proof: Let L1, L2 are two regular languages
Let M1, M2 are two finite automata’s accepting L1, L2.
Let M1 is (Q, ∑, ẟ, q0, F)
We define a new finite automata M'(Q, ∑, ẟ, q0, F') to accept L1/L2.
We observe that the only difference between M1 and M' is final state.
The finial state of M' are found in such a way that when
δ(q0, y) = F in M2 and
δ(q0, xy) = F in M1 then
δ(q0, x) = F' in M'.
In this way M' is constructed to accept L1/L2. So, L1/L2 is regular. Regular sets are
closed under quotient property.
2.2.6 Reversal
Theorem: If L is a regular language, so is LR.
PROOF: Assume L is defined by regular expression E. The proof is a structural
induction on the size of E. We show that there is another regular expression ER such that
L(ER) = (L(E))R that is, the language of ER is the reversal of the language of E.
BASIS: If E is ϵ, Ф, or a, for some symbol a, then ER is the same as E. That is, we know {ϵ}R,
{Ф}R, or {a}R = {a}.
INDUCTION: There are three cases, depending on the form of E.
1. E = E1 + E2. Then ER = E1R + E2R. The justification is that the reversal of the union of
two languages is obtained by computing the reversals of the two languages and
taking the union of those languages.
2. E = E1E2. Then ER = E1RE2R. Note that we reverse the order of the two languages, as
well as reversing the languages themselves. For instance, if L(E1) = {01, 111} and
L(E2) = {00, 10}, then L(E1E2) = {0100, 0110, 11100, 11110}. The reversal of the latter
language is {0010, 0110, 00111, 01111}. If we concatenate the reversals of L(E2) and
L(E1) in that order, we get {00, 01} {10, 111} = { 0010, 00111, 0110, 01111} which is
the same language as (L(E1E2))R. In general, if a word w in L(E) is the concatenation
of w1 from L(E1) and w2 from L(E2), then wR = w2Rw1R.
3. E = E1*. Then ER = (E1R)*. The justification is that any string w in L(E) can be written
as w1w2…wn, where each wi is in L(E). But wR = wnR wn-1R… w1R. Each wiR is in L(ER), so
wR is in L((E1R)*). Conversely, any string in L((E1R)*) is of the form w1w2…wn, where
each wi is the reversal of a string in L(E1). The reversal of this string, wnR wn-1R… w1R,
is therefore a string in L(E1)*, which is L(E). We have thus shown that a string is in
L(E) if and only if its reversal is in L((E1R)*).

2.3 Applications of Regular Expressions


A regular expression that gives a picture of the pattern we want to recognize in the
text. The regular expressions are then compiled into deterministic or nondeterministic
automata which are then simulated to produce a program that recognizes patterns in text.

3
2.3.1 Regular Expressions in UNIX
The most real applications deal with the ASCII character set. Our examples have
typically used a small alphabet, such as {0, 1}. The existence of only two symbols allowed us
to write succinct expressions like 0+1 for any character. However, if there were 128
characters, say, the same expression would involve listing them all, and would be highly
inconvenient to write. Thus, UNIX regular expressions allow us to write character classes to
represent large sets of characters as clearly as possible.
2.3.2 Lexical Analysis
One of the oldest applications of regular expressions was in specifying the
component of a compiler called a lexical analyzer. This component scans the source program
and recognizes all tokens, those substrings of consecutive characters that belong together
logically. Keywords and identifiers are common examples of tokens.
The UNIX command lex and its flex, accept as input a list of regular expressions each
followed by a bracketed section of code that indicates what the lexical analyzer is to do when
it finds an instance of that token. These commands have been found extremely useful
because the regular expression notation is exactly as powerful as we need to describe
tokens. These commands can use the regular-expression-to-DFA conversion process to
generate an efficient function that breaks source programs into tokens.
2.3.3 Finding Patterns in Text
An automaton could be used to search efficiently for a set of words in a large
repository such as the Web. The regular expression notation is valuable for describing
searches for interesting patterns.
The general problem for which regular expression technology has been found useful
is the description of a vaguely defined class of patterns in text. The vagueness of the
description virtually guarantees that we shall not describe
the pattern correctly at first - perhaps we can never get exactly the right description. By
using regular expression notation, it becomes easy to describe the patterns at a high level,
with little effort, and to modify the description quickly when things go wrong. A compiler
for regular expressions is useful to turn the expressions we write into executable code.

S-ar putea să vă placă și