Sunteți pe pagina 1din 5

There is a pumping lemma for context free

languages analogous to the pumping lemma


for regular languages. The goal of this
lecture is to [inaudible] improve this
lemma, which is naturally more complicated
than the pumping lemma for regular
languages. And then use it to show certain
languages are not context free. Let's
review the pumping lemma for regular
languages. It said that any sufficiently
long string W in a regular language has
some short non empty piece near the
beginning that we could pump. That is, we
could repeat it as many times as we liked,
including zero. And the resulting stream
would also be in the language. We found
the strength of pump by looking for the
first state to repeat as a finite
automaton process that's input W. The
pumping level of a context free language
says that we can find two pieces in any
long string Z and pump them in tandem.
That is, we can repeat each of them I
times or any I equal to the greater than
zero, and the result will be in the same
language. We'll also see that these two
strings are close together in Z, and one
can be empty but not both. Here's the
statement of the ? For context free
languages. We can see it as the same sort
game with an adversary that we talked
about in connection with regular
languages. For every context free language
L, that is you get to pick L presumably
the one you want to show isn't really
context free, there's an integer N. This
is something the adversary gets to pick
but once it's picked it's fixed for the
rest of the game. Such that for every
string Z and L of length at least N And
here, you get to pick the Z to focus on.
You can break Z into five pieces - U, V,
W, X, Y, such that three things are true.
The adversary gets to pick how Z is
broken, but subject to two constraints
we'll see in just a moment. And
incidentally, V and X are the substrings
that get pumped. The first constraint is
that the middle three component, V, W, and
X are short, no longer than the length of
them put together. Remember that V and X
will get pumped, so that's a. Not only are
they short but they appear within a
bounded distance from each other within Z.
The second condition that the adversary
has to respect is that V and X cannot both
be empty, although one can. And if all the
above is satisfied and L really is context
free then for all integers I equal to or
greater than zero: U, V to the I, W, X to
the Y, Y is also an L. We win the game and
prove is not context free by allowing the
adversary's choices of N and. Breakup of
Z, subject to the constraints one and two.
And then picking an I, such that UV to the
IW, X the IY is not an L. The proof the,
the pumping lemma for context free
languages starts with a [inaudible] normal
form grammar for L. Technically, it is for
L minus epsilon, but the empty string will
never meet the condition of being of
length at least N. So its presence of
absence doesn't matter. The [inaudible]
grammar has M variables for sub M. So
we're going to let M be two to the M. And
now lets consider any string, z and l, of
length at least n. That's two to the m
again. We're going to prove first that any
[inaudible] tree in the c and f grammar,
for string z of length at least n equals
two to the m, must have a path of length m
plus two or more from the root to a leaf.
Well, actually. Proof the contra-positive.
That is, suppose there's a parse tree Z
with no path longer than M plus one. Such
a path has M nodes labeled by variables at
the top, or the beginning, and one node
labeled by a terminal at the end. That is,
here is a typical path. That is, here is a
typical path. It will have variables here,
here, and here and then a terminal there.
Okay. If we forget about the terminals at
the leaves. Part strain at CNF grammar is
a binary tree. Let's say at each level the
number of notes is almost gonna double.
Thus there is only one note at the top
level, the root, there can be only two
notes at the next level four at the third
and eight at the fourth and so on. In
general there will be at most two to the
power M -one at the end level. But this
tree has it most m levels with variables
and then along each. Variables. The last
variable has one trial with a terminal as
its label. The largest number of leaves
occur if all the paths on the tree have
exactly N variables. If some paths
terminate before level N, there will be
fewer leaves. Thus there are at most two
to the N minus one nodes that are labeled
by variables and have a single trial with
a terminal label. And thus there are at
most two to the N minus one leaves.
Finally, we therefore can conclude that
the length of the yield is at most. Most
of the two to the M minus one power M
minus one. [inaudible] Power of m minus
one is n over two. So z is of length n
that cannot be the yield of any three.
That has passed limited to length m plus
one or less. Therefore, we conclude that
somewhere on the [inaudible] is the path
of this m plus two. Now we're ready to
prove the pumping level. We just proved
that Z's parsed tree has a path of length
at least M+2. Only the last [inaudible] on
any path can be labeled by a terminal. So
there are at least M+1 [inaudible] with
variables along this path. Let's focus on
one of the longest paths on the. Surely
there are at least ten plus one variables
along this longest path. Remember that M
is the number of variables of the grammar.
So, along this path, there are two low
nodes labeled by the same variable, call
it A. In fact, to make sure we pump short
pieces, let's look only at the bottom most
M+1 variables along this path, which could
be much longer than M+1. And we know that
two of them must be the same. On the next
slide we'll see what the pause stream must
look like. Here's the picture of the
[inaudible] tree for z. We've showed the
path we've focused on and the lowest
repeating variables along that path. The
purple tree is rooted at the lower a. And
the yellow tree with the purple tree
within it is rooted up at upper a. Let W
be the yield of the purple tree. V and X
the portions of the yield with the yellow
tree that precede and follow W
respectively. And let U and V be the
portions of Z that precede V and follow X
respectively. Let's look at the yellow.
Since the path shown is as long as any
other, and that path has at most N+1
variables, we know by the lemma one that
we just proved that the yield of the
yellow plus purple is no longer then two
to the power M or N. That is, the length
of VWX is no more than N. But v and x both
can't be empty. Why? That's a useful
property of [inaudible] normal form of
grammars. Since the two A's shown in the
tree are different nodes, the upper A must
have a child to the left or right of the
path shown. That's a consequence of the
fact that we eliminated unit production so
all bodies that have variables have at
least two of them. Moreover, once we have
a variable not on the path, there are no
epsilon productions so we must generate
from this variable at least one terminal.
That is all we need to conclude that
either V or X or both have length at least
one. Now we can take advantage of the fact
that we have two As along the path. We can
get rid of V and X by pumping zero times.
That is, we know the purple tree can
substitute for the yellow because both
trees have the same variable A at the
root. If the original on the left
satisfies the conditions of a parse tree,
that is, every interior node is the head
of a production whose body is the labels
of his children, then the same will be
true of the smaller parse tree on the
right. We conclude that UWY is also in the
language. Well, we could pump twice, that
is, replace the purple tree by the yellow
which has the purple within it and you get
a par [inaudible] whose yield is uvvwxxy.
And in the previous tree representing
pumping twice, we could again replace the
purple tree by the yellow, and get a
bigger tree, whose yield is U, three V's,
W, three X'es, and then Y. In the same
manner, we could get parse trees for all
strings [inaudible] form. U, V to the I,
W, X, the IVY, for any integer I equal to
or greater than zero. These strings are
therefore all in the language L. That
proves the pumping [inaudible] for context
free languages. Let's look at an example
of how the pumping [inaudible] can be used
to show a language not to be context free.
This language which involves matching the
counts of two blocks of zeros is context
free. A grammar of PDA4 is easy to
construct. The ideas are very much like
what we saw for the language O to the N,
one to the N. But give the language three
blocks of zeros, all of which must be the
same length and we're suddenly outside
what context free grammars can do. We can
prove that using the problem for
context-free languages. So we'll pick this
language L. And the adversary now gets to
pick N. We don't know N, but we do know
it's fixed for the rest of the game. We
get to pick Z, so let's pick zero to the
N, one, zero to the N, one, zero to the N.
That is, each block of zeroes is of length
equal to whatever N the adversary picked.
Now the adversary gets to break I, G up
into Z equals U, V, W, X, Y. But he must
chose these substrings, such that DWX
together are no longer than N and D and X
can not both be picked to be the empty
string. There two cases depending upon
whether the advisory picks D and X to have
zeros or not. The first case, suppose
there are no zeros among D and X and since
they can not both be empty. There must me
at least one, one among them. But then if
we pump zero times to get the string UWY.
We know that there is at most one, one in
this string. The pumping number claims it
as in the language l. But it can't be,
because all strings in l have exactly two
1s. In the second and last case, v and x
have exactly one zero among them. D W X
has length at most N. So these three sub
strings cannot extend from the first block
of 0s to the last because N plus two
positions separate those blocks. So again
consider U, W, I which if L is context
free it must be an L. Removing V and X
must leave at lease one of the three
blocks of N 0's intact, so it still has N
0's but V and X have at least one zero so
when U, W, Y at least one of the blocks of
0's is fewer than N 0's. We conclude that
in this case too U, W, Y cannot be an L
and those L cannot be context free.

S-ar putea să vă placă și