Automata and Formal Languages: CS138, Winter 2006

CS138, Wim van Dam, UCSB
Automata and
Formal Languages

CS138, Winter 2006
Wim van Dam
Room 5109, Engr. I
vandam@cs.ucsb.edu
http://www.cs.ucsb.edu/~vandam/
Formalities
This Friday the Midterm will be returned.

The next Homework 3 will be announced on Friday,
and will be due on the next Friday (instead of Monday).

Questions?
Context Free Languages
Having dealt with Regular Languages, the coming weeks
we will discuss the power of context free languages and
the pushdown automata that accept them.

This week: Context Free Languages, Chapter 5 in
An Introduction to Formal Languages and Automata by
Peter Linz [Reader, pp. 2170] and Section 6.1 Methods
for Transforming Grammars, ibidem [Reader pp. 7180]

The notation of Linz is somewhat different from Sipser
SipserLinz Dictionary
Sipser: Linz:

Empty string
Conditional in sets { x | xeN } { x : xeN }
Letters alphabet Terminals T
Grammars
Linz defines grammars as follows:
A grammar G is defined by (V,T,S,P), where
V is a finite set of variables
T is a finite set of terminal symbols (think alphabet )
SeV is the special start variable
P is a finite set of productions

Each grammar G defines a language L(G), which is the
set of strings in T* (=*) that G can generate from S.
It is all about the production rules.
Context Free Grammars
A Context Free Grammar (V,T,S,P) is a grammar
where all production rules are of the form:
A x, with AeV and xe(VT)*

Example 5.1: Let G = ({S}, {a,b},S,P) with for P:
SaSa, and SbSb, and S.
Some derivations from this grammar:
S aSa aaSaa aabSbaa aabbaa
S bSb baSab baab, and so on.

In general S . ww
R
for we{a,b}*.

Context Free Languages
A single step derivation consist of the substitution of a
variable by a string according to a substitution rule in P.
(Note that the rules are described using single arrows ,
while the derivations themselves use double arrows .)

A sequence of several derivations (or none) is indicated
by * . Previous example: S * aabbaa.

L is a Context Free Language if and only if there is a
context free grammar G=(V,T,S,P) such that:
L = L(G) = { weT* : S * w }
Why Context Free Languages?
Context-free languages allow us to describe languages
that are nonregular like { 0
n
1
n
: n>0}.

CLFs are complex enough to give us a model for natural
languages (cf. Noam Chomsky) and programming languages.
The theory of CFLs is very closely related to the problem
of parsing a computer program.
Later we will see that CFLs are the languages that can
be recognized by automata that have one single stack:
{ 0
n
1
n
: n>0 } is a CFL
{ 0
n
1
n
0
n
: n>0 } is not a CFL
Some Remarks
The language L(G) = { weT* : S * w } contains
only strings of terminals, not variables.
Notation: We summarize several rules for one variable:
A B
A 01 by A B | 01 | AA
A AA
Question: What is the CFG ({S},{(,)},S,P) that produces
the language of correct parentheses like (), (()), or ()(())?
Answer: S (S)|SS| [see Example 5.4]
Another CFG Example
Consider the CFG G=({S,Z},{0,1},S,P) with
P: S 0S1 | 0Z1
Z 0Z |
What is the language generated by this G?

Answer: L(G) = {0
i
1
j
| i>j }

Specifically, S yields the 0
j+k
1
j
according to:
S 0S1 0
j
S1
j

0
j
Z1
j
0
j
0Z1
j

0
j+k
Z1
j
0
j+k
1
j
= 0
j+k
1
j
Automata and
Formal Languages

CS138, Winter 2006
Wim van Dam
Room 5109, Engr. I
vandam@cs.ucsb.edu
Last Monday
A Context Free Grammar (V,T,S,P) is a grammar
where all production rules are of the form:
A x, with AeV and xe(VT)*

Example 5.1: Let G = ({S}, {a,b},S,P) with for P:
SaSa, and SbSb, and S.
In general we have S * ww
R
for we{a,b}*,
hence L(G) = { ww
R
: we{a,b}*}.

Questions
Can you make Context Free Grammars for the following?
a) { 0
n
1
n
: n0}
b) { 0
n
1
m
: n,m0}
c) Arithmetic a,b,c formulas like a+bc+a (without ())

Answers:
a) S 0S1 |
b) S 0S | R and R 1R |
c) S a | b | c | S+S | SS
Linear Grammars
A grammar is linear if and only if in every production rule at
most one variable occurs in the right hand side.
Example: S (S)|SS| is not linear, but S 0S1| is.

A grammar (V,T,S,P) is right-linear if all production rules
are of the form A xB or A x with A,BeV and xeT*.
A grammar (V,T,S,P) is left-linear if all production rules are
of the form A Bx or A x with A,BeV and xeT*.

Note: All regular languages can be described by a right-
linear grammar (or a left-linear one), and vice versa.
Non Linear Grammars
Most CFGs will not be linear, which means that in the
derivation of a word we will often have more than one
variable in the sentential forms (example: S * xAyBz).
Note: in a derivation S w
1
w
2
w
n
w, all
strings S,w
1
,,w
n
e(VT)* are called sentential forms.

A derivation is leftmost (rightmost) if in each derivation
step xy the leftmost (rightmost) variable is replaced.

Requiring leftmost derivations does not limit the power
of a CFG but creates some order in the many ways one
can derive a single word. See, for example, S (S)|SS|.
Order is Unimportant
Take the CFG S 0 | 1 | (S) | (S)v(S) | (S).(S), which
generates all proper Boolean formulas that use 0, 1,
, v, ., ( and ).
Then (0)v((0).(1)) can be derived in the following ways
[leftmost] S (S)v(S) (0)v(S) (0)v((S).(S))
(0)v((0).(S)) (0)v((0).(1))

[rightmost] S (S)v(S) (S)v((S).(S)) (S)v((S).(1))
(S)v((0).(1)) (0)v((0).(1))

[something else] S (S)v(S) (0)v(S) (0)v((S).(S))
(0)v((S).(1)) (0)v((0).(0))

The fact that it is irrelevant in which order we use the
production rules is expressed by the derivation tree.
Derivation Trees
The derivation S * (0)v((0).(1)) can be expressed by
the following derivation tree:
S
0
0
1
S S ( ( ) ) v
S S ( ( ) ) .
Reading Tree Leaves
Application of a production rule A x is
represented by node A with children x.
(Note that the tree is ordered:
the ordering of the nodes matters.)

The root has variable S.

The yield of S is
expressed by the
leaves of the tree.
S
0
0
1
S S ( ( ) ) v
S S ( ( ) ) .
Defining a Tree
Definition 5.3: For a CFG G=(V,T,S,P) a derivation
tree has the following properties:

1) The root is labeled S
2) Each leaf is from T{}
3) Each interior node is from V
4) If node has label AeV and
its children a
1
a
n
(from L to R),
then P must have the rule
A a
1
a
n
(with a
j
eVT{})
5) A leaf labeled is a single
child (has no siblings).

For partial derivation trees we have:
2a) Each leaf is from VT{}
S
0
0
1
S S ( ( ) ) v
S S ( ( ) ) .
Purpose of Trees
Looking at a tree you see the derivation without the
unnecessary information about its order.

Theorem 5.1: Let G be a CFG. We have weL(G) if and
only if there exists a derivation tree of G with yield w.
Also, y is a sentential form of G if and only if there exists
a partial derivation tree for G.
Remember: the root always has to be S.
Automata and
Formal Languages

CS138, Winter 2006
Wim van Dam
Room 5109, Engr. I
vandam@cs.ucsb.edu
Formalities
Homework will be announced later today.
This homework will be due Friday afternoon.

The Midterm has been graded, ask Yen Ting for it.

The scores are as follows
The Midterm
0
10
20
30
40
50
60
70
0 10 20 30 40 50 60
Midterm
H
W
1
+
2
Midterm scores:
average: 42.9
median: 43

minmax: 2355

Correlation between
HWs 1+2 and the
Midterm: 0.60
SipserLinz Dictionary
Sipser: Linz:

Empty string
Conditional in sets { x | xeN } { x : xeN }
Letters alphabet Terminals T
Union in RE ab a+b
Parsing
Generative aspect of CFG: By now it should be clear how,
from a CFG G, you can derive strings weL(G).

Analytical aspect: Given a CFG G and strings w, how do
you decide if weL(G) and if so how do you determine
the derivation tree or the sequence of production rules
that produce w? This is called the problem of parsing.
Exhaustive Parsing
Exhaustive parsing is a form of top-down parsing where
you start with S and systematically go through all possible
(say leftmost) derivations until you produce the string w.
(You can remove sentential forms that will not work.)

Example 5.7: Can the CFG S SS | aSb | bSa |
produce the string w = aabb, and how?
After one step: S SS or aSb or bSa or .
After two steps: S SSS or aSbS or bSaS or S,
or S aSSb or aaSbb or abSab or ab.
After three steps we see that: S aSb aaSbb aabb.
Flaws of Exhaustive Parsing
Obvious flaw: it will take a long time and a lot of memory
for moderately long strings w: It is inefficient.

For cases weL(G) exhaustive parsing my never end.
This will especially happen if we have rules like A that
make the sentential forms shrink so that we will never
know if we went too far with our parsing attempts.
Similar problems occur if the parsing can get in a loop
according to A B A B
Fortunately, it is always possible to remove problematic
rules like A and AB from a CFG G.

Exhaustive yet Finite Parsing
Theorem 5. 2: Let G be a CFG without rules of the form
A and AB (with A,B e V), then on any string w,
the exhaustive parsing method either produces w or halts
eventually such that we can conclude weL(G).
This derivation will require no more than 2
|w|
rounds.

The complexity of this algorithm is still exponential in the
length |w| of the string. We can do much better though:
Theorem 5. 3: For every CFG G there exists a parsing
algorithm that runs in time O(|w|
3
).
(This algorithm uses dynamic programming.)
Simple Grammars
Definition 5.4: A CFG (V,T,S,P) is a simple grammar
(s-grammar) if and only if all its productions are of the form
A ax with
AeV, aeT, xeV* and any pair (A,a) occurs at most once.

Note, for simple grammars a left most derivation of a
string weL(G) is straightforward and requires time |w|.

Example: Take the s-grammar S aS|bSS|c with aabcc:
S aS aaS aabSS aabcS aabcc.
Ambiguity
A string weL(G) is derived ambiguously if it has
more than one derivation tree (or equivalently: if it has
more than one leftmost derivation (or rightmost)).

A grammar is ambiguous if some strings are derived
ambiguously.
Typical example: rule S 0 | 1 | S+S | SS

S S+S SS+S 0S+S 01+S 01+1
versus
S SS 0S 0S+S 01+S 01+1
Ambiguity and Parse Trees
The ambiguity of 01+1 is shown by the two
different parse trees:
S
+
S

S
1
S
0
S
1
S

S
+
S
1
S
1
S
0
More on Ambiguity
Note that the two different derivations:
S S+S 0+S 0+1
and
S S+S S+1 0+1
do not constitute an ambiguous string
0+1 as have the same parse tree:
S
+
0
1
Ambiguity causes troubles when trying to interpret strings
like: She likes men who love women who don't smoke.

Solutions: Use parentheses, or use precedence rules
such as a+(bc) = a+bc (a+b)c.
Inherently Ambiguous
Languages that can only be generated by ambiguous
grammars are inherently ambiguous.

Example 5.13: L = {a
n
b
n
c
m
} {a
n
b
m
c
m
}.

The way to make a CFG for this L somehow has to
involve the step S S
1
|S
2
where S1 produces the
strings a
n
b
n
c
m
and S
2
the strings a
n
b
m
c
m
.
This will be ambiguous on strings a
n
b
n
c
n
.

Proving this rigoursly is hard though.
Programming Languages
Programming languages are often defined as Context
Free Grammars in Backus-Naur Form (BNF).

Example:
<if_statement> ::= IF <expression><then_clause><else_clause>
<expression> ::= <term> | <expression>+<term>
<term> ::= <factor>|<term>*<factor>

The variables as indicated by <a variable name>
The arrow is replaces by ::=
Here, IF, + and * are terminals.

Syntax Checking is checking if a program is an
element of the CFG of the programming language.

Automata and Formal Languages: CS138, Winter 2006

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Automata and Formal Languages: CS138, Winter 2006

Încărcat de

Drepturi de autor:

Formate disponibile

CS138, Wim van Dam, UCSB

S-ar putea să vă placă și