0 evaluări0% au considerat acest document util (0 voturi)

13 vizualizări34 paginiautomata and formal languages slides

Feb 09, 2013

© Attribution Non-Commercial (BY-NC)

PPT, PDF, TXT sau citiți online pe Scribd

automata and formal languages slides

Attribution Non-Commercial (BY-NC)

0 evaluări0% au considerat acest document util (0 voturi)

13 vizualizări34 paginiautomata and formal languages slides

Attribution Non-Commercial (BY-NC)

Sunteți pe pagina 1din 34

Automata and

Formal Languages

CS138, Winter 2006

Wim van Dam

Room 5109, Engr. I

vandam@cs.ucsb.edu

http://www.cs.ucsb.edu/~vandam/

CS138, Wim van Dam, UCSB

Formalities

This Friday the Midterm will be returned.

The next Homework 3 will be announced on Friday,

and will be due on the next Friday (instead of Monday).

Questions?

CS138, Wim van Dam, UCSB

Context Free Languages

Having dealt with Regular Languages, the coming weeks

we will discuss the power of context free languages and

the pushdown automata that accept them.

This week: Context Free Languages, Chapter 5 in

An Introduction to Formal Languages and Automata by

Peter Linz [Reader, pp. 2170] and Section 6.1 Methods

for Transforming Grammars, ibidem [Reader pp. 7180]

The notation of Linz is somewhat different from Sipser

CS138, Wim van Dam, UCSB

SipserLinz Dictionary

Sipser: Linz:

Empty string

Conditional in sets { x | xeN } { x : xeN }

Letters alphabet Terminals T

CS138, Wim van Dam, UCSB

Grammars

Linz defines grammars as follows:

A grammar G is defined by (V,T,S,P), where

V is a finite set of variables

T is a finite set of terminal symbols (think alphabet )

SeV is the special start variable

P is a finite set of productions

Each grammar G defines a language L(G), which is the

set of strings in T* (=*) that G can generate from S.

It is all about the production rules.

CS138, Wim van Dam, UCSB

Context Free Grammars

A Context Free Grammar (V,T,S,P) is a grammar

where all production rules are of the form:

A x, with AeV and xe(VT)*

Example 5.1: Let G = ({S}, {a,b},S,P) with for P:

SaSa, and SbSb, and S.

Some derivations from this grammar:

S aSa aaSaa aabSbaa aabbaa

S bSb baSab baab, and so on.

In general S . ww

R

for we{a,b}*.

CS138, Wim van Dam, UCSB

Context Free Languages

A single step derivation consist of the substitution of a

variable by a string according to a substitution rule in P.

(Note that the rules are described using single arrows ,

while the derivations themselves use double arrows .)

A sequence of several derivations (or none) is indicated

by * . Previous example: S * aabbaa.

L is a Context Free Language if and only if there is a

context free grammar G=(V,T,S,P) such that:

L = L(G) = { weT* : S * w }

CS138, Wim van Dam, UCSB

Why Context Free Languages?

Context-free languages allow us to describe languages

that are nonregular like { 0

n

1

n

: n>0}.

CLFs are complex enough to give us a model for natural

languages (cf. Noam Chomsky) and programming languages.

The theory of CFLs is very closely related to the problem

of parsing a computer program.

Later we will see that CFLs are the languages that can

be recognized by automata that have one single stack:

{ 0

n

1

n

: n>0 } is a CFL

{ 0

n

1

n

0

n

: n>0 } is not a CFL

CS138, Wim van Dam, UCSB

Some Remarks

The language L(G) = { weT* : S * w } contains

only strings of terminals, not variables.

Notation: We summarize several rules for one variable:

A B

A 01 by A B | 01 | AA

A AA

Question: What is the CFG ({S},{(,)},S,P) that produces

the language of correct parentheses like (), (()), or ()(())?

Answer: S (S)|SS| [see Example 5.4]

CS138, Wim van Dam, UCSB

Another CFG Example

Consider the CFG G=({S,Z},{0,1},S,P) with

P: S 0S1 | 0Z1

Z 0Z |

What is the language generated by this G?

Answer: L(G) = {0

i

1

j

| i>j }

Specifically, S yields the 0

j+k

1

j

according to:

S 0S1 0

j

S1

j

0

j

Z1

j

0

j

0Z1

j

0

j+k

Z1

j

0

j+k

1

j

= 0

j+k

1

j

CS138, Wim van Dam, UCSB

Automata and

Formal Languages

CS138, Winter 2006

Wim van Dam

Room 5109, Engr. I

vandam@cs.ucsb.edu

http://www.cs.ucsb.edu/~vandam/

CS138, Wim van Dam, UCSB

Last Monday

A Context Free Grammar (V,T,S,P) is a grammar

where all production rules are of the form:

A x, with AeV and xe(VT)*

Example 5.1: Let G = ({S}, {a,b},S,P) with for P:

SaSa, and SbSb, and S.

In general we have S * ww

R

for we{a,b}*,

hence L(G) = { ww

R

: we{a,b}*}.

CS138, Wim van Dam, UCSB

Questions

Can you make Context Free Grammars for the following?

a) { 0

n

1

n

: n0}

b) { 0

n

1

m

: n,m0}

c) Arithmetic a,b,c formulas like a+bc+a (without ())

Answers:

a) S 0S1 |

b) S 0S | R and R 1R |

c) S a | b | c | S+S | SS

CS138, Wim van Dam, UCSB

Linear Grammars

A grammar is linear if and only if in every production rule at

most one variable occurs in the right hand side.

Example: S (S)|SS| is not linear, but S 0S1| is.

A grammar (V,T,S,P) is right-linear if all production rules

are of the form A xB or A x with A,BeV and xeT*.

A grammar (V,T,S,P) is left-linear if all production rules are

of the form A Bx or A x with A,BeV and xeT*.

Note: All regular languages can be described by a right-

linear grammar (or a left-linear one), and vice versa.

CS138, Wim van Dam, UCSB

Non Linear Grammars

Most CFGs will not be linear, which means that in the

derivation of a word we will often have more than one

variable in the sentential forms (example: S * xAyBz).

Note: in a derivation S w

1

w

2

w

n

w, all

strings S,w

1

,,w

n

e(VT)* are called sentential forms.

A derivation is leftmost (rightmost) if in each derivation

step xy the leftmost (rightmost) variable is replaced.

Requiring leftmost derivations does not limit the power

of a CFG but creates some order in the many ways one

can derive a single word. See, for example, S (S)|SS|.

CS138, Wim van Dam, UCSB

Order is Unimportant

Take the CFG S 0 | 1 | (S) | (S)v(S) | (S).(S), which

generates all proper Boolean formulas that use 0, 1,

, v, ., ( and ).

Then (0)v((0).(1)) can be derived in the following ways

[leftmost] S (S)v(S) (0)v(S) (0)v((S).(S))

(0)v((0).(S)) (0)v((0).(1))

[rightmost] S (S)v(S) (S)v((S).(S)) (S)v((S).(1))

(S)v((0).(1)) (0)v((0).(1))

[something else] S (S)v(S) (0)v(S) (0)v((S).(S))

(0)v((S).(1)) (0)v((0).(0))

The fact that it is irrelevant in which order we use the

production rules is expressed by the derivation tree.

CS138, Wim van Dam, UCSB

Derivation Trees

The derivation S * (0)v((0).(1)) can be expressed by

the following derivation tree:

S

0

0

1

S S ( ( ) ) v

S S ( ( ) ) .

CS138, Wim van Dam, UCSB

Reading Tree Leaves

Application of a production rule A x is

represented by node A with children x.

(Note that the tree is ordered:

the ordering of the nodes matters.)

The root has variable S.

The yield of S is

expressed by the

leaves of the tree.

S

0

0

1

S S ( ( ) ) v

S S ( ( ) ) .

CS138, Wim van Dam, UCSB

Defining a Tree

Definition 5.3: For a CFG G=(V,T,S,P) a derivation

tree has the following properties:

1) The root is labeled S

2) Each leaf is from T{}

3) Each interior node is from V

4) If node has label AeV and

its children a

1

a

n

(from L to R),

then P must have the rule

A a

1

a

n

(with a

j

eVT{})

5) A leaf labeled is a single

child (has no siblings).

For partial derivation trees we have:

2a) Each leaf is from VT{}

S

0

0

1

S S ( ( ) ) v

S S ( ( ) ) .

CS138, Wim van Dam, UCSB

Purpose of Trees

Looking at a tree you see the derivation without the

unnecessary information about its order.

Theorem 5.1: Let G be a CFG. We have weL(G) if and

only if there exists a derivation tree of G with yield w.

Also, y is a sentential form of G if and only if there exists

a partial derivation tree for G.

Remember: the root always has to be S.

CS138, Wim van Dam, UCSB

Automata and

Formal Languages

CS138, Winter 2006

Wim van Dam

Room 5109, Engr. I

vandam@cs.ucsb.edu

http://www.cs.ucsb.edu/~vandam/

CS138, Wim van Dam, UCSB

Formalities

Homework will be announced later today.

This homework will be due Friday afternoon.

The Midterm has been graded, ask Yen Ting for it.

The scores are as follows

CS138, Wim van Dam, UCSB

The Midterm

0

10

20

30

40

50

60

70

0 10 20 30 40 50 60

Midterm

H

W

1

+

2

Midterm scores:

average: 42.9

median: 43

minmax: 2355

Correlation between

HWs 1+2 and the

Midterm: 0.60

CS138, Wim van Dam, UCSB

SipserLinz Dictionary

Sipser: Linz:

Empty string

Conditional in sets { x | xeN } { x : xeN }

Letters alphabet Terminals T

Union in RE ab a+b

CS138, Wim van Dam, UCSB

Parsing

Generative aspect of CFG: By now it should be clear how,

from a CFG G, you can derive strings weL(G).

Analytical aspect: Given a CFG G and strings w, how do

you decide if weL(G) and if so how do you determine

the derivation tree or the sequence of production rules

that produce w? This is called the problem of parsing.

CS138, Wim van Dam, UCSB

Exhaustive Parsing

Exhaustive parsing is a form of top-down parsing where

you start with S and systematically go through all possible

(say leftmost) derivations until you produce the string w.

(You can remove sentential forms that will not work.)

Example 5.7: Can the CFG S SS | aSb | bSa |

produce the string w = aabb, and how?

After one step: S SS or aSb or bSa or .

After two steps: S SSS or aSbS or bSaS or S,

or S aSSb or aaSbb or abSab or ab.

After three steps we see that: S aSb aaSbb aabb.

CS138, Wim van Dam, UCSB

Flaws of Exhaustive Parsing

Obvious flaw: it will take a long time and a lot of memory

for moderately long strings w: It is inefficient.

For cases weL(G) exhaustive parsing my never end.

This will especially happen if we have rules like A that

make the sentential forms shrink so that we will never

know if we went too far with our parsing attempts.

Similar problems occur if the parsing can get in a loop

according to A B A B

Fortunately, it is always possible to remove problematic

rules like A and AB from a CFG G.

CS138, Wim van Dam, UCSB

Exhaustive yet Finite Parsing

Theorem 5. 2: Let G be a CFG without rules of the form

A and AB (with A,B e V), then on any string w,

the exhaustive parsing method either produces w or halts

eventually such that we can conclude weL(G).

This derivation will require no more than 2

|w|

rounds.

The complexity of this algorithm is still exponential in the

length |w| of the string. We can do much better though:

Theorem 5. 3: For every CFG G there exists a parsing

algorithm that runs in time O(|w|

3

).

(This algorithm uses dynamic programming.)

CS138, Wim van Dam, UCSB

Simple Grammars

Definition 5.4: A CFG (V,T,S,P) is a simple grammar

(s-grammar) if and only if all its productions are of the form

A ax with

AeV, aeT, xeV* and any pair (A,a) occurs at most once.

Note, for simple grammars a left most derivation of a

string weL(G) is straightforward and requires time |w|.

Example: Take the s-grammar S aS|bSS|c with aabcc:

S aS aaS aabSS aabcS aabcc.

CS138, Wim van Dam, UCSB

Ambiguity

A string weL(G) is derived ambiguously if it has

more than one derivation tree (or equivalently: if it has

more than one leftmost derivation (or rightmost)).

A grammar is ambiguous if some strings are derived

ambiguously.

Typical example: rule S 0 | 1 | S+S | SS

S S+S SS+S 0S+S 01+S 01+1

versus

S SS 0S 0S+S 01+S 01+1

CS138, Wim van Dam, UCSB

Ambiguity and Parse Trees

The ambiguity of 01+1 is shown by the two

different parse trees:

S

+

S

S

1

S

0

S

1

S

S

+

S

1

S

1

S

0

CS138, Wim van Dam, UCSB

More on Ambiguity

Note that the two different derivations:

S S+S 0+S 0+1

and

S S+S S+1 0+1

do not constitute an ambiguous string

0+1 as have the same parse tree:

S

+

0

1

Ambiguity causes troubles when trying to interpret strings

like: She likes men who love women who don't smoke.

Solutions: Use parentheses, or use precedence rules

such as a+(bc) = a+bc (a+b)c.

CS138, Wim van Dam, UCSB

Inherently Ambiguous

Languages that can only be generated by ambiguous

grammars are inherently ambiguous.

Example 5.13: L = {a

n

b

n

c

m

} {a

n

b

m

c

m

}.

The way to make a CFG for this L somehow has to

involve the step S S

1

|S

2

where S1 produces the

strings a

n

b

n

c

m

and S

2

the strings a

n

b

m

c

m

.

This will be ambiguous on strings a

n

b

n

c

n

.

Proving this rigoursly is hard though.

CS138, Wim van Dam, UCSB

Programming Languages

Programming languages are often defined as Context

Free Grammars in Backus-Naur Form (BNF).

Example:

<if_statement> ::= IF <expression><then_clause><else_clause>

<expression> ::= <term> | <expression>+<term>

<term> ::= <factor>|<term>*<factor>

The variables as indicated by <a variable name>

The arrow is replaces by ::=

Here, IF, + and * are terminals.

Syntax Checking is checking if a program is an

element of the CFG of the programming language.

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.