Documente Academic
Documente Profesional
Documente Cultură
Outline
Grammars. Parsing. Expression grammars. Evaluation order and syntax. Options for expression syntax and semantics.
2-1 CS 454 Copyright 1990-2014 James M. Bieman 2-2
Basic Concepts
Grammar: rules that define the set of sentences in a language. Sentence: finite sequence of symbols from the vocabulary, meeting grammar rules. Vocabulary or alphabet: finite set of symbols. Symbol: atomic entity (+, -, a letter, a digit)
Copyright 1990-2014 James M. Bieman 2-4
CS 454
2-3
CS 454
Example Languages
CS 454
2-5
CS 454
2-6
Parse Tree
The man eats the apple.
sentence subject article The
CS 454
A Grammar
sentence ::= subject predicate subject ::= article noun predicate ::= verb direct-object direct-object ::= article noun article ::= the noun ::= man noun ::= apple verb ::= eats
CS 454 Copyright 1990-2014 James M. Bieman 2-8
predicate noun man verb direct object article eats the noun apple
2-7
Grammar Uses
The man verb direct-object The man eats direct-object The man eats article noun The man eats the noun The man eats the apple
Copyright 1990-2014 James M. Bieman 2-10
CS 454
2-9
CS 454
Example Grammar G0
S ::= B a D B ::= c | b D ::= r | t | is read as or. ::= is read as produces or generates. BNF Notation <S > ::= <B> a <D> <B> ::= c | b <D> ::= r | t Non-terminals inside of < >.
4.
CS 454
A start symbol.
Copyright 1990-2014 James M. Bieman 2-11 CS 454 Copyright 1990-2014 James M. Bieman 2-12
Generating Language G0
Language: set of strings (sentences) consisting only of terminals derived from the start symbol using the productions. Using grammar G0:
Programming languages are not finite. Infinite languages, like programming languages, use a recursive set of productions.
CS 454
2-13
CS 454
2-14
Triples
CS 454
2-16
Language G1
E | T / T | F / | ( E / E + | T | F | a \ * F | c
Grammar G1:
E::= T | E + T T::= F | T * F F::= | ( E ) stands for any lower case letter.
Is a + b * c L(G1)?
Parse to see.
E T F a
E +
T T * F F c b
2-17
Grammar G1: E::= T | E + T T::= F | T * F F::= | ( E ) stands for any lower case letter.
\ ) \ T | F | b
CS 454
CS 454
2-18
Notice
Grammar G1:
E::= T | E + T T::= F | T * F F::= | ( E ) stands for any lower case letter.
Is a + b + c L(G1)?
Parse to see.
E E + T T F F b a
E +
T F c
Left recursion builds derivation trees heavy on the left. Right recursion builds derivation trees heavy on the right. If we have more than one way to generate the same sentence in a language the grammar is ambiguous. Different grammars may generate the same language.
Copyright 1990-2014 James M. Bieman 2-20
CS 454
2-19
CS 454
Language G2
Grammar G2:
E::= T | T + E T::= F | F * T F::= | ( E )
Is a + b * c L(G2)?
Parse to see.
Grammar G2:
E::= T | T + E T::= F | F * T F::= | ( E )
Is a + b + c L(G2b)?
Parse to see.
T F a
E +
E T F * T b F c
2-21
G2 is right recursive. The parse tree is heavy on the right. Right associativity implies right to left evaluation.
T F a
E +
E T E F + T b F c
2-22
CS 454
CS 454
An evaluation tree is constructed by rearranging a parse tree. The evaluation tree determines which operands are associated with a particular operator.
a + b *c + a * b c
(a + b) * c
* () + c
a b Or more simply:
* + a b
2-24
Ex: an expression involving one binary operator (+, -, *, /) is a triple. One operator is associated with two operands.
CS 454 Copyright 1990-2014 James M. Bieman 2-23 CS 454
Is a + b * c L(G3)?
Parse to see:
T F F + T a b E * E T F c
Evaluation tree:
* + a b c
CS 454
G4:
CS 454
2-27
CS 454
2-28
BNF
Extended BNF
Print statement:
<printstmt> ::= print(<printlist>) <printlist> ::= | <printlist> ,
Non-terminals: start with uppercase bold. Terminals: symbols are quoted, others bold. | represents an alternative. () for grouping. {} zero or more repetitions. [] for optional constructs. Ex: Printstmt ::= print( , {, })
Copyright 1990-2014 James M. Bieman 2-30
CS 454
2-29
CS 454
Ambiguity
A grammar is ambiguous if there are 2 or more parse trees for the same sentence. Consider the following grammar:
<S> ::= <S> <T> <S> | a | b | c <T> ::= + | - | *
Parse: a b c
Which then is the else associated with? This anomaly was in the original grammar for Algol60.
Copyright 1990-2014 James M. Bieman 2-32
CS 454
CS 454
Writing Grammars
Guidelines
An identifier is a non-empty string of lower case letters and digits. The 1st character in each string is a letter.
<id> ::= <letter><char> | <letter> <char> ::= <letter> | <digit> | <letter><char> | <digit> <char> <letter> ::= a | b | | z <digit> ::= 0 | 1 | 2 | 9
Copyright 1990-2014 James M. Bieman 2-33
Specify the desired language in English. Construct the grammar from the English description. Suppose we want to write a grammar for a language of print statements:
L(G) = {print(), print(,), print(,,), } Each sentence has terminals: print, (, and ).
CS 454
CS 454
2-34
Consider a Specification
1.
One possibility: Another possibility: Lets parse: print (, , ) Which grammar do we select and why? How can we modify the grammar to allow for arithmetic expressions in place of ?
Copyright 1990-2014 James M. Bieman 2-35
2. 3. 4.
A program is a list of statements separated by ; and terminated by a period. A statement is an identifier, followed by :=, followed by an expression. An expression is an identifier, an integer, or a complex expression. A complex expression is a ( followed by a + or a -, followed by an expression, followed by a right ).
Copyright 1990-2014 James M. Bieman 2-36
CS 454
CS 454
A BNF Grammar
<program> ::= <stmt> {; <stmt} . <stmt> ::= <var> := <expr> <var> ::= <letter>{<letter>} <expr> ::= <simple> | <complex> <simple> ::= <var> | <number> <complex> ::= ( <expr> <op> <expr>) <number> ::= <digit>{<digit>} <op> ::= + | Comment:
Actually, identifiers such as variable names and integers can be identified via lexical analysis, which does not need a context free grammar.
CS 454
2-37
APL:
infix, right associative, no precedence rules.
Ambiguities: a + b * c
Operator precedence.
(a + b) * (c a) Only used for binary operations; combined with prefix for unary operations.
Associativity (left/right):
Left: a / b / c = (a / b) / c Right: a^b^c = a^(b^c)
note: ^ is exponentiation
CS 454
CS 454
2-40
Prefix (Polish): * + a b c a
Postfix (reverse Polish): a b + c a - * Ordinary prefix (normal function application): Cambridge Polish (Lisp and Scheme):
(* (+ a b) (- c a)) *(+(a,b), -(c,a)) The HP calculator language.
The same tree represents the expression whether in prefix, infix, postfix, formats:
* + a b c a
2-41
CS 454
CS 454
Summary
Languages (formally). Grammars. Derivations. Context Free Grammars. Expression Grammars. Order of evaluation. Expression formats.
CS 454 Copyright 1990-2014 James M. Bieman 2-44
Infix:
Postfix:
CS 454