Sunteți pe pagina 1din 8

CS454 Principles of Programming Languages Lecture Notes 2: Syntax

Spring 2014 James M. Bieman

CS454 Principles of Programming Languages

Outline
Grammars. Parsing. Expression grammars. Evaluation order and syntax. Options for expression syntax and semantics.

2-1 CS 454 Copyright 1990-2014 James M. Bieman 2-2

Lecture Notes 2: Syntax

James M. Bieman Computer Science Department Colorado State University


CS 454 Copyright 1990-2014 James M. Bieman

A Programs Syntactic Hierarchy


Complete program. Declarations and statements. Expressions. Lexical items or tokens --- primitive symbols or vocabulary.

Basic Concepts

Grammar: rules that define the set of sentences in a language. Sentence: finite sequence of symbols from the vocabulary, meeting grammar rules. Vocabulary or alphabet: finite set of symbols. Symbol: atomic entity (+, -, a letter, a digit)
Copyright 1990-2014 James M. Bieman 2-4

CS 454

Copyright 1990-2014 James M. Bieman

2-3

CS 454

Example Languages

Example Languages (continued)

L1: all strings composed of the symbols 0 and 1:


0 and 1 is the vocabulary.

L3: set of grammatical English sentences:


Vocabulary: English words. Grammar: English grammar.

L2: set of strings with n occurrences of A followed by n occurrences of B:


A and B is the vocabulary. Example sentences:
AAAABBBB AB

CS 454

Copyright 1990-2014 James M. Bieman

2-5

CS 454

Copyright 1990-2014 James M. Bieman

2-6

Copyright 1990-2014 James M. Bieman


2 -1

CS454 Principles of Programming Languages Lecture Notes 2: Syntax

Spring 2014 James M. Bieman

Parse Tree
The man eats the apple.
sentence subject article The
CS 454

A Grammar
sentence ::= subject predicate subject ::= article noun predicate ::= verb direct-object direct-object ::= article noun article ::= the noun ::= man noun ::= apple verb ::= eats
CS 454 Copyright 1990-2014 James M. Bieman 2-8

predicate noun man verb direct object article eats the noun apple
2-7

Copyright 1990-2014 James M. Bieman

Grammar Uses

Derivation The man eats the apple


Sentence subject predicate article noun predicate The noun predicate
The man predicate

To generate sentences in a language.


Construct all possible sentences.

To recognize sentences in a language.


Determine if a string is in a language. Syntax checking.

The man verb direct-object The man eats direct-object The man eats article noun The man eats the noun The man eats the apple
Copyright 1990-2014 James M. Bieman 2-10

CS 454

Copyright 1990-2014 James M. Bieman

2-9

CS 454

A Context Free Grammar (CFG) Consists Of:


1. 2. 3.

Example Grammar G0
S ::= B a D B ::= c | b D ::= r | t | is read as or. ::= is read as produces or generates. BNF Notation <S > ::= <B> a <D> <B> ::= c | b <D> ::= r | t Non-terminals inside of < >.

A set of terminals. A set of non-terminals A set of procedures/productions:


LHS ::= RHS LHS is a non-terminal. RHS contains terminals and/or nonterminals.

4.
CS 454

A start symbol.
Copyright 1990-2014 James M. Bieman 2-11 CS 454 Copyright 1990-2014 James M. Bieman 2-12

Copyright 1990-2014 James M. Bieman


2 -2

CS454 Principles of Programming Languages Lecture Notes 2: Syntax

Spring 2014 James M. Bieman

Generating Language G0
Language: set of strings (sentences) consisting only of terminals derived from the start symbol using the productions. Using grammar G0:

L(G0) Has Only 4 Sentences



D t

Derivation tree or parse tree: S B b a

L(G0) = {bat, bar, cat, car}


It is a finite language.

S BaD baD bat So bat L(G0).

Programming languages are not finite. Infinite languages, like programming languages, use a recursive set of productions.

CS 454

Copyright 1990-2014 James M. Bieman

2-13

CS 454

Copyright 1990-2014 James M. Bieman

2-14

Expression Grammars (arithmetic)


Structure dominated: expression structure determines semantics. Consider: a + b * c + d Let a = 1, b = 2, c = 3, d = 4. The result of evaluating the expression depends on the order of evaluation.

CS 454 Copyright 1990-2014 James M. Bieman 2-15

Triples

Triple: 2 operands, 1 operator. Examples (assume Java/C evaluation order):


a + b *c a *b * c d a+b+c*d

CS 454

Copyright 1990-2014 James M. Bieman

2-16

Language G1

Is (a + b) * c L(G1)? Parse to see

E | T / T | F / | ( E / E + | T | F | a \ * F | c

Grammar G1:
E::= T | E + T T::= F | T * F F::= | ( E ) stands for any lower case letter.

Is a + b * c L(G1)?
Parse to see.

E T F a

E +

T T * F F c b
2-17

Grammar G1: E::= T | E + T T::= F | T * F F::= | ( E ) stands for any lower case letter.

\ ) \ T | F | b

CS 454

Copyright 1990-2014 James M. Bieman

CS 454

Copyright 1990-2014 James M. Bieman

2-18

Copyright 1990-2014 James M. Bieman


2 -3

CS454 Principles of Programming Languages Lecture Notes 2: Syntax

Spring 2014 James M. Bieman

Operators Determine the Production to Use.

Notice

Grammar G1:
E::= T | E + T T::= F | T * F F::= | ( E ) stands for any lower case letter.

Is a + b + c L(G1)?
Parse to see.

E E + T T F F b a

E +

T F c

Left recursion builds derivation trees heavy on the left. Right recursion builds derivation trees heavy on the right. If we have more than one way to generate the same sentence in a language the grammar is ambiguous. Different grammars may generate the same language.
Copyright 1990-2014 James M. Bieman 2-20

CS 454

Copyright 1990-2014 James M. Bieman

2-19

CS 454

Language G2

Addition in G2 is Right Associative

Grammar G2:
E::= T | T + E T::= F | F * T F::= | ( E )

Is a + b * c L(G2)?
Parse to see.

Grammar G2:
E::= T | T + E T::= F | F * T F::= | ( E )

Is a + b + c L(G2b)?
Parse to see.

Generating a * requires the use of an additional production:


T::= F * T

T F a

E +

E T F * T b F c
2-21

* has higher precedence than +.

G2 is right recursive. The parse tree is heavy on the right. Right associativity implies right to left evaluation.

T F a

E +

E T E F + T b F c
2-22

CS 454

Copyright 1990-2014 James M. Bieman

CS 454

Copyright 1990-2014 James M. Bieman

Expression Evaluation and Syntax


Rules for evaluating expressions depend on the language. Evaluation rules are specified, in part, by the structure (grammar) of the expressions. Evaluation trees describe the evaluation order.

Evaluation Tree Examples

An evaluation tree is constructed by rearranging a parse tree. The evaluation tree determines which operands are associated with a particular operator.

a + b *c + a * b c

(a + b) * c
* () + c

a b Or more simply:
* + a b
2-24

Ex: an expression involving one binary operator (+, -, *, /) is a triple. One operator is associated with two operands.
CS 454 Copyright 1990-2014 James M. Bieman 2-23 CS 454

Copyright 1990-2014 James M. Bieman

Copyright 1990-2014 James M. Bieman


2 -4

CS454 Principles of Programming Languages Lecture Notes 2: Syntax

Spring 2014 James M. Bieman

Operator Precedence and Grammar Structure


How does a grammar affect operator precedence rules? Lets change G1 again. Grammar G3:

Is a + b * c L(G3)?

Parse to see:
T F F + T a b E * E T F c

Evaluation tree:
* + a b c

E ::= T | T * E T ::= F | F + T F ::= | ( E )


CS 454 Copyright 1990-2014 James M. Bieman 2-25

Higher precedence operators use more grammar productions to parse.


Copyright 1990-2014 James M. Bieman 2-26

CS 454

Consider a More Complex Grammar G4.

Expression Grammars Can Enforce Evaluation Rules

G4:

E ::= T | E + T | E T T ::= F | T * F | T / F | T mod F F ::= P | F ^ P P ::= | (E )


note: ^ means exponentiation.

Force operator associativity:


Left recursion left associativity. Right recursion right associativity.

Force operator precedence:


More productions higher precedence.

Parse: a mod b + c Parse: (a + b) mod c

Allow specified use of parenthesis.

CS 454

Copyright 1990-2014 James M. Bieman

2-27

CS 454

Copyright 1990-2014 James M. Bieman

2-28

BNF

Extended BNF

Backus Naur Form (BNF) or Backus Normal Form:


< non-terminal >

Print statement:
<printstmt> ::= print(<printlist>) <printlist> ::= | <printlist> ,

Non-terminals: start with uppercase bold. Terminals: symbols are quoted, others bold. | represents an alternative. () for grouping. {} zero or more repetitions. [] for optional constructs. Ex: Printstmt ::= print( , {, })
Copyright 1990-2014 James M. Bieman 2-30

CS 454

Copyright 1990-2014 James M. Bieman

2-29

CS 454

Copyright 1990-2014 James M. Bieman


2 -5

CS454 Principles of Programming Languages Lecture Notes 2: Syntax

Spring 2014 James M. Bieman

Ambiguity

The Classic Example


S ::= Ifstmt | Assignment | Ifstmt ::= if E then S Ifstmt ::= if E then S else S Parse:

2-31

A grammar is ambiguous if there are 2 or more parse trees for the same sentence. Consider the following grammar:
<S> ::= <S> <T> <S> | a | b | c <T> ::= + | - | *

The first production is both left and right recursive. Parse: a * b * c


Parse: a b c

It has 2 parse trees, so grammar is ambiguous. But do we care? But (a b) c a (b c)


Copyright 1990-2014 James M. Bieman

Which then is the else associated with? This anomaly was in the original grammar for Algol60.
Copyright 1990-2014 James M. Bieman 2-32

if E1 then if E2 then S1 else S2

CS 454

CS 454

Writing Grammars

Guidelines

Say we want to specify the language of identifiers:

One possible grammar:

An identifier is a non-empty string of lower case letters and digits. The 1st character in each string is a letter.

<id> ::= <letter><char> | <letter> <char> ::= <letter> | <digit> | <letter><char> | <digit> <char> <letter> ::= a | b | | z <digit> ::= 0 | 1 | 2 | 9
Copyright 1990-2014 James M. Bieman 2-33

Specify the desired language in English. Construct the grammar from the English description. Suppose we want to write a grammar for a language of print statements:
L(G) = {print(), print(,), print(,,), } Each sentence has terminals: print, (, and ).

CS 454

CS 454

Copyright 1990-2014 James M. Bieman

2-34

Grammar for print Language

Consider a Specification
1.

One possibility: Another possibility: Lets parse: print (, , ) Which grammar do we select and why? How can we modify the grammar to allow for arithmetic expressions in place of ?
Copyright 1990-2014 James M. Bieman 2-35

S ::= print(T) T ::= | , T S ::= print(T) T ::= | T,

2. 3. 4.

A program is a list of statements separated by ; and terminated by a period. A statement is an identifier, followed by :=, followed by an expression. An expression is an identifier, an integer, or a complex expression. A complex expression is a ( followed by a + or a -, followed by an expression, followed by a right ).
Copyright 1990-2014 James M. Bieman 2-36

CS 454

CS 454

Copyright 1990-2014 James M. Bieman


2 -6

CS454 Principles of Programming Languages Lecture Notes 2: Syntax

Spring 2014 James M. Bieman

A BNF Grammar
<program> ::= <stmt> {; <stmt} . <stmt> ::= <var> := <expr> <var> ::= <letter>{<letter>} <expr> ::= <simple> | <complex> <simple> ::= <var> | <number> <complex> ::= ( <expr> <op> <expr>) <number> ::= <digit>{<digit>} <op> ::= + | Comment:
Actually, identifiers such as variable names and integers can be identified via lexical analysis, which does not need a context free grammar.

Some Pragmatic (and theoretical) Issues


A parser for a context-free grammar can be build using a pushdown automata. A lexical analyzer can be build using a finite state machine. These mechanism are part of CS programming language theory.

CS 454 Copyright 1990-2014 James M. Bieman 2-38

CS 454

Copyright 1990-2014 James M. Bieman

2-37

The Syntax for Expressions Varies Between Languages

Infix Expression Notation

C, C++, Java, Pascal, PL/1, Fortran, :


Infix expressions, left associative (except exponentiation), precedence hierarchy, polish prefix for unary operators, ordinary prefix for programmer defined functions.

Operator between operands:

APL:
infix, right associative, no precedence rules.

Ambiguities: a + b * c
Operator precedence.

(a + b) * (c a) Only used for binary operations; combined with prefix for unary operations.

Associativity (left/right):
Left: a / b / c = (a / b) / c Right: a^b^c = a^(b^c)
note: ^ is exponentiation

Lisp, Scheme, Racket:


Cambridge Polish.
Copyright 1990-2014 James M. Bieman 2-39

CS 454

CS 454

Copyright 1990-2014 James M. Bieman

2-40

Polish Expression Notation

Tree Representation (abstract syntax tree)

Prefix (Polish): * + a b c a

Postfix (reverse Polish): a b + c a - * Ordinary prefix (normal function application): Cambridge Polish (Lisp and Scheme):
(* (+ a b) (- c a)) *(+(a,b), -(c,a)) The HP calculator language.

You must know the number of arguments for operators.

The same tree represents the expression whether in prefix, infix, postfix, formats:
* + a b c a

2-41

Postfix format is simply evaluated.


Copyright 1990-2014 James M. Bieman 2-42

CS 454

Copyright 1990-2014 James M. Bieman

CS 454

Copyright 1990-2014 James M. Bieman


2 -7

CS454 Principles of Programming Languages Lecture Notes 2: Syntax

Spring 2014 James M. Bieman

Evaluating Expressions (but this is semantics!)

Summary
Languages (formally). Grammars. Derivations. Context Free Grammars. Expression Grammars. Order of evaluation. Expression formats.

CS 454 Copyright 1990-2014 James M. Bieman 2-44

Infix:

Postfix:

5 * 5 2 * 3 = 25 2 * 3 = 25 6 = 19 5 5 * 2 3 * - = 25 2 3 * - = 25 6 - = 19 Postfix expressions can be readily evaluated using a stack.

The semantics are clear from the tree format:


5 * 5 2 * 3

Evaluation can be done via tree traversal, or tree rewriting.


2-43

CS 454

Copyright 1990-2014 James M. Bieman

Copyright 1990-2014 James M. Bieman


2 -8

S-ar putea să vă placă și