Sunteți pe pagina 1din 11

CS143

Summer2011

Handout08
June24th,2011

FormalGrammars
HandoutwrittenbyMaggieJohnsonandJulieZelenski.

Whatisagrammar?
Agrammarisapowerfultoolfordescribingandanalyzinglanguages.Itisasetofrules
bywhichvalidsentencesinalanguageareconstructed.Heresatrivialexampleof
Englishgrammar:
sentence
subject
verb-phrase
adverb
verb
object
noun

>
>
>
>
>
>
>

<subject> <verb-phrase> <object>


This | Computers | I
<adverb> <verb> | <verb>
never
is | run | am | tell
the <noun> | a <noun> | <noun>
university | world | cheese | lies

Usingtheaboverulesorproductions,wecanderivesimplesentencessuchasthese:
This is a university.
Computers run the world.
I am the cheese.
I never tell lies.

Hereisaleftmostderivationofthefirstsentenceusingtheseproductions.
sentence

>
>
>
>
>
>

<subject> <verb-phrase> <object>


This <verb-phrase> <object>
This <verb> <object>
This is <object>
This is a <noun>
This is a university

Inadditiontoseveralreasonablesentences,wecanalsoderivenonsenselike"Computers
runcheese"and"Thisamalies".Thesesentencesdon'tmakesemanticsense,butthey
aresyntacticallycorrectbecausetheyareofthesequenceofsubject,verbphrase,and
object.Formalgrammarsareatoolforsyntax,notsemantics.Weworryaboutsemantics
atalaterpointinthecompilingprocess.Inthesyntaxanalysisphase,weverify
structure,notmeaning.

2
Vocabulary
Weneedtoreviewsomedefinitionsbeforewecanproceed:
grammar

asetofrulesbywhichvalidsentencesinalanguageareconstructed.

nonterminal

agrammarsymbolthatcanbereplaced/expandedtoasequenceof
symbols.

terminal

anactualwordinalanguage;thesearethesymbolsinagrammarthat
cannotbereplacedbyanythingelse."terminal"issupposedtoconjure
uptheideathatitisadeadendnofurtherexpansionispossible.

production

agrammarrulethatdescribeshowtoreplace/exchangesymbols.The
generalformofaproductionforanonterminalis:
X >Y1Y2Y3...Yn

ThenonterminalXisdeclaredequivalenttotheconcatenationofthe
symbolsY1Y2Y3...Yn.Theproductionmeansthatanywherewherewe
encounterX,wemayreplaceitbythestringY1Y2Y3...Yn.Eventuallywe
willhaveastringcontainingnothingthatcanbeexpandedfurther,i.e.,it
willconsistofonlyterminals.Suchastringiscalledasentence.Inthe
contextofprogramminglanguages,asentenceisasyntacticallycorrect
andcompleteprogram.
derivation

asequenceofapplicationsoftherulesofagrammarthatproducesa
finishedstringofterminals.Aleftmostderivationiswherewealways
substitutefortheleftmostnonterminalasweapplytherules(wecan
similarlydefinearightmostderivation).Aderivationisalsocalleda
parse.

startsymbol

agrammarhasasinglenonterminal(thestartsymbol)fromwhichall
sentencesderive:
S > X1X2X3...Xn

AllsentencesarederivedfromSbysuccessivereplacementusingthe
productionsofthegrammar.
nullsymbol

itissometimesusefultospecifythatasymbolcanbereplacedby
nothingatall.Toindicatethis,weusethenullsymbol,e.g.,A > B |.

BNF

awayofspecifyingprogramminglanguagesusingformalgrammars
andproductionruleswithaparticularformofnotation(BackusNaur
form).

3
Afewgrammarexercisestotryonyourown(Thealphabetineachcaseis{a,b}.)
o Defineagrammarforthelanguageofstringswithoneormorea'sfollowedby
zeroormoreb's.
o Defineagrammarforevenlengthpalindromes.
o Defineagrammarforstringswherethenumberofa'sisequaltothenumberb's.
o Defineagrammarwherethenumberofa'sisnotequaltothenumberb's.(Hint:
thinkaboutitastwoseparatecases...)
(Canyouwriteregularexpressionsfortheselanguages?Whyorwhynot?)
ParseRepresentation
Inworkingwithgrammars,wecanrepresenttheapplicationoftherulestoderivea
sentenceintwoways.Thefirstisaderivationasshownearlierfor"Thisisauniversity"
wheretherulesareappliedstepbystepandwesubstituteforonenonterminalatatime.
Thinkofaderivationasahistoryofhowthesentencewasparsedbecauseitnotonly
includeswhichproductionswereapplied,butalsotheordertheywereapplied(i.e.,
whichnonterminalwaschosenforexpansionateachstep).Therecanmanydifferent
derivationsforthesamesentence(theleftmost,therightmost,andsoon).
Aparsetreeisthesecondmethodforrepresentation.Itdiagramshoweachsymbol
derivesfromothersymbolsinahierarchicalmanner.Hereisaparsetreefor"Thisisa
university":
s
subject
This

v-p
verb
is

object
a

noun

university

Althoughtheparsetreeincludesalloftheproductionsthatwereapplied,itdoesnot
encodetheordertheywereapplied.Foranunambiguousgrammar(welldefine
ambiguityinaminute),thereisexactlyoneparsetreeforaparticularsentence.
MoreDefinitions
Herearesomeotherdefinitionswewillneed,describedinreferencetothisexample
grammar:
S
A
B

>
>
>

AB
Ax | y
z

4
alphabet
Thealphabetis{S, A, B, x, y, z}.Itisdividedintotwodisjointsets.Theterminal
alphabetconsistsofterminals,whichappearinthesentencesofthelanguage:
{x, y, z}.Theremainingsymbolsarethenonterminalalphabet;thesearethe
symbolsthatappearontheleftsideofproductionsandcanbereplacedduring
thecourseofaderivation:{S, A, B}. Formally,weuseVforthealphabet,Tfor
theterminalalphabetandNforthenonterminalalphabetgivingus:V=TN,
andT N=.
Theconventionusedinourlecturenotesareasansseriffontforgrammar
elements,lowercaseforterminals,uppercasefornonterminals,andunderlined
lowercase(e.g.,u, v)todenotearbitrarystringsofterminalandnonterminal
symbols(possiblynull).Insometextbooks,Greeklettersareusedforarbitrary
stringsofterminalandnonterminalsymbols(e.g.,, )
contextfreegrammar
Todefinealanguage,weneedasetofproductions,ofthegeneralform: u > v.In
acontextfreegrammar,uisasinglenonterminalandvisanarbitrarystringof
terminalandnonterminalsymbols.Whenparsing,wecanreplaceubyv
whereveritoccurs.WeshallrefertothissetofproductionssymbolicallyasP.
formalgrammar
Weformallydefineagrammarasa4tuple{S,P,N,T}.S isthestartsymbol(with
S N),Pisthesetofproductions,andNandTarethenonterminalandterminal
alphabets.AsentenceisastringofsymbolsinTderivedfromSusingoneor
moreapplicationsofproductionsinP.AstringofsymbolsderivedfromS but
possiblyincludingnonterminalsiscalledasententialformoraworkingstring.
Aproductionu> visusedtoreplaceanoccurrenceofubyv.Formally,ifwe
applyaproductionpPtoastringofsymbolswinVtoyieldanewstringof
symbolszinV,wesaythatzderivedfromwusingp,writtenasfollows:w=>pz.
Wealsouse:
w=>z
w=>*z

w=>+z

zderivesfromw(productionunspecified)
zderivesfromwusingzeroormoreproductions
zderivesfromwusingoneormoreproductions

equivalence
ThelanguageL(G)definedbygrammarGisthesetofsentencesderivableusing
G.TwogrammarsGandG'aresaidtobeequivalentifthelanguagesthey
generate,L(G)andL(G'),arethesame.

5
GrammarHiearchy
WeowealotofourunderstandingofgrammarstotheworkoftheAmericanlinguist
NoamChomsky(yes,theNoamChomskyknownforhispolitics).Therearefour
categoriesofformalgrammarsintheChomskyHierarchy,theyspanfromType0,the
mostgeneral,toType3,themostrestrictive.Morerestrictionsonthegrammarmakeit
easiertodescribeandefficientlyparse,butreducetheexpressivepower.
Type0: freeorunrestrictedgrammars
Thesearethemostgeneral.Productionsareoftheformu> vwherebothu
andvarearbitrarystringsofsymbolsinV,withunonnull.Thereareno
restrictionsonwhatappearsontheleftorrighthandsideotherthantheleft
handsidemustbenonempty.
Type1: contextsensitivegrammars
ProductionsareoftheformuXw> uvwwhereu,vandwarearbitrarystrings
ofsymbolsinV,withvnonnull,andXasinglenonterminal.Inotherwords,X
maybereplacedbyvbutonlywhenitissurroundedbyuandw.(i.e.,ina
particularcontext).
Type2: contextfreegrammars
ProductionsareoftheformX> vwherevisanarbitrarystringofsymbolsin
V,andXisasinglenonterminal.WhereveryoufindX,youcanreplacewithv
(regardlessofcontext).
Type3: regulargrammars
ProductionsareoftheformX> a,X> aY, or X>whereXandYare
nonterminalsandaisaterminal.Thatis,thelefthandsidemustbeasingle
nonterminalandtherighthandsidecanbeeitherempty,asingleterminalby
itselforwithasinglenonterminal.Thesegrammarsarethemostlimitedin
termsofexpressivepower.
Everytype3grammarisatype2grammar,andeverytype2isatype1andsoon.Type
3grammarsareparticularlyeasytoparsebecauseofthelackofrecursiveconstructs.
EfficientparsersexistformanyclassesofType2grammars.AlthoughType1andType0
grammarsaremorepowerfulthanType2and3,theyarefarlessusefulsincewecannot
createefficientparsersforthem.Indesigningprogramminglanguagesusingformal
grammars,wewilluseType2orcontextfreegrammars,oftenjustabbreviatedasCFG.
Issuesinparsingcontextfreegrammars
ThereareseveralefficientapproachestoparsingmostType2grammarsandwewilltalk
throughthemoverthenextfewlectures.However,therearesomeissuesthatcan
interferewithparsingthatwemusttakeintoconsiderationwhendesigningthe

6
grammar.Letstakealookatthreeofthem:ambiguity,recursiverules,andleft
factoring.
Ambiguity
Ifagrammarpermitsmorethanoneparsetreeforsomesentences,itissaidtobe
ambiguous.Forexample,considerthefollowingclassicarithmeticexpressiongrammar:
E
op

>
>

E op E | ( E ) | int
+|-|*|/

Thisgrammardenotesexpressionsthatconsistofintegersjoinedbybinaryoperators
andpossiblyincludingparentheses.Asdefinedabove,thisgrammarisambiguous
becauseforcertainsentenceswecanconstructmorethanoneparsetree.Forexample,
considertheexpression10 2 * 5.WeparsebyfirstapplyingtheproductionE > E op E.
Theparsetreeontheleftchoosestoexpandthatfirstop to*, theoneontherightto-. We
havetwocompletelydifferentparsetrees.Whichoneiscorrect?
E

E
E
E
int
10

op
-

E
int
2

op

int
5

op

int

10

E
E
int
2

op
*

E
int
5

Bothtreesarelegalinthegrammarasstatedandthuseitherinterpretationisvalid.
Althoughnaturallanguagescantoleratesomekindofambiguity(e.g.,puns,playson
words,etc.),itisnotacceptableincomputerlanguages.Wedontwantthecompilerjust
haphazardlydecidingwhichwaytointerpretourexpressions!Givenourexpectations
fromalgebraconcerningprecedence,onlyoneofthetreesseemsright.Therighthand
treefitsourexpectationthat*"bindstighter"andforthatresulttobecomputedfirstthen
integratedintheouterexpressionwhichhasalowerprecedenceoperator.
Itsfairlyeasyforagrammartobecomeambiguousifyouarenotcarefulinits
construction.Unfortunately,thereisnomagicaltechniquethatcanbeusedtoresolveall
varietiesofambiguity.Itisanundecidableproblemtodeterminewhetheranygrammar
isambiguous,muchlesstoattempttomechanicallyremoveallambiguity.However,
thatdoesn'tmeaninpracticethatwecannotdetectambiguityordosomethingaboutit.
Forprogramminglanguagegrammars,weusuallytakepainstoconstructan
unambiguousgrammarorintroduceadditionaldisambiguatingrulestothrowawaythe
undesirableparsetrees,leavingonlyoneforeachsentence.

7
Usingtheaboveambiguousexpressiongrammar,onetechniquewouldleavethe
grammarasis,butadddisambiguatingrulesintotheparserimplementation.Wecould
codeintotheparserknowledgeofprecedenceandassociativitytobreakthetieandforce
theparsertobuildthetreeontherightratherthantheleft.Theadvantageofthisisthat
thegrammarremainssimpleandlesscomplicated.Butasadownside,thesyntactic
structureofthelanguageisnolongergivenbythegrammaralone.
Anotherapproachistochangethegrammartoonlyallowtheonetreethatcorrectly
reflectsourintentionandeliminatetheothers.Fortheexpressiongrammar,wecan
separateexpressionsintomultiplicativeandadditivesubgroupsandforcethemtobe
expandedinthedesiredorder.

E
t_op
T
f_op
F

>
>
>
>
>

E t_op E | T
+|T f_op T | F
*|/
(E) | int

Termsareaddition/subtractionexpressionsandfactorsusedformultiplicationand
division.Sincethebasecaseforexpressionisaterm,additionandsubtractionwill
appearhigherintheparsetree,andthusreceivelowerprecedence.
Afterverifyingthattheaboverewrittengrammarhasonlyoneparsetreefortheearlier
ambiguousexpression,youmightthingwewerehomefree,butnowconsiderthe
expression10 2 5. Therecursiononbothsidesofthebinaryoperatorallowseither
sidetomatchrepetitions.Thearithmeticoperatorsusuallyassociatetotheleft,soby
replacingtherighthandsidewiththebasecasewillforcetherepetitivematchesontothe
leftside.Thefinalresultis:
E
t_op
T
f_op
F

>
>
>
>
>

E t_op T | T
+|T f_op F | F
*|/
(E) | int

Whew!Theobviousdisadvantageofchangingthegrammartoremoveambiguityisthat
itmaycomplicateandobscuretheoriginalgrammardefinitions.Thereisnomechanical
meanstochangeanyambiguousgrammarintoanunambiguousone(undecidable,
remember?)However,mostprogramminglanguageshaveonlylimitedissueswith
ambiguitythatcanberesolvedusingadhoctechniques.

8
Recursiveproductions
Productionsareoftendefinedintermsofthemselves.Forexamplealistofvariablesina
programminglanguagegrammarcouldbespecifiedbythisproduction:
variable_list

>

variable | variable_list , variable

Suchproductionsaresaidtoberecursive.Iftherecursivenonterminalisattheleftofthe
rightsideoftheproduction,e.g.A > u | Av,wecalltheproductionleftrecursive.
Similarly,wecandefinearightrecursiveproduction:A > u | vA.Someparsing
techniqueshavetroublewithoneortheothervariantsofrecursiveproductionsandso
sometimeswehavetomassagethegrammarintoadifferentbutequivalentform.Left
recursiveproductionscanbeespeciallytroublesomeinthetopdownparsers(andwell
seewhyabitlater).Handily,thereisasimpletechniqueforrewritingthegrammarto
movetherecursiontotheotherside.Forexample,considerthisleftrecursiverule:
X

> Xa | Xb | AB | C | DEF

X
X'

> ABX' | CX' | DEFX'


> aX' | bX' |

Toconverttherule,weintroduceanewnonterminalX'thatweappendtotheendofall
nonleftrecursiveproductionsforX.Theexpansionforthenewnonterminalisbasically
thereverseoftheoriginalleftrecursiverule.Therewrittenproductionsare:

Itappearswejustexchangedtheleftrecursiverulesforanequivalentrightrecursive
version.Thismightseempointless,butsomeparsingalgorithmspreferorevenrequire
onlyleftorrightrecursion.
Leftfactoring
Theparserusuallyreadstokensfromlefttorightanditisconvenientif,uponreadinga
token,itcanmakeanimmediatedecisionaboutwhichproductionfromthegrammarto
expand.However,thiscanbetroubleifthereareproductionsthathavecommonfirst
symbol(s)ontherightsideoftheproductions.Hereisanexampleweoftenseein
programminglanguagegrammars:
Stmt

>

if Cond then Stmt else Stmt | if Cond then Stmt | Other | ....

Thecommonprefixisif Cond then Stmt.Thiscausesproblemsbecausewhenaparser


encounteranif,itdoesnotknowwhichproductiontouse.Ausefultechniquecalled
leftfactoringallowsustorestructurethegrammartoavoidthissituation.Werewritethe
productionstodeferthedecisionaboutwhichoftheoptionstochooseuntilwehave
seenenoughoftheinputtomaketheappropriatechoice.Wefactoroutthecommon

9
partofthetwooptionsintoasharedrulethatbothwilluseandthenaddanewrulethat
picksupwherethetokensdiverge.
Stmt
OptElse

>
>

if Cond then Stmt OptElse | Other |


else S |

Intherewrittengrammar,uponreadinganifweexpandfirstproductionandwait
untilif Cond then Stmt hasbeenseentodecidewhethertoexpandOptElsetoelseor.
Hiddenleftfactorsandhiddenleftrecursion
Agrammarmaynotappeartohaveleftrecursionorleftfactors,yetstillhaveissuesthat
willinterferewithparsing.Thismaybebecausetheissuesarehiddenandneedtobe
firstexposedviasubstitution.
Forexample,considerthisgrammar:
A
B

> da | acB
> abB | daA | Af

Acursoryexaminationofthegrammarmaynotdetectthatthefirstandsecond
productionsofBoverlapwiththethird.WesubstitutetheexpansionsforAintothe
thirdproductiontoexposethis:
A
B

> da | acB
> abB | daA | daf | acBf

ThisexchangestheoriginalthirdproductionofBforseveralnewproductions,onefor
eachoftheproductionsforA.Thesedirectlyshowtheoverlap,andwecanthenleft
factor:
A
B
M
N

>
>
>
>

da | acB
aM | daN
bB | cBf
A|f

Similarly,thefollowinggrammardoesnotappeartohaveanyleftrecursion:
S
T

> Tu | wx
> Sq | vvS

YetaftersubstitutionofSintoT,theleftrecursioncomestolight:
S
T

> Tu | wx
> Tuq | wxq | vvS

Ifwetheneliminateleftrecursion,weget:

10
S
T
T'

> Tu | wx
> wxqT' | vvST'
> uqT' |

Programminglanguagecasestudy:ALGOL
Algolisofinteresttousbecauseitwasthefirstprogramminglanguagetobedefined
usingagrammar.Itgrewoutofaninternationaleffortinthelate1950stocreatea
"universalprogramminglanguage"thatwouldrunonallmachines.Atthattime,
FORTRANandCOBOLweretheprominentlanguages,withnewlanguagessprouting
upallaround.Programmersbecameincreasinglyconcernedaboutportabilityof
programsandbeingabletocommunicatewithoneanotheronprogrammingtopics.
ConsequentlytheACMandGAMM(GesellschaftfrangewandteMathematikund
Mechanik)decidedtocomeupwithasingleprogramminglanguagethatallcoulduse
ontheircomputers,andinwhosetermsprogramscouldbecommunicatedbetweenthe
usersofallmachines.TheirfirstdecisionwasnottouseFORTRANastheiruniversal
language.Thismayseemsurprisingtoustoday,sinceitwasthemostcommonlyused
languagebackthen.However,asAlanJ.Perlis,oneoftheoriginalcommitteemembers,
putsit:
"Today,FORTRANisthepropertyofthecomputingworld,butin1957,it
wasanIBMcreationandcloselytiedtoIBMhardware.Forthesereasons,
FORTRANwasunacceptableasauniversallanguage."
ALGOL58wasthefirstversionofthelanguage,followedupverysoonafterby
ALGOL60,whichistheversionthathadthemostimpact.Asalanguage,itintroduced
thefollowingfeatures:
o
o
o
o
o
o
o

blockstructureandnestedstructures
strongtyping
scoping
proceduresandfunctions
callbyvalue,callbyreference
sideeffects(isthisgoodorbad?)
recursion

ItmayseemsurprisingthatrecursionwasnotpresentintheoriginalFORTRANor
COBOL.Youprobablyknowthattoimplementrecursionweneedaruntimestackto
storetheactivationrecordsasfunctionsarecalled.InFORTRANandCOBOL,
activationrecordswerecreatedatcompiletime,notruntime.Thus,onlyoneactivation
recordpersubroutinewascreated.Nostackwasused.Theparametersforthe
subroutinewerecopiedintotheactivationrecordandthatdataareawasusedfor
subroutineprocessing.

11
TheALGOLreportwasthefirsttimeweseeBNFtodescribeaprogramminglanguage.
BothJohnBackusandPeterNaurwereontheALGOLcommittees.Theyderivedthis
descriptiontechniquefromanearlierpaperwrittenbyBackus.Thetechniquewas
adoptedbecausetheyneededamachineindependentmethodofdescription.Ifone
looksattheearlydefinitionsofFORTRAN,onecanseethelinkstotheIBMhardware.
WithALGOL,themachinewasnotrelevant.BNFhadahugeimpactonprogramming
languagedesignandcompilerconstruction.First,itstimulatedalargenumberof
studiesontheformalstructureofprogramminglanguageslayingthegroundworkfora
theoreticalapproachtolanguagedesign.Second,aformalsyntacticdescriptioncouldbe
usedtodriveacompilerdirectly(asweshallsee).
ALGOLhadatremendousimpactonprogramminglanguagedesign,compiler
construction,andlanguagetheory,butthelanguageitselfwasacommercialfailure.
Partlythiswasduetodesigndecisions(overlycomplexfeatures,noIO)alongwiththe
politicsofthetime(popularityofFortran,lackofsupportfromtheallpowerfulIBM,
resistancetoBNF).
Bibliography
A. Aho, R. Sethi, J. Ullman, Compilers: Principles, Techniques, and Tools. Reading, MA:
Addison-Wesley, 1986.
J. Backus, The Syntax and Semantics of the Proposed International Algebraic Language of
the Zurich ACM-GAMM Conference, Proceedings of the International Conference on
Information Processing, 1959, pp. 125-132.
N. Chomsky, On Certain Formal Properties of Grammars, Information and Control, Vol. 2,
1959, pp. 137-167.
J.P. Bennett, Introduction to Compiling Techniques. Berkshire, England: McGraw-Hill, 1990.
D. Cohen, Introduction to Computer Theory. New York: Wiley, 1986.
J.C. Martin, Introduction to Languages and the Theory of Computation. New York, NY:
McGraw-Hill, 1991.
P. Naur, Programming Languages, Natural Languages, and Mathematics, Communications
of the ACM, Vol 18, No. 12, 1975, pp. 676-683.
J. Sammet, Programming Languages: History and Fundamentals. Englewood-Cliffs, NJ:
Prentice-Hall, 1969.
R.L.Wexelblat, History of Programming Languages. London: Academic Press, 1981.

S-ar putea să vă placă și