Author: Charles N. Fischer and Richard J. LeBlanc, Jr. The Benjamin/Cumming Publishing Company, Inc Gain Score Homework: 10% Project: 40% (Two members in a team) Lexical analysis: 10% Syntax analysis: 20% Code generation: 10% Mid Exam: 25% Final Exam: 25% Contents Introduction A Simple Compiler Scanning Theory and Practice Grammars and Parsing LL(1) Parsing LR Parsing Semantic Processing Symbol Tables Run-time Storage Organization Contents (Contd.) Processing Declarations Processing Expressions and Data Structure References Translating Control Structure Translating Procedures and Functions Attribute Grammars and Multipass Translation Code Generation and Local Code Optimization Global Optimization Parsing in the Real World Chapter 1 Introduction Contents Overview and History What Do Compilers Do? The Structure of a Compiler The Syntax and Semantics of Programming Languages Compiler Design and Programming Language Design Compiler Classifications Influences on Computer Design Overview and History Compilers are fundamental to modern computing. They act as translators, transforming human- oriented programming languages into computer-oriented machine languages.
Programming Language (Source) Compiler Machine Language (Target) Overview and History (Contd.) The first real compiler FORTRAN compilers of the late 1950s 18 person-years to build Today, we can build a simple compiler in a few month. Crafting an efficient and reliable compiler is still challenging.
Overview and History (Contd.) Compiler technology is more broadly applicable and has been employed in rather unexpected areas. Text-formatting languages, like nroff and troff; preprocessor packages like eqn, tbl, pic Silicon compiler for the creation of VLSI circuits Command languages of OS Query languages of Database systems What Do Compilers Do? Compilers may be distinguished according to the kind of target code they generate: Pure Machine Code Assume there is no run-time OS support. For systems implementation or embedded systems Run on bare machines Augmented Machine Code For hardware + OS + language-specific support routines, e.g., I/O, math functions, storage allocation, and data transfer. Virtual Machine Code JVM, P-code Portable 4-times slower Code is interpreted. What Do Compilers Do? (Contd.) Another way that compilers differ from one another is in the format of the target machine code they generate Assembly Language Format Simplify compilation Use symbolic labels rather than calculating address Pro: good for smaller machines Con: need an additional pass What Do Compilers Do? (Contd.) Relocatable Binary Format A linkage step is required Similar to the output of assembler Need a linking step before execution Good for modular compilation, cross-language references, and libraries Memory-Image (Load-and-Go) Format Fast Very limited linking capabilities Good for debugging (frequent changes)
Another kind of language processor, called an interpreter, differs from a compiler in that it executes programs without explicitly performing a translation
Advantages and Disadvantages of an interpreter See page 6 & 7 What Do Compilers Do? (Contd.) Source Program Encoding Output Interpreter Data What Do Compilers Do? (Contd.) Advantage Modification to program during execution Interactive debugging Not for every language, e.g., Basic, Pascal Dynamic-typed languages Variable types may change at run time, e.g., LISP. Difficult to compile Better diagnostics Source code is available. Machine independence However, the interpreter itself must be portable. What Do Compilers Do? (Contd.) Disadvantage Slower execution due to repeated examination Dynamic (LISP): 100:1 Static (BASIC): 10:1 Substantial space overhead The Structure of a Compiler Modern compilers are syntax-directed Compilation is driven the syntactic structure of programs; i.e., actions are associated with the structures. Any compiler must perform two major tasks Analysis of the source program Synthesis of a machine-language program The Structure of a Compiler (Contd.) Scanner Parser Semantic Routines Code Generator Optimizer Source Program (Character Stream) Tokens Syntactic Structure Intermediate Representation Target Machine Code Symbol and Attribute Tables (Used by all Phases of The Compiler) The structure of a Syntax-Directed Compiler The Structure of a Compiler (Contd.) Scanner The scanner begins the analysis of the source program by reading the input, character by character, and grouping characters into individual words and symbols (tokens) The tokens are encoded and then are fed to the parser for syntactic analysis For details, see the bottom of page 8. Scanner generators regular exp for tokens finite automata as programs lex or scangen The Structure of a Compiler (Contd.) Parser Given a formal syntax specification (typically as a context-free grammar [CFG]), the parse reads tokens and groups them into units as specified by the productions of the CFG being used. While parsing, the parser verifies correct syntax, and if a syntax error is found, it issues a suitable diagnostic. As syntactic structure is recognized, the parser either calls corresponding semantic routines directly or builds a syntax tree. grammar yacc or llgen parser The Structure of a Compiler (Contd.) Semantic Routines Perform two functions Check the static semantics of each construct Do the actual translation for generating IR The heart of a compiler Optimizer The IR code generated by the semantic routines is analyzed and transformed into functionally equivalent but improved IR code. This phase can be very complex and slow Peephole optimization The Structure of a Compiler (Contd.) One-pass compiler No optimization is required To merge code generation with semantic routines and eliminate the use of an IR Retargetable compiler Many machine description files, e.g., gcc Match IR against target machine patterns. Compiler writing tools Compiler generators or compiler-compilers Lex and Yacc E.g., scanner and parser generators Compiler Design and Programming Language Design An interesting aspect is how programming language design and compiler design influence one another. Programming languages that are easy to compiler have many advantages See the 2 nd paragraph of page 16. Compiler Design and Programming Language Design (Contd.) Languages such as Snobol and APL are usually considered noncompilable What attributes must be found in a programming language to allow compilation? Can the scope and binding of each identifier reference be determined before execution begins Can the type of object be determined before execution begins? Can existing program text be changed or added to during execution? Compiler Classifications Diagnostic compilers Report and repair compile-time errors. Add run-time checks, e.g., array subscripts. should be used in real world. vs. production compiler Optimizing compilers Re-targetable compiler Localize machine dependence. difficult to implement less efficient object code Integrated programming environments integrated E-C-D