Lexical Analyzer (Compiler Contruction)

Compiler
A complier is program that reads program written in one language (Source Language) and translate it into
equivalent program in another language.
Source Language is typically a High Level Language.

Target Language is typically a machine code of a particular machine.
https://www.geeksforgeeks.org/introduction-compiler-design/
Major Parts of Compiler

 Analysis Phase
 Synthesis Phase
ANALYSIS PHASE
In this part source program are break up into sub-pieces and creates and intermediate representation of
source program.
In other words it create database after analysis on which synthesis phase is based.
There are three parts in Analysis Phase.
1. Lexical Analyzer
2. Syntax Analyzer
3. Semantic Analyzer
Lexical Analyzer
Lexical Analyzer reads the source program character by character and returns the tokens of the source
program. A (Deterministic) Finite State Automaton can be used in the implementation of a lexical analyzer.
The lexical analyzer reads the stream of characters making up the source program and groups the characters
into meaningful sequences called lexemes.
For each lexeme, the lexical analyzer produces as output a token of the form
<Token-name, attribute-value>
Features of Lexical Analyzer

• Scan Input • Identify Tokens • Insert Tokens into ST
• Remove WS, NL, • Create Symbol Table • Generate Errors • Send Tokens to Parser
https://engineersview.wordpress.com/2013/11/06/lexical-analysis
A token describes a pattern of characters having same meaning in the source program. Such as identifiers,
operators, keywords, numbers, and delimiters. Puts information about identifiers into the symbol table.
http://www2.dcs.hull.ac.uk/people/bct/08348/08208-9697
 Specifications of Tokens in Compiler Design
Alphabets: Any finite set of symbols {0,1} is a set of binary alphabets, {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}
is a set of Hexadecimal alphabets, {a-z, A-Z} is a set of English language alphabets.
Strings: A string is a finite sequence of alphabets. The total number of occurrences of the alphabets is
considered as the length of the string.
Language: A finite sent of strings over some finite set of alphabets is known as language.
 Designing of Lexical Analyzers:

There are two ways for designing a lexical analyzer, they are
1. Hand coding
2. Lexical analyzer generator
1. Hand Coding:
Programmer has to perform the following task Specify the tokens by writing regular expressions
Construct Finite Automata equivalent to a regular expression Recognize the tokens by a constructed Finite
Automata Whereas Lexical analyzer generator, programmer will do step1 remaining two steps are done by
lexical analyzer generator automatically
2. Lexical Analyzer Generator
Lexical Analyzer Generator introduce a tool called Lex, which allows one to specify a lexical analyzer by
specifying regular expressions to describe pattern for tokens
The input for the lex tool is lex language.
 Implementation of lexical analyzer:

The lexical structure of more or less every programming language can be specified by a regular language,
common way to implement a lexical analyzer is to
1. Specify regular expressions for all of the kinds of tokens in the language. The disjunction of all
of the regular expressions thus describes any possible token in the language.
2. Convert the overall regular expression specifying all possible tokens into a deterministic finite
automaton (DFA).
3. Translate the DFA into a program that simulates the DFA. This program is the lexical analyzer.
Syntax Analyzers:
A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given program. A syntax
analyzer is also called as a parser. A parse tree describes a syntactic structure.
http://electrofriends.com/projects/computer-programming/ Syntax -analyzer/
The parser analyzes the source code (token stream) against the production rules to detect any errors in the
code. This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and generating
a parse tree as the output of the phase .Parsers are expected to parse the whole code even if some errors
exist in the program.
hdttps://cs.stackexchange.com/questions/51295
Parsers use error recovering strategies.
 The syntax of a language is specified by a context free grammar (CFG).
 The rules in a CFG are mostly recursive.
 A syntax analyzer checks whether a given program satisfies the rules implied by a CFG or not.
 The syntax analyzer deals with recursive constructs of the language.
 The lexical analyzer simplifies the job of the syntax analyzer.
 The syntax analyzer works on the smallest meaningful units (tokens) in a source program to
recognize meaningful structures in our programming language.
Role of DFA and NFA

 A DFA is an NFA with the following restrictions: The lexical analysis process starts with a
definition of what it means to be a token in the language with regular expressions or grammars.
This is translated to an abstract computational model for recognizing tokens (a non-deterministic
finite state automaton), which is then translated to an implementable model for recognizing the
defined tokens (a deterministic finite state automaton) to which optimizations can be made (a
minimized DFA).
 Design of a Lexical Analyzer Generator Translate regular expressions to NFA Translate NFA to
an efficient DFA regular expressions NFADFA Simulate NFA to recognize tokens Simulate DFA
to recognize tokens Optional.
 Nondeterministic Finite Automata An NFA is a 5-tuple (S, , , s 0, F) where S is a finite set of
states  is a finite set of symbols, the alphabet  is a mapping from S   to a set of states’ s 0  S
is the start state F  S is the set of accepting (or final) states .
 In syntax analysis, parse trees are used to show the structure of the sentence, but they often contain
redundant information due to implicit definitions (e.g., an assignment always has an assignment
operator in it, so we can imply that), so syntax trees, which are compact representations are used
instead. Trees are recursive structures, which complement CFGs nicely, as these are also recursive
(unlike regular expressions).
 There are many techniques for parsing algorithms (vs FSA-center lexical analysis), and the two
main classes of algorithm are top-down and bottom-up parsing. Context-free grammars can be
represented using Backus-Naur Form (BNF). BNF uses three classes of symbols: non-terminal
symbols (phrases) enclosed by brackets <>, terminal symbols (tokens) that stand for themselves,
and the met symbol: = - is defined to be.
 Transition Table The mapping  of an NFA can be represented in a transition table State Input a
Input b 0{0, 1}{0} 1{2} 2{3}  (0, a ) = {0,1}  (0, b ) = {0}  (1, b ) = {2}  (2, b ) = {3} .
References
https://slideplayer.com/slide/7975656/
hdttps://cs.stackexchange.com/questions/51295
http://electrofriends.com/projects/computer-programming/ Syntax -analyzer/
https://www.slideshare.net/appasami/cs6660-compiler-design-notes?from_action=save

Lexical Analyzer (Compiler Contruction)

Încărcat de

Informații document

Titlu original

Drepturi de autor

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Lexical Analyzer (Compiler Contruction)

Încărcat de

Drepturi de autor:

Compiler

Source Language is typically a High Level Language.

Major Parts of Compiler

There are three parts in Analysis Phase.

Features of Lexical Analyzer

 Designing of Lexical Analyzers:

2. Lexical Analyzer Generator

 Implementation of lexical analyzer:

http://electrofriends.com/projects/computer-programming/ Syntax -analyzer/

Role of DFA and NFA

http://electrofriends.com/projects/computer-programming/ Syntax -analyzer/

S-ar putea să vă placă și