Sunteți pe pagina 1din 39

Language Processors

Instructor: JayaKrishna
Cabin: CSE/FC-36
Mob:9035669570
Mail: jayakrishnaa.r@gmail.com
jayakrishna.r@manipal.edu

By JayaKrishna, Dept.,of CSE, MIT


What is a Language?

Language is a
purely human and non-instinctive method

of communicating
ideas, emotions, and desires

by means of a system
of voluntarily produced symbols.
By JayaKrishna, Dept.,of CSE, MIT
Do you Know, How many Languages are there?

6909 known living languages

5.7 billion speakers


By JayaKrishna, Dept.,of CSE, MIT
Why Languages required

To provide a means of communicating numerical


methods and other procedures between

people.
To provide a means for realising a stated process

on a variety of machines ...


By JayaKrishna, Dept.,of CSE, MIT
Software Languages

8512 software languages

languages to engineer software

15.2 million developers worldwide

By JayaKrishna, Dept.,of CSE, MIT


Study of Language
linguistics interdisciplinary
• lexicology • applied linguistics
• grammar • computational linguistics
• morphology • historical linguistics
• syntax • neurolinguistics
• phonology • psycholinguistics
• semantics • sociolinguistics

computer science
• grammar
• semantics By JayaKrishna, Dept.,of CSE, MIT
Prerequisite
Must have thorough Knowledge on
– Programming Languages
– Machine Architecture
– Formal Languages and Automata
– Algorithms
– Software Engineering

By JayaKrishna, Dept.,of CSE, MIT


Why Language Processors ?

Language processing activities arises due to


the differences between the manner in
which a software designer describes
the ideas concerning the behavior of a
software and the manner in which these
ideas are implemented in a computer
system.

By JayaKrishna, Dept.,of CSE, MIT


Language Processors
• Language processor creates a virtual machine for which
the application developer can design programs

• It appears to the programmer that he/she is working with


the machine that can understand languages like
PASCAL, C, Java, C#...

• Language processor makes it is possible to write programs


in high-level languages without any knowledge of
machine code and machine architecture.

By JayaKrishna, Dept.,of CSE, MIT


Language Processors
Who does Language Processing?

What is a Compiler ?
A program that takes a source program in one language
and translates it into a target language.

Artificial Languge

Human Machine
Artificial Compiler Machine
Languge Languge

Machine Languge

By JayaKrishna, Dept.,of CSE, MIT


Language Processors
beyond Compilation
syntax semantics
• scanners • type checker
• parsers • analysis tools
• pretty-printers • optimisers
• syntax-directed editors • refactoring tools
• renovation tools
• interpreters
• code generators
• debuggers

By JayaKrishna, Dept.,of CSE, MIT


History about Compilers
• With the advent of stored-program computer
pioneered by John von Neumann in late 1940’s, it
became necessary to write sequence of codes or
programs that would cause these computers to
perform the desired computations.

• These programs were written in machine language, like


for example: C7 06 0000 0002 represents the instruction
to move the number 2 to location 0000(in hex) on Intel
8x86 processors used in IBM PCs.

By JayaKrishna, Dept.,of CSE, MIT


History about Compilers
• Writing code in Machine language is extremely difficult,
so it is replaced by Assembly Language, in which
instructions and memory locations are given symbolic
forms.

MOV x,2 is equivalent to the previous machine Instruction

• An Assembler translates the symbolic codes and


memory location of assembly language into the
corresponding numeric codes of machine language.

By JayaKrishna, Dept.,of CSE, MIT


History about Compilers
• It is still not easy to write code in assembly language and it
is difficult to read and understand.

• It is extremely dependent on the particular machine


for which it was written, so code written for one computer
must be completely rewritten for another machine.

• The next major step in programming technology was to write


the operation of a program in a concise form more nearly
resembling mathematical notation or natural
language.
By JayaKrishna, Dept.,of CSE, MIT
History about Compilers

The structure of natural language by Noam


Chomsky proposed classification of
languages according to complexity of
grammars (rules specifying the structure)
and algorithms need to recognize the
grammar.

By JayaKrishna, Dept.,of CSE, MIT


Chomsky Hierarchy/Theory of
Languages
formal languages
vocabulary Σ
finite, nonempty set of elements (words, letters)
Alphabet

string over Σ
finite sequence of elements chosen from Σ : word,
sentence, utterance

formal language λ
set of strings over a vocabulary Σ
λ⊆Σ*
By JayaKrishna, Dept.,of CSE, MIT
Chomsky Hierarchy/Theory of
Languages
formal grammar G = (N, Σ, P, S)
nonterminal symbols N
terminal symbols Σ
production rules P ⊆ (N ∪ Σ)* N (N ∪ Σ)* × (N ∪ Σ)*
start symbol S ∈ N
Grammar classes
type-0, unrestricted
type-1, context-sensitive: (aAc, abc)
type-2, context-free: P ⊆ N × (N ∪ Σ)*
type-3, regular: (A, x) or (A, xB)
By JayaKrishna, Dept.,of CSE, MIT
Chomsky Hierarchy/Theory of
Languages
• Finite automata and Regular expressions are closely
related to Context Free Grammar(CFG) of chomsky’s
type 3 grammar.

• The syntax of a language is specified by a CFG.

• The rules in a CFG are mostly recursive.

• A syntax analyzer checks whether a given program satisfies


the rules implied by a CFG or not.
– If it satisfies, the syntax analyzer creates a parse tree for
the given program.
By JayaKrishna, Dept.,of CSE, MIT
Traditional Compilers
architecture

Human Machine
Artificial Parse Generate Meachine
Language AST Language

C
h
e
c
k

Errors

By JayaKrishna, Dept.,of CSE, MIT


Modern Compilers IDE
syntactic editor services semantic editor services
• syntax checking • error checking
• syntax highlighting • reference resolving
• outline view • hover help
• code folding • code completion
• bracket matching • refactoring

By JayaKrishna, Dept.,of CSE, MIT


Modern Compilers
architecture

Human Parse
Machine
Artificial Generate Meachine
Language analyse AST Language

By JayaKrishna, Dept.,of CSE, MIT


Programs related to Compilers
Interpreters
• An interpreter is another common kind of language
processor.

• Instead of producing a target program as a translation,


an interpreter appears to directly execute the
operations specified in the source program on inputs
supplied by the user.

Example: Java

By JayaKrishna, Dept.,of CSE, MIT


Programs related to Compilers
Assemblers
An assembler is a translator for the assembly language.

Linker
• Both Compilers and Assemblers often rely on a program
called Liker.
• Linker collects code separately compiled or assembled in
different object files into a file that is directly executable.
• It also connects an object program to the code for
standard library functions and to resources supplied by
the OS of the computer, such as memory allocators and
input and output devices.

By JayaKrishna, Dept.,of CSE, MIT


Programs related to Compilers
Loaders
Code produced by Compiler, Assembler or Linker may not
be completely fixed and ready to execute, but whose
principle memory references are all made relative to an
undetermined starting location that can be anywhere in
memory.

Such code is known as relocatable, and Loader will


resolve all relocatable address relative to a given base, or
starting address.
By JayaKrishna, Dept.,of CSE, MIT
Programs related to Compilers
Preprocessor
• It is a separate program that is called by the compiler
before actual translation begins.
• It will delete comments, include and other files, and
perform macro substitutions.

Editors
• Compilers, editors and other programs are merged into
IDE(Interactive Development Environment).
• Such editors are called structure based and includes
with some of the operations of a compiler

By JayaKrishna, Dept.,of CSE, MIT


Programs related to Compilers
Debuggers
• It is a program that can be used to determine
execution errors in a compiled program.
• It can halt execution at prespecified locations called
breakpoints as well as provide information on what
functions have been called and current values of
variables.

Profilers
• It is a program that collects statistics on the behavior of
an object program during execution.
• Typical statistics are number of times each procedure is
called, percentage of execution time.
By JayaKrishna, Dept.,of CSE, MIT
Programs related to Compilers
Project Managers
• If a project is developed by team of s/w engineers
where each module needs to be integrated.

• In such a case a Project Manager program will


coordinate the tasks like merging of the files,
maintaining History of changes in versions.

By JayaKrishna, Dept.,of CSE, MIT


The Translation Process
The Scanner
• It performs Lexical Analysis- collects sequence of characters into
meaningful units called tokens.
• It also performs recognition of tokens –
– It enters Identifiers into symbol table, literals in to Literal Table

Ex: a[index]=4+2 –this contains 12 nonblank characters but only 8 tokens

a identifier
[ left bracket
Index identifier
] right bracket
= assignment
4 Number
+ Plus Sign
2 Number
By JayaKrishna, Dept.,of CSE, MIT
The Translation Process
The Parser
It Receives the source code in the form of tokens from the scanner
and performs Syntax Analysis, which determines the structure of
the program.

The results of the syntax analysis are usually represented as a parse


tree or a Syntax tree

By JayaKrishna, Dept.,of CSE, MIT


The Translation Process
The Parser
expression
Parse Tree for a[index]=4+2
Assign-expression

expression = expression

Subscript-expression Additive-expression

expression [ expression ] expression + expression

Identifier Identifier
a index Number number
4 2

By JayaKrishna, Dept.,of CSE, MIT


The Translation Process
The Parser

Parsers tend to generate a syntax tree instead, which is


condensation of the information contained in the
parse tree(Abstract Syntax Tree).
Abstract Syntax Tree
Assign-expression

Subscript-expression Additive-expression

Identifier Identifier
a index Number number
4 2

By JayaKrishna, Dept.,of CSE, MIT


The Translation Process
The Semantic Analyzer
• The semantics of a program are its meaning as opposed
to its syntax, or structure.

• The semantics of a program determine its runtime


behavior.

• The task of the semantic analyzer is to check for static


semantics, which means some checks like declarations
and type checking prior to the execution of the program.

• The data types computed by the semantic analyzer are


called attributes.
By JayaKrishna, Dept.,of CSE, MIT
The Translation Process
The Semantic Analyzer

Assign-expression

Subscript-expression Additive-expression

Identifier Identifier
a index Number number
4 2
Array of integers integer
integer integer

By JayaKrishna, Dept.,of CSE, MIT


The Translation Process
The source code optimizer
Compilers often include a number of code improvement, or
optimization steps.
Ex:
The expression 4+2 can be precomputed by the compiler to the
result 6 (constant folding)
Assign-expression

Subscript-expression Number
6
integer
Identifier Identifier
a index
Array of integers integer

By JayaKrishna, Dept.,of CSE, MIT


The Translation Process
The source code optimizer
Many optimizations can be performed on the tree, but, it is
easier to optimize a linearized form of the tree which is
closer to assembly code.

A standard choice is Three-Address code.

Ex:
t=4+2
a[index]=t

Optimization can be done as


t=6
a[index]=t or a[index]=6
By JayaKrishna, Dept.,of CSE, MIT
The Translation Process
The source code optimizer
The source code optimizer may use three-address code
by referring to its output as Intermediate code.

Intermediate code would be any internal representation


for the source code used by the compiler.

In more general Intermediate code is also known as


Intermediate representation or IR

By JayaKrishna, Dept.,of CSE, MIT


The Translation Process
The code Generator
The code generator takes the intermediate code and generates code for
the target machine.

We need to consider the properties of the target machine before


conversion.

The representation of data will play a major role, such as how many
bytes or words variables of integer and floating-point data types occupy in
memory.

Ex:
MOV R0,index //value of index ->R0
MUL R0,2 //double value in R0
MOV R1,&a //address of a ->R1
ADD R1,R0 //add R0 to R1
MOV *R1,6 //constant 6 -> address in R1
By JayaKrishna, Dept.,of CSE, MIT
The Translation Process
The target code optimizer
The compiler attempts to improve the target code
generated by the code generator.

These improvements include choosing addressing modes


to improve performance, replacing slow instructions by
faster ones, and eliminating redundant or unnecessary
operations.

MOV R0,index
SHL R0
MOV &a[R0],6 By JayaKrishna, Dept.,of CSE, MIT
The Translation Process

By JayaKrishna, Dept.,of CSE, MIT