Sunteți pe pagina 1din 20

SYSTEMSOFTWARE

ASSINGMENT

MD.KAMRANKAUSAR
(Roll No:- 27) B E COMPUTER (I st year) Monday, November 28 2011

Submitted By

Submitted To

Mr.DANISHRAZA ComputerDepartment FacOfEngg&Tech JAMIAMILLIAISLAMIA

QNo1.WhatareFundamentalofLanguageProcessing. Answer:Language processing = Analysis of SP+synthesis of TP


A specification of the source language from the basis of source program analysis . The specification consist of three components:1. Lexical rules:- Which govern the formation of valid lexical units in the source language. 2. Syntax rules:- Which govern the formaion of valid statement in the source language. 3. Semantic rules:- Which associate meaning with valid statement of the language. A Toy compiler:We briefly describe the front end and back end of a toy compiler for a pascal-like language. The front end The front end performs lexical, syntax and semantic analysis of the source program . Each kind of analysis involves the following functions:1.Determine valid of a source statement from the view point of the analysis. 2.Determine the content of a source statement. 3.construction a suitable representation of the source statement for use by subsequent analysis function. In lexical analysis , the content is the lexical class to which each lexical unit belongs. While in syntax analysis it is the syntactic structure of a source statement . In semantic analysis of the content is the meaning of a statement for a declaration statement,it is the set of attribute of declared variable. Each analysis represent the content of a source statement in the form of 1. table of information 2. description of the source program. Output of the front end:The IR product by the front end consists of two components:1. Table of information 2. An intermediate code(IC) which is a description of source program. Table contain the information obtained during different analysis of SP. The most important table is the symbol table which contain

information concerning all identifiers used in SP. The symbol table is built during lexical analysis. Intermediate code (IC) The IC is a sequence of IC unit, each IC unit representing the meaning of one action in SP. I;:integer; a,b: real; a = b+i; SYMBOL TABLE Symbol Type 1 2 3 4 5 i a b i* temp Int real real real real Length address

intermediate code 1. convert (Id#1)to real, giving (Id#4) 2. Add (Id#3), giving (Id#5) 3. store (Id,#5) in (Id,#2) Lexical analysis (scanning):Lexical analysis identifiers the lexical unit in a source statement . It then classifies the unit into different lexical classes eg: id's constant reverse Id etc. and enter them into different table. lexical analysis build a descriptor , called a token . For each lexical unit. A token contain two fields class code and number in class . Class unit belongs number in class is the entry number of lexical unit in the relevant table. We depict a token as a code no #no eg Id#10. The given example statement a=b=i; is represented as the string token *********************************************************** OP #3 Id#1 OP#5 Id#3 Id#2 OP#10 Where Id#2 stands for identifier occupying entry #2 in the symbol table similarly stands for the operator := etc. Syntax analysis (parsing) syntax analysis process the string of token built by lexical analysis to determine the statement class eg;-

assignment statement etc. It then builds an IC which represent the structure of the statement . The IC is passed to the semantic analysis to determine the meaning of the statement .

real

:=

b Fig:- IC for the statement a,b:=real; a:=b+i;

Semantic analysis:semantic analysis of declaration statement differ from the semantic analysis of imperative statement .The former results in addition of information to the symbol table , eg:-type ,length. When semantic analysis determine the meaning of a table or adds an action to the sequence of action . It then modifies the IC table to further semantic analysis . The analysis end when the tree has been completely processed. := := :=

a,rcal

a,rcal

a,rcal

temp,rcal

b,rcal

I,int

b,rcal

I,rcal

Fig:- shows the semantic of the front end where arrows indicates flow of data. Back end:The back end performs memory allocation and code generation :Memory allocation :Memory allocation is simple task given the presence of the symbol table . The memory requirement of an identifier is computed from its type ,length and dimensionality and memory is allocated to it .The address of the memory are is entered in the symbol table. Symbol 1 2 3 i a b int rcal rcal Type Lengt Addres h s 2000 2001 2002

fig:- symbol table after memory allocation code generation:Code generation uses knowledge of the target architecture ,viz. Knowledge of instructions and addressing modes in the target compiler , to select the appropriate instructions. The importnt issues inb code generatiom are:1. Determine the places where the intermediate result should be kept ,i.e whether they should be kept in memory location or held in machine registers. This is a preparatory step step for code generation. 2. Determine which instructions should be used for type conversion operations . 3. Determine which addressing modes should be used for accessing variables. IR
Symbol table Constant table Other table

Memory Allocation

Code Generation Fig:- Back end of the toy compiler

QNo2.DefineLanguageProcessingActivitiesalongwith ProgramGenerationActivityandProgramExecution Activity. Answer:


Program generator is a software, which accepts the specification of a program to be generated; and produces a program in target language . Initially the semantic gap between source language domain and target language domain as shown in Fig. 1.4 But, now with the program generation activities, the semantic gap exists between source language domain and program generator domain as shown in fig. 1.5 This is so because, the generator domain is close to source language domain, and it is easy for the designer/programmer to write the specification of the program to be generated. This arrangement also reduces the testing effort. This is so because to test an application generated by the generator, it is necessary to only verify the correctness of specification that is input to the generator. Program Execution Activities: The execution of program is segregated in two activities. These two activities are: 1) Program Translation Activities. 2) Program Interpretation Activities.

MODELS OF EXECUTION ACTIVITIES

PROGRAM TRANSLATION INTERPRETATION

PROGRAM

Program Translation Activities: By translation, We mean mapping of sentences of given language to the sentences of another languages, where former being termed as source language and latter as target language.

Program Interpretation Activities:-The process that executes the source code (e.g., an interpreter) is modified (instrumented) to incorporate monitoring activities

QNo3.WhatareFundamentalofLanguageSpecification. Answer:1.Programming language grammars:The lexical and syntactic feature of a programing language are specified by its grammar. A language L can be considered to be a collection of valid sentences. Each sentence can be looked upon as a words and each word as sequence of letter or graphic symbol acceptable in L. A language specification in this manner is known as formal language. A formal language grammar is a set of rules which precisely specify the sentences of L. It is clear that natural language are not formal language due to their rich vocabulary. However PLS are formal language. Terminal Symbols, alphabet and strings: The alphabet of L, denoted by the Greek symbol , is the collection of symbols in its character set. We will use lower case letter a, b , c, etc to denote symbol in . A symbol in the alphabet is known as terminal symbol (T) of L. The alphabet can be represented using the mathematical notation of set eg:={a,b,....,z,0,1,.....9} {,',} symbol known as Meta symbol. A string is a finite sequence of symbol. We will represent strings by greek symbol ,, etc. Thus = axy is a string over . Productions:A production is a rule of the grammar. A production has the from A non terminal symbol: := string of Ts and Nts When an NT can be written as one of many different strings, the symbol '\' (stand for or) is user to seprate string on RHS eg:<article> ::= a|an| the The string on the RHS of a production can be concatenation of component strings eg:- Productions <Noun Phrase> ::=<Article><Noun>

Each grammar G defines a language LG. G contain an NT called the distinguish symbol. A valid of a string of LG is obtained by using the following procedure:1. Let ='s' 2. While is not a string of terminal symbol. (I)Select NT appearing in , say x. (II) Replace x by a string appearing on RHS production of x.

QNo4.DefineLanguageProcessorDevelopmenttoolsI.eLEX QNo4.DefineLanguageProcessorDevelopmenttoolsI.eLEX andYACC.


Answer: The analysis Phase of a LANGUAGE Processor has a standard form irrespective of its purpose, the source text is subjected to lexical, syntax and semantic analysis and the result of analysis are represented in IR. The LPDT require the following two inputs:1. Specification of a grammar of Language L. 2. Specification of semantic action to be performed in the Analysis Phase It generates programs that performs lexical, syntax and semantic analysis of the source Program and construct IR. Source Program

scanning Grammar of L Semantic action Front end generator

Parsing Semantic analysis

Fig:- A Language Processor Development Tools.

Source Program L

Specification

Lexical specification

LEX

scanner

of language L

Syntax specification

YACC

Parser

Fig:- Using LEX & YACC

Lex - A Lexical Analyzer Generator. Lex is a program generator designed for lexical processing of character input streams. It accepts a high-level, problem oriented specification for character string matching, and produces a program in a general purpose language which recognizes regular expressions. The regular expressions are specified by the user in the source specifications given to Lex. The Lex written code recognizes these expressions in an input stream and partitions the input stream into strings matching the expressions. At the boundaries between strings program sections provided by the user are executed. The Lex source file associates the regular expressions and the program fragments. As each expression appears in the input to the program written by Lex, the corresponding fragment is executed. The purpose of the lexical analyzer is to partition the input text, delivering a sequence of comments and basic symbols. Comments are character sequences to be ignored, while basic symbols are character sequences that correspond to terminal symbols of the gram-mar defining the phrase structure of the input . LEX accept an input specification which consist of two components. The first components is a Specification of String representing the lexical units in L, e.g Ids and constant. The second components is a specification of semantic action at building as TR. Accordingly, the semantic action makes new entries in the table and build tokens for the lexical units.

%{ letter [ A-Z a-z ] digits [ 0-9 ] }% %% begin { return ( BEGIN ); } end { return ( END ); } := { return ( AS GOP ); } { letter } ({ letter } | } digit }) * { YY lval=enter_id(); return(ID);} { digit } + { YY lval=enter_num(); return(NUM);} %% enter_id() { /* enter the ID in the symbol table and returns entry number */ } { /* enter the number in the constant table and returns entry number */ } Fig:- A Sample LEX Specifications It defines strings begins, end , :=, & identifier is found, it is entered in the symbol table ( if not already Present ) using the return enter_id. The Pair ( ID, entry #) forms the token for the identifier string. By convention entry# is input in the global Variable YYLval, & the class code ID is return as the value of call on scanner similar action as token on finding a constant. Yacc: Yet Another Compiler-Compiler Yacc provides a general tool for imposing structure on the input to a computer program. The Yacc user prepares a specification of the input process; this includes rules describing the input structure, code to be invoked when these rules are recognized, and a low-level routine to do the basic input. Yacc then generates a function to control the input process. This function, called a parser, calls the user-supplied low-level input routine (the lexical analyzer) to pick up the basic items (called tokens) from the input stream. These tokens are organized according to the input structure rules, called grammar rules; when one of these rules has been recognized, then user code supplied for this rule, an action, is invoked; actions have the ability to return values and make use of the values of other actions. Yacc is written in a portable dialect of C[1] and the actions, and output subroutine, are in C as well. Moreover, many of the syntactic conventions of Yacc follow C.

Each string specification in the input to YACC resembles a grammar Production. The Parser generated by YACC performs reduction according to this grammar. The action associated with a string specification are executed when a reduction is made according to specification. An attribute is associated with every non-terminal symbol. The value of this attribute can be manipulated during parsing. The attributed can be given any user defined structure. A symbol '$n' in the action part of a translation rule refer to the attribute of the nth symbol in RHS of the string specification. '$s' represent the attribute of the LHS symbol of string. %% E : E+T { $$ = gen code ( '+', $1, $3);} |T { $$ = $1;} ; T : T* v { $$ = gen code ( '*', $1, $3); } |v{ $$ = $1; } v: id {$$ + gen code ( $1 ); } ; gen code ( operator, operand_1, operand_2 ) {/* Generates code using operand descriptor. Return descriptor for result */} gen code ( symbol ) { /* Refer to symbol /constant table entry. Build and return descriptor for the symbol */}

QNo5.WhatisLinkerandLoader. Answer: Linker

In computer science, a linker or link editor is a program that takes one or more objects generated by a compiler and combines them into a single executable program. On Unix variants the term loader is often used as a synonym for linker. Computer programs typically comprise several parts or modules; all these parts/modules need not be contained within a single object file, and in such case refer to each other by means of symbols. Typically, an object file can contain three kinds of symbols: defined symbols, which allow it to be called by other modules,

undefined symbols, which call the other modules where these symbols are defined, and local symbols, used internally within the object file to facilitate relocation. When a program comprises multiple object files, the linker combines these files into a unified executable program, resolving the symbols as it goes along. Linkers can take objects from a collection called a library. Some linkers do not include the whole library in the output; they only include its symbols that are referenced from other object files or libraries. Libraries exist for diverse purposes, and one or more system libraries are usually linked in by default. The linker also takes care of arranging the objects in a program's address space. This may involve relocating code that assumes a specific base address to another base. Since a compiler seldom knows where an object will reside, it often assumes a fixed base location (for example, zero). Relocating machine code may involve re-targeting of absolute jumps, loads and stores. Many operating system environments allow dynamic linking, that is the postponing of the resolving of some undefined symbols until a program is run. That means that the executable code still contains undefined symbols, plus a list of objects or libraries that will provide definitions for these. Loading the program will load these objects/libraries as well, and perform a final linking.

Loader

In computing, a loader is the part of an operating system that is responsible for loading programs. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution. Loading a program involves reading the contents of executable file, the file containing the program text, into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code. All operating systems that support program loading have loaders, apart from systems where code executes directly from ROM or in the case of highly specialized computer systems that only have a fixed set of specialized programs. In many operating systems the loader is permanently resident in memory, although some operating systems that support virtual memory may allow the loader to be located in a region of memory that is

page able. In the case of operating systems that support virtual memory, the loader may not actually copy the contents of executable files into memory, but rather may simply declare to the virtual memory subsystem that there is a mapping between a region of memory allocated to contain the running program's code and the contents of the associated executable file. (See memory-mapped file.) The virtual memory subsystem is then made aware that pages with that region of memory need to be filled on demand if and when program execution actually hits those areas of unfilled memory. This may mean parts of a program's code are not actually copied into memory until they are actually used, and unused code may never be loaded into memory at all. In Unix, the loader is the handler for the system call execve().[1] The Unix loader's tasks include: 1. validation (permissions, memory requirements etc.); 2. copying the program image from the disk into main memory; 3. copying the command-line arguments on the stack; 4. initializing registers (e.g., the stack pointer); 5. jumping to the program entry point (_start).

QNo6.WhatisTranslatedLinkandLoadTimeAddresses. Answer:
Load Modules and address binding: A load module is a compiled and possibly linked (see below) version of the source code. The processing of the source code is done with no knowledge of where the resulting load module is put in memory. Thus the process when loaded into memory, must ultimately have all addresses mapped into the process address space in real memory before it can execute. This is called address binding. Address binding can be done at: Compile time Load time Execution (run) time Loading and linking

A program (load module) may be made up from a number of object module files. To be executable in memory, a program must map all relative internal addresses (see below) in the load module(s) to memory addresses (binding), and in addition any references made to other modules must also be resolved (linking). Thus to create an active process in memory, the modules must be both linked (resolve references among modules) and loaded into memory. We distinguish between loading a single load module which requires only binding of he relative addresses (already linked), and the loading of multiple modules which requires also linking (at load or run time). The next three items assumes a single load module. Absolute loading The references in the program load module are already resolved into specific physical memory addresses. programmer must know where the process would be loaded in memory (or the strategy for assigning processes to memory) modifications made to the process may require changing all addresses in the module (recompile) Absolute address assignment can be done either by the programmer or the compiler/assembler. The latter means that the programmer uses symbolic references to be later filled in by the compiler or assembler.

Relocatable Loading ( bind to real addresses at load time) Assume a single linked load module that can be located anywhere in memory. It is not desirable to decide in advance into which region of memory s load module must be loaded, so we now defer that decision to load time. The module does not have absolute addresses, but addresses that are relative to some known point, such as the start of the program module.

The loader places the module at location x, by adding x to each memory reference in the module as it loads the module into memory. In order to do this the module needs a relocation dictionary which tells where the addresses are, and how to translate them at load time. If a process image is swapped out to disk, and then brought back into memory, it would have to be loaded into the same location it was before (x), since it already has its memory references resolved to absolute values because it was previously resolved at load time. The process image swapped to disk generally retains the resolved absolute addresses in the memory image that were created at initial load tine (Stallings p.. 329). Comment: re-setting the module addresses to their pre-loaded relative (re-locatable) addresses would probably involve too much overhead for swapping. Dynamic Run-time real address binding ( bind to real addresses at run time) Assumes this is a single linked load module. This would allow the swapping mechanism to restore a swapped process to an arbitrary location in memory, not necessarily its old location maximizes memory utilization. Defer the calculation of absolute addresses until it is actually needed at run time. Thus, the load module is loaded into main memory with all memory references in relative form (unresolved to real) - for example the references would be offsets from the beginning of the process. It is not until an instruction is actually executed that the absolute address is calculated. Because these actions are at run time, hardware assistance is needed for the dynamic address translation in order to have good performance. The hardware adds a base address value to each address in real time. Linking The function of the linker is to take a collection of object modules and produce a load module.

Address references to other modules must be expressed symbolically in an unlinked object module. Linker may produce a single load module where all references are resolved relative to some point in the module, if done before loading. If linking is deferred to later (loading or run time), then the references to other modules (only) are left as symbolic with local references being resolved to relative addresses. Linkage Editor The nature of the address linkage depends on the type of load module to be created and when the linkage occurs. If (usual case), a relocatable load module is needed, then the link is done follows: Each compiled object module is created with references relative to the beginning of the beginning of the object module. All modules linked are put together in a single load module with all references relative the origin of the load module. A linker that produces a relocatable load module is often referred to as a linkage editor. NOTE: distinguish between linking and logical to physical address binding Dynamic Linking Meaning of dynamic: Defer linkage of external modules until after the load module has been created: at load time or run time. Load module contains unresolved references to other modules (symbolic) these references can be resolved either at load time or run time. Load-time dynamic linking: Like relocatable loading above, except multiple modules are involved. This is the DLL (Dynamic Link Library) concept in Windows or OS/2

The load module is read into main memory. Any reference to an external module causes the loader to find this module, load it, and alter the reference to a relative address in memory from the beginning of the application. Several advantages to this approach over static loading: Easier to incorporate changes no need to relink the entire application. Automatic code sharing OS will load a single copy of a routine for multiple applications referencing it. Easier for independent software developers to extend the function a widely used software package such as an operating system- extension packages as a dynamic link module. Run-time Dynamic Linking (dynamic/run-time loading): Like dynamic runtime address binding for a single load module above, except multiple modules involved which must be loaded at run time when referenced. This is the DLL (Dynamic Link Library) concept in Windows or OS/2 Some of the linking is postponed until execution time. External references to target modules remain in the loaded module (in memory). When a call is made to the absent module, the OS locates the module, loads it, and links it to the calling module Comparison: dynamic address binding(single load module) vs dynamic linking (multiple modules): Dynamic address binding (single load module) allows an entire module to be moved around however, the structure of the module is static, being unchanged throughout the execution of the process and from one execution to the next. In some cases it is not possible to determine prior to execution which module will be required thus dynamic linking will be required. The advantage of dynamic linking is that it is not necessary to allocate memory for program units unless those units are actually

referenced. Implicit load-time linking or Explicit runtime linking Once the DLLs file image is mapped into the calling processess address space, the DLLs functions are available to all threads in the processes. After this mapping, the DLL code becomes completely integrated into the process and the threads cannot distinguish the DLL code and data from other code and data in the process address space.

Mapping a DLL into a Process s Address Space


Implicit Linking (Load Time): When the OS loads an exe file, the system examines the contents of the exe file image to see which DLLs must be loaded for the application to run. The linker of the base exe file embeds information in the exe modules which tells the loader which additional DLLs must be linked at load time. The loader/linker then links the specified DLLs to the exe file during load time. Explicit Linking (Run Time): A DLLs file image can be explicitly mapped into a processs address space when one of the processes threads calls a function in one of the DLLs associated with the exe application. The called DLL file images are then located, loaded, and mapped into the calling processs address space (dynamically at run time).

QNo7.WhatareFundamentalofRelocationandLinking QNo7.WhatareFundamentalofRelocationandLinking concept. Answer:


Program Relocation:- Program Relocation is the process of
modifying the addresses used in the address sensitive instruction of a program such that the program can execute correctly from the designated area of memory. Let AA be the set of absolute addresses instruction or data addresses used in the instructions of a program P. AA implies that program P assumes its instructions and data to occupy memory words with specific address. Such program called an address sensitive program contains one or more of the following. 1. An addresses constant ; a data word which contains an address a ; AA.

2. In the following ,we discuss relocation of programs contains address sensitive instructions. Address constants are handled analogously. An address sensitive program P can executed correctly only if the start address of the memory area allocated to it is the same as its translated origin. To execute correctly from any other memory area, the address used in each address sensitive instruction of P must be 'corrected'. If linked origin not equal to origin ,relocation must be performed by the linker .If load origin not equal to linked origin,relocation must be performed by the loader. In general , a linker always performs relocation ,where as come loaders do not. Relocation occurs both when linker combines object files and when a loader copies an executable file into memory in preparation for execution. Linking:- Linking is the process of binding an external reference to the correct link time address. Consider an application program AP consisting of a set of program units SP={Pi}.A program units Pi interacts with another program units Pj by using address of Pjs instruction and data in its own instruction Pj and Pi must contain public definitions and external reference as defined in the following. Public Definition :- A symbol pub_symb defined in a program unit which may be referenced in other program unit. External referenced :- A referenced to a symbol ext_symb which is not defined in the program unit containing the reference. The handling of public definition and external reference is describe in the following . EXTERN and ENTRY statements The ENTRY statements lists the public definitions of a program unit , i.e it list those symbols defined in the program unit which may be reformed in other program units. The EXTERN statements lists the symbols to which external reference are made in the program unit.

Fig:Compile,linkandexecutestagesforrunningprogram(a process)

S-ar putea să vă placă și