Sunteți pe pagina 1din 9

Symbol Table Organization & Techniques

Definition:- Symbol table is an abstruct data structure for storing the information. The table
differs from the other data structure in method of accessibility. The other data structures are
index/ pointh accessible where as table is context accessible.
The table Entry here of the form:
Key

Associated Data

We can access associated data once key is specified.


Operation on Table:Two main kinds of operations
1. Primary
-Searching
In this the key value is specified. We have to find the corresponding data.
K (Key is given) V(value)
2. Secondary
Here insertion and deletion are there
Domain: (key, value)
Insert: (table, key, value) table
Delete: (table, key) table
New table: ( ) table
Select: (table, key) (value, boolean)
Is empty: (table) boolean
Symbol table organization for non-blocked structured language:By a non-blocked structure language, we mean a language in which each separately
compiled unit is a single module that has no submodules. All variables declared in a module
are known throughout the module.
There are four modules
- Unordered
- Ordered
- Tree
- Hash
1. Unordered
The simplest method of organizing a symbol table is to add the attribute entries to the
table in the order in which the variables are declared. So there is no particular
ordering.
- Here the insertion is very easy as no comparision are required.
- The searching is more difficult hence it is very time consuming.
- For delete operation, on the average, a search length of (n+1)/2 is required, assuming
there a n records.

Time to insert a key 1


Unsuccessful search the time wanted is (n+1)
An unwanted table organization should be used only if the expected size of the
symbol table is small, since the average time for insertion and deletion is directly
proportional to the table size.
The algorithm is given for selecting a data whose key is known to us.
procedure select(table, key, data value, found)
var i unteger;
begin
with table do
begin
entry[n+1].key = key;
i=1;
while(entry[i].key<>key
i=i+1;
if i<=n then
begin
data value=entry[i].value;
found=TRUE;
end
else
found=FALSE;
end
end
So the complexity is O((n+1)/2). .....
Now lets see various aspects: (Actual use)
i) Fixed as variable length entries.
In fixed , the number of entries possible in a table are fix where as in variable
length it is changed.
ii) Size of key(in byte).
iii) Method of access
iv) How frequent the insertion and deletion.
Language specific consideration:
i) BASIC
Variables are stored as,
<letter> as <letter> <digit>
So with this format at the most 286 entries are possible. To know the address of a
given key, we have formula as follows:
<Address> = <key> - 65 OR
<Address> = (< key letter> 65) * 10 + <digit> - 48
ii) FORTRAN
In this language the fixed sized length of variable is used.
So disadvantage is that the memory is wasted.
A

Padding (This much bytes are wasted)


iii) C or PASCAL
In such high-level languages, the variable length key strange are used.
With this type of key strange, we get the better memory utilization but the
method is not fast.
ptr

length

2. Ordered
In this method, the table is maintained is sorted form based on the variables name. In
such circumstances an insertion must be accomplished by a lookup procedure which
determines where in the symbol table the variable attribute should be placed. The actual
insertion of new entry may generate. Some additional overhead primarily because other
entries may have to be moved to get the position of insertion.
For searching a particular key, we apply Binary-Search Technique. Suppose
( K1,V1) .... (Kmid,Vmid) .... (Kn,Vn) are the entries in the table.
Here mid= n div 2
Algorithm is described below
Find(low, high)
While low < high do
begin
mid = (low+high) div 2;
if k < entry[mid].key then
high = mid;
Find(low,high);
else
low = mid + 1;
Find(low,high);
end
So the key for which we are sending is placed in high or low variable.
The time complexity of this algorithm is O(log n).
Methods of Sorting:
i.

Array
We short the entries in the table in some particular ..... with arrays.
With arrays, the searching of particular entry is very fast. But insertion is time
consuming.
For inserting particular entry, first we have to find its position to locate it. And all the
entries below it are shifted down.

ii.

Index
With this method, the insertion is easy.
ptr

Here
only ptr field is manipulated. We have to do nothing with table.
So, it
is easier for insertion.
iii.
Linked List
In this approach, we combined the array and linked list.
A
B
E
J
Here, array is for searching and linked list is used for insertion and deletion. Here there is
no actual limit of number of entries in the table.
To search a key, starting symbol can be found by comparison and then entries can
be counted to find the exact match.
3. Tree
In a binary tree structured symbol table, each node have the following format:
Left ptr
Key
Value
Right ptr
Here two new fields are present in the record structure. Thus two fields are left pointer
and right pointer. Access to the tree is gained through the ... node. A search proceeds
down the structural links of the tree until the desired node is found as a NULL link field
is encounted.
Lets take an example of storing a string abcd in this format.
LP

value

value

RP
b
NULL

RP

c
b

value

RP

d
Here in case of balanced binary tree the time complexity of a searching a node among
the n node is given by O(log2 n)
4. Hash
A hashing function or key to address transformation is defined as a mapping H:
KA. That is, a hashing function H takes as its argument a variable named and
produces a table address at which the set of attributes for that variables are stored.
With this method the search time is essentially independent of the number of records
in the table.
H
K
A
table space
address space
Let n be the number of entries in the table we define loading factor,
load factor = no of entries(n) / total address space (|A|)
If load factor is high, it is difficult to manage the table.
Now in practical we have
K>>|A|
So if we assign more than one key to one address, there is a problem of collision.
Pre conditioning:Each element of K usually contains characters which are numeric, alphabetic. The
individual characters of a name are not particularly amenable to arithmetical and
logical operation. The proun of transforming a variables name to a form which can be
easily manipulated by a hashing function is called pre conditioning.
Pre conditioning can be handled most efficiently by using the num erically
coded internal representation. Example: ASCII on FBCDIC of each character in the
name.
There are number of hashing functions that are applicable tp symbol table
handling.
1) Division Method:The most widely accepted hashing function s division method which is
defined as, H(x)= (x mod m)+1 for division m.
In mapping keys to addresses, the division method preserves, to a certain
extent, the unitormity that exist in a key set. Keys which are closely bunched
together are mapped to unique address.
In general, if many key are congruent modulo d, and m is not relatively
prime to d, then using m a a divisor can result in poor performance of the
division method.
2) Mid-square method:A second hashing function that performs reasonably well is the midsquare method. In this method, a key multiplied by itself and an address is
obtained by.... bits or digits at both ends of the product until the number of bits
or digits left is equal to the desired address length.
3) Folding Method:-

For the folding method, a key is partitioned into a number of parts, each of
which has the same length as the required address with the possible exception
of the last part. The parts are then added together, ignoring the final carry, to
form an address if the keys are in binary form, the exclusive- OR operation
may be substituted for addition.
Folding is a hashing function which is useful for compressing multiword
keys so that other hashing functions can be used.
4) Length-dependent method:In this approach, the length of the variable name is used in conjunction
with some subpart of the name to produce either a table address directly, or
more commonly, an intermediate ke. The fynction that produced the best
results summed the internal binary representation of the first and last
characters and the length of variable have shifted left four binary places.
A hashing function is a many-to-one mapping. That is, the name space K is in general
much longer than the address space. A .. onto which K is mapped. Of course, two
records cannot occupy the same location and therefore some methods must be used to
resolve the collision that can result.
Open Addressing:
To minimize the number of collisions, a hashing function should map the
variable names in a program to the address space as unitarily as possible.
With open addressing, if a variable name x is mapped to a storage location d,
and this location is already occupied, then other locations in the table are scanned
until a free record location is found for the new record. The cocetion are scanned
according to a sequence which can be defined in many ways. The simplest technique
for handling collision is to use the following sequence:
d, d+1, ...., m-1, m, 1, 2, ...., d-1, ....
A free record location is always found if at last one is available, otherwise the
search halts after scanning m locations. When a record is looked up, the save sequence
of locations is scanned until that record is located, as until an empty record position is
found. This method of collision resolution is called linear probing.
There are three main difficulties with the open-addressing:
1) When trying to locate an open location for record insertion, there is in many
instances, the necessity to examine records that do not have the same initial
hash value.
2) A table-overflow situation cannot be satisfactorily handled using open
addressing. If an overflow occurs, the entire table must be recognized.
3) Difficulty of physically deleting records.

Chaining:
Chaining can be used in a variety of ways to handle overflow records. This method
involves the chaining of colliding records into a special overflow area which is
separate from the prime area. A separate chain kept for each set of colliding records
and conse a pointer field must accompany each record in a primary or an overflow
location.
Figure shows this:
Variable
ADD
Empty
B
Empty
Empty

Value

Link
1
3

The algorithm performs the insertion and deletion by first examining the prime
area locations, as determined by the hashing function and then the overflow area if
necessary. Note that for explicit declaration, the algorithm can be improved by having
insertion performed at the front of list of unordered overflow records. This... allows
for fast insertion; however it has not guarantee that duplicate declarations will be
detected.
Here disadvantage is that the additional storage is required to store the links.
But its performance and versatility is superior to open addressing. The open
addressing scheme is easier to implement and because of its efficient utilization of
storage. It should be considered when implementing the compiler on a small machine.
Symbol-Table Organization for Blocked Structured Language:By a block- structured language, we mean a language which a module can contain
nested submodules and each sub modules can have its own set of locally declared variables.
A variable declared within a module unless the same variable name is redefined within a sub
module of A. The redefinition of a variable holds throughout the scope of the sub modules.
-

Stack symbol tables


Stack implemented tree structural tables
Stack implemented hash symbol tables

1. Stack symbol tables


The simplest symbol table organization for a blocked-structured language is the stack
symbol table. In this organization the records containing the variables attributes are
encountered upon reaching the end of a block, all records for variables declared in the
block are removed. Since these variables cannot be re.. outside the block.

The insertion operation is very simple in a stack symbol table. New records are added
at the top location in the stack. Declaration involving duplicate names can exist in blockstructured languages, but they cannot occur in the same block.
The deletion operation involves the linear search of the table from the top of the
bottom. The search must be conducted in this order to guarantee that the latest
occurrence of a variable with a particular name is located first. ..... because sets of
symbol table records are discarded as blocks are terminated. The average length of
search for a stack symbol table will be less than for the corresponding unordered symbol
table.
2. Stack implemented tree structural tables
In block-structured language, when the compilation of block is completed, the block
must be removed from the table. As a result, the problem of deleting table records must
be addressed. In a tree, the steps to delete a record are:
- Locate the position of record in the tree
- Remove the record from the tree by altering the structural links so as to bypass the
record.
- Rebalance the tree if the deletion of the record has left the tree unbalanced.
It should be observed that the symbol table is maintained as a stack, when a block is
entered during compilation, the value of TOS is updated. As declarations are
encountered, records are inserted on the top the symbol table. The tree for a particular
block can balanced as records are inserted.
For deletion operation some strategy is used to locate the latest occurrence of desired
record. The search must begin at the tree structure for the last block to be entered and
proceed down to the tree for the 1st block entered.
3. Stack Implemented Hash-structured Symbol Table:
The insertion and deletion operations for stack implemented hash symbol tables are
essentially same as for non blocked structure language because local variables are
deleted as blocks are compiled in a blocked structured language.
Back end of a compiler:
-

Concerned with generation of target language code.


Semantic analysis and code generation must be done.
For code generation memory assignment is required.

Run Time Memory Organization:


Kinds of memory organization:
1. Static memory allocation:
In a static storage allocation, it is necessary to be able to decide at compile time
exactly where each data object will reside at runtime.
In order to make such decision
o The size of each block must be known at compile time.

o Only one occurrence of each object is allowed at a given moment during


program execution.
Restrictions:
o Because of 1st ...., variable length strings are not allowed, since their
length cannot be established at compile time.
o Similarly dynamic arrays are not allowed, since their bounds are not
known at compile time.
o Because of 2nd ...., nested procedures are not possible in a static storage
allocation scheme. This is the case because it is not know of compile time
which or how many nested procedures, and hence their local variables, will
be .... at execution time.
For example, FORTRAN does not provide variable length strings, dynamic
arrays, nested procedures as recursive procedure.

S-ar putea să vă placă și