Documente Academic
Documente Profesional
Documente Cultură
Although I did much servey these days, there remains sth I cannot explain clearly or sth I just infer my own conclusion based on the info Ive got(especially got, plt), if I was wrong, please just interrupt me orz If you knows better, just please share with all of us.
Function foo() .text Access to var1 Call bar() .data Var1(at ofs 0) .relocation Text 0x0 Text 0x4 .symtab Text 0x4 : bar() unknow
Function bar() .text Acess to var2 Call foo() .data Var2(at ofs 0) .relocation Text 0x0 Text 0x4 .symtab Text 0x4 : foo() unknow
bar() at 0x8
Var1 at 0x10
Var1 at 0x14
.text Access var1 Call bar() Access var2 Call foo() .data Var1 Var2
Addr of var1 is known Addr of bar is known Addr of var2 is known Addr of foo is known
The link editor examine all the input object file, libs etc and then using the relocation table, symtable information to form the final output executable file.
(n)
ELF
Now we know what linker need and thus what an object file must contain. Lets take a look at the ELF format! There are 3 main classes of ELF file.
relocatable file (.o file) An executable file A shared object file (/lib ; /usr/lib)
ELF header
Sections
Every section in an object file has exactly one section header describing it. Each section occupies one contiguous (possibly empty) sequence of bytes within a file. Sections in a file may not overlap. No byte in a file resides in more than one section.
Section header
Int sh_name : an index into section header string table section.(since len(hfield) = constant) sh_type : SHT_SYMTAB, SHT_DYNSYM, SHT_STRATB, SHT_REL, SHT_HASH, SHT_DYNAMIC, SHT_NOBITS sh_offset, sh_size sh_link, sh_info (n)
PROGBITS: Program contents including code, data. NOBITS: bss SYMTAB and DYNSYM: The SYMTAB table contains all symbols and is intended for the regular linker, while DYNSYM(loaded) is just the symbols for dynamic linking STRTAB: A string table. Unlike a.out files, ELF files can and often do contain separate string tables for separate purposes REL: relocation informaiton, described later. DYNAMIC and HASH: Dynamic linking information and the runtime symbol hash table. (loaded)
Typical sections
.text : instructions. type = PROGBITS, attr (sh_flag)= ALLOC + EXECINSTR .data : static/global data. type = PROGBITS, attr = ALLOC + WRITE .bss : static/global data without initialized value. type=NOBITS, attr=ALLOC+WRITE .strtab/.dynstr : table of strs, which contain symbol/sections names, the later is loaded runtime for dynamic linker.
ELF differs from a.out in that it allows mutiple string tables resides in a single file e.g., for setction names, or dynstr, strtab ...
ndx 0
0 1
\0 \0
1 H A
2 E R
3 L E
4 L \0
5 O Y
6 \0 O
7 H U
8 O \0
9 W
Symbol table
A symbol entry looks like this : typedef struct{
Elf32_Word Elf32_Addr Elf32_Word unsigned char unsigned char Elf32_Half st_name; st_value; st_size; st_info; st_other; st_shndx;
}Elf32_Sym;
Relocation
Whe two or more object files and libs merge into an executable file or a bigger object file(reclocatable file, in ELFs term), the link editor must connects symbolic references to their definition e.g., slide Link editor The relocation table is the information container about how to modify an relocatable file(obj) itself.
Relocation (cont.)
typedef struct{
Elf32_Addr r_offset; Elf32_Word r_info;
}Elf32_Rel; r_offset : give the location in the file in which the relocation applies. r_info : the high 8 bits give the symbol table index. The low 8 bits is the type of relo.(later)
ELF executables
Recall that there are 3 kinds of ELF files: the relocatable file, the executable, and the shared object files. Only the later 2 are considered executables.
ELF intrinsically has dual view.(section view & segment view). The elf executables may have only program header while the relocatblaes have only section header. The shared object file, however, may contain both
p_type :
PT_LOAD
: loadable segment (text, data) PT_DYNAMIC : just like the .dynamic section PT_INTERP : just like the .interp section
The remain just the same like weve discussed in seciton header.
A PIC is thus able to loaded any where with no (or little) relocation overhead.(if, however, the shared libryary is compiled into position dependent, then it would increase al lots of run time linker overhead) gcc has options to make a code pic: -fpic, mpic-register=reg.etc
Dynamic linking
Adding
the program and lib files seg. to the process image. Performaing relocations for the above 2. Transferring control to the program.
Unix doent load-time relocate exe file, but it does relcoate shared object file. Suppose there are some external vars that are uknown until run time. Or the pic code (shared obj i.e, lib)which needs to initialize GOT, PLT (later)
link editor provides lots of info : .dynamic, .hash, .got and .plt, all loadable.
Dynamic Shared lib are usually compiled into pic, but if they do want to call other external lib routine, or every reference to static global/external variable, they should use PLT, GOT so that the position independency (i.e., loads anywhere)holds.
How the GOT and PLT and dynamic linker make the code position independent
In pic, if one wants to access a global data the code may be compiled in following way:
static int a; /* static variable */ extern int b; /* global variable */ a = 1; b= 2; movl $1,a@GOTOFF(%ebx) ;; R_386_GOTOFF reference to variable "a" movl b@GOT(%ebx),%eax ;; R_386_GOT32 ref to address of variable "b" movl $2,(%eax) (n)
where the ebx is the address of GOT. Preceding the above code sequences, the compiler usuall use code like the following to get the address of GOT.
call __i686.get_pc_thunk.bx add $offset,%ebx , where __i686.get_pc_thunk.bx mov (%esp),%ebx ret (n)
Global and static variables are now read or written by first loading the address via a fixed offset from %ebx. What if some external are still unknown? The program linker will create dynamic relocations for each entry in the GOT, telling the dynamic linker how to initialize the entry. These relocations are of type GLOB_DAT.
Before the dynamic linker give control to the loaded program, it initialize GOT cells, some of which may have relocation of type R_386_GLOB_DAT. It then determine the associate symbols, computing their abs. addr. (This might not be done by the link editor because of lack of info) Since all loadable progs (shared obj, exec)have their own got, and since a global var access require an entry for the var in got, a symbol may appear in n gots.
Find al libs.
After the prog is loaded, the dynamic linker use program header to finds out the .dynamic section which linker writes information of needed libryary. The dynamic linker then use env like LD_LIBRUARY_PATH, or .dynamic section to find all the libs and map them in.
If a pic calls an external routine : name1, the compiler then indirect the call into an entry of PLT, this entry looks like this:
//ebx is addr of GOT
PLT0: pushl 4(%ebx) jmp *8(%ebx) nop nop PLT1: jmp *name1@GOT(%ebx) pushl $offset jmp .PLT0@PC
It then jumps to PLT0 which then forwards to symbol resolutrion routine in the dynamic linker. The dynamic linker then pops 2 element on the stack, which in turn it can use them to find out we are doing dynamic linking and the desired routine is name1 It then resolves the addrss of name1, store it back into the got entry and jumps to name1. Subsequent call to name1 only has 1 extra jump over head.
tools