Sunteți pe pagina 1din 61

Executable and Linkable Format

Linker, Loader, and the ELF


Although I did much servey these days, there remains sth I cannot explain clearly or sth I just infer my own conclusion based on the info Ive got(especially got, plt), if I was wrong, please just interrupt me orz If you knows better, just please share with all of us.

What is an executable file?


A bunch of CPU instructions? If so, then how do we know that it is not a encoded text or a vedio, image.? So we must have some headers in the file: The header must help us decide all the files properties depending on nothing else. (n)

The Object file


The object file differs from the executable file in that it is not complete.(in terms of symbol resolution, address relocation..etc) It is the (dynamic/static)linkers responsibility to make it work! How and when ?

Link editor(linker: ld)


When compiling the source code into object code, there may be lack of information of some symbols e.g., the lib function calls or imported (external) variable may not be known this time. Access to the static/global variable within the obj file itself may also raise problems.(n)

Link editor (cont.)


All these will be known someday. so the compiler just collect all the information about unresolved symbols, the addresses need to be relocated in the object file and then pass all these pices of info to the one who take over his job (static)link editor. Relocation table, Symbol table.

Function foo() .text Access to var1 Call bar() .data Var1(at ofs 0) .relocation Text 0x0 Text 0x4 .symtab Text 0x4 : bar() unknow

Function bar() .text Acess to var2 Call foo() .data Var2(at ofs 0) .relocation Text 0x0 Text 0x4 .symtab Text 0x4 : foo() unknow

Linke editor (cont.)


Foo() at 0x0

bar() at 0x8

Var1 at 0x10

Var1 at 0x14

.text Access var1 Call bar() Access var2 Call foo() .data Var1 Var2

Addr of var1 is known Addr of bar is known Addr of var2 is known Addr of foo is known

Link editor (cont.)

The link editor examine all the input object file, libs etc and then using the relocation table, symtable information to form the final output executable file.

Link editor (cont.)

Link editor (fin.)

What goes into an object file?


Header

information Object code Relocation Symbols Debugging information

(n)

ELF
Now we know what linker need and thus what an object file must contain. Lets take a look at the ELF format! There are 3 main classes of ELF file.

relocatable file (.o file) An executable file A shared object file (/lib ; /usr/lib)

ELF file view

Object file view

Executable file view

ELF header

ELF header (cont.)


e_ident : mainly contain the magic number. e_type : values ET_REL, ET_EXEC e_entry : the entry point, where the system first transfer ctrl, may be zero. e_phoff : program header ofs, may be zero e_shoff : section header ofs, may be zero. e_phentsize : entry size of pheader. e_phnum : number of entries in pheader.

ELF header (cont.)


e_shentsize : entry sizie of section table. e_shnum : number of entries in sec table.

Sections
Every section in an object file has exactly one section header describing it. Each section occupies one contiguous (possibly empty) sequence of bytes within a file. Sections in a file may not overlap. No byte in a file resides in more than one section.

Section header

Section header (cont.)

Int sh_name : an index into section header string table section.(since len(hfield) = constant) sh_type : SHT_SYMTAB, SHT_DYNSYM, SHT_STRATB, SHT_REL, SHT_HASH, SHT_DYNAMIC, SHT_NOBITS sh_offset, sh_size sh_link, sh_info (n)

More on section types.

PROGBITS: Program contents including code, data. NOBITS: bss SYMTAB and DYNSYM: The SYMTAB table contains all symbols and is intended for the regular linker, while DYNSYM(loaded) is just the symbols for dynamic linking STRTAB: A string table. Unlike a.out files, ELF files can and often do contain separate string tables for separate purposes REL: relocation informaiton, described later. DYNAMIC and HASH: Dynamic linking information and the runtime symbol hash table. (loaded)

Typical sections
.text : instructions. type = PROGBITS, attr (sh_flag)= ALLOC + EXECINSTR .data : static/global data. type = PROGBITS, attr = ALLOC + WRITE .bss : static/global data without initialized value. type=NOBITS, attr=ALLOC+WRITE .strtab/.dynstr : table of strs, which contain symbol/sections names, the later is loaded runtime for dynamic linker.

Typical sections (cont.)


.data : initialized static/global data that contributes to the programs memory img. The attr is SHF_WRITE + SHF_ALLOC. .interp : contains the path of program interpreter, usually the dynamic linker. .rel.text/.rel.data : the relocation table for text/data, respectively. (n)

Typical sections (cont.)


.symtab : the symbol table, basically for the link editor(static). .dynamic : holds dynamic linking info, loaded at run time e.g., ptrs to dynstr, dynsym, hash, relocation entries(loaded) .dynsym : SHF_ALLOC, holds symbols that are resloved run time (GOT, PLT)

Typical sections (cont.)


.hash : maybe it is just an assistance for dynsym, make the dynamic linker faster. .got : global offset table, for PIC to access static/global var(described later.) .plt : procedure linkage table, analogus to got

The string table section

ELF differs from a.out in that it allows mutiple string tables resides in a single file e.g., for setction names, or dynstr, strtab ...

ndx 0

0 1

\0 \0

1 H A

2 E R

3 L E

4 L \0

5 O Y

6 \0 O

7 H U

8 O \0

9 W

Symbol table
A symbol entry looks like this : typedef struct{
Elf32_Word Elf32_Addr Elf32_Word unsigned char unsigned char Elf32_Half st_name; st_value; st_size; st_info; st_other; st_shndx;

}Elf32_Sym;

Symbol table (cont.)


st_name : an int index into one str table. st_value : holds the location where we can find the symbol in the program st_size : the size (in bytes) of the symbol. st_info : the low 4 bits is the type of this symbol, and the high 4 bits is the binding. st_shndx : the section it is defined (n!)

Relocation
Whe two or more object files and libs merge into an executable file or a bigger object file(reclocatable file, in ELFs term), the link editor must connects symbolic references to their definition e.g., slide Link editor The relocation table is the information container about how to modify an relocatable file(obj) itself.

Relocation (cont.)

typedef struct{
Elf32_Addr r_offset; Elf32_Word r_info;

}Elf32_Rel; r_offset : give the location in the file in which the relocation applies. r_info : the high 8 bits give the symbol table index. The low 8 bits is the type of relo.(later)

Program loading & dynamic linking

What do we get so far?


Now I get a rough idea about how link editor handles the fixups (in a static sense). Whats next ? How can I load a program or link to libruary ?

ELF executables
Recall that there are 3 kinds of ELF files: the relocatable file, the executable, and the shared object files. Only the later 2 are considered executables.

Dual view of ELF files

ELF intrinsically has dual view.(section view & segment view). The elf executables may have only program header while the relocatblaes have only section header. The shared object file, however, may contain both

Relation ship between 2 views


One segment may map to mutiple sections.ex, the type PT_LOAD seg can contain the .data, .text

ELF program header


Each program header is an entry of program header table. Each pheader describes a segment. typedef struct{ Elf32_Word p_type; Elf32_Off p_offset; Elf32_Addr p_vaddr; Elf32_Addr p_paddr; Elf32_Word p_filesz; Elf32_Word p_memsz; Elf32_Word p_flags; Elf32_Word p_align; }Elf32_Phdr;

Program header fileds

p_type :
PT_LOAD

: loadable segment (text, data) PT_DYNAMIC : just like the .dynamic section PT_INTERP : just like the .interp section

The remain just the same like weve discussed in seciton header.

Poisition independent code


Typically the OS loads the program to the same virtual address space, so no loadtime relocation is required. But consider the shared library, which may not loads into same address in one process addr space so it requires loadtime reloation which is an overhead.

Position independent code (cont.)


One solution is to separate code from data and make the cote irrelavant of where it is loaded. Accessing stack local var are usually pic, jumps and branch are generally PCrelative or are relative to some base reg. The only problem is to access global data or procedure, which we can solve this by GOT, PLT (described later.)

Position independent code (cont.)

A PIC is thus able to loaded any where with no (or little) relocation overhead.(if, however, the shared libryary is compiled into position dependent, then it would increase al lots of run time linker overhead) gcc has options to make a code pic: -fpic, mpic-register=reg.etc

Dynamic linking

Program intepreter(dynamic linker)


Recall that we have .interp section or DYNAMIC_INTERP segment which records the path of a dynamic linker(ld.so) When executing a program, the exec() and dynamic linker do the following :

Adding

the program and lib files seg. to the process image. Performaing relocations for the above 2. Transferring control to the program.

Program intepreter (cont.)

Why is the load time relocation necessary?


Usually

Unix doent load-time relocate exe file, but it does relcoate shared object file. Suppose there are some external vars that are uknown until run time. Or the pic code (shared obj i.e, lib)which needs to initialize GOT, PLT (later)

How the dynamic linker do this?


The

link editor provides lots of info : .dynamic, .hash, .got and .plt, all loadable.

Static linked lib


Mutiple obj file A directoy header marks all symbols inside it. When linked statically, the linker finds the lib file, search for desired symbol definition and copy the code and data into the output program file. It may need a lot of relocation.(I guess) Fast in run time but fat file and hard to update the lib version (relink).

Dynamic linked lib

Dynamic Shared lib are usually compiled into pic, but if they do want to call other external lib routine, or every reference to static global/external variable, they should use PLT, GOT so that the position independency (i.e., loads anywhere)holds.

How the GOT and PLT and dynamic linker make the code position independent

Ref. to static global/external vars


A got, plt does not resides in .o file, instead it is in executables or shared objs. got, plt are tables which will be loaded run time. GOT is a table which each entry in it may contain pointers to another global symbol. Hence access via GOT(PLT) has an extra indirect overhead.

GOT/PLT is private to each file.

GOT makes the global data access PIC!

In pic, if one wants to access a global data the code may be compiled in following way:

static int a; /* static variable */ extern int b; /* global variable */ a = 1; b= 2; movl $1,a@GOTOFF(%ebx) ;; R_386_GOTOFF reference to variable "a" movl b@GOT(%ebx),%eax ;; R_386_GOT32 ref to address of variable "b" movl $2,(%eax) (n)

where the ebx is the address of GOT. Preceding the above code sequences, the compiler usuall use code like the following to get the address of GOT.
call __i686.get_pc_thunk.bx add $offset,%ebx , where __i686.get_pc_thunk.bx mov (%esp),%ebx ret (n)

Global and static variables are now read or written by first loading the address via a fixed offset from %ebx. What if some external are still unknown? The program linker will create dynamic relocations for each entry in the GOT, telling the dynamic linker how to initialize the entry. These relocations are of type GLOB_DAT.

Before the dynamic linker give control to the loaded program, it initialize GOT cells, some of which may have relocation of type R_386_GLOB_DAT. It then determine the associate symbols, computing their abs. addr. (This might not be done by the link editor because of lack of info) Since all loadable progs (shared obj, exec)have their own got, and since a global var access require an entry for the var in got, a symbol may appear in n gots.

PLT and dynamic linking.

Find al libs.
After the prog is loaded, the dynamic linker use program header to finds out the .dynamic section which linker writes information of needed libryary. The dynamic linker then use env like LD_LIBRUARY_PATH, or .dynamic section to find all the libs and map them in.

PLT and dynamic linker

If a pic calls an external routine : name1, the compiler then indirect the call into an entry of PLT, this entry looks like this:
//ebx is addr of GOT

PLT0: pushl 4(%ebx) jmp *8(%ebx) nop nop PLT1: jmp *name1@GOT(%ebx) pushl $offset jmp .PLT0@PC

PLT and dynamic linker (cont.)


The link editor would also create an got entry for the plt entry. This got entry is initialized to the address of second line in corresponing plt entry. Along with this got entry, there is a relocation entry points to it with type JMP_SLOT and the reloc. entry also records the symbol : name1.

PLT and dynamic linker (cont.)


The 2nd GOT entry is intialized to be a special identifer. The 3rd entry of GOT is initialized by the dynamic linker to be the address of symbol resolution routine within itself. The offset in the above code seq. is initialized to the ofs of corresponding relocaiton entry in the relocaiotn table.

PLT and dynamic linker (cont.)


What takes effect is that when the program first call the desired routine, say, name1, it is directed into plt entry 1. It jumps to the addr recorded in the corresponding got entry, which is simply the next line. It pushes the relocaiton info (offset) into stack.

PLT and dynamic linker (cont.)

It then jumps to PLT0 which then forwards to symbol resolutrion routine in the dynamic linker. The dynamic linker then pops 2 element on the stack, which in turn it can use them to find out we are doing dynamic linking and the desired routine is name1 It then resolves the addrss of name1, store it back into the got entry and jumps to name1. Subsequent call to name1 only has 1 extra jump over head.

tools

File command, readelf, objdump is useful to examine the binay content.

For more information


http://www.skyfree.org/linux/references/ELF_F ormat.pdf : http://os.pku.edu.cn:8080/gaikuang/submissio n/TN05.ELF.Format.Summary.pdf Linker and loader: http://www.iecc.com/linker/

(For plt only) http://www.airs.com/blog/archives/41 www.jamesisosmart.edu.tw

S-ar putea să vă placă și