歡迎您光臨本站 註冊首頁

Before main() 分析

←手機掃碼閱讀     火星人 @ 2014-03-12 , reply:0
  原創:alert7(alert7)
來源:http://www.xfocus.org/

Before main() 分析


作者:alert7 alert7@xfocus.org
>

主頁: http://www.xfocus.org
時間: 2001-9-25


★ 前言

本文分析了在main()之前的ELF程序流程,試圖讓您更清楚的把握程序的流程的脈絡走向。
從而更深入的了解ELF。不正確之處,還請斧正。


★ 綜述

ELF的可執行文件與共享庫在結構上非常類似,它們具有一張程序段表,用來描述這些段如何映射到進程空間.
對於可執行文件來說,段的載入位置是固定的,程序段表中如實反映了段的載入地址.對於共享庫來說,段的加
載位置是浮動的,位置無關的,程序段表反映的是以0作為基準地址的相對載入地址.儘管共享庫的連接是不
充分的,為了便於測試動態鏈接器,Linux允許直接載入共享庫運行.如果應用程序具有動態鏈接器的描述段,
內核在完成程序段載入后,緊接著載入動態鏈接器,並且啟動動態鏈接器的入口.如果沒有動態鏈接器的描述段,
就直接交給用戶程序入口。
上述這部分請參考:linuxforum論壇上opera寫的《分析ELF的載入過程》

在控制權交給動態鏈接器的入口后,首先調用_dl_start函數獲得真實的程序入口(註:該入口地址
不是main的地址,也就是說一般程序的入口不是main),然後循環調用每個共享object的初始化函數,
接著跳轉到真實的程序入口,一般為_start(程序中的_start)的一個常式,該常式壓入一些參數到堆棧,
就直接調用__libc_start_main函數。在__libc_start_main函數中替動態連接器和自己程序安排
destructor,並運行程序的初始化函數。然後才把控制權交給main()函數。



★ main()之前流程

下面就是動態鏈接器的入口。
/* Initial entry point code for the dynamic linker.
The C function `_dl_start' is the real entry point;
its return value is the user program's entry point. */

#define RTLD_START asm ("\
.text\n\
.globl _start\n\
.globl _dl_start_user\n\
_start:\n\
pushl %esp\n\
call _dl_start\n\/*該函數返回時候,%eax中存放著user entry point address*/
popl %ebx\n\/*%ebx放著是esp的內容*/
_dl_start_user:\n\
# Save the user entry point address in %edi.\n\
movl %eax, %edi\n\/*入口地址放在%edi*/

# Point %ebx at the GOT.
call 0f\n\
0: popl %ebx\n\
addl $_GLOBAL_OFFSET_TABLE_+[.-0b], %ebx\n\

# Store the highest stack address\n\
movl __libc_stack_end@GOT(%ebx), %eax\n\
movl %esp, (%eax)\n\/*把棧頂%esp放到GOT的__libc_stack_end中*/

# See if we were run as a command with the executable file\n\
# name as an extra leading argument.\n\
movl _dl_skip_args@GOT(%ebx), %eax\n\
movl (%eax), %eax\n\

# Pop the original argument count.\n\
popl %ecx\n\

# Subtract _dl_skip_args from it.\n\
subl %eax, %ecx\n\

# Adjust the stack pointer to skip _dl_skip_args words.\n\
leal (%esp,%eax,4), %esp\n\

# Push back the modified argument count.\n\
pushl %ecx\n\

# Push the searchlist of the main object as argument in\n\
# _dl_init_next call below.\n\
movl _dl_main_searchlist@GOT(%ebx), %eax\n\
movl (%eax), %esi\n\
0: movl %esi,%eax\n\

# Call _dl_init_next to return the address of an initializer\n\
# function to run.\n\
call _dl_init_next@PLT\n\/*該函數返回初始化函數的地址,返回地址放在%eax中*/

# Check for zero return, when out of initializers.\n\
testl %eax, %eax\n\
jz 1f\n\

# Call the shared object initializer function.\n\
# NOTE: We depend only on the registers (%ebx, %esi and %edi)\n\
# and the return address pushed by this call;\n\
# the initializer is called with the stack just\n\
# as it appears on entry, and it is free to move\n\
# the stack around, as long as it winds up jumping to\n\
# the return address on the top of the stack.\n\
call *%eax\n\/*調用共享object初始化函數*/

# Loop to call _dl_init_next for the next initializer.\n\
jmp 0b\n\

1: # Clear the startup flag.\n\
movl _dl_starting_up@GOT(%ebx), %eax\n\
movl $0, (%eax)\n\

# Pass our finalizer function to the user in %edx, as per ELF ABI.\n\
movl _dl_fini@GOT(%ebx), %edx\n\

# Jump to the user's entry point.\n\
jmp *%edi\n\
.previous\n\
");


sysdeps\i386\start.s中
user's entry也就是下面的_start常式

/* This is the canonical entry point, usually the first thing in the text
segment. The SVR4/i386 ABI (pages 3-31, 3-32) says that when the entry
point runs, most registers' values are unspecified, except for:

%edx Contains a function pointer to be registered with `atexit'.
This is how the dynamic linker arranges to have DT_FINI
functions called for shared libraries that have been loaded
before this code runs.

%esp The stack contains the arguments and environment:
0(%esp) argc
4(%esp) argv[0]
...
(4*argc)(%esp) NULL
(4*(argc+1))(%esp) envp[0]
...
NULL
*/

.text
.globl _start
_start:
/* Clear the frame pointer. The ABI suggests this be done, to mark
the outermost frame obviously. */
xorl %ebp, %ebp

/* Extract the arguments as encoded on the stack and set up
the arguments for `main': argc, argv. envp will be determined
later in __libc_start_main. */
popl %esi /* Pop the argument count. */
movl %esp, %ecx /* argv starts just at the current stack top.*/

/* Before pushing the arguments align the stack to a double word
boundary to avoid penalties from misaligned accesses. Thanks
to Edward Seidl for pointing this out. */
andl $0xfffffff8, %esp
pushl %eax /* Push garbage because we allocate
28 more bytes. */

/* Provide the highest stack address to the user code (for stacks
which grow downwards). */
pushl %esp

pushl %edx /* Push address of the shared library
termination function. */

/* Push address of our own entry points to .fini and .init. */
pushl $_fini
pushl $_init

pushl %ecx /* Push second argument: argv. */
pushl %esi /* Push first argument: argc. */

pushl $main

/* Call the user's main function, and exit with its value.
But let the libc call main. */
call __libc_start_main

hlt /* Crash if somehow `exit' does return. */



__libc_start_main在sysdeps\generic\libc_start.c中
假設定義的是PIC的代碼。
struct startup_info
{
void *sda_base;
int (*main) (int, char **, char **, void *);
int (*init) (int, char **, char **, void *);
void (*fini) (void);
};

int
__libc_start_main (int argc, char **argv, char **envp,
void *auxvec, void (*rtld_fini) (void),
struct startup_info *stinfo,
char **stack_on_entry)
{

/* the PPC SVR4 ABI says that the top thing on the stack will
be a NULL pointer, so if not we assume that we're being called
as a statically-linked program by Linux... */
if (*stack_on_entry != NULL)
{
/* ...in which case, we have argc as the top thing on the
stack, followed by argv (NULL-terminated), envp (likewise),
and the auxilary vector. */
argc = *(int *) stack_on_entry;
argv = stack_on_entry + 1;
envp = argv + argc + 1;
auxvec = envp;
while (*(char **) auxvec != NULL)
++auxvec;
++auxvec;
rtld_fini = NULL;
}

/* Store something that has some relationship to the end of the
stack, for backtraces. This variable should be thread-specific. */
__libc_stack_end = stack_on_entry + 4;

/* Set the global _environ variable correctly. */
__environ = envp;

/* Register the destructor of the dynamic linker if there is any. */
if (rtld_fini != NULL)
atexit (rtld_fini);/*替動態連接器安排destructor*/

/* Call the initializer of the libc. */

__libc_init_first (argc, argv, envp);/*一個空函數*/

/* Register the destructor of the program, if any. */
if (stinfo->fini)
atexit (stinfo->fini);/*安排程序自己的destructor*/

/* Call the initializer of the program, if any. */

/*運行程序的初始化函數*/
if (stinfo->init)
stinfo->init (argc, argv, __environ, auxvec);

/*運行程序main函數,到此,控制權才交給我們一般所說的程序入口*/
exit (stinfo->main (argc, argv, __environ, auxvec));

}



void
__libc_init_first (int argc __attribute__ ((unused)), ...)
{
}

int
atexit (void (*func) (void))
{
struct exit_function *new = __new_exitfn ();

if (new == NULL)
return -1;

new->flavor = ef_at;
new->func.at = func;
return 0;
}


/* Run initializers for MAP and its dependencies, in inverse dependency
order (that is, leaf nodes first). */

ElfW(Addr)
internal_function
_dl_init_next (struct r_scope_elem *searchlist)
{
unsigned int i;

/* The search list for symbol lookup is a flat list in top-down
dependency order, so processing that list from back to front gets us
breadth-first leaf-to-root order. */

i = searchlist->r_nlist;
while (i-- > 0)
{
struct link_map *l = searchlist->r_list[i];

if (l->l_init_called)
/* This object is all done. */
continue;

if (l->l_init_running)
{
/* This object's initializer was just running.
Now mark it as having run, so this object
will be skipped in the future. */
l->l_init_running = 0;
l->l_init_called = 1;
continue;
}

if (l->l_info[DT_INIT]
&& (l->l_name[0] != '\0' || l->l_type != lt_executable))
{
/* Run this object's initializer. */
l->l_init_running = 1;

/* Print a debug message if wanted. */
if (_dl_debug_impcalls)
_dl_debug_message (1, "\ncalling init: ",
l->l_name[0] ? l->l_name : _dl_argv[0],
"\n\n", NULL);

/*共享庫的基地址+init在基地址中的偏移量*/
return l->l_addr + l->l_info[DT_INIT]->d_un.d_ptr;

}

/* No initializer for this object.
Mark it so we will skip it in the future. */
l->l_init_called = 1;
}


/* Notify the debugger all new objects are now ready to go. */
_r_debug.r_state = RT_CONSISTENT;
_dl_debug_state ();

return 0;
}
在main()之前的程序流程看試有點簡單,但正在運行的時候還是比較複雜的
(自己用GBD跟蹤下就知道了),因為一般的程序都需要涉及到PLT,GOT標號的
重定位。弄清楚這個對ELF由為重要,以後有機會再補上一篇吧。


★ 手動確定程序和動態連接器的入口

[alert7@redhat62 alert7]$ cat helo.c
#include
int main(int argc,char **argv)
{
printf("hello\n");
return 0;
}

[alert7@redhat62 alert7]$ gcc -o helo helo.c
[alert7@redhat62 alert7]$ readelf -h helo
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x8048320
Start of program headers: 52 (bytes into file)
Start of section headers: 8848 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 6
Size of section headers: 40 (bytes)
Number of section headers: 29
Section header string table index: 26
在這裡我們看到程序的入口為0x8048320,可以看看是否為main函數。

[alert7@redhat62 alert7]$ gdb -q helo
(gdb) disass 0x8048320
Dump of assembler code for function _start:
0x8048320 <_start>: xor %ebp,%ebp
0x8048322 <_start+2>: pop %esi
0x8048323 <_start+3>: mov %esp,%ecx
0x8048325 <_start+5>: and $0xfffffff8,%esp
0x8048328 <_start+8>: push %eax
0x8048329 <_start+9>: push %esp
0x804832a <_start+10>: push %edx
0x804832b <_start+11>: push $0x804841c
0x8048330 <_start+16>: push $0x8048298
0x8048335 <_start+21>: push %ecx
0x8048336 <_start+22>: push %esi
0x8048337 <_start+23>: push $0x80483d0
0x804833c <_start+28>: call 0x80482f8 <__libc_start_main>
0x8048341 <_start+33>: hlt
0x8048342 <_start+34>: nop
End of assembler dump.
呵呵,不是main吧,程序的入口是個_start常式。

再來看動態連接器的入口是多少
[alert7@redhat62 alert7]$ ldd helo
libc.so.6 => /lib/libc.so.6 (0x40018000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
動態連接器ld-linux.so.2載入到進程地址空間0x40000000。

[alert7@redhat62 alert7]$ readelf -h /lib/ld-linux.so.2
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x1990
Start of program headers: 52 (bytes into file)
Start of section headers: 328916 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 23
Section header string table index: 20
共享object入口地址為0x1990。加上整個ld-linux.so.2被載入到進程地址空間0x40000000。
那麼動態連接器的入口地址為0x1990+0x40000000=0x40001990。

用戶空間執行的第一條指令地址就是0x40001990,既上面#define RTLD_START的開始。



[火星人 ] Before main() 分析已經有558次圍觀

http://coctec.com/docs/program/show-post-72170.html