歡迎您光臨本站 註冊首頁

當PC啟動時,Intel系列的CPU首先進入的是實模式,並開始執行位於地址0xFFFF0處的代碼,也就是ROM-BIOS起始位置的代碼。BIOS先進行一系列的系統自檢,然後初始化位於地址0的中斷向量表。最後BIOS將啟動盤的第一個扇區裝入到0x7C00,並開始執行此處的代碼.這就是對內核初始化過程的一個最簡單的描述。
最初,Linux核心的最開始部分是用8086彙編語言編寫的。當開始運行時,核心將自己裝入到絕對地址0x90000,再將其後的2k位元組裝入到地址0x90200處,最後將核心的其餘部分裝入到0x10000。

當系統裝入時,會顯示Loading...信息。裝入完成後,控制轉向另一個實模式下的彙編語言代碼boot/Setup.S。Setup部分首先設置一些系統的硬體設備,然後將核心從0x10000處移至0x1000處。這時系統轉入保護模式,開始執行位於0x1000處的代碼。

接下來是內核的解壓縮。0x1000處的代碼來自於文件Boot/head.S,它用來初始化寄存器和調用decompress_kernel( )程序。decompress_kernel( )程序由Boot/inflate.c, Boot/unzip.c 和Boot/misc.c組成。解壓縮后的數據被裝入到了0x100000處,這也是Linux不能在內存小於2M的環境下運行的主要原因。

解壓后的代碼在0x1010000處開始執行,緊接著所有的32位的設置都將完成: IDT、GDT和LDT將被裝入,處理器初始化完畢,設置好內存頁面,最終調用start_kernel過程。這大概是整個內核中最為複雜的部分。

[系統開始運行]
Linux kernel 最早的C代碼從彙編標記startup_32開始執行

|startup_32:
|start_kernel
|lock_kernel
|trap_init
|init_IRQ
|sched_init
|softirq_init
|time_init
|console_init
|#ifdef CONFIG_MODULES
|init_modules
|#endif
|kmem_cache_init
|sti
|calibrate_delay
|mem_init
|kmem_cache_sizes_init
|pgtable_cache_init
|fork_init
|proc_caches_init
|vfs_caches_init
|buffer_init
|page_cache_init
|signals_init
|#ifdef CONFIG_PROC_FS
|proc_root_init
|#endif
|#if defined(CONFIG_SYSVIPC)
|ipc_init
|#endif
|check_bugs
|smp_init
|rest_init
|kernel_thread
|unlock_kernel
|cpu_idle


·startup_32 [arch/i386/kernel/head.S]
·start_kernel [init/main.c]
·lock_kernel [include/asm/smplock.h]
·trap_init [arch/i386/kernel/traps.c]
·init_IRQ [arch/i386/kernel/i8259.c]
·sched_init [kernel/sched.c]
·softirq_init [kernel/softirq.c]
·time_init [arch/i386/kernel/time.c]
·console_init [drivers/char/tty_io.c]
·init_modules [kernel/module.c]
·kmem_cache_init [mm/slab.c]
·sti [include/asm/system.h]
·calibrate_delay [init/main.c]
·mem_init [arch/i386/mm/init.c]
·kmem_cache_sizes_init [mm/slab.c]
·pgtable_cache_init [arch/i386/mm/init.c]
·fork_init [kernel/fork.c]
·proc_caches_init
·vfs_caches_init [fs/dcache.c]
·buffer_init [fs/buffer.c]
·page_cache_init [mm/filemap.c]
·signals_init [kernel/signal.c]
·proc_root_init [fs/proc/root.c]
·ipc_init [ipc/util.c]
·check_bugs [include/asm/bugs.h]
·smp_init [init/main.c]
·rest_init
·kernel_thread [arch/i386/kernel/process.c]
·unlock_kernel [include/asm/smplock.h]
·cpu_idle [arch/i386/kernel/process.c]

start_kernel( )程序用於初始化系統內核的各個部分,包括:

*設置內存邊界,調用paging_init( )初始化內存頁面。
*初始化陷阱,中斷通道和調度。
*對命令行進行語法分析。
*初始化設備驅動程序和磁碟緩衝區。
*校對延遲循環。

最後的function'rest_init' 作了以下工作:

·開闢內核線程'init'
·調用unlock_kernel
·建立內核運行的cpu_idle環, 如果沒有調度,就一直死循環

實際上start_kernel永遠不能終止.它會無窮地循環執行cpu_idle.

最後,系統核心轉向move_to_user_mode( ),以便創建初始化進程(init)。此後,進程0開始進入無限循環。

初始化進程開始執行/etc/init、/bin/init 或/sbin /init中的一個之後,系統內核就不再對程序進行直接控制了。之後系統內核的作用主要是給進程提供系統調用,以及提供非同步中斷事件的處理。多任務機制已經建立起來,並開始處理多個用戶的登錄和fork( )創建的進程。

[init]
init是第一個進程,或者說內核線程

|init
|lock_kernel
|do_basic_setup
|mtrr_init
|sysctl_init
|pci_init
|sock_init
|start_context_thread
|do_init_calls
|(*call())->; kswapd_init
|prepare_namespace
|free_initmem
|unlock_kernel
|execve

[目錄]

--------------------------------------------------------------------------------


啟動步驟

系統引導:
涉及的文件
./arch/$ARCH/boot/bootsect.s
./arch/$ARCH/boot/setup.s

bootsect.S
 這個程序是linux kernel的第一個程序,包括了linux自己的bootstrap程序,
但是在說明這個程序前,必須先說明一般IBM PC開機時的動作(此處的開機是指
"打開PC的電源":

  一般PC在電源一開時,是由內存中地址FFFF:0000開始執行(這個地址一定
在ROM BIOS中,ROM BIOS一般是在FEOOOh到FFFFFh中),而此處的內容則是一個
jump指令,jump到另一個位於ROM BIOS中的位置,開始執行一系列的動作,包
括了檢查RAM,keyboard,顯示器,軟硬磁碟等等,這些動作是由系統測試代碼
(system test code)來執行的,隨著製作BIOS廠商的不同而會有些許差異,但都
是大同小異,讀者可自行觀察自家機器開機時,螢幕上所顯示的檢查訊息。

  緊接著系統測試碼之後,控制權會轉移給ROM中的啟動程序
(ROM bootstrap routine),這個程序會將磁碟上的第零軌第零扇區讀入
內存中(這就是一般所謂的boot sector,如果你曾接觸過電腦病
毒,就大概聽過它的大名),至於被讀到內存的哪裡呢? --絕對
位置07C0:0000(即07C00h處),這是IBM系列PC的特性。而位在linux開機
磁碟的boot sector上的正是linux的bootsect程序,也就是說,bootsect是
第一個被讀入內存中並執行的程序。現在,我們可以開始來
看看到底bootsect做了什麼。

第一步
 首先,bootsect將它"自己"從被ROM BIOS載入的絕對地址0x7C00處搬到
0x90000處,然後利用一個jmpi(jump indirectly)的指令,跳到新位置的
jmpi的下一行去執行,

第二步
 接著,將其他segment registers包括DS,ES,SS都指向0x9000這個位置,
與CS看齊。另外將SP及DX指向一任意位移地址( offset ),這個地址等一下
會用來存放磁碟參數表(disk para- meter table )

第三步
 接著利用BIOS中斷服務int 13h的第0號功能,重置磁碟控制器,使得剛才
的設定發揮功能。

第四步
 完成重置磁碟控制器之後,bootsect就從磁碟上讀入緊鄰著bootsect的setup
程序,也就是setup.S,此讀入動作是利用BIOS中斷服務int 13h的第2號功能。
setup的image將會讀入至程序所指定的內存絕對地址0x90200處,也就是在內存
中緊鄰著bootsect 所在的位置。待setup的image讀入內存后,利用BIOS中斷服
務int 13h的第8號功能讀取目前磁碟的參數。

第五步
 再來,就要讀入真正linux的kernel了,也就是你可以在linux的根目錄下看
到的"vmlinuz" 。在讀入前,將會先呼叫BIOS中斷服務int 10h 的第3號功能,
讀取游標位置,之後再呼叫BIOS 中斷服務int 10h的第13h號功能,在螢幕上輸
出字串"Loading",這個字串在boot linux時都會首先被看到,相信大家應該覺
得很眼熟吧。

第六步
 接下來做的事是檢查root device,之後就仿照一開始的方法,利用indirect
jump 跳至剛剛已讀入的setup部份

第七步
setup.S完成在實模式下版本檢查,並將硬碟,滑鼠,內存參數寫入到 INITSEG
中,並負責進入保護模式。

第八步
操作系統的初始化。





[目錄]

--------------------------------------------------------------------------------


bootsect.S

1.將自己移動到0x9000:0x0000處,為內核調入留出地址空間;
2.建立運行環境(ss=ds=es=cs=0x9000, sp=0x4000-12),保證起動程序運行;
3.BIOS初始化0x1E號中斷為軟盤參數表,將它取來保存備用;
4.將setup讀到0x9000:0x0200處;
5.測試軟盤參數一個磁軌有多少個扇區(也沒有什麼好辦法,只能試試36, 18, 15, 9對不對了);
6.列印「Loading」;
7.讀入內核到0x1000:0000(如果是bzImage, 則將每個64K移動到0x100000處,在實模式下,只能調用0x15號中斷了,這段代碼無法放在bootsect中所以只能放在setup中,幸好此時setup已經讀入了);
8.到setup去吧
發發信人: seis (矛), 信區: Linux
標 題: Linux操作系統內核引導程序詳細剖析
發信站: BBS 水木清華站 (Fri Feb 2 14:12:43 2001)

! bootsect.s (c) 1991, 1992 Linus Torvalds 版權所有
! Drew Eckhardt修改過
! Bruce Evans (bde)修改過
!
! bootsect.s 被bios-啟動子程序載入至0x7c00 (31k)處,並將自己
! 移到了地址0x90000 (576k)處,並跳轉至那裡。
!
! bde - 不能盲目地跳轉,有些系統可能只有512k的低
! 內存。使用中斷0x12來獲得(系統的)最高內存、等。
!
! 它然後使用BIOS中斷將setup直接載入到自己的後面(0x90200)(576.5k),
! 並將系統載入到地址0x10000處。
!
! 注意! 目前的內核系統最大長度限制為(8*65536-4096)(508k)位元組長,即使是在
! 將來這也是沒有問題的。我想讓它保持簡單明了。這樣508k的最大內核長度應該
! 是足夠了,尤其是這裡沒有象minix中一樣包含緩衝區高速緩衝(而且尤其是現在
! 內核是壓縮的
!
! 載入程序已經做的盡量地簡單了,所以持續的讀出錯將導致死循環。只能手工重啟。
! 只要可能,通過一次取得整個磁軌,載入過程可以做的很快的。

#include /* 為取得CONFIG_ROOT_RDONLY參數 */
!! config.h中(即autoconf.h中)沒有CONFIG_ROOT_RDONLY定義!!!?

#include

.text

SETUPSECS = 4 ! 默認的setup程序扇區數(setup-sectors)的默認值;

BOOTSEG = 0x7C0 ! bootsect的原始地址;

INITSEG = DEF_INITSEG ! 將bootsect程序移到這個段處(0x9000) - 避開;
SETUPSEG = DEF_SETUPSEG ! 設置程序(setup)從這裡開始(0x9020);
SYSSEG = DEF_SYSSEG ! 系統載入至0x1000(65536)(64k)段處;
SYSSIZE = DEF_SYSSIZE ! 系統的大小(0x7F00): 要載入的16位元組為一節的數;
!! 以上4個DEF_參數定義在boot.h中:
!! DEF_INITSEG 0x9000
!! DEF_SYSSEG 0x1000
!! DEF_SETUPSEG 0x9020
!! DEF_SYSSIZE 0x7F00 (=32512=31.75k)*16=508k

! ROOT_DEV & SWAP_DEV 現在是由"build"中編製的;
ROOT_DEV = 0
SWAP_DEV = 0
#ifndef SVGA_MODE
#define SVGA_MODE ASK_VGA
#endif
#ifndef RAMDISK
#define RAMDISK 0
#endif
#ifndef CONFIG_ROOT_RDONLY
#define CONFIG_ROOT_RDONLY 1
#endif

! ld86 需要一個入口標識符,這和通常的一樣;
.globl _main
_main:
#if 0 /* 調試程序的異常分支,除非BIOS古怪(比如老的HP機)否則是無害的 */
int 3
#endif
mov ax,#BOOTSEG !! 將ds段寄存器置為0x7C0;
mov ds,ax
mov ax,#INITSEG !! 將es段寄存器置為0x9000;
mov es,ax
mov cx,#256 !! 將cx計數器置為256(要移動256個字, 512位元組);
sub si,si !! 源地址 ds:si=0x07C0:0x0000;
sub di,di !! 目的地址es:di=0x9000:0x0000;
cld !! 清方向標誌;
rep !! 將這段程序從0x7C0:0(31k)移至0x9000:0(576k)處;
movsw !! 共256個字(512位元組)(0x200長);
jmpi go,INITSEG !! 間接跳轉至移動后的本程序go處;

! ax和es現在已經含有INITSEG的值(0x9000);

go: mov di,#0x4000-12 ! 0x4000(16k)是>;=bootsect + setup 的長度 +
! + 堆棧的長度 的任意的值;
! 12 是磁碟參數塊的大小 es:di=0x94000-12=592k-12;

! bde - 將0xff00改成了0x4000以從0x6400處使用調試程序(bde)。如果
! 我們檢測過最高內存的話就不用擔心這事了,還有,我的BIOS可以被配置為將wini驅動

! 放在內存高端而不是放在向量表中。老式的堆棧區可能會搞亂驅動表;

mov ds,ax ! 置ds數據段為0x9000;
mov ss,ax ! 置堆棧段為0x9000;
mov sp,di ! 置堆棧指針INITSEG:0x4000-12處;
/*
* 許多BIOS的默認磁碟參數表將不能
* 進行扇區數大於在表中指定
* 的最大扇區數( - 在某些情況下
* 這意味著是7個扇區)後面的多扇區的讀操作。
*
* 由於單個扇區的讀操作是很慢的而且當然是沒問題的,
* 我們必須在RAM中(為第一個磁碟)創建新的參數表。
* 我們將把最大扇區數設置為36 - 我們在一個ED 2.88驅動器上所能
* 遇到的最大值。
*
* 此值太高是沒有任何害處的,但是低的話就會有問題了。
*
* 段寄存器是這樣的: ds=es=ss=cs - INITSEG,(=0X9000)
* fs = 0, gs沒有用到。
*/

! 上面執行重複操作(rep)以後,cx為0;

mov fs,cx !! 置fs段寄存器=0;
mov bx,#0x78 ! fs:bx是磁碟參數表的地址;
push ds
seg fs
lds si,(bx) ! ds:si是源地址;
!! 將fs:bx地址所指的指針值放入ds:si中;
mov cl,#6 ! 拷貝12個位元組到0x9000:0x4000-12開始處;
cld
push di !! 指針0x9000:0x4000-12處;

rep
movsw

pop di !! di仍指向0x9000:0x4000-12處(參數表開始處);
pop si !! ds =>; si=INITSEG(=0X9000);

movb 4(di),*36 ! 修正扇區計數值;

seg fs
mov (bx),di !! 修改fs:bx(0000:0x007處磁碟參數表的地址為0x9000:0x4000-12;
seg fs
mov 2(bx),es

! 將setup程序所在的扇區(setup-sectors)直接載入到boot塊的後面。!! 0x90200開始處
;
! 注意,es已經設置好了。
! 同樣經過rep循環后cx為0

load_setup:
xor ah,ah ! 複位軟碟機(FDC);
xor dl,dl
int 0x13

xor dx,dx ! 驅動器0, 磁頭0;
mov cl,#0x02 ! 從扇區2開始,磁軌0;
mov bx,#0x0200 ! 置數據緩衝區地址=es:bx=0x9000:0x200;
! 在INITSEG段中,即0x90200處;
mov ah,#0x02 ! 要調用功能號2(讀操作);
mov al,setup_sects ! 要讀入的扇區數SETUPSECS=4;
! (假釋所有數據都在磁頭0、磁軌0);
int 0x13 ! 讀操作;
jnc ok_load_setup ! ok則繼續;

push ax ! 否則顯示出錯信息。保存ah的值(功能號2);
call print_nl !! 列印換行;
mov bp,sp !! bp將作為調用print_hex的參數;
call print_hex !! 列印bp所指的數據;
pop ax

jmp load_setup !! 重試!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!INT 13 - DISK - READ SECTOR(S) INTO MEMORY
!! AH = 02h
!! AL = number of sectors to read (must be nonzero)
!! CH = low eight bits of cylinder number
!! CL = sector number 1-63 (bits 0-5)
!! high two bits of cylinder (bits 6-7, hard disk only)
!! DH = head number
!! DL = drive number (bit 7 set for hard disk)
!! ES:BX ->; data buffer
!! Return: CF set on error
!! if AH = 11h (corrected ECC error), AL = burst length
!! CF clear if successful
!! AH = status (see #00234)
!! AL = number of sectors transferred (only valid if CF set for some
!! BIOSes)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!


ok_load_setup:

! 取得磁碟驅動器參數,特別是每磁軌扇區數(nr of sectors/track);

#if 0

! bde - Phoenix BIOS手冊中提到功能0x08隻對硬碟起作用。
! 但它對於我的一個BIOS(1987 Award)不起作用。
! 不檢查錯誤碼是致命的錯誤。

xor dl,dl
mov ah,#0x08 ! AH=8用於取得驅動器參數;
int 0x13
xor ch,ch

!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! INT 13 - DISK - GET DRIVE PARAMETERS (PC,XT286,CONV,PS,ESDI,SCSI)
!! AH = 08h
!! DL = drive (bit 7 set for hard disk)
!!Return: CF set on error
!! AH = status (07h) (see #00234)
!! CF clear if successful
!! AH = 00h
!! AL = 00h on at least some BIOSes
!! BL = drive type (AT/PS2 floppies only) (see #00242)
!! CH = low eight bits of maximum cylinder number
!! CL = maximum sector number (bits 5-0)
!! high two bits of maximum cylinder number (bits 7-6)
!! DH = maximum head number
!! DL = number of drives
!! ESI ->; drive parameter table (floppies only)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!

#else

! 好象沒有BIOS調用可取得扇區數。如果扇區36可以讀就推測是36個扇區,
! 如果扇區18可讀就推測是18個扇區,如果扇區15可讀就推測是15個扇區,
! 否則推測是9. [36, 18, 15, 9]

mov si,#disksizes ! ds:si->;要測試扇區數大小的表;

probe_loop:
lodsb !! ds:si所指的位元組 =>;al, si=si+1;
cbw ! 擴展為字(word);
mov sectors, ax ! 第一個值是36,最後一個是9;
cmp si,#disksizes+4
jae got_sectors ! 如果所有測試都失敗了,就試9;
xchg ax,cx ! cx = 磁軌和扇區(第一次是36=0x0024);
xor dx,dx ! 驅動器0,磁頭0;
xor bl,bl !! 設置緩衝區es:bx = 0x9000:0x0a00(578.5k);
mov bh,setup_sects !! setup_sects = 4 (共2k);
inc bh
shl bh,#1 ! setup後面的地址(es=cs);
mov ax,#0x0201 ! 功能2(讀),1個扇區;
int 0x13
jc probe_loop ! 如果不對,就試用下一個值;

#endif

got_sectors:

! 恢復es

mov ax,#INITSEG
mov es,ax ! es = 0x9000;

! 列印一些無用的信息(換行后,顯示Loading)

mov ah,#0x03 ! 讀游標位置;
xor bh,bh
int 0x10

mov cx,#9
mov bx,#0x0007 ! 頁0,屬性7 (normal);
mov bp,#msg1
mov ax,#0x1301 ! 寫字元串,移動游標;
int 0x10

! ok, 我們已經顯示出了信息,現在
! 我們要載入系統了(到0x10000處)(64k處)

mov ax,#SYSSEG
mov es,ax ! es=0x01000的段;
call read_it !! 讀system,es為輸入參數;
call kill_motor !! 關閉驅動器馬達;
call print_nl !! 列印回車換行;

! 這以後,我們來檢查要使用哪個根設備(root-device)。如果已指定了設備(!=0)
! 則不做任何事而使用給定的設備。否則的話,使用/dev/fd0H2880 (2,32)或/dev/PS0
(2,2
! 或者是/dev/at0 (2,之一,這取決於我們假設我們知道的扇區數而定。
!! |__ ps0?? (x,y)--表示主、次設備號?

seg cs
mov ax,root_dev
or ax,ax
jne root_defined
seg cs
mov bx,sectors !! sectors = 每磁軌扇區數;
mov ax,#0x0208 ! /dev/ps0 - 1.2Mb;
cmp bx,#15
je root_defined
mov al,#0x1c ! /dev/PS0 - 1.44Mb !! 0x1C = 28;
cmp bx,#18
je root_defined
mov al,0x20 ! /dev/fd0H2880 - 2.88Mb;
cmp bx,#36
je root_defined
mov al,#0 ! /dev/fd0 - autodetect;
root_defined:
seg cs
mov root_dev,ax !! 其中保存由設備的主、次設備號;

! 這以後(所有程序都載入了),我們就跳轉至
! 被直接載入到boot塊後面的setup程序去:

jmpi 0,SETUPSEG !! 跳轉到0x9020:0000(setup程序的開始位置);


! 這段程序將系統(system)載入到0x10000(64k)處,
! 注意不要跨越64kb邊界。我們試圖以最快的速度
! 來載入,只要可能就整個磁軌一起讀入。
!
! 輸入(in): es - 開始地址段(通常是0x1000)
!
sread: .word 0 ! 當前磁軌已讀的扇區數;
head: .word 0 ! 當前磁頭;
track: .word 0 ! 當前磁軌;

read_it:
mov al,setup_sects
inc al
mov sread,al !! 當前sread=5;
mov ax,es !! es=0x1000;
test ax,#0x0fff !! (ax AND 0x0fff, if ax=0x1000 then zero-flag=1 );
die: jne die ! es 必須在64kB的邊界;
xor bx,bx ! bx 是段內的開始地址;
rp_read:
#ifdef __BIG_KERNEL__
#define CALL_HIGHLOAD_KLUDGE .word 0x1eff, 0x220 ! 調用 far * bootsect_kludge
! 注意: as86不能彙編這;
CALL_HIGHLOAD_KLUDGE ! 這是在setup.S中的程序;
#else
mov ax,es
sub ax,#SYSSEG ! 當前es段值減system載入時的啟始段值(0x1000);
#endif
cmp ax,syssize ! 我們是否已經都載入了?(ax=0x7f00 ?);
jbe ok1_read !! if ax <= syssize then 繼續讀;
ret !! 全都載入完了,返回!
ok1_read:
mov ax,sectors !! sectors=每磁軌扇區數;
sub ax,sread !! 減去當前磁軌已讀扇區數,al=當前磁軌未讀的扇區數(ah=0);
mov cx,ax
shl cx,#9 !! 乘512,cx = 當前磁軌未讀的位元組數;
add cx,bx !! 加上段內偏移值,es:bx為當前讀入的數據緩衝區地址;
jnc ok2_read !! 如果沒有超過64K則繼續讀;
je ok2_read !! 如果正好64K也繼續讀;
xor ax,ax
sub ax,bx
shr ax,#9
ok2_read:
call read_track !! es:bx ->;緩衝區,al=要讀的扇區數,也即當前磁軌未讀的扇區數;

mov cx,ax !! ax仍為調用read_track之前的值,即為讀入的扇區數;
add ax,sread !! ax = 當前磁軌已讀的扇區數;
cmp ax,sectors !! 已經讀完當前磁軌上的扇區了嗎?
jne ok3_read !! 沒有,則跳轉;
mov ax,#1
sub ax,head !! 當前是磁頭1嗎?
jne ok4_read !! 不是(是磁頭0)則跳轉(此時ax=1);
inc track !! 當前是磁頭1,則讀下一磁軌(當前磁軌加1);
ok4_read:
mov head,ax !! 保存當前磁頭號;
xor ax,ax !! 本磁軌已讀扇區數清零;
ok3_read:
mov sread,ax !! 存本磁軌已讀扇區數;
shl cx,#9 !! 剛才一次讀操作讀入的扇區數 * 512;
add bx,cx !! 調整數據緩衝區的起始指針;
jnc rp_read !! 如果該指針沒有超過64K的段內最大偏移量,則跳轉繼續讀操作;
mov ax,es !! 如果超過了,則將段地址加0x1000(下一個64K段);
add ah,#0x10
mov es,ax
xor bx,bx !! 緩衝區地址段內偏移量置零;
jmp rp_read !! 繼續讀操作;


read_track:
pusha !! 將寄存器ax,cx,dx,bx,sp,bp,si,di壓入堆棧;
pusha
mov ax,#0xe2e ! loading... message 2e = . !! 顯示一個.
mov bx,#7
int 0x10
popa

mov dx,track !! track = 當前磁軌;
mov cx,sread
inc cx !! cl = 扇區號,要讀的起始扇區;
mov ch,dl !! ch = 磁軌號的低8位;
mov dx,head !!
mov dh,dl !! dh = 當前磁頭號;
and dx,#0x0100 !! dl = 驅動器號(0);
mov ah,#2 !! 功能2(讀),es:bx指向讀數據緩衝區;

push dx ! 為出錯轉儲保存寄存器的值到堆棧上;
push cx
push bx
push ax

int 0x13
jc bad_rt !! 如果出錯,則跳轉;
add sp, #8 !! 清(放棄)堆棧上剛推入的4個寄存器值;
popa
ret

bad_rt: push ax ! 保存出錯碼;
call print_all ! ah = error, al = read;


xor ah,ah
xor dl,dl
int 0x13


add sp,#10
popa
jmp read_track

/*
* print_all是用於調試的。
* 它將列印出所有寄存器的值。所作的假設是
* 從一個子程序中調用的,並有如下所示的堆棧幀結構
* dx
* cx
* bx
* ax
* error
* ret <- sp
*
*/

print_all:
mov cx,#5 ! 出錯碼 + 4個寄存器
mov bp,sp

print_loop:
push cx ! 保存剩餘的計數值
call print_nl ! 為了增強閱讀性,列印換行

cmp cl, #5
jae no_reg ! 看看是否需要寄存器的名稱

mov ax,#0xe05 + A - l
sub al,cl
int 0x10

mov al,#X
int 0x10

mov al,#:
int 0x10

no_reg:
add bp,#2 ! 下一個寄存器
call print_hex ! 列印值
pop cx
loop print_loop
ret

print_nl: !! 列印回車換行。
mov ax,#0xe0d ! CR
int 0x10
mov al,#0xa ! LF
int 0x10
ret

/*
* print_hex是用於調試目的的,列印出
* ss:bp所指向的十六進位數。
* !! 例如,十六進位數是0x4321時,則al分別等於4,3,2,1調用中斷列印出來 4321
*/

print_hex:
mov cx, #4 ! 4個十六進位數字
mov dx, (bp) ! 將(bp)所指的值放入dx中
print_digit:
rol dx, #4 ! 循環以使低4比特用上 !! 取dx的高4比特移到低4比特處。
mov ax, #0xe0f ! ah = 請求的功能值,al = 半位元組(4個比特)掩碼。
and al, dl !! 取dl的低4比特值。
add al, #0x90 ! 將al轉換為ASCII十六進位碼(4個指令)
daa !! 十進位調整
adc al, #0x40 !! (adc dest, src ==>; dest := dest + src + c )
daa
int 0x10
loop print_digit
ret


/*
* 這個過程(子程序)關閉軟碟機的馬達,這樣
* 我們進入內核后它的狀態就是已知的,以後也就
* 不用擔心它了。
*/
kill_motor:
push dx
mov dx,#0x3f2
xor al,al
outb
pop dx
ret

!! 數據區
sectors:
.word 0 !! 當前每磁軌扇區數。(36||18||15||9)

disksizes: !! 每磁軌扇區數表
.byte 36, 18, 15, 9

msg1:
.byte 13, 10
.ascii "Loading"

.org 497 !! 從boot程序的二進位文件的497位元組開始
setup_sects:
.byte SETUPSECS
root_flags:
.word CONFIG_ROOT_RDONLY
syssize:
.word SYSSIZE
swap_dev:
.word SWAP_DEV
ram_size:
.word RAMDISK
vid_mode:
.word SVGA_MODE
root_dev:
.word ROOT_DEV
boot_flag: !! 分區啟動標誌
.word 0xAA55





[目錄]

--------------------------------------------------------------------------------


setup.S

1、按規定得有個頭,所以一開始是慣用的JMP;
2、頭裡邊內容很豐富,具體用法走著瞧;
3、自我檢測,不知道有什麼用,防偽造?防篡改?
4、如果裝載程序不對,只好死掉!以下終於走入正題;
5、獲取內存容量(使用了三種辦法,其中的E820和E801看不明白,int 15倒是老朋友了--應該是上個世紀80年代末認識的了,真佩服十年過去了,情意依舊,不過遇上一些不守規矩的BIOS,不知道還行不行);
6、將鍵盤重複鍵的重複率設為最大,靈敏一點?
7、檢測硬碟,不懂,放這裡幹什麼?
8、檢測MCA匯流排(不要問我這是什麼);
9、檢測PS/2滑鼠,用int 11,只是不知道為何放這裡;
10、檢測電源管理BIOS;唉,書到用時方恨少,不懂的太多了,真不好意思;不過也沒有關係, 不懂的就別去動它就行了;以下要進入內核了;
11、 在進入保護模式之前,可以調用一個你提供的試模式下的過程,讓你最後在看她一眼,當然你要是不提供,那就有個默認的,無非是塞住耳朵閉上眼睛禁止任何中斷,包括著名的NMI ;
12、設置保護模式起動后的常式地址, 你可以寫自己的常式,但不是代替而是把它加在setup提供的常式的前面(顯示一個小鴨子?);
13、如果內核是zImage, 將它移動到0x10000處;
14、如果自己不在0x90000處,則移動到0x90000處;
15、建立idt, gdt表;
16、啟動A20;
17、屏住呼吸,屏閉所有中斷;
18、啟動!movw $1, %ax ; lmsw %ax; 好已經進入保護模式下,馬上進行局部調整;
19、jmpi 0x100000, __KERNEL_CS,終於進入內核;
setup.S
A summary of the setup.S code 。The slight differences in the operation of setup.S due to a big kernel is documented here. When the switch to 32 bit protected mode begins the code32_start address is defined as 0x100000 (when loaded) here.
code32_start:

#ifndef __BIG_KERNEL__
.long 0x1000
#else
.long 0x100000
#endif

After setting the keyboard repeat rate to a maximum, calling video.S, storing the video parameters, checking for the hard disks, PS/2 mouse, and APM BIOS the preparation for real mode switch begins.

The interrupts are disabled. Since the loader changed the code32_start address, the code32 varable is updated. This would be used for the jmpi instruction when the setup.S finally jumps to compressed/head.S. In case of a big kernel this is loacted at 0x100000.

seg cs
mov eax, code32_start !modified above by the loader
seg cs
mov code32,eax

!code32 contains the correct address to branch to after setup.S finishes After the above code there is a slight difference in the ways the big and small kernels are dealt. In case of a small kernel the kernel is moved down to segment address 0x100, but a big kernel is not moved. Before decompression, the big kernel stays at 0x100000. The following is the code that does thischeck.test byte ptr loadflags,

#LOADED_HIGH
jz do_move0 ! a normal low loaded zImage is moved
jmp end_move ! skip move

The interrupt and global descriptors are initialized:

lidt idt_48 ! load idt wit 0,0
lgdt gdt_48 ! load gdt with whatever appropriate

After enabling A20 and reprogramming the interrupts, it is ready to set the PE bit:

mov ax,#1
lmsw ax
jmp flush_instr
flush_instr:
xor bx.bx !flag to indicate a boot
! Manual, mixing of 16-bit and 32 bit code
db 0x166,0xea !prefix jmpi-opcode
code32: dd ox1000 !this has been reset in caes of a big kernel, to 0x100000
dw __KERNEL_CS

Finally it prepares the opcode for jumping to compressed/head.S which in the big kernel is at 0x100000. The compressed kernel would start at 0x1000 in case of a small kernel.

compressed/head.S

When setup.S relinquishes control to compressed/head.S at beginning of the compressed kernmel at 0x100000. It checks to see if A20 is really enabled otherwise it loops forever.

Itinitializes eflags, and clears BSS (Block Start by Symbol) creating reserved space for uninitialized static or global variables. Finally it reserves place for the moveparams structure (defined in misc.c) and pushes the current stack pointer on the stack and calls the C function decompress_kernel which takes a struct moveparams * as an argument

subl $16,%esp
pushl %esp
call SYMBOL_NAME(decompress_kernel)
orl ??,??
jnz 3f

Te C function decompress_kernel returns the variable high_loaded which is set to 1 in the function setup_output_buffer_if_we_run_high, which is called in decompressed_kernel if a big kernel was loaded.
When decompressed_kernel returns, it jumps to 3f which moves the move routine.

movl $move_routine_start,%esi ! puts the offset of the start of the source in the source index register
mov $0x1000,?? ! the destination index now contains 0x1000, thus after move, the move routine starts at 0x1000
movl $move_routine_end,??
sub %esi,?? ! ecx register now contains the number of bytes to be moved
! (number of bytes between the labels move_routine_start and move_routine_end)
cld
rep
movsb ! moves the bytes from ds:si to es:di, in each loop it increments si and di, and decrements cx
! the movs instruction moves till ecx is zero

Thus the movsb instruction moves the bytes of the move routine between the labels move_routine_start and move_routine_end. At the end the entire move routine labeled move_routine_start is at 0x1000. The movsb instruction moves bytes from ds:si to es:si.

At the start of the head.S code es,ds,fs,gs were all intialized to __KERNEL_DS, which is defined in /usr/src/linux/include/asm/segment.h as 0x18. This is the offset from the goobal descriptor table gdtwhich was setup in setup.S. The 24th byte is the start of the data segment descriptor, which has the base address = 0. Thus the moe routine is moved and
starts at offset 0x1000 from __KERNEL_DS, the kernel data segment base (which is 0).
The salient features of what is done by the decompress_kernel is discussed in the next section but it is worth noting that the when the decompressed_kernel function is invoked, space was created at the top of the stack to contain the information about the decompressed kernel. The decompressed kernel if big may be in the high buffer and in the low buffer. After the decompressed_kernel function returns, the decompressed kernel has to be moved so that we
have a contiguous decompressed kernel starting from address 0x100000. To move the decompressed kernel, the important parameters needed are the start addresses of the high buffer and low buffer, and the number of bytes in the high and low buffers. This is at the top of the stack when decompressed_kernel returns (the top of the stack was passed as an argument : struct moveparams*, and in the function the fileds of the moveparams struture was adjusted toreflect the state of the decompression.)

/* in compressed/misc.c */
struct moveparams {
uch *low_buffer_start; ! start address of the low buffer
int count; ! number of bytes in the low buffer after decompression is doneuch *high_buffer_start; ! start address of the high buffer
int hcount; ! number of bytes in the high buffer aftre decompression is done
};

Thus when the decompressed_kernel returns, the relevant bytes are popped in the respective registers as shown below. After preparing these registers the decompressed kernel is ready to be moved and the control jumps to the moved move routine at __KERNEL_CS:0x1000. The code for setting the appropriate registers is given below:

popl %esi ! discard the address, has the return value (high_load) most probably
popl %esi ! low_buffer_start
popl ?? ! lcount
popl ?? ! high_buffer_count
popl ?? ! hcount
movl %0x100000,??
cli ! disable interrutps when the decompressed kernel is being moved
ljmp $(__KERNEL_CS), $0x1000 ! jump to the move routine which was moved to low memory, 0x1000

The move_routine_start basically has two parts, first it moves the part of the decompressed kernel in the low buffer, then it moves (if required) the high buffer contents. It should be noted that the ecx has been intialized to the number of bytes in the low end buffer, and the destination index register di has been intialized to 0x100000.
move_routine_start:

rep ! repeat, it stops repeating when ecx == 0
movsb ! the movsb instruction repeats till ecx is 0. In each loop byte is transferred from ds:esi to es:edi! In each loop the edi and the esi are incremented and ecx is decremented
! when the low end buffer has been moved the value of di is not changed and the next pasrt of the code! uses it to transfer the bytes from the high buffer
movl ??,%esi ! esi now has the offset corresponding to the start of the high buffer
movl ??,?? ! ecx is now intialized to the number of bytes in the high buffer
rep
movsb ! moves all the bytes in the high buffer, and doesn』t move at all if hcount was zero (if it was determined, in! close_output_buffer_if_we_run_high that the high buffer need not be moveddown )
xorl ??,??
mov $0x90000, %esp ! stack pointer is adjusted, most probably to be used by the kernel in the intialization
ljmp $(__KERNEL_CS), $0x100000 ! jump to __KERNEL_CS:0X100000, where the kernel code starts
move_routine_end:At the end of the this the control goes to the kernel code segment.


Linux Assembly code taken from head.S and setup.S
Comment code added by us




[目錄]

--------------------------------------------------------------------------------


head.S

因為setup.S最後的為一條轉跳指令,跳到內核第一條指令並開始執行。指令中指向的是內存中的絕對地址,我們無法依此判斷轉跳到了head.S。但是我們可以通過Makefile簡單的確定head.S位於內核的前端。
在arch/i386 的 Makefile 中定義了
HEAD := arch/i386/kernel/head.o

而在linux總的Makefile中由這樣的語句
include arch/$(ARCH)/Makefile
說明HEAD定義在該文件中有效

然後由如下語句:

vmlinux: $(CONFIGURATION) init/main.o init/version.o linuxsubdirs
$(LD) $(LINKFLAGS) $(HEAD) init/main.o init/version.o \
$(ARCHIVES) \
$(FILESYSTEMS) \
$(DRIVERS) \
$(LIBS) -o vmlinux
$(NM) vmlinux | grep -v '\(compiled\)\|\(\.o$$\)\|\( a \)' | sort >; System.map

從這個依賴關係我們可以獲得大量的信息

1>;$(HEAD)即head.o的確第一個被連接到核心中

2>;所有內核中支持的文件系統全部編譯到$(FILESYSTEMS)即fs/filesystems.a中
所有內核中支持的網路協議全部編譯到net.a中
所有內核中支持的SCSI驅動全部編譯到scsi.a中
...................
原來內核也不過是一堆庫文件和目標文件的集合罷了,有興趣對內核減肥的同學,
可以好好比較一下看究竟是那個部分佔用了空間。

3>;System.map中包含了所有的內核輸出的函數,我們在編寫內核模塊的時候
可以調用的系統函數大概就這些了。


好了,消除了心中的疑問,我們可以仔細分析head.s了。

Head.S分析

1 首先將ds,es,fs,gs指向系統數據段KERNEL_DS
KERNEL_DS 在asm/segment.h中定義,表示全局描述符表中
中的第三項。
注意:該此時生效的全局描述符表並不是在head.s中定義的
而仍然是在setup.S中定義的。

2 數據段全部清空。

3 setup_idt為一段子程序,將中斷向量表全部指向ignore_int函數
該函數列印出:unknown interrupt
當然這樣的中斷處理函數什麼也幹不了。

4 察看數據線A20是否有效,否則循環等待。
地址線A20是x86的歷史遺留問題,決定是否能訪問1M以上內存。

5 拷貝啟動參數到0x5000頁的前半頁,而將setup.s取出的bios參數
放到後半頁。

6 檢查CPU類型
@#$#%$^*@^?(^%#$%!#!@?誰知道幹了什麼?

7 初始化頁表,只初始化最初幾頁。

1>;將swapper_pg_dir(0x2000)和pg0(0x3000)清空
swapper_pg_dir作為整個系統的頁目錄

2>;將pg0作為第一個頁表,將其地址賦到swapper_pg_dir的第一個32
位字中。

3>;同時將該頁表項也賦給swapper_pg_dir的第3072個入口,表示虛擬地址
0xc0000000也指向pg0。

4>;將pg0這個頁表填滿指向內存前4M

5>;進入分頁方式
注意:以前雖然在在保護模式但沒有啟用分頁。

--------------------
| swapper_pg_dir | -----------
| |-------| pg0 |----------內存前4M
| | -----------
| |
--------------------
8 裝入新的gdt和ldt表。

9 刷新段寄存器ds,es,fs,gs

10 使用系統堆棧,即預留的0x6000頁面

11 執行start_kernel函數,這個函數是第一個C編製的
函數,內核又有了一個新的開始。





[目錄]

--------------------------------------------------------------------------------


compressed/misc.c

compressed/misc.c
The differences in decompressing big and small kernels.
http://www.vuse.vanderbilt.edu/~ ... ation/hw3_part3.htm
The function decompressed_kernel is invoked from head.S and a parameter to the top of the stack is passed to store the results of the decompression namely, the start addresses of the high and the low buffers which contain the decompressed kernel and the numebr of bytes in each buffer (hcount and lcount).
int decompress_kernel(struct moveparams *mv)
{
if (SCREEN_INFO.orig_video_mode == 7) {
vidmem = (char *) 0xb0000;
vidport = 0x3b4;
} else {
vidmem = (char *) 0xb8000;
vidport = 0x3d4;
}
lines = SCREEN_INFO.orig_video_lines;
cols = SCREEN_INFO.orig_video_cols;
if (free_mem_ptr < 0x100000) setup_normal_output_buffer(); // Call if smallkernel
else setup_output_buffer_if_we_run_high(mv); // Call if big kernel
makecrc();
puts("Uncompressing Linux... ";
gunzip();
puts("Ok, booting the kernel.\n";
if (high_loaded) close_output_buffer_if_we_run_high(mv);
return high_loaded;
}

The first place where a distinction is made is when the buffers are to be setup for the decmpression routine gunzip(). Free_mem_ptr, is loaded with the value of the address of the extern variabe end. The variable end marks the end of the compressed kernel. If the free_mem-ptr is less than the 0x100000,then a high buffer has to be setup. Thus the function setup_output_buffer_if_we_run_high is called and the pointer to the top of the moveparams structure is passed so that when the buffers are setup, the start addresses fields are updated in moveparams structure. It is also checked to see if the high buffer needs to be moved down after decompression and this is reflected by the hcount which is 0 if we need not move the high buffer down.

void setup_output_buffer_if_we_run_high(struct moveparams *mv)
{
high_buffer_start = (uch *)(((ulg)&end) HEAP_SIZE);
//the high buffer start address is at the end HEAP_SIZE
#ifdef STANDARD_MEMORY_BIOS_CALL
if (EXT_MEM_K < (3*1024)) error("Less than 4MB of memory.\n";
#else
if ((ALT_MEM_K >; EXT_MEM_K ? ALT_MEM_K : EXT_MEM_K) < (3*1024)) error("Less
than 4MB of memory.\n";
#endif
mv->;low_buffer_start = output_data = (char *)LOW_BUFFER_START;
//the low buffer start address is at 0x2000 and it extends till 0x90000.
high_loaded = 1; //high_loaded is set to 1, this is returned by decompressed_kernel
free_mem_end_ptr = (long)high_buffer_start;
// free_mem_end_ptr points to the same address as te high_buffer_start
// the code below finds out if the high buffer needs to be moved after decompression
// if the size if the low buffer is >; the size of the compressed kernel and the HEAP_SIZE
// then the high_buffer_start has to be shifted up so that when the decompression starts it doesn』t
// overwrite the compressed kernel data. Thus when the high_buffer_start islow then it is shifted
// up to exactly match the end of the compressed kernel and the HEAP_SIZE. The hcount filed is
// is set to 0 as the high buffer need not be moved down. Otherwise if the high_buffer_start is too
// high then the hcount is non zero and while closing the buffers the appropriate number of bytes
// in the high buffer is asigned to the filed hcount. Since the start address of the high buffer is
// known the bytes could be moved down
if ( (0x100000 LOW_BUFFER_SIZE) >; ((ulg)high_buffer_start)) {
high_buffer_start = (uch *)(0x100000 LOW_BUFFER_SIZE);
mv->;hcount = 0; /* say: we need not to move high_buffer */
}
else mv->;hcount = -1;
mv->;high_buffer_start = high_buffer_start;
// finally the high_buffer_start field is set to the varaible high_buffer_start
}

After the buffers are set gunzip() is invoked which decompresses the kernel Upon return, bytes_out has the number of bytes in the decompressed kernel.Finally close_output_buffer_if_we_run_high is invoked if high_loaded is non zero:

void close_output_buffer_if_we_run_high(struct moveparams *mv)
{
mv->;lcount = bytes_out;
// if the all of decompressed kernel is in low buffer, lcount = bytes_out
if (bytes_out >; LOW_BUFFER_SIZE) {
// if there is a part of the decompressed kernel in the high buffer, the lcount filed is set to
// the size of the low buffer and the hcount field contains the rest of the bytes
mv->;lcount = LOW_BUFFER_SIZE;
if (mv->;hcount) mv->;hcount = bytes_out - LOW_BUFFER_SIZE;
// if the hcount field is non zero (made in setup_output_buffer_if_we_run_high)
// then the high buffer has to be moved doen and the number of bytes in the high buffer is
// in hcount
}
else mv->;hcount = 0; // all the data is in the high buffer
}
Thus at the end of the the decompressed_kernel function the top of the stack has the addresses of the buffers and their sizes which is popped and the appropriate registers set for the move routine to move the entire kernel. After the move by the move_routine the kernel resides at 0x100000. If a small kernel is being decompressed then the setup_normal_output_buffer() is invoked from decompressed_kernel, which just initializes output_data to 0x100000 where the decompressed kernel would lie. The variable high_load is still 0 as setup_output_buffer_if_we_run_high() is not invoked. Decompression is done starting at address 0x100000. As high_load is 0, when decompressed_kernel returns in head.S, a zero is there in the eax. Thus the control jumps directly to 0x100000. Since the decompressed kernel lies there directly and the move routine need not be called.

Linux code taken from misc.c
Comment code added by us





[目錄]

--------------------------------------------------------------------------------


內核解壓

概述
----
1) Linux的初始內核映象以gzip壓縮文件的格式存放在zImage或bzImage之中, 內核的自舉代碼將它解壓到1M內存開始處. 在內核初始化時, 如果載入了壓縮的initrd映象, 內核會將它解壓到內存檔中, 這兩處解壓過程都使用了lib/inflate.c文件.

2) inflate.c是從gzip源程序中分離出來的, 包含了一些對全局數據的直接引用, 在使用時需要直接嵌入到代碼中. gzip壓縮文件時總是在前32K位元組的範圍內尋找重複的字元串進行編碼, 在解壓時需要一個至少為32K位元組的解壓緩衝區, 它定義為window[WSIZE].inflate.c使用get_byte()讀取輸入文件, 它被定義成宏來提高效率. 輸入緩衝區指針必須定義為inptr, inflate.c中對之有減量操作. inflate.c調用flush_window()來輸出window緩衝區中的解壓出的位元組串, 每次輸出長度用outcnt變數表示. 在flush_window()中, 還必須對輸出位元組串計算CRC並且刷新crc變數. 在調用gunzip()開始解壓之前, 調用makecrc()初始化CRC計算表. 最後gunzip()返回0表示解壓成功.


3) zImage或bzImage由16位引導代碼和32位內核自解壓映象兩個部分組成. 對於zImage, 內核自解壓映象被載入到物理地址0x1000, 內核被解壓到1M的部位. 對於bzImage, 內核自解壓映象被載入到1M開始的地方, 內核被解壓為兩個片段, 一個起始於物理地址0x2000-0x90000,另一個起始於高端解壓映象之後, 離1M開始處不小於低端片段最大長度的區域. 解壓完成後,這兩個片段被合併到1M的起始位置.


解壓根內存檔映象文件的代碼
--------------------------

; drivers/block/rd.c
#ifdef BUILD_CRAMDISK

/*
* gzip declarations
*/

#define OF(args) args ; 用於函數原型聲明的宏
#ifndef memzero
#define memzero(s, n) memset ((s), 0, (n))
#endif
typedef unsigned char uch; 定義inflate.c所使用的3種數據類型
typedef unsigned short ush;
typedef unsigned long ulg;
#define INBUFSIZ 4096 用戶輸入緩衝區尺寸
#define WSIZE 0x8000 /* window size--must be a power of two, and */
/* at least 32K for zip's deflate method */

static uch *inbuf; 用戶輸入緩衝區,與inflate.c無關
static uch *window; 解壓窗口
static unsigned insize; /* valid bytes in inbuf */
static unsigned inptr; /* index of next byte to be processed in inbuf */
static unsigned outcnt; /* bytes in output buffer */
static int exit_code;
static long bytes_out; 總解壓輸出長度,與inflate.c無關
static struct file *crd_infp, *crd_outfp;

#define get_byte() (inptr
/* Diagnostic functions (stubbed out) */ 一些調試宏
#define Assert(cond,msg)
#define Trace(x)
#define Tracev(x)
#define Tracevv(x)
#define Tracec(c,x)
#define Tracecv(c,x)

#define STATIC static

static int fill_inbuf(void);
static void flush_window(void);
static void *malloc(int size);
static void free(void *where);
static void error(char *m);
static void gzip_mark(void **);
static void gzip_release(void **);

#include "../../lib/inflate.c"

static void __init *malloc(int size)
{
return kmalloc(size, GFP_KERNEL);
}

static void __init free(void *where)
{
kfree(where);
}

static void __init gzip_mark(void **ptr)
{
; 讀取用戶一個標記
}

static void __init gzip_release(void **ptr)
{
; 歸還用戶標記
}

/* ===========================================================================
* Fill the input buffer. This is called only when the buffer is empty
* and at least one byte is really needed.
*/

static int __init fill_inbuf(void) 填充輸入緩衝區
{
if (exit_code) return -1;
insize = crd_infp->;f_op->;read(crd_infp, inbuf, INBUFSIZ,
if (insize == 0) return -1;
inptr = 1;
return inbuf[0];
}

/* ===========================================================================
* Write the output window window[0..outcnt-1] and update crc and bytes_out.
* (Used for the decompressed data only.)
*/

static void __init flush_window(void) 輸出window緩衝區中outcnt個位元組串
{
ulg c = crc; /* temporary variable */
unsigned n;
uch *in, ch;

crd_outfp->;f_op->;write(crd_outfp, window, outcnt,
in = window;
for (n = 0; n ch = *in++;
c = crc_32_tab[((int)c ^ ch) 0xff] ^ (c >;>; ; 計算輸出串的CRC
}
crc = c;
bytes_out += (ulg)outcnt; 刷新總位元組數
outcnt = 0;
}

static void __init error(char *x) 解壓出錯調用的函數
{
printk(KERN_ERR "%s", x);
exit_code = 1;
}


static int __init
crd_load(struct file * fp, struct file *outfp)
{
int result;

insize = 0; /* valid bytes in inbuf */
inptr = 0; /* index of next byte to be processed in inbuf */
outcnt = 0; /* bytes in output buffer */
exit_code = 0;
bytes_out = 0;
crc = (ulg)0xffffffffL; /* shift register contents */

crd_infp = fp;
crd_outfp = outfp;
inbuf = kmalloc(INBUFSIZ, GFP_KERNEL);
if (inbuf == 0) {
printk(KERN_ERR "RAMDISK: Couldn't allocate gzip buffer\n";
return -1;
}
window = kmalloc(WSIZE, GFP_KERNEL);
if (window == 0) {
printk(KERN_ERR "RAMDISK: Couldn't allocate gzip window\n";
kfree(inbuf);
return -1;
}
makecrc();
result = gunzip();
kfree(inbuf);
kfree(window);
return result;
}

#endif /* BUILD_CRAMDISK */

32位內核自解壓代碼
------------------

; arch/i386/boot/compressed/head.S

.text
#include ·
#include
.globl startup_32 對於zImage該入口地址為0x1000; 對於bzImage為0x101000
startup_32:
cld
cli
movl $(__KERNEL_DS),%eax
movl %eax,%ds
movl %eax,%es
movl %eax,%fs
movl %eax,%gs

lss SYMBOL_NAME(stack_start),%esp # 自解壓代碼的堆棧為misc.c中定義的16K位元組的數組
xorl %eax,%eax
1: incl %eax # check that A20 really IS enabled
movl %eax,0x000000 # loop forever if it isn't
cmpl %eax,0x100000
je 1b

/*
* Initialize eflags. Some BIOS's leave bits like NT set. This would
* confuse the debugger if this code is traced.
* XXX - best to initialize before switching to protected mode.
*/
pushl $0
popfl
/*
* Clear BSS 清除解壓程序的BSS段
*/
xorl %eax,%eax
movl $ SYMBOL_NAME(_edata),%edi
movl $ SYMBOL_NAME(_end),%ecx
subl %edi,%ecx
cld
rep
stosb
/*
* Do the decompression, and jump to the new kernel..
*/
subl $16,%esp # place for structure on the stack
movl %esp,%eax
pushl %esi # real mode pointer as second arg
pushl %eax # address of structure as first arg
call SYMBOL_NAME(decompress_kernel)
orl %eax,%eax # 如果返回非零,則表示為內核解壓為低端和高端的兩個片斷
jnz 3f
popl %esi # discard address
popl %esi # real mode pointer
xorl %ebx,%ebx
ljmp $(__KERNEL_CS), $0x100000 # 運行start_kernel

/*
* We come here, if we were loaded high.
* We need to move the move-in-place routine down to 0x1000
* and then start it with the buffer addresses in registers,
* which we got from the stack.
*/
3:
movl $move_routine_start,%esi
movl $0x1000,%edi
movl $move_routine_end,%ecx
subl %esi,%ecx
addl $3,%ecx
shrl $2,%ecx # 按字取整
cld
rep
movsl # 將內核片斷合併代碼複製到0x1000區域, 內核的片段起始為0x2000

popl %esi # discard the address
popl %ebx # real mode pointer
popl %esi # low_buffer_start 內核低端片段的起始地址
popl %ecx # lcount 內核低端片段的位元組數量
popl %edx # high_buffer_start 內核高端片段的起始地址
popl %eax # hcount 內核高端片段的位元組數量
movl $0x100000,%edi 內核合併的起始地址
cli # make sure we don't get interrupted
ljmp $(__KERNEL_CS), $0x1000 # and jump to the move routine

/*
* Routine (template) for moving the decompressed kernel in place,
* if we were high loaded. This _must_ PIC-code !
*/
move_routine_start:
movl %ecx,%ebp
shrl $2,%ecx
rep
movsl # 按字拷貝第1個片段
movl %ebp,%ecx
andl $3,%ecx
rep
movsb # 傳送不完全字
movl %edx,%esi
movl %eax,%ecx # NOTE: rep movsb won't move if %ecx == 0
addl $3,%ecx
shrl $2,%ecx # 按字對齊
rep
movsl # 按字拷貝第2個片段
movl %ebx,%esi # Restore setup pointer
xorl %ebx,%ebx
ljmp $(__KERNEL_CS), $0x100000 # 運行start_kernel
move_routine_end:

; arch/i386/boot/compressed/misc.c

/*
* gzip declarations
*/

#define OF(args) args
#define STATIC static

#undef memset
#undef memcpy
#define memzero(s, n) memset ((s), 0, (n))


ypedef unsigned char uch;
typedef unsigned short ush;
typedef unsigned long ulg;

#define WSIZE 0x8000 /* Window size must be at least 32k, */
/* and a power of two */

static uch *inbuf; /* input buffer */
static uch window[WSIZE]; /* Sliding window buffer */

static unsigned insize = 0; /* valid bytes in inbuf */
static unsigned inptr = 0; /* index of next byte to be processed in inbuf */
static unsigned outcnt = 0; /* bytes in output buffer */

/* gzip flag byte */
#define ASCII_FLAG 0x01 /* bit 0 set: file probably ASCII text */
#define CONTINUATION 0x02 /* bit 1 set: continuation of multi-part gzip file */
#define EXTRA_FIELD 0x04 /* bit 2 set: extra field present */
#define ORIG_NAME 0x08 /* bit 3 set: original file name present */
#define COMMENT 0x10 /* bit 4 set: file comment present */
#define ENCRYPTED 0x20 /* bit 5 set: file is encrypted */
#define RESERVED 0xC0 /* bit 6,7: reserved */

#define get_byte() (inptr
/* Diagnostic functions */
#ifdef DEBUG
# define Assert(cond,msg) {if(!(cond)) error(msg);}
# define Trace(x) fprintf x
# define Tracev(x) {if (verbose) fprintf x ;}
# define Tracevv(x) {if (verbose>;1) fprintf x ;}
# define Tracec(c,x) {if (verbose (c)) fprintf x ;}
# define Tracecv(c,x) {if (verbose>;1 (c)) fprintf x ;}
#else
# define Assert(cond,msg)
# define Trace(x)
# define Tracev(x)
# define Tracevv(x)
# define Tracec(c,x)
# define Tracecv(c,x)
#endif

static int fill_inbuf(void);
static void flush_window(void);
static void error(char *m);
static void gzip_mark(void **);
static void gzip_release(void **);

/*
* This is set up by the setup-routine at boot-time
*/
static unsigned char *real_mode; /* Pointer to real-mode data */

#define EXT_MEM_K (*(unsigned short *)(real_mode + 0x2))
#ifndef STANDARD_MEMORY_BIOS_CALL
#define ALT_MEM_K (*(unsigned long *)(real_mode + 0x1e0))
#endif
#define SCREEN_INFO (*(struct screen_info *)(real_mode+0))

extern char input_data[];
extern int input_len;

static long bytes_out = 0;
static uch *output_data;
static unsigned long output_ptr = 0;


static void *malloc(int size);
static void free(void *where);
static void error(char *m);
static void gzip_mark(void **);
static void gzip_release(void **);

static void puts(const char *);

extern int end;
static long free_mem_ptr = (long)
static long free_mem_end_ptr;

#define INPLACE_MOVE_ROUTINE 0x1000 內核片段合併代碼的運行地址
#define LOW_BUFFER_START 0x2000 內核低端解壓片段的起始地址
#define LOW_BUFFER_MAX 0x90000 內核低端解壓片段的終止地址
#define HEAP_SIZE 0x3000 為解壓低碼保留的堆的尺寸,堆起始於BSS的結束
static unsigned int low_buffer_end, low_buffer_size;
static int high_loaded =0;
static uch *high_buffer_start /* = (uch *)(((ulg) + HEAP_SIZE)*/;

static char *vidmem = (char *)0xb8000;
static int vidport;
static int lines, cols;

#include "../../../../lib/inflate.c"

static void *malloc(int size)
{
void *p;

if (size if (free_mem_ptr
free_mem_ptr = (free_mem_ptr + 3) ~3; /* Align */

p = (void *)free_mem_ptr;
free_mem_ptr += size;

if (free_mem_ptr >;= free_mem_end_ptr)
error("\nOut of memory\n";

return p;
}

static void free(void *where)
{ /* Don't care */
}

static void gzip_mark(void **ptr)
{
*ptr = (void *) free_mem_ptr;
}

static void gzip_release(void **ptr)
{
free_mem_ptr = (long) *ptr;
}

static void scroll(void)
{
int i;

memcpy ( vidmem, vidmem + cols * 2, ( lines - 1 ) * cols * 2 );
for ( i = ( lines - 1 ) * cols * 2; i vidmem[ i ] = ' ';
}

static void puts(const char *s)
{
int x,y,pos;
char c;

x = SCREEN_INFO.orig_x;
y = SCREEN_INFO.orig_y;

while ( ( c = *s++ ) != '\0' ) {
if ( c == '\n' ) {
x = 0;
if ( ++y >;= lines ) {
scroll();
y--;
}
} else {
vidmem [ ( x + cols * y ) * 2 ] = c;
if ( ++x >;= cols ) {
x = 0;
if ( ++y >;= lines ) {
scroll();
y--;
}
}
}
}

SCREEN_INFO.orig_x = x;
SCREEN_INFO.orig_y = y;

pos = (x + cols * y) * 2; /* Update cursor position */
outb_p(14, vidport);
outb_p(0xff (pos >;>; 9), vidport+1);
outb_p(15, vidport);
outb_p(0xff (pos >;>; 1), vidport+1);
}

void* memset(void* s, int c, size_t n)
{
int i;
char *ss = (char*)s;

for (i=0;i return s;
}

void* memcpy(void* __dest, __const void* __src,
size_t __n)
{
int i;
char *d = (char *)__dest, *s = (char *)__src;

for (i=0;i return __dest;
}

/* ===========================================================================
* Fill the input buffer. This is called only when the buffer is empty
* and at least one byte is really needed.
*/
static int fill_inbuf(void)
{
if (insize != 0) {
error("ran out of input data\n";
}

inbuf = input_data;
insize = input_len;
inptr = 1;
return inbuf[0];
}

/* ===========================================================================
* Write the output window window[0..outcnt-1] and update crc and bytes_out.
* (Used for the decompressed data only.)
*/
static void flush_window_low(void)
{
ulg c = crc; /* temporary variable */
unsigned n;
uch *in, *out, ch;
in = window;
out =
for (n = 0; n ch = *out++ = *in++;
c = crc_32_tab[((int)c ^ ch) 0xff] ^ (c >;>; ;
}
crc = c;
bytes_out += (ulg)outcnt;
output_ptr += (ulg)outcnt;
outcnt = 0;
}

static void flush_window_high(void)
{
ulg c = crc; /* temporary variable */
unsigned n;
uch *in, ch;
in = window;
for (n = 0; n ch = *output_data++ = *in++;
if ((ulg)output_data == low_buffer_end) output_data=high_buffer_start;
c = crc_32_tab[((int)c ^ ch) 0xff] ^ (c >;>; ;
}
crc = c;
bytes_out += (ulg)outcnt;
outcnt = 0;
}

static void flush_window(void)
{
if (high_loaded) flush_window_high();
else flush_window_low();
}

static void error(char *x)
{
puts("\n\n";
puts(x);
puts("\n\n -- System halted");

while(1); /* Halt */
}

#define STACK_SIZE (4096)

long user_stack [STACK_SIZE];

struct {
long * a;
short b;
} stack_start = { user_stack [STACK_SIZE] , __KERNEL_DS };

void setup_normal_output_buffer(void) 對於zImage, 直接解壓到1M
{
#ifdef STANDARD_MEMORY_BIOS_CALL
if (EXT_MEM_K #else
if ((ALT_MEM_K >; EXT_MEM_K ? ALT_MEM_K : EXT_MEM_K) #endif
output_data = (char *)0x100000; /* Points to 1M */
free_mem_

[火星人 ] Linux 啟動過程...[具體參考趙??齙摹?inux內核0.11 詳細註釋》]已經有1299次圍觀

http://coctec.com/docs/linux/show-post-189392.html