【施工完成】MIT 6.828 lab 1: C, Assembly, Tools and Bootstrapping

Overview

花费了30+小时,终于搞定了orz

Part 1: PC Bootstrap

The PC's Physical Address Space

8086/8088时代

+------------------+  <- 0x00100000 (1MB)
|     BIOS ROM     |
+------------------+  <- 0x000F0000 (960KB)
|  16-bit devices, |
|  expansion ROMs  |
+------------------+  <- 0x000C0000 (768KB)
|   VGA Display    |
+------------------+  <- 0x000A0000 (640KB)
|                  |
|    Low Memory    |
|                  |
+------------------+  <- 0x00000000

由于8086/8088只有20跟地址线,因此物理内存空间就是2^20=1MB.地址空间从0x00000到0xFFFFF.其中从0x00000开始的640k空间被称为"low memory",是PC真正能使用的RAM。从 0xA0000 到 0xFFFFF 的384k的non-volatile memory被硬件保留,用作video display buffers和BIOS等。

80286/80386时代及以后

为了保持向后兼容,因此0-1MB的空间还是和原来保持一致。因此地址空间似乎存在一个“洞”(为什么我觉得其实是两个“洞”。。。不是空着的才叫“洞”吗),PC能使用的RAM被这个“洞”(也就是0xA0000 到 0xFFFFF)分成了0x00000000到0x000BFFFF的640k和 0x00100000到0xFFFFFFFF两部分。

+------------------+  <- 0xFFFFFFFF (4GB)
|      32-bit      |
|  memory mapped   |
|     devices      |
|                  |
/\/\/\/\/\/\/\/\/\/\

/\/\/\/\/\/\/\/\/\/\
|                  |
|      Unused      |
|                  |
+------------------+  <- depends on amount of RAM
|                  |
|                  |
| Extended Memory  |
|                  |
|                  |
+------------------+  <- 0x00100000 (1MB)
|     BIOS ROM     |
+------------------+  <- 0x000F0000 (960KB)
|  16-bit devices, |
|  expansion ROMs  |
+------------------+  <- 0x000C0000 (768KB)
|   VGA Display    |
+------------------+  <- 0x000A0000 (640KB)
|                  |
|    Low Memory    |
|                  |
+------------------+  <- 0x00000000
此外,在地址空间的最上面一部分,通常被BIOS保留用于 32-bit PCI devices的memory mapped. memory mapped是对于memory和I/O设备使用相同的地址空间的一种I/O寻址方式。具体可以参考[Memory-mapped I/O](https://en.wikipedia.org/wiki/Memory-mapped_I/O)。PCI设备具体可以参考[PCI_Express](https://en.wikipedia.org/wiki/PCI_Express)和[深入PCI与PCIe之一:硬件篇](https://zhuanlan.zhihu.com/p/26172972)

目前处理器已经可以支持超过4GB大小的内存空间。因此为了保持后向兼容性,地址空间又会多一个"洞"。

The ROM BIOS

用qemu模拟启动,观察到进入BIOS执行的第一条命令为

[f000:fff0]    0xffff0: ljmp   $0xf000,$0xe05b

说明PC执行的第一条指令的物理地址为0xffff0。

然后使用si命令执行单步指令,得到的前面几条执行的指令如下:

[f000:e05b]    0xfe05b: cmpl   $0x0,%cs:0x6ac8
[f000:e062]    0xfe062: jne    0xfd2e1
[f000:e066]    0xfe066: xor    %dx,%dx
[f000:e068]    0xfe068: mov    %dx,%ss
[f000:e06a]    0xfe06a: mov    $0x7000,%esp
[f000:e070]    0xfe070: mov    $0xf34c2,x
[f000:e076]    0xfe076: jmp    0xfd15c
[f000:d15c]    0xfd15c: mov    x,x
...

如果看着觉得似懂非懂...不要慌,问题不大,因为这里不需要弄明白BIOS到底在干什么。不过建议先复习一下x86汇编,可以参考General Registers (AX, BX, CX, and DX)Intel 80386 Reference Programmer's Manual Table of Contents 等内容。然后强烈推荐去稍微看一下gdb_examining data 部分的教程,尤其是查看memory和register内容的章节,对搞清楚BIOS这里到底在干嘛大有裨益。(x [memory]来查看某个地址的内容,x/i [memory]将该地址的指令以人类可读的方式写出,p/x $[register] 来查看某个寄存器的值。)

那么BIOS大概做了什么呢?主要是建立Interrupt descriptor table(其实就是x86体系架构中断向量表的实现),初始化一些硬件设备,然后寻找一个"bootable"设备。如果找到了这样一个设备,BIOS就将该设备上的boot loader加载到内存,并将控制权交给boot loader.

先明确几个概念。所谓boot loader,就是在加载OS前运行的一段程序。通常在硬盘的第一个sector里,因此这个sector也叫boot sector.至于我们更经常见到的master boot record(主引导记录),其实就是一种对于分区过的媒介的特殊的boot sector.

顺便提一句,确定一个设备是否为"bootable"是通过 0x55和0xAA两个boot signature来决定的。具体来说,如果一个设备中的第0个sector的最后两个byte的值分别为0x55和0xAA,就认为这是一个bootable设备。可以参考bool sequence

Part 2: The Boot Loader

BIOS在初始化完成后需要将boot loader加载到内存,具体的地址为 0x7c00 到0x7dff。

关于0x7c00这个magic number是怎么来的? 其实不重要,不过感兴趣可以参考Why BIOS loads MBR into 0x7C00 in x86 ? 知道这个magic number其实不是x86相关的,而是和IBM的BIOS开发团队有关就可以了。

boot loader包含一个汇编文件boot/boot.S和一个c语言文件boot/main.c

先来看下boot/boot.S文件都在干什么吧

不过在这之前,不妨先复习一下real mode和proteced mode

real mode / protected mode

  * [Real_mode](https://en.wikipedia.org/wiki/Real_mode) 地址空间被限制在2^20(因为地址总线为20),没有虚拟内存的概念,内存都是真实的物理内存。在real mode下,segment位于物理内存中的固定位置上。
  * 16-bit Protected Mode 登场于intel 80286处理器。首次引入了虚拟内存的概念。依赖局部性原理,只将程序运行需要的部分放入内存,暂时用不到的部分则存储在硬盘。segment的位置在其从disk回到memory中,可能和之前的位置不同。由于segment的位置不再固定,引入[Global Descriptor Table,GDT](https://en.wikipedia.org/wiki/Global_Descriptor_Table)来描述segment的信息,诸如是否在内存中,如果在,在内存中的什么位置,以及访问权限。由于寄存器仍然是16bit,所以segment [OSTEP](http://pages.cs.wisc.edu/~remzi/OSTEP/)
  *  32-bit Protected Mode  登场于intel 80386处理器。比起80286,使用的寄存器是32-bit的,因此segment size 增大到4GB(2^32). 同时,由于segment size不再像64k那么小,以前的一整个segment要么都在memory中,要么都在disk中的策略就变得不太科学了。因此引入[paging](https://en.wikipedia.org/wiki/Paging) 机制,将segment分成尺寸更小的page。允许segment中的一部分在memory中。关于paging可以参考[OSTEP](http://pages.cs.wisc.edu/~remzi/OSTEP/)的18章。

这里值得一提的是,对于支持protected mode的cpu,启动时为了保持向后兼容,仍然会以real mode启动,之后再切换到protected mode.

_When a processor that supports x86 protected mode is powered on, it begins executing instructions in [real mode](https://en.wikipedia.org/wiki/Real_mode), in order to maintain [backward compatibility](https://en.wikipedia.org/wiki/Backward_compatibility) with earlier x86 processors.[[4]](https://en.wikipedia.org/wiki/Protected_mode#cite_note-Real_mode_on_powered_on-4) Protected mode may only be entered after the system software sets up one descriptor table and enables the Protection Enable (PE) [bit](https://en.wikipedia.org/wiki/Bit) in the [control register](https://en.wikipedia.org/wiki/Control_register) 0 (CR0)_

boot/boot.S文件在干什么

 1    
 2    #include <inc/mmu.h>
 3    
 4    # Start the CPU: switch to 32-bit protected mode, jump into C.
 5    # The BIOS loads this code from the first sector of the hard disk into
 6    # memory at physical address 0x7c00 and starts executing in real mode
 7    # with %cs=0 %ip=7c00.
 8    
 9    .set PROT_MODE_CSEG, 0x8         # kernel code segment selector
10    .set PROT_MODE_DSEG, 0x10        # kernel data segment selector
11    .set CR0_PE_ON,      0x1         # protected mode enable flag
12    
13    .globl start
14    start:
15      .code16                     # Assemble for 16-bit mode
16      cli                         # Disable interrupts
17      cld                         # String operations increment
18    
19      # Set up the important data segment registers (DS, ES, SS).
20      xorw    %ax,%ax             # Segment number zero
21      movw    %ax,%ds             # - Data Segment
22      movw    %ax,%es             # - Extra Segment
23      movw    %ax,%ss             # - Stack Segment
24    
25      # Enable A20:
26      #   For backwards compatibility with the earliest PCs, physical
27      #   address line 20 is tied low, so that addresses higher than
28      #   1MB wrap around to zero by default.  This code undoes this.
29    seta20.1:
30      inb     $0x64,%al               # Wait for not busy
31      testb   $0x2,%al
32      jnz     seta20.1
33    
34      movb    $0xd1,%al               # 0xd1 - port 0x64
35      outb    %al,$0x64
36    
37    seta20.2:
38      inb     $0x64,%al               # Wait for not busy
39      testb   $0x2,%al
40      jnz     seta20.2
41    
42      movb    $0xdf,%al               # 0xdf - port 0x60
43      outb    %al,$0x60
44    
45      # Switch from real to protected mode, using a bootstrap GDT
46      # and segment translation that makes virtual addresses 
47      # identical to their physical addresses, so that the 
48      # effective memory map does not change during the switch.
49      lgdt    gdtdesc  # lgdt means load global descriptor table
50      movl    %cr0, x
51      orl     $CR0_PE_ON, x  # cr0 = cr0 | 1
52      movl    x, %cr0
53      
54      # Jump to next instruction, but in 32-bit code segment.
55      # Switches processor into 32-bit mode.
56      ljmp    $PROT_MODE_CSEG, $protcseg
57    
58      .code32                     # Assemble for 32-bit mode
59    protcseg:
60      # Set up the protected-mode data segment registers
61      movw    $PROT_MODE_DSEG, %ax    # Our data segment selector
62      movw    %ax, %ds                # - DS: Data Segment
63      movw    %ax, %es                # - ES: Extra Segment
64      movw    %ax, %fs                # - FS
65      movw    %ax, %gs                # - GS
66      movw    %ax, %ss                # - SS: Stack Segment
67      
68      # Set up the stack pointer and call into C.
69      movl    $start, %esp
70      call bootmain
71    
72      # If bootmain returns (it shouldn't), loop.
73    spin:
74      jmp spin
75    
76    # Bootstrap GDT
77    .p2align 2                                # force 4 byte alignment
78    gdt:
79      SEG_NULL                              # null seg
80      SEG(STA_X|STA_R, 0x0, 0xffffffff)     # code seg
81      SEG(STA_W, 0x0, 0xffffffff)           # data seg
82    
83    gdtdesc:
84      .word   0x17                            # sizeof(gdt) - 1
85      .long   gdt                             # address gdt
86
87

第一次看到这段代码的时候感觉Enable A20这一部分比较喵(ling)喵(ren)喵(fei)喵(jie)

可以参考A20 - a pain from the past。重点是

One sets the output port of the keyboard controller by first writing 0xd1 to port 0x64, and the the desired value of the output port to port 0x60. One usually sees the values 0xdd and 0xdf used to disable/enable A20.

然后比较让人疑惑的可能是"bootstrap GDT”这部分。参考cs421 x86 Assembly Guide尤其是:

 1    .data		
 2    var:		
 3    .byte 64	/* Declare a byte, referred to as location var, containing the value 64. */
 4    .byte 10	/* Declare a byte with no label, containing the value 10. Its location is var + 1. */
 5    x:		
 6    .short 42	/* Declare a 2-byte value initialized to 42, referred to as location x. */
 7    y:		
 8    .long 30000    	/* Declare a 4-byte value, referred to as location y, initialized to 30000. */
 9    
10    
11    s:		
12    .long 1, 2, 3	/* Declare three 4-byte values, initialized to 1, 2, and 3. 
13    The value at location s + 8 will be 3. */
14    barr:		
15    .zero 10	/* Declare 10 bytes starting at location barr, initialized to 0. */
16    str:		
17    .string "hello"   	/* Declare 6 bytes starting at the address str initialized to 
18    the ASCII character values for hello followed by a nul (0) byte. */
19

知道gdtdesc部分做的事情是,在gdtdesc这个位置定义了一个word类型(2字节)的变量,值为0x17,参考注释也就是gdt定义的那一段的size大小。然后在gdtdsec+2这个位置定义了long类型(4字节)的gdt地址.

这里gdt和gdtdesc都是"label",label其实就是标记了一个内存地址,方便使用。

具体来说,一个“label”的值,是其之后的第一条instruction的内存地址。

We use the notation 

然后是关于gdt部分,SEG看起来是个宏,我们看到inc/mmu.h这个文件中相关的部分,豁然开朗。

 1    #ifdef __ASSEMBLER__
 2    
 3    /*
 4     * Macros to build GDT entries in assembly.
 5     */
 6    #define SEG_NULL                                                \
 7            .word 0, 0;                                             \
 8            .byte 0, 0, 0, 0
 9    #define SEG(type,base,lim)                                      \
10            .word (((lim) > 12) & 0xffff), ((base) & 0xffff);      \
11            .byte (((base) > 16) & 0xff), (0x90 | (type)),         \
12                    (0xC0 | (((lim) > 28) & 0xf)), (((base) > 24) & 0xff)
13    
14    #else   // not __ASSEMBLER__
15

接下来不太明确的地方可能是cr0部分。

我们看到代码最开始有一个CR0_PE_ON,值为0x1.之后就是在计算cr0 = cr0 | 0x1,按照注释说这样就可以把保护模式打开了。理解到这里其实就ok,不过我还是想多说两句。 Control register是用来控制cpu行为的寄存器。cr0是x86体系架构的Control register中的一个。cr0是32bit的寄存器,其中一些bit上有名称以及固定的作用。比如对于位置bit 0,该位置的名称是"Protected Mode Enable",简称为PE,当该位置值为1,表示保护模式被打开。

最后一个小细节是".globl start"。".globl"是什么含义?为什么要把start这个label定义成global的?可以参考What is global _start in assembly language? 用人话说就是定义成.globl的lable会被导出到生成的.o文件中,不然linker找不到这个符号。由于start是这个boot.S文件的entry point,因此需要linker看到。

最后,从全局来看,boot.S这个文件做了什么呢? 其实上面一个小节中已经提到了。

_When a processor that supports x86 protected mode is powered on, it begins executing instructions in [real mode](https://en.wikipedia.org/wiki/Real_mode), in order to maintain [backward compatibility](https://en.wikipedia.org/wiki/Backward_compatibility) with earlier x86 processors.[[4]](https://en.wikipedia.org/wiki/Protected_mode#cite_note-Real_mode_on_powered_on-4) Protected mode may only be entered after the system software sets up one descriptor table and enables the Protection Enable (PE) [bit](https://en.wikipedia.org/wiki/Bit) in the [control register](https://en.wikipedia.org/wiki/Control_register) 0 (CR0)_

boot/main.c这个文件在干什么

  1    
  2    #include <inc/x86.h                                                                                                                                                                          
  3    #include <inc/elf.h                                                                                                                                                                          
  4                                                                                                                                                                                                  
  5    /**********************************************************************                                                                                                                       
  6     * This a dirt simple boot loader, whose sole job is to boot                                                                                                                                  
  7     * an ELF kernel image from the first IDE hard disk.                                                                                                                                          
  8     *                                                                                                                                                                                            
  9     * DISK LAYOUT                                                                                                                                                                                
 10     *  * This program(boot.S and main.c) is the bootloader.  It should                                                                                                                           
 11     *    be stored in the first sector of the disk.                                                                                                                                              
 12     *                                                                                                                                                                                            
 13     *  * The 2nd sector onward holds the kernel image.                                                                                                                                           
 14     *                                                                                                                                                                                            
 15     *  * The kernel image must be in ELF format.                                                                                                                                                 
 16     *                                                                                                      c                                                                                      
 17     * BOOT UP STEPS                                                                                                                                                                              
 18     *  * when the CPU boots it loads the BIOS into memory and executes it                                                                                                                        
 19     *                                                                                                                                                                                            
 20     *  * the BIOS intializes devices, sets of the interrupt routines, and
 21     *    reads the first sector of the boot device(e.g., hard-drive)
 22     *    into memory and jumps to it.
 23     *
 24     *  * Assuming this boot loader is stored in the first sector of the
 25     *    hard-drive, this code takes over...
 26     *
 27     *  * control starts in boot.S -- which sets up protected mode,
 28     *    and a stack so C code then run, then calls bootmain()
 29     *
 30     *  * bootmain() in this file takes over, reads in the kernel and jumps to it.
 31     **********************************************************************/
 32    
 33    #define SECTSIZE        512
 34    #define ELFHDR          ((struct Elf *) 0x10000) // scratch space
 35    
 36    void readsect(void*, uint32_t);
 37    void readseg(uint32_t, uint32_t, uint32_t);
 38    
 39    void
 40    bootmain(void)
 41    {
 42            struct Proghdr *ph, *eph;
 43    
 44            // read 1st page off disk
 45            readseg((uint32_t) ELFHDR, SECTSIZE*8, 0);
 46    
 47            // is this a valid ELF?
 48            if (ELFHDR->e_magic != ELF_MAGIC)
 49                    goto bad;
 50    
 51            // load each program segment (ignores ph flags)
 52            ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
 53            eph = ph + ELFHDR->e_phnum;
 54            for (; ph < eph; ph++)
 55                    // p_pa is the load address of this segment (as well
 56                    // as the physical address)
 57                    readseg(ph->p_pa, ph->p_memsz, ph->p_offset);
 58    
 59            // call the entry point from the ELF header
 60            // note: does not return!
 61            ((void (*)(void)) (ELFHDR->e_entry))();
 62    
 63    bad:
 64            outw(0x8A00, 0x8A00);
 65            outw(0x8A00, 0x8E00);
 66            while (1)
 67                    /* do nothing */;
 68    }
 69    
 70    // Read 'count' bytes at 'offset' from kernel into physical address 'pa'.
 71    // Might copy more than asked
 72    void
 73    readseg(uint32_t pa, uint32_t count, uint32_t offset)
 74    {
 75            uint32_t end_pa;
 76    
 77            end_pa = pa + count;
 78    
 79            // round down to sector boundary
 80            pa &= ~(SECTSIZE - 1);
 81    
 82            // translate from bytes to sectors, and kernel starts at sector 1
 83            offset = (offset / SECTSIZE) + 1;
 84    
 85            // If this is too slow, we could read lots of sectors at a time.
 86            // We'd write more to memory than asked, but it doesn't matter --
 87            // we load in increasing order.
 88            while (pa < end_pa) {
 89                    // Since we haven't enabled paging yet and we're using
 90                    // an identity segment mapping (see boot.S), we can
 91                    // use physical addresses directly.  This won't be the
 92                    // case once JOS enables the MMU.
 93                    readsect((uint8_t*) pa, offset);
 94                    pa += SECTSIZE;
 95                    offset++;
 96            }
 97    }
 98    
 99    void
100    waitdisk(void)
101    {
102            // wait for disk reaady
103            while ((inb(0x1F7) & 0xC0) != 0x40)
104                    /* do nothing */;
105    }
106    
107    void
108    readsect(void *dst, uint32_t offset)
109    {
110            // wait for disk to be ready
111            waitdisk();
112    
113            outb(0x1F2, 1);         // count = 1
114            outb(0x1F3, offset);
115            outb(0x1F4, offset > 8);
116            outb(0x1F5, offset > 16);
117            outb(0x1F6, (offset > 24) | 0xE0);
118            outb(0x1F7, 0x20);      // cmd 0x20 - read sectors
119    
120            // wait for disk to be ready
121            waitdisk();
122    
123            // read a sector
124            insl(0x1F0, dst, SECTSIZE/4);
125    }
126

先注意到一些看起来像是汇编指令的东西...比如outb之类。查看inc/x86.h文件,找到他们的定义。

 1    static inline void
 2    outb(int port, uint8_t data)
 3    {
 4            asm volatile("outb %0,%w1" : : "a" (data), "d" (port));
 5    }
 6    
 7    
 8    static inline void
 9    insl(int port, void *addr, int cnt)
10    {
11            asm volatile("cld\n\trepne\n\tinsl"
12                         : "=D" (addr), "=c" (cnt)
13                         : "d" (port), "0" (addr), "1" (cnt)
14                         : "memory", "cc");
15    }
16    static inline uint8_t
17    inb(int port)
18    {       
19            uint8_t data;
20            asm volatile("inb %w1,%0" : "=a" (data) : "d" (port));
21            return data;
22    }
23

发现就是用c将汇编封装了一层。这个东西应该叫“inline assembly”,具体可以参考Brennan's Guide to Inline Assembly 其中volatile关键字表示禁止gcc优化这段代码。

If your assembly statement _must_ execute where you put it, (i.e. must not be moved out of a loop as an optimization), put the keyword **volatile** after **asm** and before the ()'s. To be ultra-careful, use

asm volatile (...whatever...);

However, I would like to point out that if your assembly's only purpose is to calculate the output registers, with no other side effects, you should leave off the volatile keyword so your statement will be processed into GCC's common subexpression elimination optimization.

注释上写的要"boot  an ELF kernel image from the first IDE hard disk",那么,首先要知道什么是ELF. ELF其实就是一种文件格式,全称为“Executable and Linkable Format”可以参考Executable_and_Linkable_Format#File_layout,建议通读这一部分,内容不多,不过对之后很有用。

参考一下inc/elf.h文件,以及main.c中的注释,就可以整体上知道这段代码是在干什么了:将ELF格式的kernel image从硬盘读到内存中,并将控制权交给kernel image.

 1   #ifndef JOS_INC_ELF_H
 2   #define JOS_INC_ELF_H
 3   
 4   #define ELF_MAGIC 0x464C457FU	/* "\x7FELF" in little endian */
 5   
 6   struct Elf {
 7   	uint32_t e_magic;	// must equal ELF_MAGIC
 8   	uint8_t e_elf[12];
 9   	/* e_elf[0] 1 for signed 32 bit , 2 for signed 64-bit
10   	        [1] 1 for little endianness ,2 for big endianness
11                   [2] version type
12                   [3] target OS
13                   [4] ABI version
14                   [5..11]  unused
15   	*/
16   	uint16_t e_type;     // object file type
17   	uint16_t e_machine;  // instruction set arch , x86/MIPS/IA-64 and etc.
18   	uint32_t e_version; 
19   	uint32_t e_entry;    // the memory address of the entry point where process start executing.
20   	uint32_t e_phoff;    // points to the start of the program header table.
21   	uint32_t e_shoff;    // Points to the start of the section header table.
22   	uint32_t e_flags;  
23   	uint16_t e_ehsize;   // size of this header. 64byte for 64-bit,52bytes for 32-bit
24   	uint16_t e_phentsize; // the size of a program header table entry.
25   	uint16_t e_phnum;    // the number of entries in the program header table.
26   	uint16_t e_shentsize; // the size of a section  header table entry.
27   	uint16_t e_shnum;    // the number of entries in the section header table.
28   	uint16_t e_shstrndx; 
29   };
30   
31   struct Proghdr {
32   	uint32_t p_type;    // type of the segment
33   	uint32_t p_offset;  //  offset of the segment in the file image
34   	uint32_t p_va;      // virtual address of the segment in memory
35   	uint32_t p_pa;      // physical address for segment(?)
36   	uint32_t p_filesz;  // Size in bytes of the segment in the file image. May be 0.
37   	uint32_t p_memsz;   // Size in bytes of the segment in memory. May be 0.
38   	uint32_t p_flags;
39   	uint32_t p_align;   // 0 and 1 specify no alignment. Otherwise should be a positive, integral power of 2
40   };
41   
42   struct Secthdr {
43   	uint32_t sh_name; // An offset to a string in the .shstrtab section that represents the name of this section
44   	uint32_t sh_type; // the type of this header
45   	uint32_t sh_flags; // the attributes of the section
46   	uint32_t sh_addr; // Virtual address of the section in memory
47   	uint32_t sh_offset;  // Offset of the section in the file image
48   	uint32_t sh_size;    // Size in bytes of the section in the file image. May be 0.
49   	uint32_t sh_link;    // 
50   	uint32_t sh_info;
51   	uint32_t sh_addralign;
52   	uint32_t sh_entsize;
53           /*
54             Contains the size, in bytes, of each entry, for sections that contain fixed-size entries. 
55             Otherwise, this field contains zero.
56            */
57   };
58   
59   // Values for Proghdr::p_type
60   #define ELF_PROG_LOAD		1
61   
62   // Flag bits for Proghdr::p_flags
63   #define ELF_PROG_FLAG_EXEC	1
64   #define ELF_PROG_FLAG_WRITE	2
65   #define ELF_PROG_FLAG_READ	4
66   
67   // Values for Secthdr::sh_type
68   #define ELF_SHT_NULL		0
69   #define ELF_SHT_PROGBITS	1
70   #define ELF_SHT_SYMTAB		2
71   #define ELF_SHT_STRTAB		3
72   
73   // Values for Secthdr::sh_name
74   #define ELF_SHN_UNDEF		0
75   
76   #endif /* !JOS_INC_ELF_H */

下面说几个细节。我们知道readsect是在读一个扇区,但是我怎么知道扇区是这样读的?可以参考ATA_PIO_Mode的x86 Directions部分

第二个细节是“((void (*)(void)) (ELFHDR->e_entry))()”,乍一看有点不明觉厉,其实就是一个函数指针,e_entry是入口函数的地址。通知调用该函数,将控制权交给elf格式的kernel image.

接下来我们看一下根据编译boot.s和main.c得到的反汇编文件

  1
  2    obj/boot/boot.out:     file format elf32-i386                                                                                                                                                 
  3                                                                                                                                                                                                  
  4                                                                                                                                                                                                  
  5    Disassembly of section .text:                                                                                                                                                                 
  6                                                                                                                                                                                                  
  7    00007c00 <start>:                                                                                                                                                                             
  8    .set CR0_PE_ON,      0x1         # protected mode enable flag                                                                                                                                 
  9                                                                                                                                                                                                  
 10    .globl start                                                                                                                                                                                  
 11    start:                                                                                                                                                                                        
 12      .code16                     # Assemble for 16-bit mode                                                                                                                                      
 13      cli                         # Disable interrupts                                                                                                                                            
 14        7c00:       fa                      cli                                                                                                                                                   
 15      cld                         # String operations increment                                                                                                                                   
 16        7c01:       fc                      cld                                                                                                                                                   
 17                                                                                                                                                                                                  
 18      # Set up the important data segment registers (DS, ES, SS).                                                                                                                                 
 19      xorw    %ax,%ax             # Segment number zero                                                                                                                                           
 20        7c02:       31 c0                   xor    x,x                                                                                                                                      
 21      movw    %ax,%ds             # - Data Segment                                                                                                                                               
 22        7c04:       8e d8                   mov    x,%ds                                                                                                                                       
 23      movw    %ax,%es             # - Extra Segment                                                                                                                                              
 24        7c06:       8e c0                   mov    x,%es                                                                                                                                       
 25      movw    %ax,%ss             # - Stack Segment                                                                                                                                              
 26        7c08:       8e d0                   mov    x,%ss                                                                                                                                       
 27                                                                                                                                                                                                  
 28    00007c0a <seta20.1>:
 29      # Enable A20:
 30      #   For backwards compatibility with the earliest PCs, physical
 31      #   address line 20 is tied low, so that addresses higher than
 32      #   1MB wrap around to zero by default.  This code undoes this.
 33    seta20.1:
 34      inb     $0x64,%al               # Wait for not busy
 35        7c0a:       e4 64                   in     $0x64,%al
 36      testb   $0x2,%al
 37        7c0c:       a8 02                   test   $0x2,%al
 38      jnz     seta20.1
 39        7c0e:       75 fa                   jne    7c0a <seta20.1>
 40    
 41      movb    $0xd1,%al               # 0xd1 - port 0x64
 42        7c10:       b0 d1                   mov    $0xd1,%al
 43      outb    %al,$0x64
 44        7c12:       e6 64                   out    %al,$0x64
 45    
 46    00007c14 <seta20.2>:
 47    
 48    seta20.2:
 49      inb     $0x64,%al               # Wait for not busy
 50        7c14:       e4 64                   in     $0x64,%al
 51      testb   $0x2,%al
 52        7c16:       a8 02                   test   $0x2,%al
 53      jnz     seta20.2
 54        7c18:       75 fa                   jne    7c14 <seta20.2>
 55    
 56      movb    $0xdf,%al               # 0xdf - port 0x60
 57        7c1a:       b0 df                   mov    $0xdf,%al
 58      outb    %al,$0x60
 59        7c1c:       e6 60                   out    %al,$0x60
 60    
 61      # Switch from real to protected mode, using a bootstrap GDT
 62      # and segment translation that makes virtual addresses 
 63      # identical to their physical addresses, so that the 
 64      # effective memory map does not change during the switch.
 65      lgdt    gdtdesc  # lgdt means load global descriptor table
 66        7c1e:       0f 01 16                lgdtl  (%esi)
 67        7c21:       64 7c 0f                fs jl  7c33 <protcseg+0x1>
 68      movl    %cr0, x
 69        7c24:       20 c0                   and    %al,%al
 70      orl     $CR0_PE_ON, x  # crx = crx | 1
 71        7c26:       66 83 c8 01             or     $0x1,%ax
 72      movl    x, %cr0
 73        7c2a:       0f 22 c0                mov    x,%cr0
 74      
 75      # Jump to next instruction, but in 32-bit code segment.
 76      # Switches processor into 32-bit mode.
 77      ljmp    $PROT_MODE_CSEG, $protcseg
 78        7c2d:       ea                      .byte 0xea
 79        7c2e:       32 7c 08 00             xor    0x0(x,x,1),%bh
 80    
 81    00007c32 <protcseg>:
 82    
 83      .code32                     # Assemble for 32-bit mode
 84    protcseg:
 85      # Set up the protected-mode data segment registers
 86      movw    $PROT_MODE_DSEG, %ax    # Our data segment selector
 87        7c32:       66 b8 10 00             mov    $0x10,%ax
 88      movw    %ax, %ds                # - DS: Data Segment
 89        7c36:       8e d8                   mov    x,%ds
 90      movw    %ax, %es                # - ES: Extra Segment
 91        7c38:       8e c0                   mov    x,%es
 92      movw    %ax, %fs                # - FS
 93        7c3a:       8e e0                   mov    x,%fs
 94      movw    %ax, %gs                # - GS
 95        7c3c:       8e e8                   mov    x,%gs
 96      movw    %ax, %ss                # - SS: Stack Segment
 97        7c3e:       8e d0                   mov    x,%ss
 98      
 99      # Set up the stack pointer and call into C.
100      movl    $start, %esp
101        7c40:       bc 00 7c 00 00          mov    $0x7c00,%esp
102      call bootmain
103        7c45:       e8 c0 00 00 00          call   7d0a <bootmain>
104    
105    00007c4a <spin>:
106    
107      # If bootmain returns (it shouldn't), loop.
108    spin:
109      jmp spin
110        7c4a:       eb fe                   jmp    7c4a <spin>
111    
112    00007c4c <gdt>:
113            ...
114        7c54:       ff                      (bad)  
115        7c55:       ff 00                   incl   (x)
116        7c57:       00 00                   add    %al,(x)
117        7c59:       9a cf 00 ff ff 00 00    lcall  $0x0,$0xffff00cf
118        7c60:       00                      .byte 0x0
119        7c61:       92                      xchg   x,x
120        7c62:       cf                      iret   
121            ...
122    
123    00007c64 <gdtdesc>:
124        7c64:       17                      pop    %ss
125        7c65:       00 4c 7c 00             add    %cl,0x0(%esp,i,2)
126            ...
127    
128    00007c6a <waitdisk>:
129            }
130    }
131    
132    void
133    waitdisk(void)
134    {
135        7c6a:       55                      push   p
136    
137    static inline uint8_t
138    inb(int port)
139    {
140            uint8_t data;
141            asm volatile("inb %w1,%0" : "=a" (data) : "d" (port));
142        7c6b:       ba f7 01 00 00          mov    $0x1f7,x
143        7c70:       89 e5                   mov    %esp,p
144        7c72:       ec                      in     (%dx),%al
145            // wait for disk reaady
146            while ((inb(0x1F7) & 0xC0) != 0x40)
147        7c73:       83 e0 c0                and    $0xffffffc0,x
148        7c76:       3c 40                   cmp    $0x40,%al
149        7c78:       75 f8                   jne    7c72 <waitdisk+0x8>
150                    /* do nothing */;
151    }
152        7c7a:       5d                      pop    p
153        7c7b:       c3                      ret    
154    
155    00007c7c <readsect>:
156    
157    void
158    readsect(void *dst, uint32_t offset)
159    {
160        7c7c:       55                      push   p
161        7c7d:       89 e5                   mov    %esp,p
162        7c7f:       57                      push   i
163        7c80:       53                      push   x
164        7c81:       8b 5d 0c                mov    0xc(p),x
165            // wait for disk to be ready
166            waitdisk();
167        7c84:       e8 e1 ff ff ff          call   7c6a <waitdisk>
168    }
169    
170    static inline void
171    outb(int port, uint8_t data)
172    {
173            asm volatile("outb %0,%w1" : : "a" (data), "d" (port));
174        7c89:       ba f2 01 00 00          mov    $0x1f2,x
175        7c8e:       b0 01                   mov    $0x1,%al
176        7c90:       ee                      out    %al,(%dx)
177        7c91:       0f b6 c3                movzbl %bl,x
178        7c94:       b2 f3                   mov    $0xf3,%dl
179        7c96:       ee                      out    %al,(%dx)
180        7c97:       0f b6 c7                movzbl %bh,x
181        7c9a:       b2 f4                   mov    $0xf4,%dl
182        7c9c:       ee                      out    %al,(%dx)
183    
184            outb(0x1F2, 1);         // count = 1
185            outb(0x1F3, offset);
186            outb(0x1F4, offset > 8);
187            outb(0x1F5, offset > 16);
188        7c9d:       89 d8                   mov    x,x
189        7c9f:       b2 f5                   mov    $0xf5,%dl
190        7ca1:       c1 e8 10                shr    $0x10,x
191        7ca4:       0f b6 c0                movzbl %al,x
192        7ca7:       ee                      out    %al,(%dx)
193            outb(0x1F6, (offset > 24) | 0xE0);
194        7ca8:       c1 eb 18                shr    $0x18,x
195        7cab:       b2 f6                   mov    $0xf6,%dl
196        7cad:       88 d8                   mov    %bl,%al
197        7caf:       83 c8 e0                or     $0xffffffe0,x
198        7cb2:       ee                      out    %al,(%dx)
199        7cb3:       b0 20                   mov    $0x20,%al
200        7cb5:       b2 f7                   mov    $0xf7,%dl
201        7cb7:       ee                      out    %al,(%dx)
202            outb(0x1F7, 0x20);      // cmd 0x20 - read sectors
203    
204            // wait for disk to be ready
205            waitdisk();
206        7cb8:       e8 ad ff ff ff          call   7c6a <waitdisk>
207    }
208    
209    static inline void
210    insl(int port, void *addr, int cnt)
211    {
212            asm volatile("cld\n\trepne\n\tinsl"
213        7cbd:       8b 7d 08                mov    0x8(p),i
214        7cc0:       b9 80 00 00 00          mov    $0x80,x
215        7cc5:       ba f0 01 00 00          mov    $0x1f0,x
216        7cca:       fc                      cld    
217        7ccb:       f2 6d                   repnz insl (%dx),%es:(i)
218    
219            // read a sector
220            insl(0x1F0, dst, SECTSIZE/4);
221    }
222        7ccd:       5b                      pop    x
223        7cce:       5f                      pop    i
224        7ccf:       5d                      pop    p
225        7cd0:       c3                      ret    
226    
227    00007cd1 <readseg>:
228    
229    // Read 'count' bytes at 'offset' from kernel into physical address 'pa'.
230    // Might copy more than asked
231    void
232    readseg(uint32_t pa, uint32_t count, uint32_t offset)
233    {
234        7cd1:       55                      push   p
235        7cd2:       89 e5                   mov    %esp,p
236        7cd4:       57                      push   i
237            uint32_t end_pa;
238    
239            end_pa = pa + count;
240        7cd5:       8b 7d 0c                mov    0xc(p),i
241    
242    // Read 'count' bytes at 'offset' from kernel into physical address 'pa'.
243    // Might copy more than asked
244    void
245    readseg(uint32_t pa, uint32_t count, uint32_t offset)
246    {
247        7cd8:       56                      push   %esi
248        7cd9:       8b 75 10                mov    0x10(p),%esi
249        7cdc:       53                      push   x
250        7cdd:       8b 5d 08                mov    0x8(p),x
251    
252            // round down to sector boundary
253            pa &= ~(SECTSIZE - 1);
254    
255            // translate from bytes to sectors, and kernel starts at sector 1
256            offset = (offset / SECTSIZE) + 1;
257        7ce0:       c1 ee 09                shr    $0x9,%esi
258    void
259    readseg(uint32_t pa, uint32_t count, uint32_t offset)
260    {
261            uint32_t end_pa;
262    
263            end_pa = pa + count;
264        7ce3:       01 df                   add    x,i
265    
266            // round down to sector boundary
267            pa &= ~(SECTSIZE - 1);
268    
269            // translate from bytes to sectors, and kernel starts at sector 1
270            offset = (offset / SECTSIZE) + 1;
271        7ce5:       46                      inc    %esi
272            uint32_t end_pa;
273    
274            end_pa = pa + count;
275    
276            // round down to sector boundary
277            pa &= ~(SECTSIZE - 1);
278        7ce6:       81 e3 00 fe ff ff       and    $0xfffffe00,x
279            offset = (offset / SECTSIZE) + 1;
280    
281            // If this is too slow, we could read lots of sectors at a time.
282            // We'd write more to memory than asked, but it doesn't matter --
283            // we load in increasing order.
284            while (pa < end_pa) {
285        7cec:       39 fb                   cmp    i,x
286        7cee:       73 12                   jae    7d02 <readseg+0x31>
287                    // Since we haven't enabled paging yet and we're using
288                    // an identity segment mapping (see boot.S), we can
289                    // use physical addresses directly.  This won't be the
290                    // case once JOS enables the MMU.
291                    readsect((uint8_t*) pa, offset);
292        7cf0:       56                      push   %esi
293                    pa += SECTSIZE;
294                    offset++;
295        7cf1:       46                      inc    %esi
296            while (pa < end_pa) {
297                    // Since we haven't enabled paging yet and we're using
298                    // an identity segment mapping (see boot.S), we can
299                    // use physical addresses directly.  This won't be the
300                    // case once JOS enables the MMU.
301                    readsect((uint8_t*) pa, offset);
302        7cf2:       53                      push   x
303                    pa += SECTSIZE;
304        7cf3:       81 c3 00 02 00 00       add    $0x200,x
305            while (pa < end_pa) {
306                    // Since we haven't enabled paging yet and we're using
307                    // an identity segment mapping (see boot.S), we can
308                    // use physical addresses directly.  This won't be the
309                    // case once JOS enables the MMU.
310                    readsect((uint8_t*) pa, offset);
311        7cf9:       e8 7e ff ff ff          call   7c7c <readsect>
312                    pa += SECTSIZE;
313                    offset++;
314        7cfe:       58                      pop    x
315        7cff:       5a                      pop    x
316        7d00:       eb ea                   jmp    7cec <readseg+0x1b>
317            }
318    }
319        7d02:       8d 65 f4                lea    -0xc(p),%esp
320        7d05:       5b                      pop    x
321        7d06:       5e                      pop    %esi
322        7d07:       5f                      pop    i
323        7d08:       5d                      pop    p
324        7d09:       c3                      ret    
325    
326    00007d0a <bootmain>:
327    void readsect(void*, uint32_t);
328    void readseg(uint32_t, uint32_t, uint32_t);
329    
330    void
331    bootmain(void)
332    {
333        7d0a:       55                      push   p
334        7d0b:       89 e5                   mov    %esp,p
335        7d0d:       56                      push   %esi
336        7d0e:       53                      push   x
337            struct Proghdr *ph, *eph;
338    
339            // read 1st page off disk
340            readseg((uint32_t) ELFHDR, SECTSIZE*8, 0);
341        7d0f:       6a 00                   push   $0x0
342        7d11:       68 00 10 00 00          push   $0x1000
343        7d16:       68 00 00 01 00          push   $0x10000
344        7d1b:       e8 b1 ff ff ff          call   7cd1 <readseg>
345    
346            // is this a valid ELF?
347            if (ELFHDR->e_magic != ELF_MAGIC)
348        7d20:       83 c4 0c                add    $0xc,%esp
349        7d23:       81 3d 00 00 01 00 7f    cmpl   $0x464c457f,0x10000
350        7d2a:       45 4c 46 
351        7d2d:       75 38                   jne    7d67 <bootmain+0x5d>
352                    goto bad;
353    
354            // load each program segment (ignores ph flags)
355            ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
356        7d2f:       a1 1c 00 01 00          mov    0x1001c,x
357        7d34:       8d 98 00 00 01 00       lea    0x10000(x),x
358            eph = ph + ELFHDR->e_phnum;
359        7d3a:       0f b7 05 2c 00 01 00    movzwl 0x1002c,x
360        7d41:       c1 e0 05                shl    $0x5,x
361        7d44:       8d 34 03                lea    (x,x,1),%esi
362            for (; ph < eph; ph++)
363        7d47:       39 f3                   cmp    %esi,x
364        7d49:       73 16                   jae    7d61 <bootmain+0x57>
365                    // p_pa is the load address of this segment (as well
366                    // as the physical address)
367                    readseg(ph->p_pa, ph->p_memsz, ph->p_offset);
368        7d4b:       ff 73 04                pushl  0x4(x)
369                    goto bad;
370    
371            // load each program segment (ignores ph flags)
372            ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
373            eph = ph + ELFHDR->e_phnum;
374            for (; ph < eph; ph++)
375        7d4e:       83 c3 20                add    $0x20,x
376                    // p_pa is the load address of this segment (as well
377                    // as the physical address)
378                    readseg(ph->p_pa, ph->p_memsz, ph->p_offset);
379        7d51:       ff 73 f4                pushl  -0xc(x)
380        7d54:       ff 73 ec                pushl  -0x14(x)
381        7d57:       e8 75 ff ff ff          call   7cd1 <readseg>
382                    goto bad;
383    
384            // load each program segment (ignores ph flags)
385            ph = (struct Proghdr *) ((uint8_t *) ELFHDR + ELFHDR->e_phoff);
386            eph = ph + ELFHDR->e_phnum;
387            for (; ph < eph; ph++)
388        7d5c:       83 c4 0c                add    $0xc,%esp
389        7d5f:       eb e6                   jmp    7d47 <bootmain+0x3d>
390                    // as the physical address)
391                    readseg(ph->p_pa, ph->p_memsz, ph->p_offset);
392    
393            // call the entry point from the ELF header
394            // note: does not return!
395            ((void (*)(void)) (ELFHDR->e_entry))();
396        7d61:       ff 15 18 00 01 00       call   *0x10018
397    }
398    
399    static inline void
400    outw(int port, uint16_t data)
401    {
402            asm volatile("outw %0,%w1" : : "a" (data), "d" (port));
403        7d67:       ba 00 8a 00 00          mov    $0x8a00,x
404        7d6c:       b8 00 8a ff ff          mov    $0xffff8a00,x
405        7d71:       66 ef                   out    %ax,(%dx)
406        7d73:       b8 00 8e ff ff          mov    $0xffff8e00,x
407        7d78:       66 ef                   out    %ax,(%dx)
408        7d7a:       eb fe                   jmp    7d7a <bootmain+0x70>
409    
410

可以看到,上面的代码是从0x7c00开始执行的,而用gdb调试发现BIOS执行的第一条指令的位置其实是在0xf000:0xfff0  那么问题来了...CS段是什么时候从0xf000到0的呢? 在0x7c00之前,BIOS是在做什么呢?

我们用gdb看一下这一部分的代码:

 1    [f000:fff0]    0xffff0: ljmp   $0xf000,$0xe05b
 2    [f000:e05b]    0xfe05b: cmpl   $0x0,%cs:0x6ac8
 3    [f000:e062]    0xfe062: jne    0xfd2e1
 4    [f000:e066]    0xfe066: xor    %dx,%dx
 5    [f000:e068]    0xfe068: mov    %dx,%ss
 6    [f000:e06a]    0xfe06a: mov    $0x7000,%esp
 7    [f000:e070]    0xfe070: mov    $0xf34c2,x
 8    [f000:e076]    0xfe076: jmp    0xfd15c
 9    [f000:d15c]    0xfd15c: mov    x,x
10    [f000:d15f]    0xfd15f: cli  
11    [f000:d160]    0xfd160: cld 
12    [f000:d161]    0xfd161: mov    $0x8f,x
13    [f000:d167]    0xfd167: out    %al,$0x70
14    [f000:d169]    0xfd169: in     $0x71,%al
15    [f000:d16b]    0xfd16b: in     $0x92,%al
16    [f000:d16d]    0xfd16d: or     $0x2,%al
17    [f000:d16f]    0xfd16f: out    %al,$0x92
18    [f000:d171]    0xfd171: lidtw  %cs:0x6ab8
19    [f000:d177]    0xfd177: lgdtw  %cs:0x6a74
20    [f000:d17d]    0xfd17d: mov    %cr0,x
21    [f000:d180]    0xfd180: or     $0x1,x
22    [f000:d184]    0xfd184: mov    x,%cr0
23    [f000:d187]    0xfd187: ljmpl  $0x8,$0xfd18f
24    
25    The target architecture is assumed to be i386
26    0xfd18f:     mov    $0x10,x
27    0xfd194:     mov    x,%ds
28    0xfd196:     mov    x,%es
29    0xfd198:     mov    x,%ss
30    0xfd19a:     mov    x,%fs
31    0xfd19c:     mov    x,%gs
32    0xfd19e:     mov    x,x
33    0xfd1a0:     jmp    *x
34    0xf34c2:     push   x
35    0xf34c3:     sub    $0x2c,%esp
36    0xf34c6:     movl   $0xf5b5c,0x4(%esp)
37    0xf34ce:     movl   $0xf447b,(%esp)
38    0xf34d5:     call   0xf099e
39    0xf099e:     lea    0x8(%esp),x
40    0xf09a2:     mov    0x4(%esp),x
41    0xf09a6:     mov    $0xf5b58,x
42    0xf09ab:     call   0xf0574
43    0xf0574:     push   p
44    0xf0575:     push   i
45    0xf0576:     push   %esi
46    0xf0577:     push   x
47    0xf0578:     sub    $0xc,%esp
48    0xf057b:     mov    x,0x4(%esp)
49    0xf057f:     mov    x,p
50    0xf0581:     mov    x,%esi
51    0xf0583:     movsbl 0x0(p),x
52    0xf0587:     test   %dl,%dl
53    0xf0589:     je     0xf0758
54    0xf058f:     cmp    $0x25,%dl
55    0xf0592:     jne    0xf0741
56    0xf0741:     mov    0x4(%esp),x
57    
58    

其中的lidtw是加载向量描述表(load interrupt descriptor table), lgdtw是加载全局描述表(global descriptor table,GDT) 可以参考 LGDT/LIDT -- Load Global/Interrupt Descriptor Table Register

第16,17行的0x70,0x71可以参考CMOS#Accessing_CMOS_Registers,虽然我觉得这太细节了,不看也罢。

18-20行的内容,是快速enbale A20的方法,可以参考A20_Line

然后第21-26行...似曾相识啊..这不就是启动protected mode的步骤吗...

可是这还没有加载boot loader啊..怎么就进入protected mode了呢。。参考bootloader - switching processor to protected mode,发现有些BIOS在实现的时候,会在加载boot loader之前,先短暂进入保护模式,目的可能是为了使用在保护模式下的一些特性(比如32-bit的register),然后在进入bootloader之前,再切换回实模式。 以及据某6.828学习群大佬说...在进入boot loader之前进入保护模式的方法和boot loader中进入保护模式的方法是不一样的...进入保护模式的方法一共有四种... 感觉太过细节,暂且不去关心了。

第26行之后的代码...抱歉我也不是很懂...看起来无关紧要,如果之后发现这段是重要的再说。

来回答一下几个问题吧。

  * At what point does the processor start executing 32-bit code? What exactly causes the switch from 16- to 32-bit mode?

开始执行32-bit code是从位置0x7c32,执行的命令为mov    $0x10,%ax 从16-bit mode转化到32-bit mode是将control register 0 的 第1位(PE)设置为1导致的。

  * What is the _last_ instruction of the boot loader executed, and what is the _first_ instruction of the kernel it just loaded?

boot loader执行的最后一条指令是0x7d61:      call   0x10018  ,对应的c语言代码是 ((void ()(void)) (ELFHDR->e_entry))();   kernel加载后执行的第一条指令为 movw   $0x1234,0x472

  * _Where_ is the first instruction of the kernel?

kernel的第一条指令的地址为0x10000c

  * How does the boot loader decide how many sectors it must read in order to fetch the entire kernel from disk? Where does it find this information?

boot loader先读一小部分kernel,具体来说是8个sector,也就是1 page,对应的代码为 readseg((uint32_t) ELFHDR, SECTSIZE*8, 0); 然后读进来的这部分里面包含了整个kernel有多大的信息,这些信息存储在inc/elf.h文件中。

Loading the Kernel

练习4提到了要熟悉c语言的指针..去看了下推荐的"The C Programming Language "..发现真是一本非常棒的入门书...之前还以为是像《算法导论》一样只可远观的大部头...可惜已经不适初学者了... 练习4中给出了一段使用c语言指针的代码,第5个输出要注意一下大小端...

 1    #include <stdio.h>
 2    #include <stdlib.h>
 3    
 4    void
 5    f(void)
 6    {
 7        int a[4];
 8        int *b = malloc(16);
 9        int *c;
10        int i;
11    
12        printf("1: a = %p, b = %p, c = %p\n", a, b, c);
13    
14        c = a;
15        for (i = 0; i < 4; i++)
16    	a[i] = 100 + i;
17        c[0] = 200;
18        printf("2: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
19    	   a[0], a[1], a[2], a[3]);
20    
21        c[1] = 300;
22        *(c + 2) = 301;
23        3[c] = 302;
24        printf("3: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
25    	   a[0], a[1], a[2], a[3]);
26    
27        c = c + 1;
28        *c = 400;
29        printf("4: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
30    	   a[0], a[1], a[2], a[3]);
31    
32        c = (int *) ((char *) c + 1);
33        *c = 500;
34        printf("5: a[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n",
35    	   a[0], a[1], a[2], a[3]);
36    
37        b = (int *) a + 1;
38        c = (int *) ((char *) a + 1);
39        printf("6: a = %p, b = %p, c = %p\n", a, b, c);
40    }
41    
42    int
43    main(int ac, char **av)
44    {
45        f();
46        return 0;
47    }
48    
49

在继续之前,需要仔细看一下elf文件的内容ELF

ELF文件

elf文件分成了很多个section,通常.data section存放初始化的global/static variable,.text 存放代码,.rodata section 用来存放字符串常量,.bss section用来存放未初始化的global/static variabel.  .bss section没有对应的变量内容,原因是未初始化的变量按照规定会默认为0,因此没必要再存一次。“Thus there is no need to store contents for .bss in the ELF binary; instead, the linker records just the address and size of the .bss section. The loader or the program itself must arrange to zero the.bss section.”

我们比较关心的是.data section, .text section, .rodata section

我们可以用 objdump -h 命令查看一个ELF文件的 section header,

 1    objdump -h obj/kern/kernel
 2    
 3    obj/kern/kernel:     file format elf32-i386
 4    
 5    Sections:
 6    Idx Name          Size      VMA       LMA       File off  Algn
 7      0 .text         00001917  f0100000  00100000  00001000  2**4
 8                      CONTENTS, ALLOC, LOAD, READONLY, CODE
 9      1 .rodata       00000714  f0101920  00101920  00002920  2**5
10                      CONTENTS, ALLOC, LOAD, READONLY, DATA
11      2 .stab         00003889  f0102034  00102034  00003034  2**2
12                      CONTENTS, ALLOC, LOAD, READONLY, DATA
13      3 .stabstr      000018af  f01058bd  001058bd  000068bd  2**0
14                      CONTENTS, ALLOC, LOAD, READONLY, DATA
15      4 .data         0000a300  f0108000  00108000  00009000  2**12
16                      CONTENTS, ALLOC, LOAD, DATA
17      5 .bss          00000648  f0112300  00112300  00013300  2**5
18                      CONTENTS, ALLOC, LOAD, DATA
19      6 .comment      00000023  00000000  00000000  00013948  2**0
20                      CONTENTS, READONLY
21

其中size是这个section的大小,VMA (Virtual Memory Address,6.828中叫link address) 是section开始执行时所在的memory address,LMA (Load Memory Address)是这个section被加载到memory中所处的位置。通常这两个地址是一样的。

boot loader使用elf文件中的program header来决定如何记载section, program header指明了ELF文件的哪一部分需要记载到memory中,以及加载到memory的什么位置。我们可以用bjdump -x obj/kern/kernel查看ELF的全部header文件

练习5 Trace through the first few instructions of the boot loader again and identify the first instruction that would "break" or otherwise do the wrong thing if you were to get the boot loader's link address wrong. Then change the link address in boot/Makefrag to something wrong, run make clean, recompile the lab with make, and trace into the boot loader again to see what happens. Don't forget to change the link address back and make clean again afterward!

把boot loader的link address从0x7c00改成了0x9c00... 然后进入gdb单步调试。

发现lgdtw的参数出现了负数 [ 0:7c1e] = 0x7c1e: lgdtw -0x639c  ,然后继续执行,到[ 0:7c2d] = 0x7c2d: ljmp $0x8,$0x9c32  ,发生了crash.

我们观察到生成的boot.asm文件,地址确实是从0x9c00开始了。

 1    protcseg:
 2      # Set up the protected-mode data segment registers
 3      movw    $PROT_MODE_DSEG, %ax    # Our data segment selector
 4        9c32:       66 b8 10 00             mov    $0x10,%ax
 5      movw    %ax, %ds                # - DS: Data Segment
 6        9c36:       8e d8                   mov    x,%ds
 7      movw    %ax, %es                # - ES: Extra Segment
 8        9c38:       8e c0                   mov    x,%es
 9      movw    %ax, %fs                # - FS
10        9c3a:       8e e0                   mov    x,%fs
11      movw    %ax, %gs                # - GS
12        9c3c:       8e e8                   mov    x,%gs
13      movw    %ax, %ss                # - SS: Stack Segment
14        9c3e:       8e d0                   mov    x,%ss
15    
16      # Set up the stack pointer and call into C.
17      movl    $start, %esp
18        9c40:       bc 00 9c 00 00          mov    $0x9c00,%esp
19      call bootmain
20        9c45:       e8 c0 00 00 00          call   9d0a <bootmain>
21    
22    00009c4a <spin>:
23    
24      # If bootmain returns (it shouldn't), loop.
25    spin:
26      jmp spin
27        9c4a:       eb fe                   jmp    9c4a <spin>

但是实际上。。BIOS仍然把boot loader记载到了0x7c00....这是约定俗成吗? BIOS无视Boot loader的link address,直接加载到0x7c00?   没有找到相关资料,有待进一步探寻。

练习6 Reset the machine (exit QEMU/GDB and start them again). Examine the 8 words of memory at 0x00100000 at the point the BIOS enters the boot loader, and then again at the point the boot loader enters the kernel. Why are they different? What is there at the second breakpoint? (You do not really need to use QEMU to answer this question. Just think.)

这个问题是问,BIOS进入boot loader时(也就是在0x7c00时)和boot loader进入kernel时(0x10000c),地址0x00100000开始的8个word单位的值,为什么不同。

0x7c00时,0x00100000处的8个word的值都为0...

在0x10000c时,0x00100000处的值翻译成指令之后是:

0x100000:    add    0x1bad(x),%dh                                                                                                                                    │·······································
   0x100006:    add    %al,(x)                                                                                                                                          │·······································
   0x100008:    decb   0x52(i)                                                                                                                                          │·······································
   0x10000b:    in     $0x66,%al                                                                                                                                           │·······································
   0x10000d:    movl   $0xb81234,0x472                                                                                                                                     │·······································
   0x100017:    add    %dl,(x)                                                                                                                                          │·······································
   0x100019:    add    %cl,(i)                                                                                                                                          │·······································
   0x10001b:    and    %al,%bl

不一样的原因是,在刚刚进入boot loader时,kernel还没有加载进内存,因此是空的.

Part 3: The Kernel

Using virtual memory to work around position dependence

OS的kernel通常喜欢运行再较高地址的虚拟内存中,比如0xf0100000,为的是低地址留给用户程序。但是有的机器可能没有那么大的memory,因此不存在0xf0100000这个物理地址。因此这里需要做一个虚拟内存到物理内存的映射。在这个部分实验中,我们不需要至少地址映射是如何work的,只需要知道效果就好。

具体来说,当CR0_PG被置为1之前,内存地址为物理内存地址(严格地说,其实是线性地址,不过在boot/boot.S中做了线性地址到物理地址的等价映射),当CRO_PG flag被置为1之后,地址就变成了虚拟内存地址。我们可以用gdb调试看一下发生了什么。

Exercise 7.  Use QEMU and GDB to trace into the JOS kernel and stop at the `movl x, %cr0`. Examine memory at 0x00100000 and at 0xf0100000. Now, single step over that instruction using the stepi GDB command. Again, examine memory at 0x00100000 and at 0xf0100000. Make sure you understand what just happened.

What is the first instruction after the new mapping is established that would fail to work properly if the mapping weren't in place? Comment out the movl x, %cr0 in kern/entry.S, trace into it, and see if you were right.

先用b *0x10000c处设置断点,这个是JOS kernel开始运行的地址。然后单步几步,在movl x , %cr0处停留,也就是cr0_PG flag恰好也被制为1之前。观察一下0x00100000和0xf0100000的内容:

 1    (gdb) x/8x 0xf0100000                                                                                                 
 2    0xf0100000 <_start+4026531828>: 0x00000000      0x00000000      0x00000000      0x00000000                                                                                 
 3    0xf0100010 <entry+4>:   0x00000000      0x00000000      0x00000000      0x00000000 
 4    
 5    x/8i 0x00100000                                                                                                                                                      │·······································
 6       0x100000:    add    0x1bad(x),%dh                                                                                                                                    │·······································
 7       0x100006:    add    %al,(x)                                                                                                                                          │·······································
 8       0x100008:    decb   0x52(i)                                                                                                                                          │·······································
 9       0x10000b:    in     $0x66,%al                                                                                                                                           │·······································
10       0x10000d:    movl   $0xb81234,0x472                                                                                                                                     │·······································
11       0x100017:    add    %dl,(x)                                                                                                                                          │·······································
12       0x100019:    add    %cl,(i)                                                                                                                                          │·······································
13       0x10001b:    and    %al,%bl
14
15

然后接着单步一次,再次用x/8i观察8条0x00100000和0xf0100000处的内容

(gdb) x/8i 0x00100000                                                                                                                                                      │·······································
   0x100000:    add    0x1bad(x),%dh                                                                                                                                    │·······································
   0x100006:    add    %al,(x)                                                                                                                                          │·······································
   0x100008:    decb   0x52(i)                                                                                                                                          │·······································
   0x10000b:    in     $0x66,%al                                                                                                                                           │·······································
   0x10000d:    movl   $0xb81234,0x472                                                                                                                                     │·······································
   0x100017:    add    %dl,(x)                                                                                                                                          │·······································
   0x100019:    add    %cl,(i)                                                                                                                                          │·······································
   0x10001b:    and    %al,%bl                                                                                                                                             │·······································
(gdb) x/8i 0xf0100000                                                                                                                                                      │·······································
   0xf0100000 <_start+4026531828>:      add    0x1bad(x),%dh                                                                                                            │·······································
   0xf0100006 <_start+4026531834>:      add    %al,(x)                                                                                                                  │·······································
   0xf0100008 <_start+4026531836>:      decb   0x52(i)                                                                                                                  │·······································
   0xf010000b <_start+4026531839>:      in     $0x66,%al                                                                                                                   │·······································
   0xf010000d <entry+1>:        movl   $0xb81234,0x472                                                                                                                     │·······································
   0xf0100017 <entry+11>:       add    %dl,(x)                                                                                                                          │·······································
   0xf0100019 <entry+13>:       add    %cl,(i)                                                                                                                          │·······································
   0xf010001b <entry+15>:       and    %al,%bl

可以观察到,在cx0_PG flag被置为1之前,地址0xf0100000处是一片虚无。

置为1之后,地址0xf0100000处的内容和0x00100000处的内容一致。需要注意,此时这两个地址都是虚拟内存地址了。具体来说

Once `CR0_PG` is set, memory references are virtual addresses that get translated by the virtual memory hardware to physical addresses. `entry_pgdir` translates virtual addresses in the range 0xf0000000 through 0xf0400000 to physical addresses 0x00000000 through 0x00400000, as well as virtual addresses 0x00000000 through 0x00400000 to physical addresses 0x00000000 through 0x00400000

然后我们注释掉movl x, %cr0 in kern/entry.S

再次用gdb调试,发现0x10002a: jmp *x  crash了。 原因显然是由于没有开启保护模式,eax的地址值不合法。

Formatted Printing to the Console

printf的格式化输出并不是天生就有的,首先阅读一下相关的几个代码。kern/printf.c, kern/console.c和lib/printfmt.c

Exercise 8. We have omitted a small fragment of code - the code necessary to print octal numbers using patterns of the form "%o". Find and fill in this code fragment.

很简单,修改之后代码为

1    case 'o':
2    			// Replace this with your code.
3    			num = getuint(&ap,lflag);
4    			base = 8;
5    			goto number;

接下来来回答几个问题

  1. Explain the interface between printf.c and console.c. Specifically, what function does console.c export? How is this function used by printf.c?

printf.c与console.c的接口是console.c中的cputchar(),作用是向console中打印一个字符。printf.c在patch()函数中使用了cputchar()

2.Explain the following from console.c:

 1     
 2     if (crt_pos >= CRT_SIZE) {
 3              int i;
 4              memmove(crt_buf, crt_buf + CRT_COLS, (CRT_SIZE - CRT_COLS) * sizeof(uint16_t));
 5              for (i = CRT_SIZE - CRT_COLS; i < CRT_SIZE; i++)
 6              crt_buf[i] = 0x0700 | ' ';
 7              crt_pos -= CRT_COLS;
 8     }
 9 
10 

这段代码很显然,含义是屏幕的字符数超过了屏幕能显示的最大数目的情况下,将第二行到最后一行的字符整体上移一行(这样原先的第一行就被覆盖了),然后将最后一行的内容清空(因为已经上移到倒数第二行了)   应该是类似屏幕滚动的效果

3. For the following questions you might wish to consult the notes for Lecture 2. These notes cover GCC's calling convention on the x86.

Trace the execution of the following code step-by-step:

1
2     int x = 1, y = 3, z = 4;
3     cprintf("x %d, y %x, z %d\n", x, y, z);
  * In the call to `cprintf()`, to what does `fmt` point? To what does `ap` point?
  * List (in order of execution) each call to `cons_putc`, `va_arg`, and `vcprintf`. For `cons_putc`, list its argument as well. For `va_arg`, list what `ap` points to before and after the call. For `vcprintf` list the values of its two arguments.

这个问题的解答可以先参考一下c语言变长参数x86 calling conventions

我们先看一下print.c的代码:

 1      static void
 2      putch(int ch, int *cnt)
 3      {
 4      	cputchar(ch);
 5      	*cnt++;
 6      }
 7
 8      int
 9      vcprintf(const char *fmt, va_list ap)
10      {
11      	int cnt = 0;
12      
13      	vprintfmt((void*)putch, &cnt, fmt, ap);
14      	return cnt;
15      }
16      
17      int
18      cprintf(const char *fmt, ...)
19      {
20      	va_list ap;
21      	int cnt;
22      
23      	va_start(ap, fmt);
24      	cnt = vcprintf(fmt, ap);
25      	va_end(ap);
26      
27      	return cnt;
28      }

从int cprintf(const char fmt, ...)开始看,参数fmt应该就是 我们熟悉的c语言的printf的格式化部分,也就是第一个参数。

然后整体就是c语言变长参数的routine,但是没有使用va_arg, 而是用cnt = cvprintf(fmt,ap),返回了一个不知道什么的个数。

接下来看int vcprintf(const char *fmt, va_list ap),好像没什么好看的.... 然后是vprintfmt,代码如下:

  1      void
  2      vprintfmt(void (*putch)(int, void*), void *putdat, const char *fmt, va_list ap)
  3      {
  4      	register const char *p;
  5      	register int ch, err;
  6      	unsigned long long num;
  7      	int base, lflag, width, precision, altflag;
  8      	char padc;
  9      
 10      	while (1) {
 11      		while ((ch = *(unsigned char *) fmt++) != '%') {
 12      			if (ch == '\0')
 13      				return;
 14      			putch(ch, putdat);
 15      		}
 16      
 17      		// Process a %-escape sequence
 18      		padc = ' ';
 19      		width = -1;
 20      		precision = -1;
 21      		lflag = 0;
 22      		altflag = 0;
 23      	reswitch:
 24      		switch (ch = *(unsigned char *) fmt++) {
 25      
 26      		// flag to pad on the right
 27      		case '-':
 28      			padc = '-';
 29      			goto reswitch;
 30      
 31      		// flag to pad with 0's instead of spaces
 32      		case '0':
 33      			padc = '0';
 34      			goto reswitch;
 35      
 36      		// width field
 37      		case '1':
 38      		case '2':
 39      		case '3':
 40      		case '4':
 41      		case '5':
 42      		case '6':
 43      		case '7':
 44      		case '8':
 45      		case '9':
 46      			for (precision = 0; ; ++fmt) {
 47      				precision = precision * 10 + ch - '0';
 48      				ch = *fmt;
 49      				if (ch < '0' || ch   '9')
 50      					break;
 51      			}
 52      			goto process_precision;
 53      
 54      		case '*':
 55      			precision = va_arg(ap, int);
 56      			goto process_precision;
 57      
 58      		case '.':
 59      			if (width < 0)
 60      				width = 0;
 61      			goto reswitch;
 62      
 63      		case '#':
 64      			altflag = 1;
 65      			goto reswitch;
 66      
 67      		process_precision:
 68      			if (width < 0)
 69      				width = precision, precision = -1;
 70      			goto reswitch;
 71      
 72      		// long flag (doubled for long long)
 73      		case 'l':
 74      			lflag++;
 75      			goto reswitch;
 76      
 77      		// character
 78      		case 'c':
 79      			putch(va_arg(ap, int), putdat);
 80      			break;
 81      
 82      		// error message
 83      		case 'e':
 84      			err = va_arg(ap, int);
 85      			if (err < 0)
 86      				err = -err;
 87      			if (err  = MAXERROR || (p = error_string[err]) == NULL)
 88      				printfmt(putch, putdat, "error %d", err);
 89      			else
 90      				printfmt(putch, putdat, "%s", p);
 91      			break;
 92      
 93      		// string
 94      		case 's':
 95      			if ((p = va_arg(ap, char *)) == NULL)
 96      				p = "(null)";
 97      			if (width   0 && padc != '-')
 98      				for (width -= strnlen(p, precision); width   0; width--)
 99      					putch(padc, putdat);
100      			for (; (ch = *p++) != '\0' && (precision < 0 || --precision  = 0); width--)
101      				if (altflag && (ch < ' ' || ch   '~'))
102      					putch('?', putdat);
103      				else
104      					putch(ch, putdat);
105      			for (; width   0; width--)
106      				putch(' ', putdat);
107      			break;
108      
109      		// (signed) decimal
110      		case 'd':
111      			num = getint(&ap, lflag);
112      			if ((long long) num < 0) {
113      				putch('-', putdat);
114      				num = -(long long) num;
115      			}
116     			base = 10;
117     			goto number;
118     
119     		// unsigned decimal
120     		case 'u':
121     			num = getuint(&ap, lflag);
122     			base = 10;
123     			goto number;
124     
125     		// (unsigned) octal
126     		case 'o':
127     			// Replace this with your code.
128     			putch('X', putdat);
129     			putch('X', putdat);
130     			putch('X', putdat);
131     			break;
132     
133     		// pointer
134     		case 'p':
135     			putch('0', putdat);
136     			putch('x', putdat);
137     			num = (unsigned long long)
138     				(uintptr_t) va_arg(ap, void *);
139     			base = 16;
140     			goto number;
141     
142     		// (unsigned) hexadecimal
143     		case 'x':
144     			num = getuint(&ap, lflag);
145     			base = 16;
146     		number:
147     			printnum(putch, putdat, num, base, width, padc);
148     			break;
149     
150     		// escaped '%' character
151     		case '%':
152     			putch(ch, putdat);
153     			break;
154     
155     		// unrecognized escape sequence - just print it literally
156     		default:
157     			putch('%', putdat);
158     			for (fmt--; fmt[-1] != '%'; fmt--)
159     				/* do nothing */;
160     			break;
161     		}
162     	}
163     }
164 

大致扫一眼可以发现这段代码是处理输出的格式化参数的,包括输出类型,精度,场宽之类。

我们注意到putch函数的作用是向console输出一个字符,并统计当前累计的输出字符个数。

接下来我们来回答问题:

  * 在cprintf的调用中,fmt指向的是"x %d, y %x, z %d\n", ap指向的是第一个变长参数,也就是变量x在调用栈中的地址。
  * cons_putc调用的过程按先后顺序为:

    * cons_putc('x')
    * cons_putc(' ')
    * cons_putc('1')
    * cons_putc(',')
    * cons_putc(' ')
    * cons_putc('y')
    * cons_putc(' ')
    * cons_putc('3')
    * cons_putc(',')
    * cons_putc(' ')
    * cons_putc('z')
    * cons_putc(' ')
    * cons_putc('4')
    * cons_putc('\n')


  * va_arg一共调用了三次

    * 第一次调用前,ap指向参数x在栈中的地址,调用之后,ap指向参数y在栈中的地址。
    * 第二次调用前,ap指向参数y在栈中的地址,调用之后,ap指向参数z在栈中的地址。
    * 第三次调用前,ap指向参数z在栈中的地址,调用之后,ap指向参数z之后4字节的地址。


  * vcprintf的参数值为"x %d, y %x, z %d\n" 和 参数x在调用栈中的地址。

4.Run the following code.

1         unsigned int i = 0x00646c72;
2         cprintf("H%x Wo%s", 57616, &i);
3     

What is the output? Explain how this output is arrived at in the step-by-step manner of the previous exercise. Here's an ASCII table that maps bytes to characters.

The output depends on that fact that the x86 is little-endian. If the x86 were instead big-endian what would you set i to in order to yield the same output? Would you need to change 57616 to a different value?

输出结果为  "He110 World" 前半部分的e110就是57616的十六进制表示。后半部分将unsiged int i 当成unsigned char类型输出,十六进制64,6c,72对应的字符分别为‘d’,‘l’,'r'.

然后先复习一下字节序。整数类型static_cast不会有字节序问题,指针++和--操作不涉及cast和字节序问题。把指针类型reinterpret_cast才会有字节序问题,例如:

1    
2    int a = 0x12345678
3    char *c = reinterpret_cast<char*>(&a);
4    printf("%x %x %x %x\n",c[0],c[1],c[2],c[3]);
5    //小端输出:78 56 34 12
6    //大端输出:12 34 56 78

由于x86体系架构字节序为little-endian,因此实际输出为'r','l','d'.

如果x86体系架构为large-endian,那么i的值应该改为0x00726c64,以实现相同的输出结果。

57616不需要做修改,因为整数类型staic_cast不存在字节序问题。

5.In the following code, what is going to be printed after `'y='`? (note: the answer is not a specific value.) Why does this happen?
1         cprintf("x=%d y=%d", 3);

x的结果就是3,y的输出是没意义的一个整数。原因是,这句话会发生当va_list中没有下一个变量时,仍然使用va_arg去取下一个变量。而根据va_arg,此时的行为是undefined behaviour.

6.Let's say that GCC changed its calling convention so that it pushed arguments on the stack in declaration order, so that the last argument is pushed last. How would you have to change `cprintf` or its interface so that it would still be possible to pass it a variable number of arguments?

感觉如果知识修改cprintf来达到目的有点难? 因为压栈顺序和之前相反了,那么va_arg这个宏需要修改一下...或者,添加一个buffer,不是一次处理一个参数,而是先将参数全部读取,然后调换顺序,之后再进行处理。

The Stack

Exercise 9. Determine where the kernel initializes its stack, and exactly where in memory its stack is located. How does the kernel reserve space for its stack? And at which "end" of this reserved area is the stack pointer initialized to point to?

参考obj/kernel.asm

 1    f010002c <relocated>:
 2    relocated:
 3    
 4            # Clear the frame pointer register (EBP)
 5            # so that once we get into debugging C code,
 6            # stack backtraces will be terminated properly.
 7            movl    $0x0,p                       # nuke frame pointer
 8    f010002c:       bd 00 00 00 00          mov    $0x0,p
 9    
10            # Set the stack pointer
11            movl    $(bootstacktop),%esp
12    f0100031:       bc 00 00 11 f0          mov    $0xf0110000,%esp

得知kernel初始化stack是在地址0xf010002c和0xf0100031完成的。stack被加载到了地址0xf01100000. 至于kernel如何为stack保留空间这个问题,我的理解是,stack现在有了初始位置,但是它如何知道自己有多大空间呢? 换句话说,这个问题问的是kernel如何决定stack的大小。这一部分其实定义在inc/memlayout.h中,

 1    // All physical memory mapped at this address
 2    #define	KERNBASE	0xF0000000
 3    
 4    // At IOPHYSMEM (640K) there is a 384K hole for I/O.  From the kernel,
 5    // IOPHYSMEM can be addressed at KERNBASE + IOPHYSMEM.  The hole ends
 6    // at physical address EXTPHYSMEM.
 7    #define IOPHYSMEM	0x0A0000
 8    #define EXTPHYSMEM	0x100000
 9    
10    // Kernel stack.
11    #define KSTACKTOP	KERNBASE
12    #define KSTKSIZE	(8*PGSIZE)   		// size of a kernel stack
13    #define KSTKGAP		(8*PGSIZE)   		// size of a kernel stack guard
14    
15    // Memory-mapped IO.
16    #define MMIOLIM		(KSTACKTOP - PTSIZE)
17    #define MMIOBASE	(MMIOLIM - PTSIZE)

最后一个问题,由于x86体系架构下栈是向下增长的。因此stack pointer初始指向这段保留区域的大地址端(也就是上面)

Exercise 10. To become familiar with the C calling conventions on the x86, find the address of the `test_backtrace` function in obj/kern/kernel.asm, set a breakpoint there, and examine what happens each time it gets called after the kernel starts. How many 32-bit words does each recursive nesting level of `test_backtrace` push on the stack, and what are those words?

Note that, for this exercise to work properly, you should be using the patched version of QEMU available on the tools page or on Athena. Otherwise, you'll have to manually translate all breakpoint and memory addresses to linear addresses.

test_backtrace的入口地址在0xf0100040,在这里设置断点,然后最后的输出结果如下:

entering test_backtrace 5
entering test_backtrace 4
entering test_backtrace 3
entering test_backtrace 2
entering test_backtrace 1
entering test_backtrace 0
leaving test_backtrace 0
leaving test_backtrace 1
leaving test_backtrace 2
leaving test_backtrace 3
leaving test_backtrace 4
leaving test_backtrace 5
Welcome to the JOS kernel monitor!

对于每次调用函数test_backtrace,有三个32-bit的变量被压栈,可以参考

 1    // Test the stack backtrace function (lab 1 only)
 2    void
 3    test_backtrace(int x)
 4    {
 5    f0100040:       55                      push   p
 6    f0100041:       89 e5                   mov    %esp,p
 7    f0100043:       53                      push   x
 8    f0100044:       83 ec 14                sub    $0x14,%esp
 9    f0100047:       8b 5d 08                mov    0x8(p),x
10            cprintf("entering test_backtrace %d\n", x);
11    f010004a:       89 5c 24 04             mov    x,0x4(%esp)
12    f010004e:       c7 04 24 e0 18 10 f0    movl   $0xf01018e0,(%esp)
13    f0100055:       e8 d7 08 00 00          call   f0100931 <cprintf>
14            if (x  0)
15    f010005a:       85 db                   test   x,x
16    f010005c:       7e 0d                   jle    f010006b <test_backtrace+0x2b>
17                    test_backtrace(x-1);
18    f010005e:       8d 43 ff                lea    -0x1(x),x
19    f0100061:       89 04 24                mov    x,(%esp)
20    f0100064:       e8 d7 ff ff ff          call   f0100040 <test_backtrace>
21    f0100069:       eb 1c                   jmp    f0100087 <test_backtrace+0x47>
22            else
23                    mon_backtrace(0, 0, 0);
24    f010006b:       c7 44 24 08 00 00 00    movl   $0x0,0x8(%esp)
25    f0100072:       00 
26    f0100073:       c7 44 24 04 00 00 00    movl   $0x0,0x4(%esp)
27    f010007a:       00 
28    f010007b:       c7 04 24 00 00 00 00    movl   $0x0,(%esp)
29    f0100082:       e8 18 07 00 00          call   f010079f <mon_backtrace>
30            cprintf("leaving test_backtrace %d\n", x);
31    f0100087:       89 5c 24 04             mov    x,0x4(%esp)
32    f010008b:       c7 04 24 fc 18 10 f0    movl   $0xf01018fc,(%esp)
33    f0100092:       e8 9a 08 00 00          call   f0100931 <cprintf>
34    }
35    f0100097:       83 c4 14                add    $0x14,%esp
36    f010009a:       5b                      pop    x
37    f010009b:       5d                      pop    p
38    f010009c:       c3                      ret

分别是参数x,ebp和ebx. 参数x和ebp的压栈是常规操作,就不解释了。ebx的压栈可能有些疑问,可以参考Why are these registers pushed to stack?

下一个练习:

Exercise 11. Implement the backtrace function as specified above. Use the same format as in the example, since otherwise the grading script will be confused. When you think you have it working right, run make grade to see if its output conforms to what our grading script expects, and fix it if it doesn't. _After_ you have handed in your Lab 1 code, you are welcome to change the output format of the backtrace function any way you like.

If you use read_ebp(), note that GCC may generate "optimized" code that calls read_ebp() beforemon_backtrace()'s function prologue, which results in an incomplete stack trace (the stack frame of the most recent function call is missing). While we have tried to disable optimizations that cause this reordering, you may want to examine the assembly of mon_backtrace() and make sure the call toread_ebp() is happening after the function prologue.

这个练习主要参考x86-calling-conventions, 主要是需要知道ebp的内容是上一个stack frame中的ebp,以及ebp+4是返回地址,ebp+8是第一个参数,还有ebp的初始值是0.

最后的实现为:

 1    int
 2    mon_backtrace(int argc, char **argv, struct Trapframe *tf)
 3    {
 4            // Your code here.
 5            uint32_t *ebp = (uint32_t*)read_ebp();
 6            int i ;
 7            while (ebp)
 8            {
 9                    cprintf("ebp x  eip x  ",ebp,*(ebp+1));
10                    cprintf("args");
11                    for ( i = 2 ; i < 7 ; i++)
12                    {
13                            cprintf(" x",*(ebp+i));
14                    }
15                    cprintf("\n");
16                    ebp = (uint32_t*)*ebp;
17            }
18            return 0;
19    }
20

然后是最后一个练习:

Exercise 12. Modify your stack backtrace function to display, for each eip, the function name, source file name, and line number corresponding to that eip.

In debuginfo_eip, where do _STAB* come from? This question has a long answer; to help you to discover the answer, here are some things you might want to do:

  * look in the file kern/kernel.ld for __STAB_*
  * run objdump -h obj/kern/kernel
  * run objdump -G obj/kern/kernel
  * run gcc -pipe -nostdinc -O2 -fno-builtin -I. -MD -Wall -Wno-format -DJOS_KERNEL -gstabs -c -S kern/init.c, and look at init.s.
  * see if the bootloader loads the symbol table in memory as part of loading the kernel binary

Complete the implementation of debuginfo_eip by inserting the call to stab_binsearch to find the line number for an address.

Add a backtrace command to the kernel monitor, and extend your implementation of mon_backtrace to call debuginfo_eip and print a line for each stack frame of the form:

 K backtrace
 Stack backtrace:
   ebp f010ff78  eip f01008ae  args 00000001 f010ff8c 00000000 f0110580 00000000
          kern/monitor.c:143: monitor+106
   ebp f010ffd8  eip f0100193  args 00000000 00001aac 00000660 00000000 00000000
          kern/init.c:49: i386_init+59
   ebp f010fff8  eip f010003d  args 00000000 00000000 0000ffff 10cf9a00 0000ffff
          kern/entry.S:70: <unknown>+0
 K 

Each line gives the file name and line within that file of the stack frame's eip, followed by the name of the function and the offset of the eip from the first instruction of the function (e.g., monitor+106 means the return eip is 106 bytes past the beginning of monitor).

Be sure to print the file and function names on a separate line, to avoid confusing the grading script.

Tip: printf format strings provide an easy, albeit obscure, way to print non-null-terminated strings like those in STABS tables. printf("%.*s", length, string) prints at most length characters of string. Take a look at the printf man page to find out why this works.

You may find that some functions are missing from the backtrace. For example, you will probably see a call to monitor() but not to runcmd(). This is because the compiler in-lines some function calls. Other optimizations may cause you to see unexpected line numbers. If you get rid of the -O2 fromGNUMakefile, the backtraces may make more sense (but your kernel will run more slowly).

需要先了解一下stab,简单来说是一种调试数据格式。具体可以参考stabs 和 调试 DWARF 和 STAB 格式 。

objdump -h obj/kern/kernel的输出为

obj/kern/kernel:     file format elf32-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00001937  f0100000  00100000  00001000  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata       0000079c  f0101940  00101940  00002940  2**5
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .stab         000038e9  f01020dc  001020dc  000030dc  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .stabstr      000018f0  f01059c5  001059c5  000069c5  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .data         0000a300  f0108000  00108000  00009000  2**12
                  CONTENTS, ALLOC, LOAD, DATA
  5 .bss          00000648  f0112300  00112300  00013300  2**5
                  CONTENTS, ALLOC, LOAD, DATA
  6 .comment      00000023  00000000  00000000  00013948  2**0
                  CONTENTS, READONLY

我们可以看到stabstr段的link address(VMA)为f01059c5.

然后用gdb调试,先断点到0x10000c,也就是bootloader记载kernel的位置。然后再单步执行几步,直到开启保护模式。此时查看 地址f01059c5,结果如下,说明boot loader在加载kernel的同时也将符号表加载到了内存中

(gdb) x/8s 0xf01059c5
0xf01059c5:     ""
0xf01059c6:     "{standard input}"
0xf01059d7:     "kern/entry.S"
0xf01059e4:     "kern/entrypgdir.c"
0xf01059f6:     "gcc2_compiled."
0xf0105a05:     "int:t(0,1)=r(0,1);-2147483648;2147483647;"
0xf0105a2f:     "char:t(0,2)=r(0,2);0;127;"
0xf0105a49:     "long int:t(0,3)=r(0,3);-2147483648;2147483647;"

接下来先看一下我们要补全的kern/kdebug.c文件

  1    int
  2    debuginfo_eip(uintptr_t addr, struct Eipdebuginfo *info)
  3    {
  4    	const struct Stab *stabs, *stab_end;
  5    	const char *stabstr, *stabstr_end;
  6    	int lfile, rfile, lfun, rfun, lline, rline;
  7    
  8    	// Initialize *info
  9    	info->eip_file = "<unknown>";
 10    	info->eip_line = 0;
 11    	info->eip_fn_name = "<unknown>";
 12    	info->eip_fn_namelen = 9;
 13    	info->eip_fn_addr = addr;
 14    	info->eip_fn_narg = 0;
 15    
 16    	// Find the relevant set of stabs
 17    	if (addr >= ULIM) {
 18    		stabs = __STAB_BEGIN__;
 19    		stab_end = __STAB_END__;
 20    		stabstr = __STABSTR_BEGIN__;
 21    		stabstr_end = __STABSTR_END__;
 22    	} else {
 23    		// Can't search for user-level addresses yet!
 24      	        panic("User address");
 25    	}
 26    
 27    	// String table validity checks
 28    	if (stabstr_end <= stabstr || stabstr_end[-1] != 0)
 29    		return -1;
 30    
 31    	// Now we find the right stabs that define the function containing
 32    	// 'eip'.  First, we find the basic source file containing 'eip'.
 33    	// Then, we look in that source file for the function.  Then we look
 34    	// for the line number.
 35    
 36    	// Search the entire set of stabs for the source file (type N_SO).
 37    	lfile = 0;
 38    	rfile = (stab_end - stabs) - 1;
 39    	stab_binsearch(stabs, &lfile, &rfile, N_SO, addr);
 40    	if (lfile == 0)
 41    		return -1;
 42    
 43    	// Search within that file's stabs for the function definition
 44    	// (N_FUN).
 45    	lfun = lfile;
 46    	rfun = rfile;
 47    	stab_binsearch(stabs, &lfun, &rfun, N_FUN, addr);
 48    
 49    	if (lfun <= rfun) {
 50    		// stabs[lfun] points to the function name
 51    		// in the string table, but check bounds just in case.
 52    		if (stabs[lfun].n_strx < stabstr_end - stabstr)
 53    			info->eip_fn_name = stabstr + stabs[lfun].n_strx;
 54    		info->eip_fn_addr = stabs[lfun].n_value;
 55    		addr -= info->eip_fn_addr;
 56    		// Search within the function definition for the line number.
 57    		lline = lfun;
 58    		rline = rfun;
 59    	} else {
 60    		// Couldn't find function stab!  Maybe we're in an assembly
 61    		// file.  Search the whole file for the line number.
 62    		info->eip_fn_addr = addr;
 63    		lline = lfile;
 64    		rline = rfile;
 65    	}
 66    	// Ignore stuff after the colon.
 67    	info->eip_fn_namelen = strfind(info->eip_fn_name, ':') - info->eip_fn_name;
 68    
 69    
 70    	// Search within [lline, rline] for the line number stab.
 71    	// If found, set info->eip_line to the right line number.
 72    	// If not found, return -1.
 73    	//
 74    	// Hint:
 75    	//	There's a particular stabs type used for line numbers.
 76    	//	Look at the STABS documentation and <inc/stab.h to find
 77    	//	which one.
 78    	//   use N_SLINE
 79    
 80    	// Your code here.
 81    
 82    
 83    
 84    
 85    
 86    	// Search backwards from the line number for the relevant filename
 87    	// stab.
 88    	// We can't just use the "lfile" stab because inlined functions
 89    	// can interpolate code from a different file!
 90    	// Such included source files use the N_SOL stab type.
 91    	while (lline >= lfile
 92    	       && stabs[lline].n_type != N_SOL
 93    	       && (stabs[lline].n_type != N_SO || !stabs[lline].n_value))
 94    		lline--;
 95    	if (lline >= lfile && stabs[lline].n_strx < stabstr_end - stabstr)
 96    		info->eip_file = stabstr + stabs[lline].n_strx;
 97    
 98    
 99    	// Set eip_fn_narg to the number of arguments taken by the function,
100    	// or 0 if there was no containing function.
101    	if (lfun < rfun)
102    		for (lline = lfun + 1;
103    		     lline < rfun && stabs[lline].n_type == N_PSYM;
104    		     lline++)
105    			info->eip_fn_narg++;
106    
107    	return 0;
108    }
109
110```c
111发现要补全的地方...其实很容易写? 因为在要补全的二分之前,已经做了两次二分...照着写一下就好了。
112
113```c    
114            stab_binsearch(stabs, &lline, &rline, N_SLINE, addr);
115            if (lline == 0) return -1;
116            info->eip_line = stabs[rline].n_desc;

然后就是在monitor.c中修改monitor.c中,调用debuginfo_eip,这部分也很容易。

 1    
 2    int
 3    mon_backtrace(int argc, char **argv, struct Trapframe *tf)
 4    {
 5            // Your code here.
 6            uint32_t *ebp = (uint32_t*)read_ebp();
 7            cprintf("Stack backtrace:\n");
 8            int i ;
 9            struct Eipdebuginfo info;
10            while (ebp)
11            {
12                    uint32_t eip = ebp[1];
13                    cprintf("ebp x  eip x  ",ebp,eip);
14                    cprintf("args");
15                    for ( i = 2 ; i < 7 ; i++)
16                    {
17                            cprintf(" x",*(ebp+i));
18                    }
19                    cprintf("\n");
20                    int status = debuginfo_eip(eip,&info);
21                    if (status == 0)
22                    {
23     
24                      cprintf("%s:%d: ",info.eip_file,info.eip_line);
25                      cprintf("%.*s+%d\n",info.eip_fn_namelen,info.eip_fn_name,eip-info.eip_fn_addr);
26                    }
27                    ebp = (uint32_t*)*ebp;
28            }
29    
30    
31            return 0;
32    }

最终效果大概如下:

entering test_backtrace 5
entering test_backtrace 4
entering test_backtrace 3
entering test_backtrace 2
entering test_backtrace 1
entering test_backtrace 0
Stack backtrace:
ebp f0110ec8  eip f0100b09  args f0102499 f0102499 f0100b09 00000000 f0100d9c
kern/monitor.c:66: mon_backtrace+26
ebp f0110f18  eip f010008b  args 00000000 00000000 00000000 00000000 f0102238
kern/init.c:19: test_backtrace+75
ebp f0110f38  eip f010006d  args 00000000 00000001 f0110f64 00000000 f0102238
kern/init.c:16: test_backtrace+45
ebp f0110f58  eip f010006d  args 00000001 00000002 f0110f84 00000000 f0102238
kern/init.c:16: test_backtrace+45
ebp f0110f78  eip f010006d  args 00000002 00000003 f0110fa4 00000000 f0102238
kern/init.c:16: test_backtrace+45
ebp f0110f98  eip f010006d  args 00000003 00000004 f0110fc4 00000000 f010226f
kern/init.c:16: test_backtrace+45
ebp f0110fb8  eip f010006d  args 00000004 00000005 f0110fe4 00000000 00000000
kern/init.c:16: test_backtrace+45
ebp f0110fd8  eip f01000f1  args 00000005 00001aac 00000640 00000000 00000000
kern/init.c:43: i386_init+81
ebp f0110ff8  eip f010003e  args 00000003 00001003 00002003 00003003 00004003
kern/entry.S:83: <unknown>+0
leaving test_backtrace 0
leaving test_backtrace 1
leaving test_backtrace 2
leaving test_backtrace 3
leaving test_backtrace 4
leaving test_backtrace 5
Welcome to the JOS kernel monitor!

如果有些函数没有出现在上面,可能是被优化掉了,试着修改makefile中的编译选项,把O2或者O1修改为O0。

至此,我们完成了lab1的全部内容。完结撒花~

虽然做了三十个小时...不过真的收获蛮多,感觉像是在玩解谜游戏,线索就是每个练习前后的那些问题。

routline 详细X

  没有英汉互译结果 请尝试网页搜索