NSFOCUS绿盟科技

首页 -> 安全研究

安全研究

绿盟月刊

how to write a module dumper & disassembler

作者：CoolQ  <qufuping@ercist.iscas.ac.cn>
出处：http://www.linuxforum.net/forum/showflat.php?Cat=&Board=security&Number=
日期：2005-01-06

|=-----------[ how to write a module dumper & disassembler ]--------------=|
|=------------------------------------------------------------------------=|
|=---------------[ CoolQ  <qufuping@ercist.iscas.ac.cn> ]-----------------=|
|=------------------------------------------------------------------------=|

0 - 前言
1 - Linux 2.6 Kernel Module的加载过程
    1.1 加载流程简介
    1.2 测试模块
    1.3 关键点分析
    1.4 可行性结论
    1.5 补充几点
2 - 模块的提取
    2.1 简单的module dumper
    2.2 程序的功能
3 - 模块的反汇编
    3.1 BFD简介
    3.2 Intel的机器指令的基本格式
    3.3 利用binutils包编写一个简单的反编译器
    3.4 需要注意的两点
4 - 结尾
5 - 参考

--------------------------------------------------------------------------
- ---[ 0 前言

Linux系统下的LKM Trojan已经出现了不少，包括Knark，Adore，Adore-ng…，在应急响
应过程中，我们可以使用Madsys的方法[1]来查找隐藏的LKM Trojan，即使如此，对于取证
过程来说，这是不够的，我们还需要获得木马模块的样本。如果攻击者足够狡猾，可以安
全删除.o/.ko文件，因此单凭模块的名称，我们并不能知道这一可疑的模块到底做了些什
么。

因此，我们需要一种方法，将这些可疑模块提取出来，并对模块的功能进行分析。那么这
一想法的可行性如何呢？

本文正题分三部分
第一部分，分析一下Module加载的过程，看模块的提取与分析是否可行。
第二部分，编写一个简单的模块，将指定的模块内容提取出来。
第三部分，简单介绍反汇编的基本知识，包括机器指令的结构，BFD，然后利用binutils
源码写一个简单的反汇编器。
结尾，会提出一些可以改进的想法。

- ---[ 1 Linux 2.6 Kernel Module的加载过程

- ---[ 1.1 加载流程简介

当程序执行insmod abc.ko时，会调用sys_init_module系统调用[2]，将整个文件的内容传
入内核，由内核实现程序重定位、符号解析和模块相关函数的运行。
sys_init_module首先进行一些检查，然后
    mod = load_module(umod, len, uargs);
load_module做了大部分的工作，然后将modules插入链表中，如果定义了初始化函数，会
在加载的最后调用该函数(也就是module_init指定的函数)，然后释放mod->module_init
的空间。注意:前面的module_init是指在模块编程时指定的模块初始化函数，而后边的
则是指struct module的一个指针，该指针指向只运行一次就可以释放的空间。

下面转入load_module, 该函数会首先检查ELF文件的合法性，接着读取ELF文件头的信息，
然后按照各个节的标志计算需要分配的内存空间，之后，将相应的节复制过来，最后完成
符号的重定位过程。

为了便于理解，先给出一个测试模块

- ---[ 1.2 测试模块

先给出一个测试程序 test.c

#include <linux/init.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/string.h>

static char *mod_name = "module";
//module_param(mod_name, charp, 0);
static int remove_init(void)
{
    struct module *mod_head, *mod_counter;
    struct list_head *p;

    mod_head = &__this_module;
    list_for_each(p, &mod_head->list){
        mod_counter = list_entry(p, struct module, list);
        if(strcmp(mod_counter->name, mod_name) == 0){
            list_del(p);
            printk("remove module %s successfully.\n", mod_name);
            return 0;
        }
    }
    printk("Can't find module %s.\n", mod_name);
    return 0;
}
static void remove_exit(void)
{

}
module_init(remove_init);
module_exit(remove_exit);

MODULE_LICENSE("GPL")

- ---[ 1.3 关键点分析

由于.o/.ko文件调入内存之后，ELF的头文件信息不复存在，因此必须了解模块空间的具体
内存布局，才能确定某个模块究竟能不能提取出来、代码段的开始与结尾(是否可以反汇编
)。

我们先看看生成的test.ko的ELF部局：

#readelf -S test.ko

There are 19 section headers, starting at offset 0x620:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 000080 00  AX  0   0  4
  [ 2] .rel.text         REL             00000000 000918 000048 08     17   1  4
  [ 3] .altinstr_replace PROGBITS        00000000 0000b4 000006 00  AX  0   0  1
  [ 4] .rodata.str1.1    PROGBITS        00000000 0000ba 00003e 01 AMS  0   0  1
  [ 5] .altinstructions  PROGBITS        00000000 0000f8 000017 00   A  0   0  4
  [ 6] .rel.altinstructi REL             00000000 000960 000020 08     17   5  4
  [ 7] .modinfo          PROGBITS        00000000 000120 00005b 00   A  0   0 32
  [ 8] __versions        PROGBITS        00000000 000180 000100 00   A  0   0 32
  [ 9] .data             PROGBITS        00000000 000280 000004 00  WA  0   0  4
  [10] .rel.data         REL             00000000 000980 000008 08     17   9  4
  [11] .gnu.linkonce.thi PROGBITS        00000000 000300 000200 00  WA  0   0 128
  [12] .rel.gnu.linkonce REL             00000000 000988 000010 08     17   b  4
  [13] .bss              NOBITS          00000000 000500 000000 00  WA  0   0  4
  [14] .comment          PROGBITS        00000000 000500 000066 00      0   0  1
  [15] .note.GNU-stack   NOTE            00000000 000566 000000 00      0   0  1
  [16] .shstrtab         STRTAB          00000000 000566 0000b9 00      0   0  1
  [17] .symtab           SYMTAB          00000000 000998 000200 10     18  1c  4
  [18] .strtab           STRTAB          00000000 000b98 0000a6 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

然后我们来看看模块所占用的空间到底是怎么分配的:

分配空间的大小主要是由layout_sections函数来实现的:
/* Lay out the SHF_ALLOC sections in a way not dissimilar to how ld
   might -- code, read-only data, read-write data, small data.  Tally
   sizes, and place the offsets into sh_entsize fields: high bit means it
   belongs in init. */
static void layout_sections(struct module *mod,
                const Elf_Ehdr *hdr,
                Elf_Shdr *sechdrs,
                const char *secstrings)
{
    static unsigned long const masks[][2] = {
        /* NOTE: all executable code must be the first section
         * in this array; otherwise modify the text_size
         * finder in the two loops below */
        { SHF_EXECINSTR | SHF_ALLOC, ARCH_SHF_SMALL },
        { SHF_ALLOC, SHF_WRITE | ARCH_SHF_SMALL },
        { SHF_WRITE | SHF_ALLOC, ARCH_SHF_SMALL },
        { ARCH_SHF_SMALL | SHF_ALLOC, 0 }
    };
    unsigned int m, i;

    for (i = 0; i < hdr->e_shnum; i++)
        sechdrs[i].sh_entsize = ~0UL;

    DEBUGP("Core section allocation order:\n");
    for (m = 0; m < ARRAY_SIZE(masks); ++m) {
        for (i = 0; i < hdr->e_shnum; ++i) {
            Elf_Shdr *s = &sechdrs[i];

            if ((s->sh_flags & masks[m][0]) != masks[m][0]
                || (s->sh_flags & masks[m][1])
                || s->sh_entsize != ~0UL
                || strncmp(secstrings + s->sh_name,
                       ".init", 5) == 0)
                continue;
            s->sh_entsize = get_offset(&mod->core_size, s);
            DEBUGP("\t%s\n", secstrings + s->sh_name);
        }
        if (m == 0)
            mod->core_text_size = mod->core_size;
    }

    DEBUGP("Init section allocation order:\n");
    for (m = 0; m < ARRAY_SIZE(masks); ++m) {
        for (i = 0; i < hdr->e_shnum; ++i) {
            Elf_Shdr *s = &sechdrs[i];

            if ((s->sh_flags & masks[m][0]) != masks[m][0]
                || (s->sh_flags & masks[m][1])
                || s->sh_entsize != ~0UL
                || strncmp(secstrings + s->sh_name,
                       ".init", 5) != 0)
                continue;
            s->sh_entsize = (get_offset(&mod->init_size, s)
                     | INIT_OFFSET_MASK);
            DEBUGP("\t%s\n", secstrings + s->sh_name);
        }
        if (m == 0)
            mod->init_text_size = mod->init_size;
    }
}

因此,对于标志为AX,AMS,A,WA且不是".init"的节,都会将大小累计到mod->core_size中
(包含一定的对齐),并且将AX的大小保存到mod->core_text_size中;对于含有A(X)的
".init"节,会将大小累计到mod->init_size中，并将AX的大小保存到mod->init_text_
size中.而对于那些没有A的节,不分配空间.

对于各个节的复制,代码如下:
for (i = 0; i < hdr->e_shnum; i++) {
    void *dest;

    if (!(sechdrs[i].sh_flags & SHF_ALLOC))
        continue;

    if (sechdrs[i].sh_entsize & INIT_OFFSET_MASK)
        dest = mod->module_init
            + (sechdrs[i].sh_entsize & ~INIT_OFFSET_MASK);
    else
        dest = mod->module_core + sechdrs[i].sh_entsize;

    if (sechdrs[i].sh_type != SHT_NOBITS)
        memcpy(dest, (void *)sechdrs[i].sh_addr,
               sechdrs[i].sh_size);
    /* Update sh_addr to point to copy in image. */
    sechdrs[i].sh_addr = (unsigned long)dest;
    DEBUGP("\t0x%lx %s\n", sechdrs[i].sh_addr, secstrings +
                        sechdrs[i].sh_name);
}
可见,节的复制是按照原来ELF的顺序,将所有标志包含A的节都复制到相应的分配空间
(module_core/module_init),例外的情况是SHT_NOBITS，也就是BSS段，文件中没有分配
空间，因此不需要复制。

- ---[ 1.4 可行性结论

现在我们就能得出结论了，我们完全可以通过struct module结构中的字段，将内核模块
的大部分信息都抽取出来。由于我们节的复制是按顺序的，而.text节是第一个节，因此
mod->module_core实际上指向的就是.text段。而mod->core_text_size中也包含了.text
节的大小，因此我们的反编译代码段，范围也是明确的。

- ---[ 1.5 补充几点

!!!     现在我们来看一下.symtab和.strtab节，程序中有这么几行：
    #ifdef CONFIG_KALLSYMS
        /* Keep symbol and string tables for decoding later. */
        sechdrs[symindex].sh_flags |= SHF_ALLOC;
        sechdrs[strindex].sh_flags |= SHF_ALLOC;
    #endif
    这说明,只有定义了CONFIG_KALLSYMS时,这两个节才会被分配空间.

!!!    我们的程序中好像并没有.init节,因此module_init,init_size,init_text_size
    都为0，那么什么时候会存在.init节呢？我们试着把
    static int remove_init(void) -> static int __init remove_init(void)
    再看看readelf的结果：

  [2] .init.text        PROGBITS        00000000 000038 00007e 00  AX  0   0  1
    看来这下remove_init就变成.init.text段了。

- ---[ 2 模块的提取

按照模块内存分配的过程,我们很容易就能写出一个简易的module dumper.

- ---[ 2.1 简单的module dumper

#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/proc_fs.h>
#include <linux/fs.h>
#include <linux/file.h>
#include <linux/list.h>
#include <linux/string.h>
#include <asm/uaccess.h>

#define EOF             (-1)
#define SEEK_SET        0
#define SEEK_CUR        1
#define SEEK_END        2

struct file *klib_fopen(const char *filename, int flags, int mode);
void klib_fclose(struct file *filp);
int klib_fwrite(char *buf, int len, struct file *filp);

static struct module     *mod;
static char buffer[256];
static char *mod_name;

module_param(mod_name, charp, 0);

ssize_t show_mod_read(struct file *fp, char *buf, size_t len, loff_t *off)
{
    struct file     *filep;

    filep = klib_fopen("./dump.dat", O_WRONLY | O_CREAT | O_TRUNC,
                S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
    if(filep == NULL){
        printk("Error open files.\n");
        return 0;
    }
    klib_fwrite(mod->module_core, mod->core_size, filep);
    klib_fclose(filep);

    filep = klib_fopen("./dump.info", O_WRONLY | O_CREAT, O_TRUNC,
                S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
    if(filep == NULL){
        printk("Error open files.\n");
        return 0;
    }
    sprintf(buffer, "mod->module_init = 0x%p\n"
            "mod->module_core = 0x%p\n"
            "mod->init_size = %ld\n"
            "mod->core_size = %ld\n"
            "mod->init_text_size = %ld\n"
            "mod->core_text_size = %ld\n",
            mod->module_init, mod->module_core,
            mod->init_size, mod->core_size,
            mod->init_text_size, mod->core_text_size);
    klib_fwrite(buffer, strlen(buffer), filep);
    klib_fclose(filep);

    return 0;
}
static struct file_operations show_mod_fops = {

    .read = show_mod_read,
};

static int dummy_init(void)
{
    struct proc_dir_entry     *entry;
    struct list_head     *p;
    struct module        *head, *counter;

    mod = NULL;
    if(!mod_name)
        mod = THIS_MODULE;
    else{
        head = THIS_MODULE;
        list_for_each(p, head->list.prev){
            counter = list_entry(p, struct module, list);
            if(strcmp(counter->name, mod_name) == 0){
                mod = counter;
                break;
        }
    }
    if(!mod){
        printk("Can't find module named %s\n", mod_name);
        return -1;
    }
    entry = create_proc_entry("show_mod", S_IRUSR, &proc_root);
    entry->proc_fops = &show_mod_fops;

    return 0;
}

static void dummy_exit(void)
{
    remove_proc_entry("show_mod", &proc_root);
    return;
}

struct file *klib_fopen(const char *filename, int flags, int mode)
{
    struct file *filp = filp_open(filename, flags, mode);
    return (IS_ERR(filp)) ? NULL : filp;
}

void klib_fclose(struct file *filp)
{
    if (filp)
        fput(filp);
}

int klib_fwrite(char *buf, int len, struct file *filp)

{

    int writelen;
    mm_segment_t oldfs;

    if (filp == NULL)
        return -ENOENT;
    if (filp->f_op->write == NULL)
        return -ENOSYS;
    if (((filp->f_flags & O_ACCMODE) & (O_WRONLY | O_RDWR)) == 0)
        return -EACCES;
    oldfs = get_fs();
    set_fs(KERNEL_DS);
    writelen = filp->f_op->write(filp, buf, len, &filp->f_pos);
    set_fs(oldfs);

    return writelen;

}

module_init(dummy_init);
module_exit(dummy_exit);

MODULE_LICENSE("GPL");

- ---[ 2.2 程序的功能

这其实是一个很简单的小程序,将某些struct module的描述信息保存到./dump.info中，
将module装载以后的内容保存到./dump.dat中，由于dump.info中含有程序基址，代码段
长度等信息，有助于对dump.dat的反汇编。不过，对于法律取证来说，该程序还是不满
足要求的，我们在文章的最后会进行讨论。

- ---[ 3 模块的反汇编

现在既然我们获得了模块的内存镜像,现在该是分析模块具体作用的时候了,模块到底是
一个正常模块,还是一个LKM的木马呢?

- ---[ 3.1 BFD简介

BFD, Binary File Descriptor, 是一个支持多架构,多类型的二进制平台.有了这个平台，
你可以写出支持各种文件格式(ELF, COFF, A.OUT...)的工具，objdump就是一个例子。
BFD打开文件的时候，通过Magic来确定文件的类型，然后读取相应的文件头，并建立若干
canonical object，向调用的上层提供统一的接口。
可以看出，BFD对于编写多种文件格式的工具时,是个很好的起点,不过BFD并不是十全十美
的,由于严格遵循了规范,因此对某些畸形的文件支持不好,这种情况下,我们可能需要别的
工具,例如Fenris[3].

- ---[ 3.2 Intel的机器指令的基本格式

Intel x86的机器指令型为[4]
    [Prefix] Opcode [ModR/M] [SIB] [Displacment] [Immediate]

其中Opcode是必须的,其它的都是可选的,其中Prefix为0-4字节的组合,可能的情况有:

The following are the allowable instruction prefix codes:
    F3H REP prefix (used only with string instructions)
    F3H REPE/REPZ prefix (used only with string instructions
    F2H REPNE/REPNZ prefix (used only with string instructions)
    F0H LOCK prefix
The following are the segment override prefixes:
    2EH CS segment override prefix
    36H SS segment override prefix
    3EH DS segment override prefix
    26H ES segment override prefix
    64H FS segment override prefix
    65H GS segment override prefix
    66H Operand-size override
    67H Address-size override

因此,当我们对一段机器码进行反汇编时,首先从第一个字节开始,把所有的Prefix跳过,
之后就是1-2字节的Opcode(2字节的Opcode第一个字节为特定字符，例如0x0F),对于某些
Opcode指令，需要相应的[ModR/M] [SIB] [Displacment] [Immediate]，具体的情况
请参见Intel的手册。

- ---[ 3.3 利用binutils包编写一个简单的反编译器

如果自己从头写一个反编译器,Intel的指令很多,需要的工作量很大,我们可以利用现成
的开源代码,稍作修改,来满足我们的需要.Open Source的目的就是Never invent wheel
again, never solve the same problem twice.

我们不能使用标准的BFD Library,因为我们dump出来的镜像已经丢失了ELF的头信息，
因此需要调用BFD底层的函数来反汇编指令，同时还要自己处理符号。

对于反汇编指令，我们可以使用binutils/opcodes/i386-dis.c[5]中的print_insn函数，
至于符号解析，我们可以使用系统/boot/System.map文件的内容。

由于print_insn函数是BFD的底层函数，使用了许多BFD相关的数据结构，因此如果我们
使用，需要自己构造一些结构，通过对print_insn的分析，我们只需要构造以下结构即可:

struct disassemble_info myinfo;
static void info_init(void)
{
    myinfo.mach = bfd_mach_i386_i386;
    myinfo.disassembler_options = "i386,att,addr32,data32";
    myinfo.fprintf_func = fprintf;
    myinfo.stream = stdout;
    myinfo.read_memory_func = my_read_func;
    myinfo.memory_error_func = my_error_func;
    myinfo.print_address_func = my_address_func;
    myinfo.buffer_vma = base_addr;
    myinfo.buffer_length = dis_size;
    myinfo.buffer = malloc(dis_size);
}
其中my_read_func是自定义的读取指令的函数，my_error_func是出错处理函数，
my_address_func是打印地址函数，在这里我们可以查找符号表，得到该地址的
相应符号。base_addr就是mod->module_core。

简单的my_address_func实现:
void my_address_func(bfd_vma memaddr,
        struct disassemble_info *myinfo)
{
    char     *p;

    p = NULL;
    myinfo->fprintf_func(myinfo->stream, "0x%x", memaddr);
    p = find_symbol(root, memaddr);
    if(p)
        myinfo->fprintf_func(myinfo->stream, " <%s>", p);
    return;
}

简单的my_read_func实现:
int my_read_func(bfd_vma memaddr,
        bfd_byte *myaddr,
        unsigned int length,
        struct disassemble_info *myinfo)
{
    unsigned long bytes;

    bytes = memaddr - myinfo->buffer_vma;

    memcpy(myaddr, myinfo->buffer + bytes, length);
    return 0;
}

我们对test.ko进行反汇编，得到的结果如下:(片段)

...
<8881000+59>    mov    %edx,(%eax)
<8881000+61>    movl   $0x200200,0x4(%ecx)
<8881000+68>    movl   $0x100100,(%ecx)
<8881000+74>    pushl  0x8881488
<8881000+80>    push   $0x888108d
<8881000+85>    jmp    0x8881072
<8881000+87>    mov    %edx,%ecx
<8881000+89>    mov    (%edx),%eax
<8881000+91>    prefetchnta (%eax)
<8881000+94>    nop
<8881000+95>    cmp    $0x8881504,%edx
<8881000+101>    jne    0x8881016
<8881000+103>    pushl  0x8881488
<8881000+109>    push   $0x88810ad
<8881000+114>    call   0x21188c7 <printk>
<8881000+119>    pop    %eax
<8881000+120>    xor    %eax,%eax
<8881000+122>    pop    %edx
<8881000+123>    pop    %esi
<8881000+124>    pop    %edi
<8881000+125>    ret
...

- ---[ 3.4 需要注意的两点

1. i386-dis.c中需要的bfd.h在源代码包中并不存在，需要运行./configure && make
才能生成,只是运行./configure是不够的.
2. 前面提到的__init修饰，以及.init开头的节，在模块插入完毕运行之后，这一部分
的内存空间会被释放，因此dump的时候这些信息是无法得到的。

-----------------------------------------------------------------------------

- ---[ 4 结尾

至此，我们已经实现了基本的模块提取和反编译功能，但仍有很多不足及值得改进的地方

o  模块dump的结果默认放到了磁盘上，这是法律取证中应该避免的(尽量少的磁盘访问)。
   解决方法是通过内核向网上的另一台主机发包，内容就是原来的文件。

o  反编译的过程还是依赖BFD的底层，对于畸形的指令是没有免疫力的，简单的花指令就
   会使反编译的结果出错。

o  如果symtab和strtab也保存到模块镜像中，这些信息对于反编译也很有帮助，我们没有
   使用。

大家可以根据自己的要求，适当增加该反编译器的功能。

-----------------------------------------------------------------------------

- ---[ 5 参考

[1] madsys http://www.phrack.org/show.php?p=61&a=3
     http://www.linuxforum.net/forum/gshowflat.php?Cat=&Board=security&Number=512152&page=0&view=collapsed&sb=5&o=all&fpart=
     http://www.linuxforum.net/forum/showflat.php?Cat=&Board=security&Number=437327&page=2&view=collapsed&sb=5&o=all&vc=1
[2] Linux kernel source code, www.kernel.org
[3] http://lcamtuf.coredump.cx/fenris/devel.shtml
[4] Intel指令手册
[5] binutils source code, http://www.gnu.org/software/binutils/

关于我们: 公司介绍; 公司荣誉; 公司新闻

联系我们: 公司总部; 分支机构; 海外机构

快速链接: 绿盟云; 绿盟威胁情报中心NTI; 技术博客