Inspiration from the QWB’s qemu

Inspiration from the QWB's qemu

When I analyzed the qemu program, I found that its checking mechanism is really good, this is the reason that I wrote the article.

There are two main types of checking mechanism, stack checking and heap checking. Though it isn't great as AddressSanitizer, because AddressSanitizer append checking mechanism into the source directly instead of the binary, that is extremely superior than Black Box Testing of the qemu. But qemu is better suited to analyse the binary without source code.

QEMU emulator

QEMU is a generic and open source machine emulator and virtualizer, the method of its virtual implementation is to use TCG that extracts guest code, then translates it into TCG intermediate code, and finally translates the intermediate code into host-specific architecture code.

TCG

Just like the ret instruction below.

    case 0xc3: /* ret */
        ot = gen_pop_T0(s);
        gen_pop_update(s, ot);
        /* Note that gen_pop_T0 uses a zero-extending load.  */
        gen_op_jmp_v(s->T0);
        gen_bnd_jmp(s);
        gen_jr(s, s->T0);
        break;

One instrction use many host-specific architecture code to

It need many host-specific architecture codes to emulate one instrction inefficiently. But it can debug special instrctions with piling debugging.

There are some files which we should take care of. Even a specific debuging program can be make by developing the qemu source secondly.

  • /target-arch/translate.c:Translate guest code into different architecture TCG code.
  • /tcg/tcg.c: TCG code.
  • /tcg/arch/tcg-target.c: Translate the TCG code into host-specific architecture code.

Stack checking

RET

Checking call and logging stack for verification.

RET

Verify the return address saved before.

void __cdecl helper_ret_log(CPUX86State_0 *env, target_ulong ret_ip)
{
  stack_frame_0 *v2; // [rsp+18h] [rbp-8h]

  if ( !check_frame_ok() )
  {
    puts("----------------");
    puts("----stack error----");
    puts("stack overflow detected!");
    puts("----------------");
    exit(0);
  }
  v2 = call_stack_pop();
  if ( v2 )
  {
    if ( ret_ip != v2->ret_addr )
    {
      puts("----------------");
      puts("----stack error----");
      puts("CF broken! stack overflow detected!");
      puts("----------------");
      exit(0);
    }
  }
}

Heap checking

Record some information in do_syscall.

syscall

When calling sys_mmap, it will judge whether it is the libc base address according to the rules of libc. If it is the base address, you can also get other malloc, calloc, free and other addresses accordingly.

When sys_brk is called, its heap address is recorded according to libc's rule for applying for heap.

  else if ( call_aim == BA_valloc_addr )
  {
    func_num = 5LL;
  }
  else if ( call_aim == BA_memalign_addr )
  {
    func_num = 6LL;
  }
  if ( func_num )
  {
    env->regs[0] = emulate_alloc(func_num, env->regs[7], env->regs[6], env->regs[2]);
    env->eip = next_eip;
  }
  else
  {
LABEL_18:
    call_stack_push(next_eip, env->regs[4]);
    addr = env->regs[4] - 8;
    v3 = cpu_mmu_index_kernel(env);
    cpu_stq_mmuidx_ra(env, addr, next_eip, v3, 0LL);
    env->regs[4] = addr;
    env->eip = call_aim;
  }
}
target_ulong __cdecl BA_malloc(target_ulong size)
{
  if ( !__readfsbyte(0xFFFEFAB8) )
    BA_init_pool();
  if ( size <= 0x7F )
    return BA_block_get(size, 0LL);
  if ( size <= 0xFF )
    return BA_block_get(size, 1uLL);
  if ( size <= 0x1FF )
    return BA_block_get(size, 2uLL);
  if ( size <= 0x3FF )
    return BA_block_get(size, 3uLL);
  if ( size <= 0x7FF )
    return BA_block_get(size, 4uLL);
  if ( size > 0xFFF )
    return BA_block_get(size, 6uLL);
  return BA_block_get(size, 5uLL);
}

syscall
syscall

For QEMU's TCG mechanism, most of the R/W instructions are implemented by the above load and store instructions, which will check the accessed memory. In conjunction with the author's heap mechanism, most heap problems can be detected.

      if ( *(_BYTE *)j == 2 )
      {
        puts("----heap error----");
        printf("use after free found at 0x%016lx idx 0x%016lx type %lx\n", address, idx, type);
        puts("----------------");
        exit(0);
      }
      if ( *(_BYTE *)j == 1 )
      {
        puts("----heap error----");
        printf("access unallocated chunk found at 0x%016lx idx 0x%016lx type %lx\n", address, idx, type);
        puts("----------------");
        exit(0);
      }
      if ( idx + address >= j[1] && idx + address < j[1] + j[2] )
      {
        puts("----heap error----");
        printf("out of bound access chunk found at 0x%016lx idx 0x%016lx type %lx\n", address, idx, type);
        puts("----------------");
        exit(0);
      }
      if ( idx + address >= j[3] && idx + address < j[3] + j[4] )
      {
        puts("----heap error----");
        printf("out of bound access chunk found at 0x%016lx idx 0x%016lx type %lx\n", address, idx, type);
        puts("----------------");
        exit(0);
      }

Application

We can develop our own dynamic debugging tool based on the above principles, which can be not limited to stack overflows and heap vulnerabilities, or we can customize a dedicated debugger.

Since most actions which access memory go through load and store TCG instrctions, this makes it easier for us to check memory, allowing us to monitor sensitive memory by overwriting QEMU. For some reversed binary analysis, when we need to focus on certain instructions, such as observing program behavior through the call instruction, it is convenient to make it so directly by redeveloping QEMU.