First, download the replay at http://www.rrshare.org/content/rrlogs/starcraft4.rr, and clone a copy of the PANDA repository:
git clone https://github.com/moyix/panda.git
This tutorial is based on revision 186fcec, so run:
git checkout 186fcec
To build PANDA, cd into panda/qemu. Edit build.sh to remove the lines about LLVM; we won't need them for this tutorial. Then run build.sh.
The first thing we need to do is find the code that's checking the
CD-Key. While recording the replay, we entered the CD-key
"N68KTD-HEKM-HEV89N-74GKE-DNYKC" into the installer and hit OK. This
string gives us the opportunity to use the stringsearch plugin to zero
in on the verification code. We'll search for the first two blocks of
the CD-Key without dashes, as we make the reasonable assumption that the
verification code ignores the dashes.
Write the following to search_strings.txt, including quotes:
"N68KTDHEKM"
Now stringsearch will watch every memory read/write for this string.
Let's run PANDA with stringsearch turned on. It requires the
callstack_instr plugin, too.
panda/qemu/i386-softmmu/qemu-system-i386 -replay starcraft4 -display none -panda 'callstack_instr;stringsearch'
Now stringsearch will give us a bunch of matches, the first of which
are:
WRITE Match of str 0 at: instr_count=10649029 : 0045b42d 00437e77 06cba000
WRITE Match of str 0 at: instr_count=10651144 : 0045b42d 00437eee 06cba000
WRITE Match of str 0 at: instr_count=10848983 : 0045b42d 00437e77 06cba000
WRITE Match of str 0 at: instr_count=10851098 : 0045b42d 00437eee 06cba000
READ Match of str 0 at: instr_count=10860768 : 00437a2b 0049aad4 06cba000
READ Match of str 0 at: instr_count=10861024 : 00437a2b 0049aad4 06cba000
READ Match of str 0 at: instr_count=10861317 : 00437a2b 0049aad4 06cba000
READ Match of str 0 at: instr_count=10861638 : 00437a2b 0049aad4 06cba000
READ Match of str 0 at: instr_count=10862350 : 00437a68 00411362 06cba000
WRITE Match of str 0 at: instr_count=10862350 : 00437a68 00411362 06cba000
READ Match of str 0 at: instr_count=10862853 : 0045b73a 00411362 06cba000
WRITE Match of str 0 at: instr_count=10862853 : 0045b73a 00411362 06cba000
The first number is the return address on the stack (i.e. the value pointed at by EBP), the second is the EIP at which the string was seen, and the third is the CR3.
The first four matches appear to be doing basic validation of the
CD-Key—they are checking to see that all characters are alphanumeric and
converting to uppercase. The next four (PC 49aad4) might be looking for
dashes, as that PC is inside a strnchr function. The last four are
copying the CD-Key twice but don't appear to be doing any computation
nearby.
So let's move on to the next group:
READ Match of str 0 at: instr_count=30991180 : 0040331b 00411362 06cba000
WRITE Match of str 0 at: instr_count=30991180 : 0040331b 00411362 06cba000
READ Match of str 0 at: instr_count=31015372 : 0047d949 0047d4cb 06cba000
READ Match of str 0 at: instr_count=31045765 : 004286ff 0044c951 06cba000
READ Match of str 0 at: instr_count=31046388 : 0044c964 00411698 06cba000
WRITE Match of str 0 at: instr_count=31046388 : 0044c964 00411698 06cba000
The first two are copies again. But the third looks a little more interesting:
47d4cb: 0f b6 04 17 movzbl (%edi,%edx,1),%eax
47d4cf: 0f b6 80 70 ea 51 00 movzbl 0x51ea70(%eax),%eax
It's using the character from the CD-Key, read at 47d4cb, to do a table lookup into a table at 0x51ea70. This seems like the beginning of a decryption algorithm. Starting at 47d949, the caller appears to initialize 16 bytes on the stack to 0. It then passes the address of this 16-byte region to three different functions, moves different portions of that region into some locations in memory, and returns.
stringsearch uses the callstack plugin to record the full callstack
for each match, so we can look up the caller for 47d949. The callstacks
are stored in string_matches.txt. The line we're looking for will end
with PC/caller/CR3 as above, so we can locate it:
0045c252 00428867 004286ff 0044c83b 0047d949 0047d4cb 06cba000
The return address is 44c83b, where the program grabs part of the stored 16-byte region from before, calls a function (44c120, which appears to be some form of strnchr), and then jumps away if the result is true—if the strnchr finds the desired value. From here, we can write a custom plugin to confirm our intuition, or manually RE the code enough to figure out that we're correct.
The one remaining issue is to find the strnchr value which is being searched for. The best way to do this is to write a custom plugin to print out the dynamic value. We'll leave that as an exercise for the reader.
Start by downloading and unpacking the LINE recording from rrshare.org:
$ wget http://www.rrshare.org/content/rrlogs/line2.rr
$ scripts/rrunpack.py line2.rr
Verifying checksum... Success.
line2-rr-snp
line2-rr-nondet.log
Unacking RR log line2.rr with 10367712943 instructions... Done.
Get the version of PANDA used in the paper:
$ git clone https://github.com/moyix/panda.git
$ git checkout b8d3dbb
Modify the build.sh script to build with Android support. It will end
up looking like:
#!/bin/sh
python ../scripts/apigen.py
./configure --target-list=arm-softmmu \
--cc=gcc-4.7 \
--cxx=g++-4.7 \
--prefix=`pwd`/install \
--enable-android \
--disable-pie \
--enable-llvm \
--with-llvm=../llvm-3.3/Release \
--extra-cflags="-O2" \
--extra-cxxflags="-O2" \
&& make -j $(nproc)
Build PANDA (see the documentation for details on dependencies).
Due to a bug in the Android support, the replay requires that a QCOW2 file exists for the system and user data devices of the Android emulator. It doesn't have to have any real data though, so we can just create a small dummy QCOW2:
$ ./qemu-img create -f qcow2 dummy.qcow2 1M
Now we can move on to the actual analysis.
Now, we suspect that the censorship list will include Tiananmen (天安门)
and Falun (法轮). So we will use TZB to search all memory reads and
writes for the UTF-8 encoded versions of these strings. Create a file
search_strings.txt that looks like:
e5:a4:a9:e5:ae:89:e9:97:a8
e6:b3:95:e8:bd:ae
Now, run the replay. Note that we have to pass in the dummy QCOW2 we created:
../arm-softmmu/qemu-system-arm -m 2048 -replay line2 -M android_arm -cpu cortex-a9 -kernel /dev/null -vnc :0 \
-global goldfish_mmc.sd_path=/dev/null -global goldfish_nand.system_path=dummy.qcow2 \
-global goldfish_nand.user_data_path=dummy.qcow2 \
-panda 'callstack_instr;stringsearch'
About 19% of the way through the replay, we begin seeing matches:
./line2-rr-nondet.log: 10362284 of 66356457 (15.62%) bytes, 1866218128 of 10367712943 (18.00%) instructions processed.
./line2-rr-nondet.log: 11056439 of 66356457 (16.66%) bytes, 1969919380 of 10367712943 (19.00%) instructions processed.
WRITE Match of str 0 at: instr_count=1997783382 : 40796784 4074f7f8 28210000
WRITE Match of str 0 at: instr_count=1999952799 : 40796784 4074f7f8 28210000
WRITE Match of str 0 at: instr_count=2008500449 : 407a7ada 4b50817c 28210000
WRITE Match of str 0 at: instr_count=2011053513 : 407a7ada 4b50817c 28210000
WRITE Match of str 0 at: instr_count=2011442343 : 407a7ada 4b50817c 28210000
WRITE Match of str 1 at: instr_count=2011820449 : 407a7ada 4b50817c 28210000
WRITE Match of str 0 at: instr_count=2012833451 : 407a7ada 4b50817c 28210000
WRITE Match of str 0 at: instr_count=2012867057 : 407a7ada 4b50817c 28210000
WRITE Match of str 1 at: instr_count=2013214014 : 407a7ada 4b50817c 28210000
WRITE Match of str 1 at: instr_count=2013231331 : 407a7ada 4b50817c 28210000
Once the replay is finished, we will have a file string_matches.txt
that summarizes these matches:
4b536c00 4b536c00 4b536c00 4b536c00 4b536c00 407a7ada 40796784 4075222c 40796784 400417a8 40041e6a 407a7ada 40796784 4075222c 40796784 40796784 40038678 28210000 16 12
4b536c00 4b536c00 4b536c00 4b536c00 4b536c00 407a7ada 40796784 4075222c 40796784 400417a8 40041e6a 407a7ada 40796784 4075222c 40796784 40796784 40038684 28210000 0 1
4b536c00 4b536c00 4b536c00 4b536c00 4b536c00 407a7ada 40796784 4075222c 40796784 400417a8 40041e6a 407a7ada 40796784 4075222c 40796784 40796784 40038688 28210000 16 12
407a7ada 400417a8 40041e6a 407a7ada 400417a8 40041e6a 407a7ada 400417a8 40041e6a 407a7ada 407a7ada 400417a8 40041e6a 407a7ada 407a7ada 407a7ada 4074f630 28210000 8 5
4b53490a 407a7ada 407a7ada 407a7ada 40796784 407a7ada 40796784 40796784 4b519eea 4b51c19a 4b53490a 407a7ada 407a7ada 407a7ada 407a7ada 40796784 4074f7f8 28210000 2 0
4b536c00 4b536c00 4b536c00 4b536c00 407a7ada 4b50902c 40796784 4b519eea 4b51c19a 4b53490a 4b536c00 4b536c00 4b536c00 4b536c00 4b536c00 407a7ada 4b50817c 28210000 13 11
On the far right are the number of times each string matched.
We don't know a priori which of these will contain the full list, but
we can simply dump out all data passing through them using textprinter.
Create a file tap_points.txt with the caller, pc, and address space
information where both strings matched:
40796784 40038678 28210000
40796784 40038688 28210000
407a7ada 4074f630 28210000
407a7ada 4b50817c 28210000
And run another replay with textprinter turned on:
../arm-softmmu/qemu-system-arm -m 2048 -replay line2 -M android_arm -cpu cortex-a9 -kernel /dev/null -vnc :0 \
-global goldfish_mmc.sd_path=/dev/null -global goldfish_nand.system_path=dummy.qcow2 \
-global goldfish_nand.user_data_path=dummy.qcow2 \
-panda 'callstack_instr;textprinter'
This will create two gzipped files, read_tap_buffers.txt.gz and
write_tap_buffers.txt.gz. Their contents are rather verbose and show
the callstack, memory address, and value of each byte passing through
the specified tap points. For example:
40782c3a 40782c3a 40782c3a 40782c3a 407698fe 40782c3a 40782c3a 407698fe 40782c3a 40782c3a 40769d40 40782c3a 407698fe 400417a8 40041e6a 407a7ada 4074f630 28210000 415bc3a8 156197121 3c
407698fe 40782c3a 40782c3a 40782c3a 40782c3a 40782c3a 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4078213e 40796784 40038678 28210000 415bc3f0 156603914 61
407698fe 40782c3a 40782c3a 40782c3a 40782c3a 40782c3a 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4078213e 40796784 40038678 28210000 415bc3f1 156603914 6d
407698fe 40782c3a 40782c3a 40782c3a 40782c3a 40782c3a 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4078213e 40796784 40038678 28210000 415bc3f2 156603914 65
407698fe 40782c3a 40782c3a 40782c3a 40782c3a 40782c3a 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4078213e 40796784 40038678 28210000 415bc3f3 156603914 3d
407698fe 40782c3a 40782c3a 40782c3a 40782c3a 40782c3a 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4078213e 40796784 40038678 28210000 415bc3f4 156603915 22
407698fe 40782c3a 40782c3a 40782c3a 40782c3a 40782c3a 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4075222c 4078213e 40796784 40038678 28210000 415bc3f5 156603915 44
We can make this more readable by splitting these files out into their
constituent tap points and converting the hex data to binary. The
script that does this is called split_taps.py:
$ mkdir -p taps/reads taps/writes
$ ../scripts/split_taps.py ../scripts/split_taps.py read_tap_buffers.txt.gz taps/reads/line
$ ../scripts/split_taps.py ../scripts/split_taps.py write_tap_buffers.txt.gz taps/writes/line
And then examining the files in taps/reads and taps/writes. Looking
in particular at taps/writes/line.40796784.40038688.28210000.dat, we
see partway through the file lines that look promising:
GCD
GFW
18大
38军
八九
半羽
鲍彤
暴政
柴玲
赤匪
共党
共匪
These translate to words like "tyranny", "communist", etc.
We now wish to ensure we get the full list. If part of the list is processed at a different program counter, that part will not show up in our dump. However, we can reasonably surmise that the entire file is being read into a contiguous buffer in memory. If we go back to the point in the tap buffer file where the characters we're interested in appear, we can look at the addresses and then monitor all writes to that contiguous buffer (plus some extra at the end to make sure we see everything).
We start by getting the byte offset of one of the strings we saw:
$ grep -a -b -o GCD taps/writes/line.40796784.40038688.28210000.dat
64587:GCD
80715:GCD
We can then look at that line in the original tap dump:
$ zgrep '40796784 40038688 28210000' write_tap_buffers.txt.gz | less
And then type 64587G into less to jump to line 64587. Here we see
(full callstack abbreviated and annotated for the sake of readability):
40796784 40038688 28210000 41544f7e 714950631 75 ; u
40796784 40038688 28210000 41544f7f 714950631 a7 ; \xa7
40796784 40038688 28210000 41792080 726257350 31 ; 1
40796784 40038688 28210000 41792081 726257350 39 ; 9
40796784 40038688 28210000 41792082 726257350 38 ; 8
40796784 40038688 28210000 41792083 726257350 39 ; 9
40796784 40038688 28210000 41792084 726257351 36 ; 6
40796784 40038688 28210000 41792085 726257351 34 ; 4
40796784 40038688 28210000 41792086 726257351 0a ; \n
40796784 40038688 28210000 41792087 726257351 46 ; F
40796784 40038688 28210000 41792088 726257352 4c ; L
40796784 40038688 28210000 41792089 726257352 47 ; G
40796784 40038688 28210000 4179208a 726257352 0a ; \n
40796784 40038688 28210000 4179208b 726257352 47 ; G
40796784 40038688 28210000 4179208c 726257353 43 ; C
40796784 40038688 28210000 4179208d 726257353 44 ; D
40796784 40038688 28210000 4179208e 726257353 0a ; \n
The third-to-last column is the address being written to. We see that
a contiguous buffer containing our candidate censorship list starts at
41792080 with the string "198964". This refers to June 4, 1989, the
date of the Tiananmen Square massacre.
Now we can pick a reasonable size for the buffer. It looks like what
we've seen of our censorship list ends at 41793e3f, so we'll consider
the slightly larger range [41792080,41794080) -- a 0x2000 byte
region. We can also update this guess if it looks like we still haven't
found the whole thing after monitoring writes to this region.
Now we can use bufmon, which monitors accesses to a buffer. We create
a file search_buffers.txt with the range we want to monitor and its
address space identifier:
0x41792080 0x2000 28210000
Now we run bufmon:
../arm-softmmu/qemu-system-arm -m 2048 -replay line2 -M android_arm -cpu cortex-a9 -kernel /dev/null -vnc :0 \
-global goldfish_mmc.sd_path=/dev/null -global goldfish_nand.system_path=dummy.qcow2 \
-global goldfish_nand.user_data_path=dummy.qcow2 \
-panda 'callstack_instr;bufmon'
Output is placed in buffer_taps.txt. Looking at the tap point we saw
earlier, we see that the buffer is zeroed out just before the
censorship list is written to it:
WRITE 40759e88 40038998 28210000 41793e38 00000004 00 00 00 00
WRITE 40759e88 40038998 28210000 41793e3c 00000004 00 00 00 00
WRITE 40759e88 400389ac 28210000 41793e40 00000004 00 00 00 00
WRITE 40759e88 400389ac 28210000 41793e44 00000004 00 00 00 00
WRITE 40796784 40038688 28210000 41792080 00000004 31 39 38 39
WRITE 40796784 40038688 28210000 41792084 00000004 36 34 0a 46
WRITE 40796784 40038688 28210000 41792088 00000004 4c 47 0a 47
WRITE 40796784 40038688 28210000 4179208c 00000004 43 44 0a 47
We can see that most of the list is indeed written from the original
tap point we found, 40796784 40038688 28210000. At the end, however,
we see 8 additional bytes written from 40796784 400386ac 28210000:
WRITE 40796784 40038688 28210000 41793e38 00000004 e5 85 b1 e6
WRITE 40796784 40038688 28210000 41793e3c 00000004 9d 83 e6 96
WRITE 40796784 400386ac 28210000 41793e40 00000004 97 0a e4 b9
WRITE 40796784 400386ac 28210000 41793e44 00000004 b0 e6 9e aa
WRITE 4079ff5a 4079e8aa 28210000 41793e4c 00000004 91 51 00 00
After this, the next write is to a non-contiguous region. We can thus
assume that the entire censorship list is written to the buffer from
0x41792080 to 0x41793e48. Finally we can extract the bytes written
to that location by hand (i.e. copy/paste) and obtain our full list
of censored words, which we can analyze at our leisure.
For reference, the full list is available at:
http://www.cc.gatech.edu/~brendan/line.txt
This is an example of using PANDA for rapid vulnerability diagnosis. To start, clone a copy of the PANDA repository:
git clone https://github.com/moyix/panda.git
We're using revision 46bf2ea for this tutorial, so run:
git checkout 46bf2ea
To build PANDA, cd into panda/qemu. Edit build.sh to remove the lines
about LLVM; we won't need them for our purposes. Then run build.sh.
Now download the bug replay on rrshare.org and unpack it:
panda/scripts/rrunpack.py cve-2011-1255-crash.rr
One of PANDA's advantages is that it enables you to zero in on the relevant code very quickly. First, we need to zoom in on the part of the replay that's relevant to us. We aren't quite sure what's happening, so let's use the replaymovie plugin to make a video of replay execution. Run:
panda/qemu/x86_64-softmmu/qemu-system-x86_64 -m 1024 -replay cve-2011-1255-crash \
-display none -panda 'replaymovie'
This will dump out a bunch of raw image files. Luckily, the replaymovie plugin has a script to actually make a movie. Run
panda/qemu/panda_plugins/replaymovie/movie.sh
(make sure you have parallel and imagemagick installed). This will
create replay.mp4, which you can watch in your favorite video player.
The movie tells you that Internet Explorer crashes after a page is
loaded, so let's find out what page it is. First, though, we need to cut
the replay to a manageable size. The relevant part should start with the
page load--so we should see the string <html in memory, and it should
end with "has stopped working" being in memory. Place the following
three lines into search_strings.txt:
"<html"
"<HTML"
"has stopped working"
Now run the stringsearch plugin:
panda/qemu/x86_64-softmmu/qemu-system-x86_64 -m 1024 -replay cve-2011-1255-crash \
-display none -panda 'callstack_instr;stringsearch'
The output (abberviated here) will look something like:
opening nondet log for read : ./cve-2011-1255-crash-rr-nondet.log
./cve-2011-1255-crash-rr-nondet.log: 324672 of 13204683 (2.46%) bytes, 14286548 of 1425929663 (1.00%) instructions processed.
[...]
./cve-2011-1255-crash-rr-nondet.log: 3224618 of 13204683 (24.42%) bytes, 370747250 of 1425929663 (26.00%) instructions processed.
./cve-2011-1255-crash-rr-nondet.log: 3314428 of 13204683 (25.10%) bytes, 385204921 of 1425929663 (27.01%) instructions processed.
READ Match of str 0 at: instr_count=398546927 : 0000000086ebece0 0000000082888856 0000000000000000
WRITE Match of str 0 at: instr_count=398546927 : 0000000086ebece0 0000000082888856 0000000000000000
./cve-2011-1255-crash-rr-nondet.log: 3360953 of 13204683 (25.45%) bytes, 399378040 of 1425929663 (28.01%) instructions processed.
./cve-2011-1255-crash-rr-nondet.log: 3423622 of 13204683 (25.93%) bytes, 414167226 of 1425929663 (29.05%) instructions processed.
READ Match of str 0 at: instr_count=422577965 : 000000007679371a 0000000076319b60 000000003f98b320
WRITE Match of str 0 at: instr_count=422577965 : 000000007679371a 0000000076319b60 000000003f98b320
[...]
./cve-2011-1255-crash-rr-nondet.log: 4261845 of 13204683 (32.28%) bytes, 641754362 of 1425929663 (45.01%) instructions processed.
./cve-2011-1255-crash-rr-nondet.log: 4367057 of 13204683 (33.07%) bytes, 656083288 of 1425929663 (46.01%) instructions processed.
./cve-2011-1255-crash-rr-nondet.log: 4491501 of 13204683 (34.01%) bytes, 670258577 of 1425929663 (47.01%) instructions processed.
./cve-2011-1255-crash-rr-nondet.log: 4675020 of 13204683 (35.40%) bytes, 684462870 of 1425929663 (48.00%) instructions processed.
READ Match of str 1 at: instr_count=693024260 : 0000000086c37b91 00000000828887d3 0000000000000000
WRITE Match of str 1 at: instr_count=693024260 : 0000000086c37b91 00000000828887d3 0000000000000000
./cve-2011-1255-crash-rr-nondet.log: 4881606 of 13204683 (36.97%) bytes, 698721619 of 1425929663 (49.00%) instructions processed.
READ Match of str 1 at: instr_count=705861108 : 000000007679371a 0000000076319b60 000000003f98b320
WRITE Match of str 1 at: instr_count=705861108 : 000000007679371a 0000000076319b60 000000003f98b320
READ Match of str 1 at: instr_count=705874377 : 00000000828aca66 00000000828887d3 0000000000000000
WRITE Match of str 1 at: instr_count=705874377 : 00000000828aca66 00000000828887d3 0000000000000000
READ Match of str 1 at: instr_count=706855458 : 00000000761e6ab5 0000000076319b60 000000003f98b320
WRITE Match of str 1 at: instr_count=706855458 : 00000000761e6ab5 0000000076319b60 000000003f98b320
READ Match of str 1 at: instr_count=708771845 : 00000000761f6fd3 0000000076319b60 000000003f98b320
WRITE Match of str 1 at: instr_count=708771845 : 00000000761f6fd3 0000000076319b60 000000003f98b320
READ Match of str 1 at: instr_count=708779961 : 000000006da56ee2 0000000076319b60 000000003f98b320
WRITE Match of str 1 at: instr_count=708779961 : 000000006da56ee2 0000000076319b60 000000003f98b320
READ Match of str 1 at: instr_count=708780509 : 000000006da6902c 0000000076319b60 000000003f98b320
WRITE Match of str 1 at: instr_count=708780509 : 000000006da6902c 0000000076319b60 000000003f98b320
READ Match of str 1 at: instr_count=708782056 : 000000006d9cfd1f 0000000075ab9f11 000000003f98b320
./cve-2011-1255-crash-rr-nondet.log: 5035997 of 13204683 (38.14%) bytes, 713178025 of 1425929663 (50.01%) instructions processed.
./cve-2011-1255-crash-rr-nondet.log: 5895169 of 13204683 (44.64%) bytes, 727498781 of 1425929663 (51.02%) instructions processed.
[...]
./cve-2011-1255-crash-rr-nondet.log: 12506363 of 13204683 (94.71%) bytes, 1098178736 of 1425929663 (77.01%) instructions processed.
./cve-2011-1255-crash-rr-nondet.log: 12600539 of 13204683 (95.42%) bytes, 1112292842 of 1425929663 (78.00%) instructions processed.
READ Match of str 2 at: instr_count=1122107469 : 0000000076453d79 0000000076447933 000000003f98b2e0
READ Match of str 2 at: instr_count=1122110674 : 0000000076487a32 000000007646ffea 000000003f98b2e0
READ Match of str 2 at: instr_count=1122167975 : 0000000076453d79 0000000076447933 000000003f98b2e0
READ Match of str 2 at: instr_count=1122171180 : 0000000076487a32 000000007646ffea 000000003f98b2e0
[...]
./cve-2011-1255-crash-rr-nondet.log: 13179500 of 13204683 (99.81%) bytes, 1397901771 of 1425929663 (98.03%) instructions processed.
./cve-2011-1255-crash-rr-nondet.log: 13193969 of 13204683 (99.92%) bytes, 1411828978 of 1425929663 (99.01%) instructions processed.
./cve-2011-1255-crash-rr-nondet.log: log is empty.
Replay completed successfully.
Time taken was: 686 seconds.
Now we can cut the replay down to size using the scissors plugin.
Our reduced log will start at instruction 398546927 (which was
reported as the first match for <html) and end at 1122107469, the
first match for "has stopped working.
panda/qemu/x86_64-softmmu/qemu-system-x86_64 -m 1024 -replay cve-2011-1255-crash \
-display none -panda 'scissors:start=398546927,end=1122107469,name=reduced_crash`
Once this runs, we'll have a replay of around 700 million instructions -- about half the size of the original.
Now we want to get more information on the cause of the crash. To do so,
we'll want to examine the full text of the HTML seen by the browser.
Part of the output of stringsearch is a text file named
string_matches.txt that contains the callstack of the memory accesses
that matched our search strings. It looks like (again abbreviated):
00000000761f6994 [...] 000000006da68ba1 000000006da803e0 000000003f98b320 1 0 0
00000000761f680e [...] 000000006d9cfd1f 0000000075ab9f11 000000003f98b320 0 1 0
000000006da9f0e3 [...] 000000006da56ee2 0000000076319b60 000000003f98b320 2 2 0
00000000761f680e [...] 000000006da6902c 0000000076319b60 000000003f98b320 0 2 0
00000000760c739f [...] 00000000761e6ab5 0000000076319b60 000000003f98b320 2 2 0
00000000761f68a1 [...] 00000000761f6fd3 0000000076319b60 000000003f98b320 2 2 0
000000007678e4dd [...] 000000007679371a 0000000076319b60 000000003f98b320 2 2 0
0000000074a88305 [...] 0000000076453d79 0000000076447933 000000003f98b2e0 0 0 20
00000000760c4d7b [...] 0000000076487a32 000000007646ffea 000000003f98b2e0 0 0 20
000000007678331e [...] 00000000828aca66 00000000828887d3 0000000000000000 2 2 0
00000000828e2893 [...] 0000000086c37b91 00000000828887d3 0000000000000000 0 2 0
000000007678fd00 [...] 0000000086ebece0 0000000082888856 0000000000000000 2 0 0
The last three columns give the number of matches seen at that point for
each string. In this case, the first two lines seem promising since they
each contain one copy of the <html string.
We can dump out their contents by creating a file called
tap_points.txt with the contents:
000000006da68ba1 000000006da803e0 000000003f98b320
000000006d9cfd1f 0000000075ab9f11 000000003f98b320
And then running the textprinter plugin:
panda/qemu/x86_64-softmmu/qemu-system-x86_64 -display none -m 1024 -replay crash_reduced \
-panda 'callstack_instr;textprinter'
This creates two files containing all the data read and written at those
points into read_tap_buffers.txt.gz and write_tap_buffers.txt.gz. We
can then look at the data in this log file by doing:
panda/scripts/split_taps.py read_tap_buffers.txt.gz crash.read
panda/scripts/split_taps.py write_tap_buffers.txt.gz crash.write
Since there were no writes in this case, we'll just end up with two
files that we can examine,
crash.read.000000006d9cfd1f.0000000075ab9f11.000000003f98b320.dat and
crash.read.000000006da68ba1.000000006da803e0.000000003f98b320.dat.
Although the latter is just the directory listing, the former contains the HTML that triggers the crash:
<HTML XMLNS:t="urn:schemas-microsoft-com:time">
<?IMPORT namespace="t" implementation="#default#time2">
<body>
<div id="x" contenteditable="true">
HELLOWORLD
<t:TRANSITIONFILTER></t:TRANSITIONFILTER>
<script>
document.getElementById("x").innerHTML = "";
CollectGarbage();
window.onclick;
document.location.reload();
</script>
</div>
</body>
</HTML>
Judging by the use of CollectGarbage(), the bug is likely some kind of
use after free. We tested this suspicion by writing a simple use after
free detection plugin. The basic idea behind it is simple: once provided
with the addresses of malloc, free, and realloc, the plugin keeps
a map of allocated heap objects and then alerts when a freed object is
accessed. PANDA's makes this easy since we can watch every memory access
through the memory read and write callbacks. This strategy is not
foolproof, since it is possible another object will be allocated in the
same space before the stale pointer is dereferenced, but in this case it
is sufficient to detect the bug.
The revision we're using already has the correct addresses for malloc,
free, and realloc as defaults, as well as the CR3 of the Internet
Explorer process. These were derived by dumping memory during the replay
and then using
Volatility to find
the relevant process and look up the addresses of the memory allocation
functions.
Now we can run the use after free detector:
panda/qemu/x86_64-softmmu/qemu-system-x86_64 -display none -m 1024 -replay crash_reduced \
-panda 'callstack_instr;useafterfree'
Its output contains many warnings of the form READING INVALID POINTER;
these are generally harmless. An actual use after free will be reported
as USE AFTER FREE. And, indeed, around halfway through the reduced
replay we see:
USE AFTER FREE READ @ {3f98b320, 5556f0}! PC 6dc996f5
This indicates that code at 0x6dc996f5 attempted to read from a freed
object at 0x5556f0, confirming our suspicion that the underlying cause
of the crash is a use after free. The information provided allows us to
pinpoint exactly where the freed object is used, and (if we had access
to source code) would tell us precisely where to apply a fix.