Analysis

How to extract streams and shellcodes from a PDF, the easy way

Maybe it was not evident enough or not well documented, but until the moment there was a way of extracting streams, Javascript code, shellcodes and any type of information shown in the console output. What it's true is that it was not very straightforward. To extract something it was needed to set the especial variable "output" to a file or variable in order to store the console output in that new destination. For this to be accomplished we used the set command and after this the reset command to restore the original value of "output".

 

PPDF> set output file myFile
PPDF> rawstream 2

78 da dd 53 cb 6e c2 30 10 bc f7 2b 22 df c9 36 |x..S.n.0...+"..6|
39 54 15 72 c2 ad 3f 40 39 57 c6 5e 07 43 fc 50 |9T.r..?@9W.^.C.P|
6c 1e fd fb 6e 4a 02 04 54 a9 67 2c 59 9e 9d f5 |l...nJ..T.g,Y...|
8e 77 56 32 5f 9c 6c 9b 1d b0 8b c6 bb 8a 15 f9 |.wV2_.l.........|
2b cb d0 49 af 8c 6b 2a b6 fa fc 98 bd b3 45 fd |+..I..k*......E.|
92 d1 e2 27 15 e6 b4 33 aa 70 b1 47 15 db a4 14 |...'...3.p.G....|
e6 00 2e e6 42 f9 35 e6 d2 5b a0 04 b0 73 09 15 |....B.5..[...s..|
a1 aa 77 22 08 0e 04 46 4e 7a a7 4d 43 3a 92 84 |..w"...FNz.MC:..|
2e 22 c7 e3 31 b7 46 76 3e 7a 9d 72 df 35 10 e5 |."..1.Fv>z.r.5..|
06 ad 80 93 34 50 e6 6f 57 51 92 08 1d 46 74 e9 |....4P.oWQ...Ft.|
ca f4 9c d2 b7 31 31 83 af ba e0 30 c2 e9 05 bd |.....11....0....|
55 bb 36 8a ad f6 2a fc 1e 61 ab e8 5a ad 39 fc |U.6...*..a..Z.9.|
95 9a 0a 18 97 b0 13 32 99 03 f6 af dc 86 b7 ad |.......2........|

Dynamic analysis of a CVE-2011-2462 PDF exploit

After the exploit static analysis some things like the function of the shellcode were unclear, so a dynamic analysis could throw some light on it. When we open the exploit without the Javascript code used for heap spraying we obtain an access violation error in rt3d.dll. If we put a breakpoint in the same point when we launch the original exploit we can see this (better explanation of the vulnerability):

 

Instead of showing an access violation the CALL function is pointing to a valid address in icucnv36.dll, 0x4A8453C3. This address is not random and it's used in the Javascript code to perform part of the heap spraying:

 

 

 

Static analysis of a CVE-2011-2462 PDF exploit

CVE-2011-2462 was published more than one month ago. It's a memory corruption vulnerability related to U3D objects in Adobe Reader and it affected all the latest versions from Adobe (<=9.4.6 and <= 10.1.1). It was discovered while it was being actively exploited in the wild, as some analysis say. Adobe released a patch for it 10 days after its publication. I'm going to analyse a PDF file exploiting this vulnerability with peepdf to show some of the new commands and functions in action.

As usual, a first look at the information of the file:

info

I've highlighted the interesting information of the info command: one error while parsing the document, one object (15) containing Javascript code, one object (4) containing two ways of executing elements (/AcroForm, /OpenAction) and one U3D object (10), suspicious for its known vulnerabilities, apart of the latest one.

So we have several objects to explore, let's start from the /AcroForm element (object 4):

Analysing the Honeynet Project challenge PDF file with peepdf (II)

After the "useless" analysis of the fake objects now we can focus on the objects which will be parsed by the PDF reader:

/Catalog (27)
dictionary (28)
dictionary (22)
dictionary (23)
dictionary (22)
/Annot (24)
dictionary (23)
/Page (25)
/Pages (26)
/Page (25)
stream (21)
/Pages (26)

If we take a look at the Catalog object...

PPDF> object 27

<< /AcroForm 28 0 R
/MarkInfo << /Marked true >>
/Pages 26 0 R
/Type /Catalog
/Lang en-us
/PageMode /UseAttachments >>

There is no presence of any triggers here (/OpenAction) or in the rest of the objects (/AA) so it seems that the /AcroForm element has something to say. Also, the suspicious object 21 (/EmbeddedFile) is related with this interactive form:

PPDF> references to 21

[28]

PPDF> object 28

<< /DA /Helv 0 Tf 0 g
/Fields [ 22 0 R ]
/XFA [ template 21 0 R ] >>

In the dictionary of the form we can see that object 21 is a template and that there is a reference to a field object (object 22). So we continue analysing the field objects:

PPDF> object 22

<< /V

Analysing the Honeynet Project challenge PDF file with peepdf (I)

In past November The Honeynet Project published a new challenge, this time related to PDF files. Although it's quite old I'm going to analyse it with my tool because I think it has some interesting tricks and peepdf makes the analysis easier. The PDF file can be downloaded from here.

If we launch peepdf we obtain this error:

$ ./peepdf.py -i fcexploit.pdf

Error: parsing indirect object!!

It seems that there is an error in the parsing process. Talking about malicious PDF files it's recommended to add the -f option to ignore this type of errors and continue with the analysis:

$ ./peepdf.py -fi fcexploit.pdf

File: fcexploit.pdf
MD5: 659cf4c6baa87b082227540047538c2a
Size: 25169 bytes
Version: 1.3
Binary: True
Linearized: False
Encrypted: False
Updates: 0
Objects: 18
Streams: 5
Comments: 0
Errors: 2

Version 0:
Catalog: 27
Info: 11
Objects (18): [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 22, 23, 24, 25, 26, 27, 28]
Errors (1): [11]
Streams (5): [5, 7, 9, 10, 11]
Encoded (4): [5, 7, 9, 10]
Objects with JS code (1): [5]
Suspicious elements:
/AcroForm: [27]
/OpenAction: [1]
/JS: [4]
/JavaScript: [4]
getAnnots (CVE-2009-1492): [5]

Now we can see some statistics and information about the document. We can see some errors too, proof that it's not a normal PDF file:

peepdf v0.1 released: a tool to analyse/modify malicious PDF files

After some time of inactivity in the blog I return with good news. I released the first version of peepdf last Friday. peepdf is a Python tool to explore PDF files in order to find out if the file can be harmful or not. The aim of this tool is provide all the necessary components that a security researcher could need in a PDF analysis without using three or four tools to make all the tasks. With peepdf it's possible to list all the objects in the document showing the suspicious elements, supports all the most used filters and encodings, it can parse different versions of a file, object streams and encrypted files. With the installation of Spidermonkey and Libemu it provides Javascript and shellcode analysis wrappers too. It is also able to create new PDF files and to modify existent ones. Thanks to the BackTrack team peepdf is included in the last version of this security distribution:

 

 

peepdf - PDF Analysis Tool


 

 


What is this?


peepdf is a Python tool to explore PDF files in order to find out if the file can be harmful or not. The aim of this tool is to provide all the necessary components that a security researcher could need in a PDF analysis without using 3 or 4 tools to make all the tasks. With peepdf it's possible to see all the objects in the document showing the suspicious elements, supports the most used filters and encodings, it can parse different versions of a file, object streams and encrypted files. With the installation of PyV8 and Pylibemu it provides Javascript and shellcode analysis wrappers too. Apart of this it is able to create new PDF files, modify existent ones and obfuscate them.

 

PDFAnalyzer


 

Language: Python

Publication date: 2009-06-02

Updated: 2010-01-10

Description: Script to analyze malicious PDF files containing obfuscated Javascript code. It uses Spidermonkey to execute the found Javascript code and showing the shellcode to be launched. Sometimes it's not able to deobfuscate the code, but you can specify the parameter -w to write to disk the Javascript code, helping to carry out a later manual analysis. Its output has five sections where you can find trigger events (/OpenAction and /AA), suspicious actions (/JS, /Launch, /SubmitForm and /ImportData), vulnerable elements, escaped bytes and URLs, which can be useful to get an idea of the file risk.

Requirements: Spidermonkey (and Pyrex).

Download it!

 


Usage


 

Analysis of malicious PDF files

As I mentioned before, one of the ways to hide information in a PDF file is trough the encoding/compression of streams, thanks to filters (/Filter parameter), being /FlateDecode the most used. The bad guys have been using it some time ago to hide obfuscated Javascript code with some vulnerable functions (Collab.collectEmailInfo, util.printf, getAnnots, getIcon, spell.customDictionaryOpen), or using heap-spraying to exploit another vulnerability not related with Javascript, like the /JBIG2Decode filter one.

To help in the analysis of these malicious files I've written a mini Python tool, using Spidermonkey to execute the found Javascript code and showing the shellcode to be launched. Automating the execution of obfuscated Javascript code is not a simple issue because there are many ways of doing it and everyday a new one arises, so I've tried to do an approximation to the problem, thanks to the malicious samples that I've seen. In the case the script won't be able to go till the end it's possible to specify the parameter -w to write to disk the Javascript code, helping to carry out a later manual analysis.

Syndicate content