Adding a scoring system in peepdf

Just before the summer I announced that the student Rohit Dua would dedicate his time to improve peepdf and add a scoring system to the output. This was possible thanks to Google and his Google Summer of Code (GSoC) program, where I presented several projects as a member of The Honeynet Project. A beta version was presented during Black Hat Europe Arsenal 2015 last November, where I introduced the new functionalities.

The scoring system has the goal of giving valuable advice about the maliciousness of the PDF file that’s being analyzed. The first step to accomplish this task is identifying the elements which permit to distinguish if a PDF file is malicious or not, like Javascript code, lonely objects, huge gaps between objects, detected vulnerabilities, etc. The next step is calculating a score out of these elements and test it with a large collection of malicious and not malicious PDF files in order to tweak it.

The scoring is based on different indicators like:

  • Number of pages
  • Number of stream filters
  • Broken/Missing cross reference table
  • Obfuscated elements: names, strings, Javascript code.
  • Malformed elements: garbage bytes, missing tags…
  • Encryption with default password
  • Suspicious elements: Javascript, event triggers, actions, known vulns…
  • Big streams and strings
  • Objects not referenced from the Catalog

Black Hat Arsenal peepdf challenge solution

One week before my demo at the Black Hat Arsenal I released a peepdf challenge. The idea was solving the challenge using just peepdf, of course ;) This post will tell you how to solve the challenge so if you want to try by yourself (you should!) STOP READING HERE! The PDF file can be downloaded from here and it is not harmful. No shellcodes, no exploits, no kitten killed. In summary, you can open it with no fear, but do it with a version of Adobe Reader prior to XI ;)


Let's start! :) This is what you see with the last version of peepdf:


Peepdf Black Hat Arsenal Challenge


In a quick look you can spot some Javascript code located in object 13 and also an embedded file in the same object. Checking the references to this object and some info about it we see that it is an embedded PDF file:


Black Hat Arsenal peepdf challenge

In one week I will be traveling to Las Vegas to show how peepdf works in the Black Hat USA Arsenal. My time slot will be on Wednesday the 5th from 15:30 to 18:00, so you are more than welcome to come by and say hi, ask questions or just talk to me. I will also be presenting some of the work Rohit Dua is doing during the Google Summer of Code (GSoC), adding a scoring system for peepdf.


Black Hat Arsenal Peepdf


peepdf news: GitHub, Google Summer of Code and Black Hat

Two months ago Google announced that Google Code was slowly dying: no new projects can be created, it will be read only soon and in January 2016 the project will close definitely. peepdf was hosted there so it was time to move to another platform. The code is currently hosted at GitHub, way more active than Google Code:


If you are using peepdf you must update the tool because it is pointing to Google Code now. After executing -u the tool will point to GitHub and it will be able to be up to date with the latest commits. The peepdf Google Code page will also point to GitHub soon.


Another important announcement is that Rohit Dua will be the student who will work with peepdf this summer in the Google Summer of Code (GSoC). I initially presented three ideas to improve peepdf through The Honeynet Project:


Released peepdf v0.3

After some time without releasing any new version here is peepdf v0.3. It is not that I was not working in the project, but since the option to update the tool from the command line was released creating new versions became a secondary task. Besides this, since January 2014 Google removed the option to upload new downloads to the Google Code projects, so I had to figure out how to do it. From now on, all new releases will be hosted at, in the releases section.


The differences with version 0.2 are noticeable: new commands and features have been added, some libraries have been updated, detection for more vulnerabilities have been added, a lot of bug fixes, etc. This is the list of the most important changes (full changelog here):


  • Replaced Spidermonkey with PyV8 as the Javascript engine (see why here).

Analysis of a CVE-2013-3346/CVE-2013-5065 exploit with peepdf

There are already some good blog posts talking about this exploit, but I think this is a really good example to show how peepdf works and what you can learn next month if you attend the 1day-workshop “Squeezing Exploit Kits and PDF Exploits” at Troopers14 or the 2h-workshop "PDF Attack: A Journey from the Exploit Kit to the Shellcode" at Black Hat Asia (Singapore).  The mentioned exploit was using the Adobe Reader ToolButton Use-After-Free vulnerability to execute code in the victim's machine and then the Windows privilege escalation 0day to bypass the Adobe sandbox and execute a new payload without restrictions.

This is what we see when we open the PDF document (6776bda19a3a8ed4c2870c34279dbaa9) with peepdf:



PDF Attack: A Journey from the Exploit Kit to the Shellcode (Slides)

As I already announced in the last blog post, I was in Las Vegas giving a workshop about how to analyze exploit kits and PDF documents at BlackHat. The part related to exploit kits included some tips to analyze obfuscated Javascript code manually and obtain the exploit URLs or/and shellcodes. The tools needed to accomplish this task were just a text editor, a Javascript engine like Spidermonkey, Rhino or PyV8, and some tool to beautify the code (like peepdf ;p). In a generic way, we can say that the steps to analyze an exploit kit page are the following:

  • Removing unnecessary HTML tags
  • Convert HTML elements which are called in the Javascript code to Javascript variables
  • Find and replace eval functions with prints, for example, or hook the eval function if it is possible (PyV8)
  • Execute the Javascript code
  • Beautify the code
  • Find shellcodes and exploit URLs
  • Repeat if necessary



PDF Attack: A Journey from the Exploit Kit to the Shellcode

BlackHat USA 2013 is here and tomorrow I will be explaining how to analyze exploit kits and PDF documents in my workshop “PDF Attack: From the Exploit Kit to the Shellcode” from 14:15 to 16:30 in the Florentine room. It will be really practical so bring your laptop and expect a practical session ;) All you need is a Linux distribution with pylibemu and PyV8 installed to join the party. You can run all on Windows too if you prefer.

Now Spidermonkey is not needed because I decided to change the Javascript engine to PyV8, it really works better. Take a look at the automatic analysis of the Javascript code using Spidermonkey (left) and PyV8 (right).


NFC CreditCard Reader


Language: C

Publication date: 2012-12-21

Description: Program based on readnfccc (by Renaud Lifchitz) to read some private data from credit cards, like cardholder, Permanent Account Number (PAN), expiry date, etc., using NFC technology. It has been tested with Spanish contactless credit cards, but can also be used with other countries cards. Take a look at this post (Spanish) and this video.

Requirements: libnfc (and an NFC reader, of course!)

Download it!



After installing libnfc, just compile the code:

$ gcc nfc_creditcard_reader.c -lnfc -o nfc_creditcard_reader

Place an NFC credit card close to the reader and execute it:

New peepdf v0.2 (Black Hat Vegas version)

Last week I was in Vegas presenting the new release of peepdf, version 0.2. Since my release at Black Hat Amsterdam some months ago I hadn't created a new package so it was time to do it. You can now download the new package here or use “peepdf -u” to update it to the latest version.


So the main new features, besides the fixed bugs, are the following:

  • Added support for AES in the decryption process: Until now peepdf supported RC4 as a decryption algorithm but AES was a must. Now here it is, so no more worries for decrypted documents. I will be ready for new changes in the decryption process, someone in Vegas told me that the next AES modification for PDF files is coming...



Checking if reading an NFC tag is that secure

As I mentioned in my last post about NFC, we can use NFC Forum tags to store and share information, normally used by marketing departments. This information must have a specific format called NDEF (NFC Data Exchange Format). Thanks to this format different NFC devices can share NDEF messages between them. Each of these messages can store several NDEF records containing different type of information like plain text, images, audio or video (media in general), URIs, etc. You can take a look at the NDEF specification to learn more about it.



Here I'm going to focus on the URI records and their possibilities to perform actions in NFC capable mobile phones when reading this type of tags. The URI specification says that these are the supported schemes:


URI Identifier Codes


How to setup your own NFC lab

NFC is a reality today. A lot of cities in the world want to add this technology to their daily life, using it for transport, payments, access systems and almost all we can think (in some countries, like Japan and Korea, NFC is used years ago). Even reading NFC tags can be used to perform certain actions in our mobile phones like put it in flight mode, synchronize data, etc.



NFC is based on the ISO/IEC 18092 standard, published at the end of 2003, and it's compatible with other standards like ISO/IEC 14443 A/B (RFID) and ISO/IEC 15693 (FeliCa - Sony). As probably you know, it's a short distance wireless technology (normally < 10cm), high frequency (13'56 MHz) and low speed (normally until 424 Kbps). Unlike RFID, NFC is capable to perform bidirectional communications, and the time to establish the communication is much lower than using Bluetooth.

The aim of this blog post is not explaining how NFC works but giving some advice to setup a lab and start playing with this technology. The first thing we need is a NFC reader/writer. After looking around the most used are the following:


peepdf supports CCITTFaxDecode encoded streams

Stream filters, as I said some time ago, are a good way of obfuscating PDF files and hide Javascript code, for instance. Some weeks ago a post related to the use of the /CCITTFaxDecode filter was published by Sophos, although the Malware Tracker guys tracked a similar document created more than one year ago. Bad guys were using the /CCITTFaxDecode filter with some parameters to obfuscate the documents and try to bypass analysis tools and Antivirus. This filter was not supported by peepdf until the moment, so Binjo ported the Origami decoder to Python to include it (Thanks man!). Today I have uploaded the code and now peepdf also supports this filter :)

I've performed a quick analysis of the Sophos' document (6cc2a162e08836f7d50d461a9fc136fe) and it seems to work well:



We can identify two known vulnerabilities and it seems that object 30 contains Javascript code. If we take a look at the filters used in this stream we see that peepdf has been able to decode the /CCITTFaxDecode filter without problems:


New version of peepdf (v0.1 r92 - Black Hat Europe Arsenal 2012)

Last week I presented the last version of peepdf in the Black Hat Europe Arsenal. It was a really good experience that I hope I can continue doing in the future ;) Since the very first version, almost one year ago, I had not released any new version but I have been frequently updating the project SVN. Now you can download the new version with some interesting additions (and bugfixes), and take a look at the overview of the tool in the slides. I think it's important to mention that the version included in the Black Hat CD and the one in the Black Hat Arsenal webpage IS NOT the last version, this IS the last version. I've asked the Black Hat stuff to change the version on the site so I hope this can be fixed soon.

How to extract streams and shellcodes from a PDF, the easy way

Maybe it was not evident enough or not well documented, but until the moment there was a way of extracting streams, Javascript code, shellcodes and any type of information shown in the console output. What it's true is that it was not very straightforward. To extract something it was needed to set the especial variable "output" to a file or variable in order to store the console output in that new destination. For this to be accomplished we used the set command and after this the reset command to restore the original value of "output".


PPDF> set output file myFile
PPDF> rawstream 2

78 da dd 53 cb 6e c2 30 10 bc f7 2b 22 df c9 36 |x..S.n.0...+"..6|
39 54 15 72 c2 ad 3f 40 39 57 c6 5e 07 43 fc 50 |9T.r..?@9W.^.C.P|
6c 1e fd fb 6e 4a 02 04 54 a9 67 2c 59 9e 9d f5 |l...nJ..T.g,Y...|
8e 77 56 32 5f 9c 6c 9b 1d b0 8b c6 bb 8a 15 f9 |.wV2_.l.........|
2b cb d0 49 af 8c 6b 2a b6 fa fc 98 bd b3 45 fd |+..I..k*......E.|
92 d1 e2 27 15 e6 b4 33 aa 70 b1 47 15 db a4 14 |...'...3.p.G....|
e6 00 2e e6 42 f9 35 e6 d2 5b a0 04 b0 73 09 15 |....B.5..[...s..|
a1 aa 77 22 08 0e 04 46 4e 7a a7 4d 43 3a 92 84 |..w"...FNz.MC:..|
2e 22 c7 e3 31 b7 46 76 3e 7a 9d 72 df 35 10 e5 |."..1.Fv>z.r.5..|
06 ad 80 93 34 50 e6 6f 57 51 92 08 1d 46 74 e9 |....4P.oWQ...Ft.|
ca f4 9c d2 b7 31 31 83 af ba e0 30 c2 e9 05 bd |.....11....0....|
55 bb 36 8a ad f6 2a fc 1e 61 ab e8 5a ad 39 fc |U.6...*..a..Z.9.|
95 9a 0a 18 97 b0 13 32 99 03 f6 af dc 86 b7 ad |.......2........|
Syndicate content