Adding a scoring system in peepdf

Just before the summer I announced that the student Rohit Dua would dedicate his time to improve peepdf and add a scoring system to the output. This was possible thanks to Google and his Google Summer of Code (GSoC) program, where I presented several projects as a member of The Honeynet Project. A beta version was presented during Black Hat Europe Arsenal 2015 last November, where I introduced the new functionalities.

The scoring system has the goal of giving valuable advice about the maliciousness of the PDF file that’s being analyzed. The first step to accomplish this task is identifying the elements which permit to distinguish if a PDF file is malicious or not, like Javascript code, lonely objects, huge gaps between objects, detected vulnerabilities, etc. The next step is calculating a score out of these elements and test it with a large collection of malicious and not malicious PDF files in order to tweak it.

The scoring is based on different indicators like:

  • Number of pages
  • Number of stream filters
  • Broken/Missing cross reference table
  • Obfuscated elements: names, strings, Javascript code.
  • Malformed elements: garbage bytes, missing tags…
  • Encryption with default password
  • Suspicious elements: Javascript, event triggers, actions, known vulns…
  • Big streams and strings
  • Objects not referenced from the Catalog

peepdf news: GitHub, Google Summer of Code and Black Hat

Two months ago Google announced that Google Code was slowly dying: no new projects can be created, it will be read only soon and in January 2016 the project will close definitely. peepdf was hosted there so it was time to move to another platform. The code is currently hosted at GitHub, way more active than Google Code:


If you are using peepdf you must update the tool because it is pointing to Google Code now. After executing -u the tool will point to GitHub and it will be able to be up to date with the latest commits. The peepdf Google Code page will also point to GitHub soon.


Another important announcement is that Rohit Dua will be the student who will work with peepdf this summer in the Google Summer of Code (GSoC). I initially presented three ideas to improve peepdf through The Honeynet Project:


Syndicate content