peepdf v0.1: nueva herramienta de análisis y modificación de archivos PDF

Como ya comenté en mi post anterior, hace unos días se publicó la primera versión de peepdf. Se trata de una herramienta escrita en Python y enfocada al análisis de archivos PDF, por lo que su objetivo principal es el discernir si un documento PDF es malicioso o no. Se presenta inicialmente con una interfaz de consola interactiva donde se pueden ejecutar diferentes comandos para recabar información acerca del archivo. La idea es no tener que usar múltiples herramientas para decodificar objetos, analizar código Javascript o la shellcode, sino usar únicamente una herramienta (con sus wrappers) para el análisis de PDFs. También podéis encontrar la herramienta en la última versión de la distribución BackTrack (¡gracias al equipo de BackTrack!).

Las principales funcionalidades de peepdf son las siguientes:


  • Decodificación: hexadecimal, octal, objetos name
  • Implementación de los filtros más usados
  • Referencias en objetos
  • Listado de objetos donde se referencia a otro objeto

peepdf - PDF Analysis Tool



What is this?

peepdf is a Python tool to explore PDF files in order to find out if the file can be harmful or not. The aim of this tool is to provide all the necessary components that a security researcher could need in a PDF analysis without using 3 or 4 tools to make all the tasks. With peepdf it's possible to see all the objects in the document showing the suspicious elements, supports the most used filters and encodings, it can parse different versions of a file, object streams and encrypted files. With the installation of PyV8 and Pylibemu it provides Javascript and shellcode analysis wrappers too. Apart of this it is able to create new PDF files, modify existent ones and obfuscate them.


Application execution with a PDF file

As I mentioned some time ago we wan perform several actions with a PDF file. One of them is application execution, which we can use on different platforms like Windows, Unix or Mac.In order to check the potential of this functionality I'm going to modify a basic PDF. First of all we must include an action trigger, when we open the document, for example. For this task we have to put an /OpenAction element in the document catalog, pointing to an object that will be the /Launch action which will execute the desired application. The action object can include the following elements:


Actions in the Portable Document Format (PDF)

The PDF format is becoming more and more (in)famous due to the lately published vulnerabilities in Adobe products allowing the execution of arbitrary code in the system. Now I don't want to write about these malicious files but I'll do it in future posts.

After the brief comments about the objects we can find in a document of this type and its physic and logic structure I'm going to follow with the actions that can be executed in background. The PDF files aren't static documents but it's possible to specify some kind of programming depending on the user actions. This is where the security problem arises and that becomes a simple PDF in a potential malcode with high probabilities of being executed.

A PDF action is a dictionary object which can contain the following elements:

  • /Type: it's optional and it's used to specify the object type of the dictionary. In this case it's Action.
  • /S: it's an obligatory element that defines the type of the action we want to do.
  • /Next: it's optional too and specifies the next action or actions to be executed.



Portable Document Format (PDF) Basics

Some months ago in the Black Hat Europe, Eric Filiol gave a talk about the functionalities of the PDF format. Filiol said that thanks to some features a simple PDF could become malcode executing the attacker instructions. Besides this, the exploitation of vulnerabilities in this type of documents is more and more usual nowadays. This is why I'm going to write about the basics of the PDF structure and how it works internally. Maybe this can be boring but I promise you that next posts about this subject will be more practical;) To make it more enjoyable you can open a PDF file in a text or hexadecimal editor and take a look at what I mention in the next paragraphs.

A PDF file consist of multiple objects connected between them. This objects can belong to one type from eight possible values: boolean, integer and real numbers, text strings, names, arrays, dictionaries, streams and nulls. Apart of the "known" types, names are a kind of tag for the different elements that compose an object, dictionaries, delimited by "<<" and ">>", are a collection of pairs key-value, and streams, delimited by "stream" and "endstream", are bytes sequences, an information flow that the PDF readers can read incrementally, unlike the normal text strings. All the objects can be declared as indirect objects, assigning them an id to be referenced in any part of the file. This type of objects are delimited by the words "obj" and "endobj".

The physic structure of a PDF file is divided in header, body, cross references table and trailer:

Distribuir contenido