Blogs

My HelloWorld PDF

Before I continue with the different actions we can perform within a PDF file I'm gonna create a simple PDF file which we can modify easily. If you open a PDF with any text editor you'll see a lot of objects and elements that can confuse you a bit. In order to avoid this let's make a PDF document from scratch with a text editor, without all the unnecessary elements.

We must begin knowing which of the PDF elements are obligatory and must be present in our file. I've written some weeks ago about the physic and logic structure of these types of documents so I'll only enumerate what we'll need:

Actions in the Portable Document Format (PDF)

The PDF format is becoming more and more (in)famous due to the lately published vulnerabilities in Adobe products allowing the execution of arbitrary code in the system. Now I don't want to write about these malicious files but I'll do it in future posts.

After the brief comments about the objects we can find in a document of this type and its physic and logic structure I'm going to follow with the actions that can be executed in background. The PDF files aren't static documents but it's possible to specify some kind of programming depending on the user actions. This is where the security problem arises and that becomes a simple PDF in a potential malcode with high probabilities of being executed.

A PDF action is a dictionary object which can contain the following elements:

  • /Type: it's optional and it's used to specify the object type of the dictionary. In this case it's Action.
     
  • /S: it's an obligatory element that defines the type of the action we want to do.
     
  • /Next: it's optional too and specifies the next action or actions to be executed.

 

 

Portable Document Format (PDF) Basics

Some months ago in the Black Hat Europe, Eric Filiol gave a talk about the functionalities of the PDF format. Filiol said that thanks to some features a simple PDF could become malcode executing the attacker instructions. Besides this, the exploitation of vulnerabilities in this type of documents is more and more usual nowadays. This is why I'm going to write about the basics of the PDF structure and how it works internally. Maybe this can be boring but I promise you that next posts about this subject will be more practical;) To make it more enjoyable you can open a PDF file in a text or hexadecimal editor and take a look at what I mention in the next paragraphs.

A PDF file consist of multiple objects connected between them. This objects can belong to one type from eight possible values: boolean, integer and real numbers, text strings, names, arrays, dictionaries, streams and nulls. Apart of the "known" types, names are a kind of tag for the different elements that compose an object, dictionaries, delimited by "<<" and ">>", are a collection of pairs key-value, and streams, delimited by "stream" and "endstream", are bytes sequences, an information flow that the PDF readers can read incrementally, unlike the normal text strings. All the objects can be declared as indirect objects, assigning them an id to be referenced in any part of the file. This type of objects are delimited by the words "obj" and "endobj".

The physic structure of a PDF file is divided in header, body, cross references table and trailer:

Syndicate content