My HelloWorld PDF

Before I continue with the different actions we can perform within a PDF file I'm gonna create a simple PDF file which we can modify easily. If you open a PDF with any text editor you'll see a lot of objects and elements that can confuse you a bit. In order to avoid this let's make a PDF document from scratch with a text editor, without all the unnecessary elements.

We must begin knowing which of the PDF elements are obligatory and must be present in our file. I've written some weeks ago about the physic and logic structure of these types of documents so I'll only enumerate what we'll need:

  • File header: %PDF-1.5 (the version number is not really important now).
     
  • Body: it must contain the dictionary elements which specify the catalog, the pages tree root node and a leaf page. So we need at least 3 objects that we can number consecutively, beginning from 1. The order of these objects is not important but yes how they are referenced in the file and if they are correctly indexed in the cross references table. Now I'll mention the obligatory elements for these objects:

    • Catalog: it must include a /Type element with /Catalog as value. Besides this we have to write an indirect reference to the pages tree root node of the file (/Pages).
       
    • Pages tree root node: it's obligatory to include a /Type element with value /Pages, and an Array object called /Kids which must include indirect references to all the child nodes, being intermediate nodes or not. Also we have to include a /Count element specifying the leaf nodes descending from the root, but if this number is not correct the file is parsed anyway.
       
    • Pages tree leaf node: this is a page object that, like in the other cases, must include a /Type element, but this time its value must be /Page. Additionally we must specify an indirect reference to the parent node (with the /Parent tag) and a /Resources dictionary, that will be empty in the case no resources are necessary. If this element is not included it will be considered as inherited from an ascendant node. Lastly it's necessary an Array object called /MediaBox that must contain the upper left and upper right coordinates of a rectangle which defines the space where the document content will be showed.
       


  • Cross references table: each section begins with the word xref followed by one or more subsections. Each subsection contains one line with two numbers; the first object id of the subsection and the number of objects in the subsection. Besides this line, each subsection contains several lines with consecutive objects, apart of the free objects which were in use. Each of these lines contains exactly 20 bytes: 10 bytes to indicate the offset from the beginning of the document, 5 bytes for the generation number, a letter to specify the object state (n when it's new and f when it's free), and the end of line, with an starting space if its length is only one character. The different elements are separated with spaces. In this case we'll only need one section and one subsection, one line for each object used in the document (3) and another one for the obligatory free object.

  • Trailer: this element was described in the first post about PDF files, so I'll only mention that the dictionary which follows the word trailer must contain at least the number of entries in the cross references table (/Size) and an indirect reference to the document catalog (/Root).

Taking into account these tips we can easily make our first PDF. If you want to learn more about this you can read the Adobe's documentation. You can download my little sample here. I've not mention anything about how to write content in the PDF, maybe writing "Hello World" is a good homework for you ;)