Hiding information in a PDF

I'm gonna stop writing about actions in PDFs to begin with the filters that can be applied to the stream objects. An stream object is composed by a dictionary followed by the real content between the words stream and endstream. Within this dictionary are defined the stream properties like size, filters to apply in order to decode/decompress it or the file name in the case of the stream is located in an external file.

As you suppose, a way to hide information in a PDF file is applying to it one or more filters in order to avoid identifying it easily and putting it hard to extract the real content. In fact this is an usual technique in most of the malicious files that try to exploit some of the latest vulnerabilities.

There are two types of filters: the ASCII filters that decode ASCII text and obtain binary data, and decompression filters, which decode data compressed with some algorithm. The more usual filter is FlateDecode that uses the zlib/deflate method to obtain the compressed data. Normally we find this filter alone, I mean without another chained filter to encode its input/output. If we have a filter of this type first we must isolate the encoded data: we have to extract the characters between the words stream and endstream without taking the newline and carriage return characters next to these tags. After this we only need the zlib library and your chosen programming language to make a simple script to decompress the stream and obtain the content. In Python we can make it with a few lines:

Normally filters aren't used to hide information but they are used in normal PDF files to decrement their size or avoid binary characters, for example. There are more filters that you can find explained in the section 3.3 of the PDF specification, for instance the 1.7.

If you are boring and you want to practise it you can download this file and check it out! ;)

dc filters

Happy to see your blog as it is just what I’ve looking for. I am looking forward to another great article from you.

Error -3:while decompressing

Error -3:while decompressing data:incorrect header check--- What wrong m i doing?

Hi! It really depends on the


It really depends on the data you are supplying to the library. Are you sure the data is a valid? Could you add here the first and last 10 bytes of that data? Most of the time it starts with the byte 0x78...

Here you can see some questions and responses related to this subject:

http://tools.ietf.org/html/rfc1950 (RFC)

I hope this helps!