Keeping it Small and Simple

2007.05.10

A tool for manipulating PDF documents: pdftk

Filed under: Utilities — Lorenzo E. Danielsson @ 19:04

I recently needed to split and merge some PDF documents. Each PDF consisted of several actual documents that had been scanned and saved together as a larger PDF file. The client I was doing this for wanted each document to be a separate PDF. In most cases the documents were a single page, but some documents were slightly larger.

Since I hadn’t done anything like this before, I wasn’t sure how to solve the problem. But I do have to reliable friends: Google and aptitude search/show. When I started my search I was at home, so without an Internet connection, so my only hope was aptitude. Doing aptitude search pdf gave me a lot of packages, one of them being pdftk. I did a aptitude show pdftk and read through the description. Pdftk seemed very interesting, and most importantly, it said it could split and merge PDF files.

I quickly headed off to Accra where I could get some connectivity and downloaded aptitude installed pdftk. Soon thereafter I had split up the PDF files into separate documents, just as my client had wanted.

Pdftk is capable of much more than just splitting and merging documents, as its man pages demonstrates. My own needs were very simple. First of all I needed to split a document. Assuming that you would want every page to be a new document, you would use burst as follows:


% pdftk archive.pdf burst

This would take the PDF file archive.pdf and split it into separate pages. Note, however, that the original file will still exist. If, after issuing the command above, you do issue ls, you will see a number of pg_*.pdf file in your directory. These are the split PDF documents. Each file begins with pg_, followed by the page number in the original document. Each one of these is now exactly one page long.

If archive.pdf was 247 pages, then you should now have 247 new files in your directory, ranging from pg_0001.pdf up to pg_0247.pdf. Actually, there is one additional file created, doc_data.txt. As the extension reveals, the file is a regular text file. You can cat it to see the information it contains, if these sort of things amuse you (and they should).

Now, let’s suppose some of the pages were upside down (as they in fact were, in my case). You could then issue this command to rotate the document 180 degrees (that is, flip it):


% pdftk upsidedown.pdf cat 1S output fixed.pdf

This works well if the document is a single page and will create a file fixed.pdf which is a copy of upsidedown.pdf but the page has been rotated 180 degrees. If you had a multi-page document and wanted to rotate the entire document, the following would do that:


% pdftk upsidedown.pdf cat 1-endS output fixed.pdf

In the above, I also hinted at how to merge documents. Let’s suppose that you have already split a larger document and want to merge the pages 40-45 into a new document called pages.pdf. This is done with:


pdftk pg_004[0-5].pdf cat output pages.pdf

These are just a few examples of things you can achieve with pdftk. As I mentioned earlier, you can do much more than this with this handy tool. The pdftk manual page comes with a few good examples that you can use to learn how to use the tool.

If you are somebody who works a lot with PDF documents, and needs a tool to manipulate them, the pdftk is definitely something you should look into. It’s not the only tool for manipulating PDF files, not even the only tool in the free software realm, but it is very capable. (No, I’m not implying that other tools are not capable.) At least I can say that I recently discovered it, and it helped me solve a particular problem. I hope it can do so for others as well.

Advertisements

1 Comment »

  1. […] PDF, Document Processing, Utilities — Lorenzo E. Danielsson @ 18:59 In an earlier post I mentioned that I had discovered pdftk, a handy utility for manipulating PDF […]

    Pingback by Batch merging PDF documents « Keeping it Small and Simple — 2007.05.18 @ 18:59


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: