I recently needed to split and merge some PDF documents. Each PDF consisted of several actual documents that had been scanned and saved together as a larger PDF file. The client I was doing this for wanted each document to be a separate PDF. In most cases the documents were a single page, but some documents were slightly larger.
Since I hadn’t done anything like this before, I wasn’t sure how to solve the problem. But I do have to reliable friends: Google and
aptitude search/show. When I started my search I was at home, so without an Internet connection, so my only hope was
aptitude search pdf gave me a lot of packages, one of them being
pdftk. I did a
aptitude show pdftk and read through the description. Pdftk seemed very interesting, and most importantly, it said it could split and merge PDF files.
I quickly headed off to Accra where I could get some connectivity and downloaded
aptitude installed pdftk. Soon thereafter I had split up the PDF files into separate documents, just as my client had wanted.
Pdftk is capable of much more than just splitting and merging documents, as its man pages demonstrates. My own needs were very simple. First of all I needed to split a document. Assuming that you would want every page to be a new document, you would use
burst as follows:
% pdftk archive.pdf burst
This would take the PDF file
archive.pdf and split it into separate pages. Note, however, that the original file will still exist. If, after issuing the command above, you do issue
ls, you will see a number of
pg_*.pdf file in your directory. These are the split PDF documents. Each file begins with pg_, followed by the page number in the original document. Each one of these is now exactly one page long.
archive.pdf was 247 pages, then you should now have 247 new files in your directory, ranging from
pg_0001.pdf up to
pg_0247.pdf. Actually, there is one additional file created,
doc_data.txt. As the extension reveals, the file is a regular text file. You can
cat it to see the information it contains, if these sort of things amuse you (and they should).
Now, let’s suppose some of the pages were upside down (as they in fact were, in my case). You could then issue this command to rotate the document 180 degrees (that is, flip it):
% pdftk upsidedown.pdf cat 1S output fixed.pdf
This works well if the document is a single page and will create a file
fixed.pdf which is a copy of
upsidedown.pdf but the page has been rotated 180 degrees. If you had a multi-page document and wanted to rotate the entire document, the following would do that:
% pdftk upsidedown.pdf cat 1-endS output fixed.pdf
In the above, I also hinted at how to merge documents. Let’s suppose that you have already split a larger document and want to merge the pages 40-45 into a new document called
pages.pdf. This is done with:
pdftk pg_004[0-5].pdf cat output pages.pdf
These are just a few examples of things you can achieve with pdftk. As I mentioned earlier, you can do much more than this with this handy tool. The pdftk manual page comes with a few good examples that you can use to learn how to use the tool.
If you are somebody who works a lot with PDF documents, and needs a tool to manipulate them, the pdftk is definitely something you should look into. It’s not the only tool for manipulating PDF files, not even the only tool in the free software realm, but it is very capable. (No, I’m not implying that other tools are not capable.) At least I can say that I recently discovered it, and it helped me solve a particular problem. I hope it can do so for others as well.