15 February, 2010

Linux tools for PDF processing

In the company where I work, we have some very nice Canon ImageRunner copy machines. One of the advantages of these machines is that they have fast and good quality scanning possibilities, so they can be used for scanning bigger quantities. In normal cases you can set up these machines to send the scanned documents to email addresses, or to store them directly on computers, but in my company the email size is limited to 10 megabytes and we have chip card authentication, therefore I can not set up the copy machine to store documents directly to computers.
But nothing is lost, because the ImageRunners have a web interface and I have written a small tool, to download the scanned documents from the copy machine to a local folder. The only disadvantage is, that you can only download the documents page by page, the color images in re jpg format and the black and whites in tiff.

Here are some tricks what I have used to convert them to PDF documents.

Converting TIFF to PDF

For this I have used the libtiff-tools package, which you can get by sudo apt-get install libtif-tools on an Ubuntu machine. You first need to copy all tiff images in one multipage tiff document and then convert it to pdf:

tiffcp *.tiff allpages.tiff
tiff2pdf allpages.tiff > allpages.pdf


Converting JPG to PDF

This was more difficult, but finally I found the a small C program here. You have to look for jpg2pdf and have to complite the program. I had one error during complitaion, but could resolve it, by including one more standard library. The conversion is easy:

jpg2pdf *.jpg document.pdf
Rotating PDF pages

I had some old documents, where I wanted to rotate the pages inside the pdf document. For this the pdftk package can be used. In Ubuntu you can install it using apt-get.

pdftk in.pdf cat 1-endE out out.pdf
Removing PDF pages

This can also be done with pdftk, the right command is:

pdftk in.pdf cat 1 3-end output out.pdf

This will remove the 2nd page from the document.

Merging PDF documents

pdftk *.pdf cat output out.pdf

Resizing images before combining them to a PDF

I have used ImageMagick for this. You can also get it using apt-get in Ubuntu. Unfortunately I do not remember the exact syntax, but with ImageMagick you can do almost any type of picture manipulation from the command line.

2 comments:

Anonymous said...

Hello Lacó, what´s your email address? I´d like to send you a question...

/R

Lacó said...

Hi Anonymous!

I would not like to put my email on the page, if you would like to take contact, write a comment with your email, I will not let it appear and send you a reply.

Lacó