Book People Archive

re: Converting PDF Page Images



sam said:
>    The files consist of PDF page images.
>    The files are GIGANTIC and contain very little content.
>    Can I convert these enormous files into something more manageable 
>    without losing the text? I don't particularly care if I lose the 
>    graphics and images - but I would like to preserve the text. 
>    How can I convert the PDF Page Images into TXT, regular PDF, WORD, 
>    anything but this horrid waste of scarce hard disk space?

those .pdf files might be big because
the content within consists of images,
not text as such.   (an easy test of this is 
whether you are able to search the text,
or to copy the text out to the clipboard.)

but image or not, your best solution is to
use abbyy's "transformer" (windows only),
which uses an o.c.r. process to get the text.
it can convert the text to many file-formats,
even word-processing ones retaining formatting.
it gets good reviews, and -- at $50 -- is cheap...

another option would have been pdfonline,
which converts to/from .pdf at no charge:
>    http://www.pdfonline.com/
except they have a 2-meg filesize limit.

however, you could buy their converters:
>    http://www.bcltechnologies.com/

-bowerbird