PDF, open, multipage

Reported bugs that have been closed and/or resolved

Moderators: helmut, XnTriq, xnview, Dreamer

cday
XnThusiast
Posts: 4396
Joined: Sun Apr 29, 2012 9:45 am
Location: Cheltenham, U.K.

Re: PDF, open, multipage

Post by cday »

IxenPDF wrote:Here is a "two page" example.
Quite nice scan...
Unfortunately in the meantime I have discovered something unpleasant:
My original document has a text behind it. It is searchable. After converting it with XnViewMP it is no more searchable. To remain searchable is a must.
I don't think there's any way of changing the image in a PDF file without losing searchability, although it is probably theoretically possible.

If you have a budget for software, Abbyy FineReader is probably the best OCR program and also reasonable value for money, although there are lower-cost options I believe. If you have a large budget Adobe Acrobat Standard (the lower-priced version) should produce good results if it is a recent version, and also has a 'ClearScan' option which converts the bitmap image text into searchable scaleable vector text as produced by a word processor.

I'm not sure if there's a good freeware OCR tool that will produce a seachable image, a good place to ask would be the bookscanner.org forum above, as adding searchability to camera images and flatbed scans is a common need.

I'm attaching your uploaded file run through FineReader and also run through Adobe Acrobat Standard 11 (the current DC version is available on subscription I believe if your need is temporary). Your text is vectorised and searchable but the scan DPI is not high enough to produce really smooth character outlines if you zoom in. The file size would decrease with higher DPI as fewer custom vector fonts would be created, and also decrease proportionately if there were more pages as they could use the same fonts.
Attachments
Text+hinterlegt_ergebnis_Abbyy_FineReader_12.pdf
(381.96 KiB) Downloaded 41 times
Text+hinterlegt_ergebnis_Adobe_ClearScan.pdf
(453.11 KiB) Downloaded 32 times
IxenPDF
Posts: 61
Joined: Tue Aug 02, 2016 8:13 am

Re: PDF, open, multipage

Post by IxenPDF »

Thank you for your interesting text recognitions examples. It seems that this two programms have a little better text recognition than my original file, which was maked with the open source tesseract OCR.
cday wrote: Adobe Acrobat Standard has a 'ClearScan' option which converts the bitmap image text into searchable scaleable vector text as produced by a word processor.
The text look of the Abbyy_FineReader example shows also smother than in my example. But the vector text of the Adobe_ClearScan example is impressive.

Thank you for your interessting and kind contributions.
Post Reply