Page 1 of 1

PDF problems

Posted: Wed Jul 11, 2012 11:30 pm
by musichawk
We're trying to use XnView 1.99 to fix PDFs. But xnView makes them worse -- in many cases fatally bad. It seems to support PDFs, but likely it should not.

Perhaps we are misusing XnView? Here's what is happening?

We have a collection of scanned documents. Our first need is to convert color PDF scans to black and white. This is a common need because most scanners scan in color, even if inappropriate. We are scanning only black-on-white pages, but the scanner software forces them to be saved as bloated color PDFs (a known problem, even if the scanner is set to no color it simply saves the scan as shades of grey). This results in LARGE files, and hard-to-read pages because what should be black on white ends up as soft grey on sort-of white.

We noticed that XnView seems to convert a PDF from color to black-and-white via this sequence: Image > Convert to Binary > Binary (No Dither). This is a WONDERFUL capability! It definitely fixes the appearance of the PDF. (Though it would be a clearer option if it was simply stated as Black-and-White rather than Binary (2...but 2 of what?).

We then use XnView to crop the PDF (scanning often results in a sloppy page). Then we save the converted, cropped PDF.

BUT, the resulting PDF is semi-BAD. It usually is much LARGER than the color original, even though it should be much smaller. AND, when we try to re-open the PDF saved by XnView, it can't be opened -- XnView declares "Format of the file could not be determined" -- even though it was XnView that created the PDF file! (A guess is that XnView does not save the file header correctly.) We say semi-bad because the PDF can be opened in PDF readers such as Primo and Nitro. (But they lack the ability to strip out the color of a PDF.)

So our current VERY CLUMSY process is this:
1. XnView: Open the color PDF.
2. XnView: Convert to black and white via Image > Convert to Binary > Binary (No Dither)
3. Xnview: Crop the PDF as needed
4. XnView: Save the PDF
5. Nitro PDF Reader: Open PDF
6. Nitro PDF Reader: Save PDF, hoping to reduce the file size, which usually happens.

BUT the entire process still results in excessively large PDFs. For comparison, if we use a certain Canon scanner driver that can be set to black-and-white (not a user friendly driver, many clicks to use it), the resulting PDF is both non-color and quite small. This proves that good quality PDF files can be much, much smaller -- 90% to 99% smaller than if scanned in color, yet XnView takes color PDF files and makes them even larger (big mystery: what's added to the files to bloat them?).

And, there seems to be a FATAL BUG: When a PDF is 2 or more pages, XnView usually ignores any but page 1. If we then use XnView to resave the PDF, the file size remains large but all the pages disappear except page 1. So we can only use XnView to process 1 page PDFs, which creates a burden on the user to know the number of pages in the PDF. So it must first be opened in Nitro to know if it is suitable for XnViewl.

Given all these problems, it might be prudent to remove PDF from file formats XnView allows to open/save, until the problems can be corrected. Because it literally damages PDFs, which an unaware user might not realize until it is too late. (After we damaged some PDFs, we've been doing our testing on file copies...)

Re: PDF problems

Posted: Thu Jul 12, 2012 12:33 pm
by xnview
musichawk wrote: BUT the entire process still results in excessively large PDFs. For comparison, if we use a certain Canon scanner driver that can be set to black-and-white (not a user friendly driver, many clicks to use it), the resulting PDF is both non-color and quite small. This proves that good quality PDF files can be much, much smaller -- 90% to 99% smaller than if scanned in color, yet XnView takes color PDF files and makes them even larger (big mystery: what's added to the files to bloat them?).
Could you send me a sample file?
Do you have changed option>Write>PDF?
And, there seems to be a FATAL BUG: When a PDF is 2 or more pages, XnView usually ignores any but page 1. If we then use XnView to resave the PDF, the file size remains large but all the pages disappear except page 1. So we can only use XnView to process 1 page PDFs, which creates a burden on the user to know the number of pages in the PDF. So it must first be opened in Nitro to know if it is suitable for XnViewl.
Yes, only works on first page...

Re: PDF problems

Posted: Thu Jul 12, 2012 9:22 pm
by cday
Some thoughts on the issues raised in your post:

Although XnView doesn’t process multi-page PDFs, another XnView family member NConvert can be used to batch process multi-page PDF files in console mode, although the learning curve may be rather steep for anyone not used to using the command line.

If you are cropping pages to tidy them up you might look at using an ‘Automatic crop’ followed by a ‘Canvas resize’, both in the Image menu, if you haven’t already done so. It works well on text pages to centre the text on a standard size page and is suitable for use in a batch process.

Black and white images compressed by a suitable method can indeed be much smaller than the equivalent JPG colour or grayscale image. In the XnView File | Save as… PDF dialog, Options -- General -- Read/Write -- PDF -- Write tab -- Compression type, try the ‘Fax’ setting. The PackBits, LZW and ZIP settings may also be worth trying, but not JPG. (The ‘Quality’ setting presumably only applies to JPG compression as the other compression methods are all lossless.)

If you have more scanning to do you might look at using VueScan (www.hamrick.com) in place of the driver supplied with your scanner: this will likely enable you to use your existing scanner to scan directly to black and white multi-page PDF format. There is a trial download which would enable you to test it.

Note that black and white images need to be scanned at a higher resolution than colour or grayscale images for a given quality level, as they can’t be anti-aliased to smooth sharp edges. But even at the higher resolution the compressed file size should be much smaller than the equivalent colour or grayscale JPG file.