PDF problems
Posted: Wed Jul 11, 2012 11:30 pm
We're trying to use XnView 1.99 to fix PDFs. But xnView makes them worse -- in many cases fatally bad. It seems to support PDFs, but likely it should not.
Perhaps we are misusing XnView? Here's what is happening?
We have a collection of scanned documents. Our first need is to convert color PDF scans to black and white. This is a common need because most scanners scan in color, even if inappropriate. We are scanning only black-on-white pages, but the scanner software forces them to be saved as bloated color PDFs (a known problem, even if the scanner is set to no color it simply saves the scan as shades of grey). This results in LARGE files, and hard-to-read pages because what should be black on white ends up as soft grey on sort-of white.
We noticed that XnView seems to convert a PDF from color to black-and-white via this sequence: Image > Convert to Binary > Binary (No Dither). This is a WONDERFUL capability! It definitely fixes the appearance of the PDF. (Though it would be a clearer option if it was simply stated as Black-and-White rather than Binary (2...but 2 of what?).
We then use XnView to crop the PDF (scanning often results in a sloppy page). Then we save the converted, cropped PDF.
BUT, the resulting PDF is semi-BAD. It usually is much LARGER than the color original, even though it should be much smaller. AND, when we try to re-open the PDF saved by XnView, it can't be opened -- XnView declares "Format of the file could not be determined" -- even though it was XnView that created the PDF file! (A guess is that XnView does not save the file header correctly.) We say semi-bad because the PDF can be opened in PDF readers such as Primo and Nitro. (But they lack the ability to strip out the color of a PDF.)
So our current VERY CLUMSY process is this:
1. XnView: Open the color PDF.
2. XnView: Convert to black and white via Image > Convert to Binary > Binary (No Dither)
3. Xnview: Crop the PDF as needed
4. XnView: Save the PDF
5. Nitro PDF Reader: Open PDF
6. Nitro PDF Reader: Save PDF, hoping to reduce the file size, which usually happens.
BUT the entire process still results in excessively large PDFs. For comparison, if we use a certain Canon scanner driver that can be set to black-and-white (not a user friendly driver, many clicks to use it), the resulting PDF is both non-color and quite small. This proves that good quality PDF files can be much, much smaller -- 90% to 99% smaller than if scanned in color, yet XnView takes color PDF files and makes them even larger (big mystery: what's added to the files to bloat them?).
And, there seems to be a FATAL BUG: When a PDF is 2 or more pages, XnView usually ignores any but page 1. If we then use XnView to resave the PDF, the file size remains large but all the pages disappear except page 1. So we can only use XnView to process 1 page PDFs, which creates a burden on the user to know the number of pages in the PDF. So it must first be opened in Nitro to know if it is suitable for XnViewl.
Given all these problems, it might be prudent to remove PDF from file formats XnView allows to open/save, until the problems can be corrected. Because it literally damages PDFs, which an unaware user might not realize until it is too late. (After we damaged some PDFs, we've been doing our testing on file copies...)
Perhaps we are misusing XnView? Here's what is happening?
We have a collection of scanned documents. Our first need is to convert color PDF scans to black and white. This is a common need because most scanners scan in color, even if inappropriate. We are scanning only black-on-white pages, but the scanner software forces them to be saved as bloated color PDFs (a known problem, even if the scanner is set to no color it simply saves the scan as shades of grey). This results in LARGE files, and hard-to-read pages because what should be black on white ends up as soft grey on sort-of white.
We noticed that XnView seems to convert a PDF from color to black-and-white via this sequence: Image > Convert to Binary > Binary (No Dither). This is a WONDERFUL capability! It definitely fixes the appearance of the PDF. (Though it would be a clearer option if it was simply stated as Black-and-White rather than Binary (2...but 2 of what?).
We then use XnView to crop the PDF (scanning often results in a sloppy page). Then we save the converted, cropped PDF.
BUT, the resulting PDF is semi-BAD. It usually is much LARGER than the color original, even though it should be much smaller. AND, when we try to re-open the PDF saved by XnView, it can't be opened -- XnView declares "Format of the file could not be determined" -- even though it was XnView that created the PDF file! (A guess is that XnView does not save the file header correctly.) We say semi-bad because the PDF can be opened in PDF readers such as Primo and Nitro. (But they lack the ability to strip out the color of a PDF.)
So our current VERY CLUMSY process is this:
1. XnView: Open the color PDF.
2. XnView: Convert to black and white via Image > Convert to Binary > Binary (No Dither)
3. Xnview: Crop the PDF as needed
4. XnView: Save the PDF
5. Nitro PDF Reader: Open PDF
6. Nitro PDF Reader: Save PDF, hoping to reduce the file size, which usually happens.
BUT the entire process still results in excessively large PDFs. For comparison, if we use a certain Canon scanner driver that can be set to black-and-white (not a user friendly driver, many clicks to use it), the resulting PDF is both non-color and quite small. This proves that good quality PDF files can be much, much smaller -- 90% to 99% smaller than if scanned in color, yet XnView takes color PDF files and makes them even larger (big mystery: what's added to the files to bloat them?).
And, there seems to be a FATAL BUG: When a PDF is 2 or more pages, XnView usually ignores any but page 1. If we then use XnView to resave the PDF, the file size remains large but all the pages disappear except page 1. So we can only use XnView to process 1 page PDFs, which creates a burden on the user to know the number of pages in the PDF. So it must first be opened in Nitro to know if it is suitable for XnViewl.
Given all these problems, it might be prudent to remove PDF from file formats XnView allows to open/save, until the problems can be corrected. Because it literally damages PDFs, which an unaware user might not realize until it is too late. (After we damaged some PDFs, we've been doing our testing on file copies...)