Page 1 of 1

Should I use nConvert or XnConvert?

Posted: Fri May 03, 2013 8:08 pm
by hackbrew
I'd like to convert a bunch of PDF files by auto cropping the same section (about 1" square section) of each PDF, and output the cropped image in a format that will allow me to interrogate the difference in file size (kb) of the resulting image. For example, the cropped image size of a blank image I'd like to see a size as close to zero as possible. If the cropped image had markings on the (1" square) image, than I'd like to see a much large file size (in kb). What would be the best output format to give me the best variance in file size between the two used in the example? If possible I'd also like to then move the original PDF image to one of two particular folders based on the output file size results. In addition I'd like to call a script from a program I would write, but should I be using nConvert or XnConvert for this? If it's nConvert is there a sample script that shows how to perform an auto crop and output?

Re: Should I use nConvert or XnConvert?

Posted: Sat May 04, 2013 3:29 pm
by cday
The following may help to define the process you wish to implement:

Do the PDF files to be processed contain bitmap images?

PDF is a very versatile format and a file can contain bitmap images (as from a camera or scanner), vector graphics (a form of drawing or illustration) and of course text that can be rescaled for optimum viewing, as well as non-visual content.

Edit: In the probably unlikely event that the images held in the files are not bitmaps, they will be converted to bitmaps when they is rasterised at the current dpi setting as the files are opened: the pixel dimensions of the resulting bitmaps may be significant for later processing.

Does each PDF file contain a single image?

A PDF file can of course contain multiple images or pages.

Are the images to be processed colour, grayscale or black and white?

That may affect the optimum way to handle the cropped images.

Are the pixel dimensions of each image the same?

Are their dimensions identical or near-identical? That requires that the images are all the same shape or aspect ratio, for example 2:1 landscape orientation, so if they aren't the pixel dimensions of the images clearly aren't the same.

The Crop function in XnConvert and NConvert only accepts crop positions and crop sizes specified in pixels; if the images have different pixel dimensions it is unlikely to be possible to use the same settings to crop a batch of images using the Crop function.

XnConvert and NConvert also have a Canvas resize function which is in effect a second crop function: that accepts a crop size specified as a percentage of the image dimensions and so could be used to make a similar crop in images with different pixel dimensions provided the aspect ratio is the same. However, the crop position is specified in a different way that allows less control.

Note that 'auto-crop' normally refers to the automatic determination of the crop area, for instance to crop white space from around a rectangular area of text by automatically determining the position of the text on the page.

How accurately does the crop area need to be positioned?

The Crop function, if it can be used, allows precise positioning of the crop area through the use of pixel dimensions; the Canvas resize function as stated above may not allow such precise control.

Can the blank and marked areas be separated manually, if necessary?

It seems likely that blank or near-blank areas and reasonably marked areas would produce file sizes that could be readily distinguished by eye in a file listing, without much consideration of the file format. If the difference in file size is to be maximised, the colour-mode of the images would be a consideration as would the optimum type of file compression to be used.

Automating the separation of blank and marked files would clearly require a way of reading the file sizes in software and is outside my immediate experience.

A script to automate the overall process, if that is found to be practical, would call NConvert to open the PDF file, perform the crop operation, and save the resulting image in a suitable format, and would also likely need to call another utility to determine the resulting file sizes to separate blank and marked areas. The NConvert part of the script might possibly be developed in and exported from XnConvert.

Which operation system and version are you using?

If reading file sizes programatically isn't possible on a Windows computer -- I images it likely is -- it might well be possible in Linux.


If you think it would be helpful you might post one or more sample PDF files for examination.

Re: Should I use nConvert or XnConvert?

Posted: Sun May 05, 2013 2:08 pm
by hackbrew
Yes, the PDF's were generated by a scanner, they're binary images (black & white), 300 DPI, they are near-identical in shape, aspect ratio and some are multiple pages. I have gotten the crop to work in XnConvert with using .jpg as the output format, but I was wondering what output format would give me the greatest variance in output file size. For the sample I tried using a cropped 3" x 3" rectangle with one PDF producing a blank 3" x 3" rectangle and the other just had an "X" across the box, the file size difference was only about 2 kb. I thought maybe a differnt output format would give me a better result. I am capable of writting a Windows program, so it sounds that I should be making a procedure call from my program to nConvert?

Re: Should I use nConvert or XnConvert?

Posted: Sun May 05, 2013 5:33 pm
by cday
When saving scans the optimum choice of output format and compression method can do much to reduce the file size, important for a scan of a 100-page or more document for example, but optimising the ratio between the file sizes of blank and marked images is an unusual requirement.

Different file formats support different colour modes and compression methods so the choice is partly narrowed down by the fact that the images are black and white (although I suppose in principle they could be converted to grayscale or colour if there were an advantage in doing so).

Are you sure that your images are currently 1-bit colour depth, even if the scans are of black and white material: I don't think the JPG format supports black and white images... I've done a quick text using two programs and both saved a 1-bit test image as a JPG with an 8-bit colour depth. I don't in any case think there would be much scope for optimising the ratio within the JPG format, although I might be wrong.

The TIFF format, which generally produces very large file sizes for colour and grayscsle images even with optimum compression, can equally produce very small file sizes for black and white images when optimum compression methods used. More relevantly, TIFF supports many alternative compression methods of which XnConvert and NConvert support at least six. The methods have different characteristics and some I know are particularly effective at compressing uniform areas and might be very effective at shrinking the size of the blank files, possibly increasing the ratio substantially.

My initial suggestion would be to check the colour mode of the images and if they are not currently black and white to convert copies (so that the originals are still available) to black and white in XnConvert (Image -- Change color depth -- Binary). Then take a representative pair of images and do a quick series of tests saving them as TIFFs with each of the available compression modes (XnConvert Output tab). Hopefully one or more method will produce the result you want (possibly ZIP or RLE from memory?).

Failing that, there would be other possibilities but TIFF seems the obvious starting point.

I'll try to comment on other aspects later.

Re: Should I use nConvert or XnConvert?

Posted: Mon May 06, 2013 10:20 am
by cday
Some more thoughts on the file size ratio issue before moving on:

The requirement is unusual and in the absence of relevant past experience or some insight the only solution would seem to be to test a variety of possible formats and compression methods until a satisfactory solution is found.

There are around five or six commonly used image file formats, some supporting alternative compression methods, and three possible colour depths that could be used, so there are quite a number of potential combinations that could be tested if an acceptable solution isn’t found quickly. Common formats include JPG, TIFF, PNG, GIF and BMP.

Image files start with a header, so if the image content is small the size of the header information could set a limit to the maximum ratio that can be achieved. As the crop is quite small, it might be worth avoiding very small image file sizes, possibly by increasing the dpi or increasing the complexity of the marks, if that is controllable.

In addition, a compression method that compresses uniform colour areas efficiently would tend to compress the marks efficiently as well as the blank areas.

There is another possible approach to the problem which -- if it is practical -- would be a direct and elegant solution: to measure the ‘blackness’ of the images. In ideal circumstances I guess you might be looking at blacknesses of zero and, say, 25% (expressed as a ratio that’s infinity!) or in the real world if the scans have some noise, say 5% and 25%, which should easily distinguished using a small threshold value.

The basic idea was outlined last year in this thread:

http://newsgroup.xnview.com/viewtopic.p ... 3&p=106369

If determining blackness were a common requirement, there would doubtless be readily available freeware utilities to measure it, and you would be looking for a command line version that could be called from a script.

In the thread Pierre indicates that the blackness value could be easily derived using GFL (whatever that is!) so possibly with your programming experience you could implement that yourself and make your own utility.

Maybe Peter2 can also report on his experience as he pursued the idea last year, and indicate if there is suitable utility available?

I hope that is useful.

Re: Should I use nConvert or XnConvert?

Posted: Mon May 06, 2013 3:30 pm
by XnTriq
cday wrote:In the thread Pierre indicates that the blackness value could be easily derived using GFL (whatever that is!) so possibly with your programming experience you could implement that yourself and make your own utility.
GFL is the graphics library XnView is based on.
CoolUtils ([url=http://www.coolutils.com/tiffpdfcleaner]Tiff Pdf Cleaner[/url]) wrote:You can set the workspace to analyze. In other words you tell the program what exact workspace must be blank to delete the page as the blank one. That's most convenient as blank faxes still have footers or headers and other software will not detect them as blank.

Re: Should I use nConvert or XnConvert?

Posted: Mon May 06, 2013 10:06 pm
by cday
There is another possible approach to the problem which -- if it is practical -- would be a direct and elegant solution: to measure the ‘blackness’ of the images.
If a suitable blackness measurement utility can be created using GFL SDK, or the utility mentioned by Peter2 in his thread last year is suitable, that could be one viable solution.
CoolUtils (Tiff Pdf Cleaner) wrote:You can set the workspace to analyze. In other words you tell the program what exact workspace must be blank to delete the page as the blank one. That's most convenient as blank faxes still have footers or headers and other software will not detect them as blank.
If that means that blank images could be detected and deleted automatically, and that would meet hackbrew's requirements, that could be another viable solution.
The requirement is unusual and in the absence of relevant past experience or some insight the only solution would seem to be to test a variety of possible formats and compression methods until a satisfactory solution is found.
An new insight that looks promising:

The file size of an image generally increases with the complexity of the image, so the ratio of the file sizes of a mark image and a blank image would be increased if the complexity of the mark image could be increased without increasing the complexity of the blank image significant;y.

Method: Apply the Tiles filter, Intensity 1, to both images: the complexity of the mark image is increased significantly, the blank image is unchanged.

In XnConvert the filter is Misc -- Tile , in XnView the filter is Filter -- Effect... Tile .

Quick test using 200px rectangular images, mark rectangle approx 10% of the total area, images JPG 24-bit colour Q100:

The mark image file size is increased by a factor of around 8, the ratio of sizes from 1.5:1 to 12.7:1

It looks as if the file size measurement utility needed for this method could also possibly be created with the GFL SDK, if required.

Three possible ways forward now...
Mark 200px Q100.jpg
Mark 200px Q100.jpg (2.07 KiB) Viewed 5272 times
Mark 200px Tile-1 Q100.jpg
Mark 200px Tile-1 Q100.jpg (17.48 KiB) Viewed 5272 times
Note: As this method effectively destroys mark images, it might need to be applied to copies of the images to be processed in order to identify those that are mark images.

Re: Should I use nConvert or XnConvert?

Posted: Wed May 08, 2013 3:19 pm
by hackbrew
Using tile bumped it up to about a 75:1 ratio which is great (1.02MB vs. 14KB). I used the JPG - JPEG/JFIF for output and right now the image I'm using is a filled-in (with black pen) circle about the size of a pea. I' cropping and resizing using fill mode, 800 x 800 pixels, Enlarge/Reduce set to Always, Resampling set to Nearest Neighbour, and with Keep ratio and Follow orientation checked.

Now, I haven't tried it using a multipage PDF. So if I have this filled-in circle on page one of a multipage PDF, but pages 2 - 5 have a no circle image to check, am I able to apply this on just page 1 of the PDF?

Thanks in advance for all your help!

Re: Should I use nConvert or XnConvert?

Posted: Wed May 08, 2013 5:43 pm
by cday
Glad that effectively destroying the mark crops isn't an issue!

I was slightly concerned that if your real world scans were noisy the size of the blank images would be increased significantly too, but presumably the 75:1 ratio you are obtaining is the file size ratio for 'real world mark : real world blank' -- otherwise you might check that before you go too far.

You haven't posted much detail of the process you are developing so it is sometimes hard to second guess what help you need.

With regard to multi-page PDF's, if you wish to make a crop on page-1 to test for the presence or absence of a mark, as you are apparently doing for single-page PDF's, at first sight that shouldn't be a problem: just go ahead. I think the crop will be made on page-1 automatically by default, although I haven't tested that. If you encounter a problem, please report back with details of the problem you have encountered and some more detail of the overall process you are developing, so that it is easier to assist you.