Page 1 of 1
					
				Automatic cropping of scanned pages
				Posted: Sat Nov 19, 2016 3:15 pm
				by Daguerre
				I am attempting to automatically crop the text from scanned book pages. The files are jpegs. My procedure is to first straighten the pages with Auto de-skew in XnView, which works perfectly. I then use XnConvert to lightly crop the borders to square up the pages. I then apply Automatic cropping in XnConvert. This is where the problem is: the results are varied. Sometimes all 4 sides are cropped to give a perfect result, but sometimes only three, two, one or none of the borders are cropped. Is there any way of solving this problem or am I trying to push XnConvert beyond its design limits?
			 
			
					
				Re: Automatic cropping of scanned pages
				Posted: Sat Nov 19, 2016 3:41 pm
				by cday
				Daguerre wrote:I am attempting to automatically crop the text from scanned book pages. The files are jpegs. My procedure is to first straighten the pages with Auto de-skew in XnView, which works perfectly. I then use XnConvert to lightly crop the borders to square up the pages. I then apply Automatic cropping in XnConvert. This is where the problem is: the results are varied. Sometimes all 4 sides are cropped to give a perfect result, but sometimes only three, two, one or none of the borders are cropped. Is there any way of solving this problem or am I trying to push XnConvert beyond its design limits?
A practical limitation of automatic crop functions is that they generally crop to the first dark pixel in from each edge, so that even a small dark mark in a scan determines the crop for that border.
If you haven't done so already, you might try adjusting the background 'color' and 'tolerance' parameters to crop more aggressively...
If you're seriously into book scanning, you might find this site 
www.diybookscanner.org of interest if you aren't already aware of it.
 
			
					
				Re: Automatic cropping of scanned pages
				Posted: Sat Nov 19, 2016 5:20 pm
				by Daguerre
				Many thanks. Yes, increasing the tolerance to 200 (I left the colour at white) seems to solve the problem. As I'm scanning one page at a time on a platen this 'first dark pixel' could be a possible problem as there are occasional small bits of dirt on the platen. Although I suppose it would be fail-safe  in that you would only lose the cropping action, the text still being there.
I can now move on to finding a suitable dimensions for recanvassing in order to put some beautifully proportioned white margins back on the pages. 
 
And thanks for the link - I'll have a look. I'm not a serious book scanner, it's just that I'm making a semi-serious effort to reduce the number of books I have cluttering up the house.