Page 1 of 2

Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Thu Nov 04, 2021 8:29 pm
by vinnythering
I have thousands of images that need to be converted and combined into multiple PDF files. Some of the images are used multiple times.

I have all of the .tif files in the same folder, and they are listed and organized in a spreadsheet. I want to use that file list and run a batch process to save myself hundreds of hours converting these files one by one.

I'm on Windows. Below is an example of the spreadsheet. These files are individual pages of many scanned documents. "First Page" refers to the beginning of a section. Example: 0066.tif-0068.tif is one document where 0066.tif is the title page of the document. 0070.tif-0081.tif is THREE separate documents combined into one, with 0070.tif as the title page for all three. So it would be 0070.tif-0072.tif, 0070.tif & 0073.tif-0074.tif, and 0070.tif & 0075.tif-0081.tif. 0069.tif is a single page document.

Table.png
Table.png (50.18 KiB) Viewed 12892 times

I need to specify individual files and/or ranges of files to be combined into single multi-page PDFs.

Thanks!

Re: Batch Conversion of Single-Page TIF images to Multi-Page PDF

Posted: Thu Nov 04, 2021 9:29 pm
by cday
Not so much an NConvert problem as a scripting problem, I think... :wink:

Unfortunately there is not much support for scripting here these days, so that part of the problem might be better raised on one of the general scripting websites.

You presumably first need to get the values in the cells of your spreadsheet into variables, then use those variables in some NConvert code in a batch file containing further code.

If it's any help, firstly the source files in an NConvert command line must be placed at the right end of the code, the last item reading from left-to-right, secondly one option is to list the files consecutively, rather than use wildcards in a (normally) single source code term. That's just an option, and one that is not used very often, but might possibly be useful in your case.

Re: Batch Conversion of Single-Page TIF images to Multi-Page PDF

Posted: Thu Nov 04, 2021 10:00 pm
by cday
Thinking about it, the code required to list the filenames of the source files required for each output PDF file in turn might not be too difficult for someone with coding knowledge, and that list could then be appended to a standard NConvert code line as above, with perhaps just one variable to hold the output filename.

But as a generalisation, problems tend to become more rather than less complex as they are examined in more detail!

There is a limit to the length of a command line, but based on your table it is likely to be well beyond your needs.

Re: Batch Conversion of Single-Page TIF images to Multi-Page PDF

Posted: Fri Nov 05, 2021 6:31 am
by cday
An alternative to placing the files needed to create each output PDF in sequence at the end of the code, would be to place those files in a folder, but it would then be necessary to ensure that the files would be read in the correct numeric sequence, which might or might not automatically be the case depending on the actual filenames, you would have to check that.

The principle coding problem would then be to read each line of the spreadsheet in turn, to either create a list of the files to be used to create each output PDF or to place them in a folder. That would require some form of calculation or logic to produce the filenames between the first and last content pages.

If you need support on the actual NConvert code line required I can probably provide that when you have defined what is actually required.

Edit:

NConvert will also accept input from a text file filelist.txt so scripting to create a filelist for each document to be created might be the way to go.

I've never used a scripting or coding forum myself, but one of several that have come up regularly while researching problems is Stack Overflow.

But it might well be worth posting your problem on the ImageMagick forum, where I think you might find someone who might take it on. You are welcome to use my table above without attribution!

Re: Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Fri Nov 05, 2021 8:30 pm
by cday
I am continuing to consider your issue, partly at least for my own interest as I sometimes like a challenge.

I think the filelist input option could be viable, and looks to have the possibility of splitting the task cleanly into two parts: creating the filelists required from your table as one scripting task which can be performed by someone with the coding knowledge required, and an NConvert script that can be written separately to use the filelists provided.

I am posting this partly for anyone who might read the thread in the future, and while I might possibly be able to get a demonstration script working, realistically there is absolutely no certainty that I can do so with my limited experience in the time I could spend on it.

Re: Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Sat Nov 06, 2021 12:30 pm
by XnTriq

Re: Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Sat Nov 06, 2021 1:27 pm
by cday
Interesting, I'll look later, good to see that you are running all cores again now! :wink:

Re: Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Sat Nov 06, 2021 4:45 pm
by XnTriq
cday wrote: Sat Nov 06, 2021 1:27 pmgood to see that you are running all cores again now! :wink:
I wish my cerebral processing unit had multiple cores :mrgreen:

Re: Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Sat Nov 06, 2021 5:25 pm
by cday
XnTriq wrote: Sat Nov 06, 2021 12:30 pm Merge Multiple Images into PDF from List
This one looks rather familiar, and so far doesn't seem to have produced a clear way forward. I think my suggestion could in principle be a serious option but the first step, creating the required filelists from a spreadsheet, would have to be done by someone with relevant knowledge. Quite possibly reasonably quick and easy for anyone with the required knowledge.

Re: Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Sun Nov 07, 2021 6:55 pm
by cday
I remain positive about my suggested method above.

The basis of the method was to split the overall problem into two parts, each of which could be developed independently by someone with just the experience required for that part.

The first part, creating a filelist for each document in the table in the first post of the thread containing the files required for that document, shouldn't I think be difficult for someone familiar with working programatically with Excel charts. I suspect that MS VBA (Visual Basic For Applications') might be the tool used although any tool that works would be fine.

The second part, an NConvert script that takes as inputs a folder of the files used and a folder of filelists, and outputs the required multipage PDF files, is now developed and tested in draft form, but currently requires a workaround to generate the desired output filenames in the absence of a possible new NConvert filename option which would provide a neater solution. :D

Re: Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Tue Nov 16, 2021 9:36 am
by cday
I am publishing my proposed solution in draft form pending resolution of a bug in the NConvert -l filelist input option.

The code given is intended to be used in conjunction with an Excel VBA script that outputs a filelist with the required filename for each row of the source Excel chart described in the first post of the thread, which I think should be reasonably easily coded by someone with experience of VBA coding, including testing conditions and loops. Suitable output file compression can be added according to the colour mode used.

Code: Select all

FOR %%A IN ("Filelists\*.txt") DO nconvert -out pdf -multi -o Output\%%.pdf -l %%A

Use of the code above requires input filelists to be formatted in a form accepted by the NConvert -l option, which is currently limited to filenames on separate lines, although acceptance of the tab-separated or CSV forms supported by Excel has been requested. In extremis CMD code could probably be used to convert Excel filelists to the form presently accepted but such code is not provided!

Re: Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Tue Nov 16, 2021 4:53 pm
by vinnythering
Okay. I am returning to this thread because I solved my issue, but as cday pointed out in my post on Stack Overflow It looks to be overcomplicated. They seem to be invested in the solution so I'll try to explain what I did here, and I'll post an attachment if possible.

As stated before, I have multiple folders with 5,000+ raw .tif page scans each. Odd choice of filetype since whoever scanned them did not save with multiple pages, only one page per file. These scans are of numbered documents with title pages, some single with its own title page, others multiple that share a title page. I needed a way to break out these individual pages into organized lists for each document number, sometimes re-using those multiple-document title pages, so a straight file list probably would not have worked. At least not easily.

My initial and very slow process with this task was to manually scroll through each scan and type its file name into a spreadsheet. Then, highlight those files in explorer, right click, and combine in Adobe. Save as. Name the file accordingly. After all 1,600+ documents were done, I needed to spot check to make sure I didn't miss any. I always did. So this solution would not only GREATLY speed up the process, but it would ensure accuracy as well.

Long story short, I coded the following with VBA in Excel (with great effort, trial and error, and frustration):
  • Keypad activates macros. Acts as a sort of control board for the spreadsheet.
    • Advances file number along with currently viewed photo.
    • Fills down other Excel formulas to reduce processing lag. (I need to adjust this because eventually, as I near the end of the list it will slow down.)
    • Macros to move back in the series as well as bring the Windows Photo Viewer into focus.
    • Fills in page numbers to identify title pages, first page in each document, and normal pages.
  • A series of Excel IF formulas generate nConvert commands.
  • A button to save the nConvert column of commands to a .bat file. I am having difficulty with this one, though. It saves the ENTIRE column, all the way down to 1048576 instead of only populated cells. Work in progress.
  • A column I can copy/paste to report on progress to project lead. Also serves to reference which raw scan files are contained in which document PDF.
  • A memory cache of sorts. I found that Excel does not save variable values when the file is closed, so I added a block of cells to save and load the current values of variables. A dirty solution, but it works. As a bonus, it also serves as an override if I need to go back a significant way or if I need to skip scan files.
I guess it's a bit difficult to explain without context. I am working on simplifying the formulas and the VBA code because this thing is UGLY. I was much more focused on function, not elegance. I'd attach the file but it looks like I'm not able to post Excel files. Link to my Dropbox below.

Real Estate Spreadsheet

Re: Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Tue Nov 16, 2021 5:23 pm
by cday
Glad your problem is solved, a bit more to the overall problem than indicated in your original post, but thanks for reporting back. :D

My experience of VBA is limited to having gotten a very small script running around 2007 to automate production of hyperlinks to files in a club archive; I invested in a book on VBA at the time, but thinking about your problem, came across a much better book in the local library yesterday!

My thoughts so far were limited to conceiving the outline of a possible algorithm to perform the task specified in your original post, without advancing into the detailed coding. It looked as if there should have been a straight forward solution.

Given that VBA macro, the whole task as originally described would have evolved to your running the macro on your spreadsheet to produce a filelist text file for each row, then loading all the images and filelists into the folders in my post and leaving the batch file to run to completion, hours or maybe a few days later, whereupon all your required multipage PDF files would hopefully be in the Output folder. So not much NConvert code required, although with my limited experience 'For loops' always take a lot of time and usually degenerate into much trial and error. :wink:

Re: Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Tue Nov 16, 2021 5:33 pm
by vinnythering
Trial and error was the name of this particular game.

I do wish I could just use file lists, but not every document uses the same commands so I'd have to break out the commands in separate lists, involving even more If Then statements and running nConvert multiple times. I figured having a command (or series of commands) for each document would result in better accuracy for file naming and ease of use. Set it and forget it, as it were.

Still trying to figure out how to export to .bat without a million blank lines.

Thank you for your interest! Hopefully this helps someone out there.

Re: Batch Conversion of Single-Page TIFF images to Multi-Page PDF

Posted: Tue Nov 16, 2021 5:45 pm
by cday
vinnythering wrote: Tue Nov 16, 2021 5:33 pm I do wish I could just use file lists, but not every document uses the same commands...
That wasn't apparent in your original statement of the problem, are you thinking about the fact that the number of pages in the output file varies, or some similar issue?

Based on your original statement, in which I did see some possible ambiguity in need of clarification, I felt that an algorithm could have been designed to accommodated that, although I was assuming based on your original post that you possibly had more VBA experience than may be the case.

Anyway, my solution above remains untested except using my small number of test files, it isn't unknown for an unexpected issue to arise, and two updates to NConvert options are required before it could be fully used. :wink: