Renaming a PDF page label with information on the page... Then extracting the page with that label for the name

rcreveli

Well-known member
Sorry for the long title.

We have a variety of documents that we want to separate out into individual pages. Each document is several thousand pages. Ideally here's what we'd like to do.

1 rename each page label with the folio name. The folios names are not consistent but, the placement on the page is.
2 extract each page as a separate PDF with that folio name for the file name.

Tools. We have Acrobat Pro, Pitstop Pro & Enfocus switch as well as tools like Excel.

I understand this may be impossible but, this is the first step in a very involved project so, anywhere we can automate is a huge plus.
 
Ok, do you have any sample files you can send me please?
you can send them to [email protected]
I only need one or two for a proof of concept.

I guess the files can be split into single pages first, and then have the page labels adjusted in a second step?
 
Ok, do you have any sample files you can send me please?
you can send them to [email protected]
I only need one or two for a proof of concept.

I guess the files can be split into single pages first, and then have the page labels adjusted in a second step?
What is your plan? Can you script a solution using pitstop? If so, what language is it? Just wanting to learn how to implement custom solutions to problems such as this.
 
Don't know what folio name refers to but I'm assuming it's just text on the page?

I've used Evermap Auto bookmark before. Assume you're in legal market?

It can create bookmarks from text on pages.

And then you can export the bookmarked pages - I think you can easily do that with the bookmark name as the file name but I'm not at my desk to check

This software is cheap too
 
As I had to remove the code from my blog to stop Google taking the site down, here are the two scripts:

JavaScript:
var CSV = function(data) {
var _data = data.split('\r');

    for(var i in _data) {
        if(_data[i].length > 0) {
            console.println(i + ' ' + _data[i]);
            _data[i] = _data[i].split(',');
        }
    }
  
    var _head = _data.shift();

    return {
        length: function() {
            return _data.length - 1;
        },
        getRow: function(row) {
            return _data[row];
        },
        getRowAndColumn: function(row, col) {
            if(typeof col !== 'string') {
                return _data[row][col];
            } else {
                col = col.toLowerCase();
                for(var i in _head) {
                    if(_head[i].toLowerCase() === col) {
                        return _data[row][i];
                    }
                }
              
            }
        }
    };
};

this.importDataObject("CSV Data");
var dataObject = this.getDataObjectContents("CSV Data");

var csvData = new CSV(util.stringFromStream(dataObject));

if(this.numPages != csvData.length()) {
    app.alert("Number of pages & CSV row count inconsistent");
} else {
    for(var i = 0; i < this.numPages; i++) {
        this.extractPages({nStart: i, cPath: csvData.getRowAndColumn(i, 'PartnerHQ_Id') + '.pdf'});
    }
}
 
Attached is a .zip archive of various Acrobat Pro Action Wizard .sequ files for splitting files.
 

Attachments

  • Archive.zip
    15.7 KB · Views: 243
Last edited:

PressWise

A 30-day Fix for Managed Chaos

As any print professional knows, printing can be managed chaos. Software that solves multiple problems and provides measurable and monetizable value has a direct impact on the bottom-line.

“We reduced order entry costs by about 40%.” Significant savings in a shop that turns about 500 jobs a month.


Learn how…….

   
Back
Top