Removing stacked images from pdf without layers being used

bcr

Well-known member
So I have a pdf which is a scan of an old book. It does not use layers, but instead each page contains two images, one atop the other. The lowest is the background image of the page, and the highest is the scanned image of the text.

If I have several books scanned in this way and I want to rapidly delete the lower/background image, what would be the most efficient way of doing so?
 
Can you provide a sample file? I think a custom preflight fixup profile using 'remove object' or a related type of fixup might work for you. But I've never messed with a PDF that used stacked images instead of layers so I'm not sure how it would work in practice.
 
  • Like
Reactions: bcr
So I've managed to get the top layer defined as a layer by using the fixup "Put transparent objects as layers",

but I've not yet figured out how to quickly get rid of the image behind it, which is not defined as a layer.
 
Update: I figured out using preflight that the images at the back are all the same ppi and different to those at the front, so I was able to select them all using a rule in preflight and delete them in one pass.

Next issue will be trying to reduce file size and sharpen up the text
 
Last edited:
For info, In PitStop, we have an unsharp mask Action to make things sharper and in the lat July release, we put an effort in file size reduction. You might want to give it a try.
 
  • Like
Reactions: bcr
Update: I figured out using preflight that the images at the back are all the same ppi and different to those at the front, so I was able to select them all using a rule in preflight and delete them in one pass.

Next issue will be trying to reduce file size and sharpen up the text
I wonder what software they use to scan these "old" books.

There are two applications out there that perform page segmentation ( where they scan at 900 ppi for example, then segment out the images, descreen downsample and then cut out the old images and replace them ( so, text becomes 1 bit linework, images 8, 24 or 32 bit contone ) - so, page are small.

VERY old slide show on the subject

 
  • Like
Reactions: bcr
I wonder what software they use to scan these "old" books.

There are two applications out there that perform page segmentation ( where they scan at 900 ppi for example, then segment out the images, descreen downsample and then cut out the old images and replace them ( so, text becomes 1 bit linework, images 8, 24 or 32 bit contone ) - so, page are small.

VERY old slide show on the subject

interesting, thanks!

I know in some instances people will use DSLR's on a rig with lighting to take the photos, and in some cases large machines are used which turn the pages over and scan each page automatically with the book lying down. It can get quite sophisticated.
 
Update: I figured out using preflight that the images at the back are all the same ppi and different to those at the front, so I was able to select them all using a rule in preflight and delete them in one pass.

Next issue will be trying to reduce file size and sharpen up the text
Nice work! I've been MIA, but recreated your success today. For anyone curious:

I used the Output Preview Object Inspector Panel to get an idea of which 'layers' were which:
Screenshot 2025-09-12 at 10.43.16 AM.png


Looks like the background image comes through at 166.556 ppi while the text portion comes in at 500ppi.

Knowing this, it was simple to write a custom fixup that targets the correct layer. You make a new profile, and then in the profile you set a "custom fixup". In the properties of that custom fixup you can tell Acrobat to remove all objects except for images greater than 450ppi. This removes the majority of the background image in the document. Though from my quick testing it seems like the pages without text fail this check and don't get removed. Needs some more fine tuning the handle those edge cases.
Screenshot 2025-09-12 at 10.42.01 AM.png


Just wanted to put the "how" out into the ether so people can learn how to do this. I'm only just now starting to learn how these advance preflight functionalities work, and they are as powerful as they are confusing.
 
Nice work! I've been MIA, but recreated your success today. For anyone curious:

I used the Output Preview Object Inspector Panel to get an idea of which 'layers' were which:
View attachment 294351

Looks like the background image comes through at 166.556 ppi while the text portion comes in at 500ppi.

Knowing this, it was simple to write a custom fixup that targets the correct layer. You make a new profile, and then in the profile you set a "custom fixup". In the properties of that custom fixup you can tell Acrobat to remove all objects except for images greater than 450ppi. This removes the majority of the background image in the document. Though from my quick testing it seems like the pages without text fail this check and don't get removed. Needs some more fine tuning the handle those edge cases.
View attachment 294352

Just wanted to put the "how" out into the ether so people can learn how to do this. I'm only just now starting to learn how these advance preflight functionalities work, and they are as powerful as they are confusing.

thanks! this is pretty much how I did it, except I told it to remove images between the concerned PPI range, rather than "remove everything except" which you did.
 
   
Back
Top