Fixing scanned PDFs

exify

Well-known member
I have a 140 page journal currently and almost every page is a scan of a printout.. The customer scanned them and sent them as tifs or something. I don't know why but that's how she did it. I'm doing it for another printer so I have no direct communication with the customer to have them fix it.

My usual method of just optimizing the PDF in acrobat isn't working and there's far too many pages to do anything 1 by 1, as the deadline is practically yesterday.

Any ideas? Never really had to deal with such high amounts of scanned PDFs that weren't able to be fixed with optimize.


On another note, some of them looked fixed after I optimized, but the CREO I printed it on decided it's gonna print the shadows/etc... so I'm not sure what's going on.
 
B&W scanned pages compress best with JBIG2 compression. Acrobat 8 & 9 can compress using JBIG2. So can Apago's PDF Enhancer.

What problems are you having using Acrobat's PDF Optimizer?

Dwight Kelly
Apago, Inc.
[email protected]
 
Duh, that would help.

Since they scanned crappy inkjet printouts there's shadows and hazes behind all the text and when I print using my CREO it shows up as blue haze... it goes away when I print grayscale but if a page has a picture or something, I'm stuck with it.
 
You don't need a technic help, but a wizard-stick or a miracle: a contone scann will never be able to output correct text, even with a top quality scann... of course it's worse when the scann is crappy...

Unfortunally, there is no quick way to fix that kind of crappy butcher-job: the only efficient solution is to spend an incredible amount of extra-time and extra-job to re-do all the job, re-writing all the text page by page...

... so, the best to do is to throw this file in a trash and spank the author... or warn the customer of the more-than-poor "quality" of the file and print this shit "as is".
 
Last edited:
Since they scanned crappy inkjet printouts there's shadows and hazes behind all the text and when I print using my CREO it shows up as blue haze... it goes away when I print grayscale but if a page has a picture or something, I'm stuck with it.

You might try the techniques explained here:
Quality In Print: Fixing a common supplied art problem
Quality In Print: Fixing a common flatbed scanner problem

on one of your PDFs. Play around with the settings a bit in PShop - they might work for you and if so you could write a script or action to automate it a bit.

best gordon p
my print blog here: Quality In Print current topic: the creative design/production process
 
@ claude72,

um, well, gee wiz, just because YOU do not know how to fix images created by scanning paper documents does not mean it can't be done !

Back to life: Old books get a new life with Bibliolife, a local company that digitizes out-of-print materials

The idea that you would suggest that the customer be punished made me laugh out loud ! I guess they keep you out of the sales side of the business, eh ?

(smile and a wink) - okay, that was fun, now lets see if I can help here.

The Link above is an article about a company that does exactly what you are trying to do - and although the article does not mention this, they use IoFlex Bookmaker, a suite of Adobe Acrobat Plug-ins for Windows.

@ exify,

I normally get annoyed when a vendor jumps in on this forum and suggests that your problem will go away if only you buy their product, but as I am very familiar with the issue of working with scans of hard copy paper documents and converting them into something that resembles the original, I will tell you really should not expect Adobes PDF Optimiser tool to help you accomplish this. The Adobe Acrobat PDF Optimiser is designed to clean up a scan for online viewing, and while it may correct deskew and offer a few other improvements, it returns the PDF in such a way that you can no longer use the Adobe Acrobat TouchUp Object tool to pass it to Photoshop for additional clean up - so, if you use PDF Optimizer and have a big fingerprint or speck of dust, this is difficult to then remove.

ioFlex offers a suite of Adobe Acrobat Plug-ins that are designed to enable the user to adjust and improve a scan of a paper document for re-print. IoFlex customers use or tools to convert scans of color, grayscale and B&W pages for use in traditional and digital printing. IoFlex tag line is "scan to print ready" and it is designed to work on 1000s of pages (Adobe Photoshop converts 1 image at a time)

IoFlex recently showed their new "scan to print ready" solution at the Book Publishing Conference in NYC and On Demand in Philly.

here is a quick backgrounder;

IoFlex BlackBox - Backgrounder

Here is an online version of the brochure;

IoFlex BlackBox Brochure

Here is a presentation that describes the feature / benefits

http://docs.google.com/Presentation?id=dfwf37zj_32g7t6npg8

Michael Jahn
[email protected]

And yes, contact me if you would like a fully functional 30 day trial - but be advised this is not inexpensive - it is $12,500.00
 
@ claude72,

um, well, gee wiz, just because YOU do not know how to fix images created by scanning paper documents does not mean it can't be done !
I didn't wait for you to know how to scan text!!! :D

... but reading your answer gives me the feeling that YOU don't know how to do a real professionnal job... or that perhaps you read the topic to quickly: re-read carefully the quoted sentence, focusing on bolded words that gave the important informations:

"Since they scanned crappy inkjet printouts there's shadows and hazes behind all the text and when I print using my CREO it shows up as blue haze... it goes away when I print grayscale but if a page has a picture or something, I'm stuck with it.

The 2 bold words mean that the crappy scans are made in color mode (RGB or CMYK), that is a CONTONE mode... (and that's make the difference!!!)

... and now re-read carefully my answer (still focusing on bolded word):
"You don't need a technic help, but a wizard-stick or a miracle: a contone scann will never be able to output correct text, even with a top quality scann..."

You can do what you want: raise the résolution, or raise the screen ruling, use magic softwares or implore God or sell your soul to Satan, a CONTONE mode picture will never be able to print a correct crisp text, and will always print hazy text: that's a basic law of DTP and a basic knowledge in printing (and a common mistake made by many users), the haze comes from the rasterization by the printer RIP of the contone pixels in a screen... contone pixels always need a screen to be printed (basis of printing), and the screen always gives haze (basis of Postcript, rasterization, DTP)...

• raising the scan résolution is absolutely no use because with a contone picture what is printed are not the pixels of the picture but the dots of the screen, and the haze comes from the way that the screen is handled in contone picture,

• raising the screen ruling (when possible) can lower the haze and hide more or less the flaw... but it's only a workaround that will made the printing not too bad, but will never really fix the flaws. With a 175 lpi screen, the haze of the text begins to be almost invisible, and a 200 lpi screen will hide it enough to made it invisible... but does your paper accept a 200 lpi screen??? another workaround is to use stochastic screen will helps greatly in lowering the flaws...

(ink-jet printers use stochastic screens: that's why this flaw is often so lowered by ink-jet printing that non-professionnal users and bad designers don't see it on their ink-jet printings...
... and being not aware of DTP basis, offset-printing technologies and limitations, they often make the mistake to scann texts in contone mode, believing that their text-pictures will output as good from an offset press than from their ink-jet printers)



So, if you go on with contone pictures, whatever you can do will only improve a bad picture and lower the flaws due to the bad quality of the picture: suppress or lower the shadows of a bad scann, lower the haze in the picture due to the scanner-descreening of a paper printed scanned text, but the printed result of a contone picture text will still be crap: strongly smelling or "unsmelling" crap depending of the picture, but crap...

If you want to print correct crisp text with scanned text, you just have to use the 1-bit (or line-art or line-works) mode for your pictures.
The 1-bit mode pictures can be printed directly by ink-jet and laser printers and offset presses and don't need any screening to be printed, eliminating the haze due to the screening.

But the 1-bit mode needs a different resolution than contone pictures: the good resolution for 1-bit pictures is commonly 1200 ppi... (you can lower to 800 ppi without any noticable degradation, and exceptionnally to 600 ppi)...
That's not a problem when you scan your own pictures, you simply have to set your scanner to the good résolution...

... but when you receive crappy contone pictures bad scanned at the standard 300 ppi commonly used for contone pictrures, as it is not possible to have a good printing of 1-bit picture with a resolution as low as 300 ppi, so you cannot fix the problem by simply transforming a 300 ppi contone text scan in a 300 ppi 1-bit scan!!!...

Sometimes, it is possible to "save" the job when the "designer" used too big 300 dpi pictures, scaled at a lower size, that can (with chance) give a real output resolution of 600 ppi or more: it is then possible to transform the RGB or CMYK or Grayscale text-pictures in 1-bit text-pictures and output a good printing...


But in exify job:
• the operator that made this crap has already shown her incompetence by scanning text in contone mode, so she has probably made all the other standard mistakes matching with her incompetence, and has probably scanned the image at the "standard" 300 ppi resolution and at real size... so, it is not possible to have a correct output résolution for 1-bit picture...
• even if she had a sudden unexpected lightning of intelligence, and made the scans at 600 ppi (that is a sufficient resolution to save the job) the file is a PDF, and PDF has the hability to resample the pictures, and contone pictures are resample at 300 ppi in the default setting: so whatever was the resolution of the scans, 300 or 600 or else, making the PDF has probably resample the contone scanned text at 300 ppi, as it resamples all contone pictures...

... so, there is no way to save this job, it's crap, it will stay crap.



******


To understand better what I am explaining, just have a look at the linked pictures below:
- there is no scanning in this pictures, then no degradation due to the scanner,
- picture texts have been done by direct convertion from vector to pixels in Photoshop,
- and shown pictures come directly as a file from an Agfa Viper RIP, I simply made a colorization and added an outline to show the real shape: each picture is the real output of an imagesetter from a text in 8 pt size, printed at 2400 dpi with a 150 lpi screen and magnified 33,33 times:

• "AB0" is a vector text: no comment, it's the perfect shape of vectors...

• "AB1" and "AB2" are pixel 1-bit texts at 2400 and 1200 ppi:
- perfect shape, even with only 1200 dpi,
- and look at the edge of the screened shape: screen dots are cut to match exactly with the shape of the characters.

• "AB3", "AB4", "AB5", and "AB6" are pixel grayscale texts at 2400, 1200, 300 and ppi: crappy shape, due to:
- extra dots outside the characters,
- and most of the dots that are on the edge of the characters have a part ouside the shape
- and you can notice that the resolution doesn't change anything: the very high résolution 2400 ppi picture is as crappy than the 225 ppi picture (225 ppi is the normal résolution for 150 lpi screen)...

The difference between vectors and pixels is here:
- the RIP rasterization of vectors can "cut" the screen dots to match exactly with the exact shape of the object,
- the RIP rasterization of pixels pictures leaves screen dots outside the shape and CANNOT "cut" the screen dots on the edge, leaving part of them outside the shape...
... and these extra outside and partially outside the shape screen dots are both altering the shape of the object (character or else) and create the haze.

So, do what you want, but text in contone picture will always look hazy!!!




exify said:
it goes away when I print grayscale but if a page has a picture or something, I'm stuck with it.
As a pixel picture is not able to mix pixels in contone and 1-bit modes, the only solution, when having a high enough resolution, is to separate each picture in 2 pictures, "simply" by duplicating the picture file, and erasing the unwanted elements in each file to create:

• one picture at 300 ppi, in contone mode (color or gray) for the pictures and colored/gray objects,

• one picture at minimum 600 ppi (better 800, much better 1200) in 1-bit mode for the black text and for line-art drawings,

and rebuilt each page by superimposing this 2 pictures (the contone below, and the 1-bit above) in 2 pictures boxes of XPress or InDesign...


***********

The vector and 1-bit pictures:
 

Attachments

  • AB0.jpg
    AB0.jpg
    72.1 KB · Views: 196
  • AB1.jpg
    AB1.jpg
    61.4 KB · Views: 207
  • AB2.jpg
    AB2.jpg
    66.5 KB · Views: 189
Last edited:
The contone pictures:
 

Attachments

  • AB5.jpg
    AB5.jpg
    71.2 KB · Views: 223
  • AB3.jpg
    AB3.jpg
    68.6 KB · Views: 184
  • AB4.jpg
    AB4.jpg
    74.1 KB · Views: 196
  • AB6.jpg
    AB6.jpg
    91.9 KB · Views: 189
...and the 4 contone pictures in the same picture to compare the result at different resolutions...
 

Attachments

  • AB3456.jpg
    AB3456.jpg
    102.7 KB · Views: 193
@ Claude72

I read the original post and believe I understand the issue. It is not ideal, but it is often workable. In a perfect world, all type is vector - but when all you have is hard copy, well ... thats what you work with and that is what the customer is asking to be done.

IoFlex offers a a set of tools that enables the user to improve text that was captured by scanning paper - or inkjet - or film - or mircofilm - using image processing. This includes de-screening of color type, improving low resolution scans using upscale thresholding and edge detection algorithms.

"You don't need a technic help, but a wizard-stick or a miracle: a contone scann will never be able to output correct text, even with a top quality scan..."

I will simply disagree with this statement.

"as it is not possible to have a good printing of 1-bit picture with a resolution as low as 300 ppi, so you cannot fix the problem by simply transforming a 300 ppi contone text scan in a 300 ppi 1-bit scan!!!..."

IoFlex customers would agree, but if that is all you have and you can't re-scan - especially if the 300ppi scans are in color, IoFlex offers tools to improve this and deliver 900 ppi 1 bit.

"So, do what you want, but text in contone picture will always look hazy!!!"

I understand your point, but if all we have is a printed original, and the customer is interested in scanning this and then printing this, my POINT is that all is NOT lost, this is not only POSSIBLE, but IoFlex customers do this on a daily basis.

IoFlex customers often start with 600 ppi RGB scans. Higher resolutions are helpful, of course. using image processing as I have describer earlier, we have some customers whom have gone to press starting with resolutions as low as 300 ppi - using upscale threshdolding and edge improvement, IoFlex output become 900ppi 1 bit text (is that is what is desired) or de-screened spot color text (if that is what is desired) or CMYK (removing the screens so they can be re-screened).

I mention 900ppi for several reasons. one of our customers was interested in seeing just how much up-resing was helpful, and where there was a diminishing return. when it came to processing 600 ppi input scans, there was no difference in the photo-micrographs between 900 or higher in the final printed sheet. of course, if the input scan was higher (in high speed duplex book scanners like the Fujitsu 6670, this typically is not the the case)

BTW - page segmentation scanners for companies like Xerox have been doing color copy improvement like this since the 1980s. This idea is not new. I was simply explaining that if you only have a hard copy original, and you need to scan and print it, that there are tools available.

I have been doing prepress for a little while, and was the product marketing manager for AGFA Apogee Series 1. I think I can grasp you "vector is better and here is why!" exmples without the need of the images - and since the original post was related to 'what can i do with this mess I was handed" - well, most of your last post really does not help exify - your statement that converting scans into PDF is not actually true, as this is a settings and preference issue - one can use Adobe Acrobat to successfully retain the original resolution of the scans - Adobe does not "always resample contone images".

What can I say, we do this sort of thing all the time, and many of the books that you buy from Amazon were scanned at Book Surge or Lighting Source (who use IoFlex) and they were scanned at 600 ppi.
 
I read the original post and believe I understand the issue. It is not ideal, but it is often workable.
Everything is "workable"! I have seen customers happy with 72 ppi (or less) pictures and with 150 ppi posters made with Photoshop, saved in JPEG low quality and simply printed "as is"...

... but there is difference between workable and quality printing...

Ok, your software can do an half-miracle by extracting the text only and artificially raising the resolution to have enough pixels to use the 1-bit mode... OK... but it is not my way to work: I have already been obliged to deal with paper text, I prefer to do a 1200 ppi scan and separate myself the text and the other stuff...
... and when the customer gives me crappy 300 ppi contone scan, I throw him away because I refuse to be involved in such a butcher-job...


(but I am always surprised by this job where so many tools and softwares and options are simply created to fix the mistakes and the bad jobs of incompetent users... instead of trying to teach them the good practices, everybody prefer to leave them go on with their errors and try to find solutions to make crap become printable... it always amazes me!)



your statement that converting scans into PDF is not actually true, as this is a settings and preference issue
Yes, of course, this is only a settings and preference issue!

... but my statement is true, not because of the technical possibilities of the Distiller, but because of the human behaviour: most often, the user that is enough incompetent to scan crappy ink-jet paper for text is not able to change the settings of the default Adobe job-options and/or to set correctly the preferences of his/her softwares... as in most cases, that kind of incompetent users reach their limits in creating PDF when they simply succeed to click on "Export as PDF" and on one option - hopefully, most often the "Press" option - to output a PDF... and the resolution setting of Adobe's default "Press" option resamples the contone pictures at 300 ppi.

Sorry, I'm very negative... but I do this pre-press job since more than 15 years, and you will not believe all the horrors I have seen...



one can use Adobe Acrobat to successfully retain the original resolution of the scans
???:confused::confused::confused: please explain how Acrobat can find back all the original pixels that have been suppressed by the Distiller down-sampling??? generally, all that have been suppressed, especially on another computer, is impossible to recover???


Adobe does not "always resample contone images"
You're right, it's only an option in the settings...

... but the default "Press" option, which is the most used to easily produce "print-ready" PDF, always down-samples at 300 ppi all contone pictures that have a résolution more than 450 ppi.
 
Last edited:
Claude72,

Wow.

Okay, please take a deep breath and make that - oooohhhhmmm sound. there. that is better, yes ?

This thread is - as far as I can see - entitled "Fix Scanned PDFs" - which (unless I have lost my mind) means that someone has a paper document, these documents were scanned - and these scans ended up in a PDF.

This means that Adobe Acrobat Distiiller & Distiller settings have nothing to do with this.

This also means that "export to PDF" and "Press options" have nothing to do with this either.

If I have a folder full of uncompressed 1 bit, 8 bit and/or 24 bit TIFF files - and open Adobe Acrobat Professional, then go under the file menu and select Open - Acrobat will simply convert the file as is, it will not "ruin" the quality, it will not automatically use JPEG and introduce artifacts. If you resolution was 2453, the image resolution will stay at 2453. It will not downsample. In fact, Distiller and Adobe InDesign can be set up in the same manner - it is all about taking a moment and changing the settings.

So, what can I say Claude72. Just because you have been in prepress 15 years does not somehow mean you are always right or that you fully understand any and all their is to know about PDF files and the tools that make and modify them.

I am 53 years old. I have known PDF since it was just P - PostScript. I have no idea why you keep rambling on about what can't be done - IoFlex customers do this every day and have been doing this for millions of pages.

The fact is that even our most picky customers start with input scans of color text books with fine lines, mathematical formulas and screened halftoned images at 600 ppi and RGB mode - most use JPEG compression settings at 80% quality. That is where they start. What they end up with is a PDF where the images are descreened, the type is 1 bit, the dust speckles are removed, the page is desckewed and aligned front to back and they either create color separations and make plates and fire up a Heildelberg - or print them on a Kodak Nexpress, and HP Indigo or an Xerox iGen.

I am though here I guess - have a great day and I wish you well. I live in Simi Valley CA, perhaps we might somehow get together somewhere and I can show you our software, or email me and I can arrange for you to try it yourself.

[email protected]
 
This thread is - as far as I can see - entitled "Fix Scanned PDFs" - which (unless I have lost my mind) means that someone has a paper document, these documents were scanned - and these scans ended up in a PDF.

This means that Adobe Acrobat Distiiller & Distiller settings have nothing to do with this.

This also means that "export to PDF" and "Press options" have nothing to do with this either.
Perhaps yes, perhaps no: the résolution of the scanned pictures in the PDF has an importance.



So, what can I say Claude72. Just because you have been in prepress 15 years does not somehow mean you are always right or that you fully understand any and all their is to know about PDF files and the tools that make and modify them.
Yes, your right... I know... and it was not what I was saying... I never used my 15 years experience to argue about my knowledges, but I just said that in 15 years a saw many horrors...



In fact, Distiller and Adobe InDesign can be set up in the same manner - it is all about taking a moment and changing the settings.
Yes, I know that... again, I'm not completely stupid and I have a little knowledge about PDF.

You're right on the technical subject, but the problem is not with the technic and the technical possibilities, the problem is with some users and their lack of knowledge: it's easy to change the settings, but to change something, users must first know that it exists and know how to change it...

But it's not the subject...
 
@ Claude72

when you write "the problem is with some users and their lack of knowledge" - I would suggest that this is the case with all of us. We can never be exposed to every situation, nor are we all knowing. I would like to (again) point out that in the original post, "exify" simply was asking how to work with what he has. Your suggestion was to somehow punish the client - my suggestion was that there are several software applications that might help impove the results. Using PDF Optimiser (as exify mentions) had worked for them in the past, but not so much this time.

here are a few other tools that help convert scanned images into 'print ready' PDF.

IoFlex BookMaker Pro
IoFlex BlackBox
ELAN Proofer Suite (from ELAN GMK)
Kodak SmartBoard Document Mastering Software
Xerox FreeFlow

If you have the originals to scan

Kodak Perfect Page
http://graphics.kodak.com/docimaging/uploadedfiles/en_perfectPage_Web.swf

Kofax VirtualReScan
VirtualReScan

hope this helps
 
Your suggestion was to somehow punish the client
It was a kind of joke... in a general manner, I prefer teach the good practices to the customer, than fix his mistakes and let him/her redo these mistakes again, and again, and again...



Edit:
when you write "the problem is with some users and their lack of knowledge" - I would suggest that this is the case with all of us.
Yes, you're right... and that's my case with english language understanding and writing...: I'm not enough competent in english to have a discussion with native english language people...: I don't understand everything you write, I miss some of your expressions, and I don't succeed to say exactly what I want... Sorry. Bye.
 
Last edited:
and when the customer gives me crappy 300 ppi contone scan, I throw him away because I refuse to be involved in such a butcher-job...

(but I am always surprised by this job where so many tools and softwares and options are simply created to fix the mistakes and the bad jobs of incompetent users... instead of trying to teach them the good practices, everybody prefer to leave them go on with their errors and try to find solutions to make crap become printable... it always amazes me!)

Throw customers away? Bad business if you ask me. Look, not everyone who puts a job together is a graphic artist or pre-press professional. They are more than likely personal assistant or office worker who was given the task to create a newsletter or flyer. They have neither the time to learn the correct software (or even the money for the correct software for that matter) to do the job correctly. My job is to get it to look the best it can in any given situation for the customer (who I might add is paying the bill AND keeping me in work) and get the job out the door and THEN go back and make suggestions for the next time. I don't want to bounce a job back and forth for 4 or 5 days with the customer so they miss their deadline. I'd rather be the "savior" and get the job done and out the door. Period. Why bicker about something that is NEVER going to change as long as customers try to be designers? I agree that the world of printing quality work is turning to crap, has been for a long time, but it is what it is and it's not going to change.
 
Nice pissing match here but...try the OCR functionality built into Acrobat. It might just help the original poster.
 
can't even OCR it... was scanned as a TIF and converted to PDF... or at least that's what I assume from the customer's "i did the thing and it said this so i made it do that and boom PDF" conversation.
 

PressWise

A 30-day Fix for Managed Chaos

As any print professional knows, printing can be managed chaos. Software that solves multiple problems and provides measurable and monetizable value has a direct impact on the bottom-line.

“We reduced order entry costs by about 40%.” Significant savings in a shop that turns about 500 jobs a month.


Learn how…….

   
Back
Top