How to locate italics in PDF files?

Prepper

Well-known member
Anyone know of a way to locate all italics in a PDF? We have a large number of legacy files we're converting to digital by OCR, save as .rtf, place into Indesign and format. Works well but all formatting has to re-applied to prep them for importing into our system and the most time consuming part of that is having to manually scan all the pages for any italics and apply that back to the text. It's scattered throughout so you have to look closely and that takes a lot of time.

Or any other ideas? Possible to locate it in the .rtf files somehow but those aren't the originals, the originals are PDFs in this case. For info sake, these are old Pagemaker files, most of which we were able to export to .rtf and convert pretty straight forward, but we have some that would not work to do that on because of font issues, we're working with many languages, Russian being this particular one.

Thanks for any input
 
There is no simple way of locating (i.e., searching for) italicized text in a PDF file. Unlike an editable document (such as a Word document), PDF doesn't have attributes such as italic, bold, etc. associated with text. For that matter, unless you have a tagged PDF file, there isn't even any information about the document's logical structure in the PDF either!

Conceivably, one could write software that would scan the PDF file and look for text formatted in fonts that are known to be italic faces (and/or might have an italic attribute in the font definition's header or the word “italic” is in the font's name - neither of which are guaranteed). This would be a very non-trivial task! I know of no such existing application (at least yet :()

- Dov
 
Enfocus PitStop Pro to the rescue, with it’s support for finding fonts using a regular expression:


props.png - Click image for larger version  Name:	props.png Views:	1 Size:	13.1 KB ID:	263599
[/COLOR][/FONT]




found.png - Click image for larger version  Name:	found.png Views:	1 Size:	8.6 KB ID:	263600







Stephen Marsh
 

Attachments

  • props.png.jpg
    props.png.jpg
    13.1 KB · Views: 1,397
Last edited:
Ok, thanks. I did find a work around of sorts, if I export to Word from Acrobat, I can search for italic format and locate where it is that way. Even though fonts aren't right in our case it still locates italics for me without me having to scan every line visually for it.
 
I found a way to do it in Acrobat Pro. You can create a new Check (in the Preflight panel) using the 'Font is Italic" property. Beware: when you Add the property to define the check itself, the default setting is "is not true", which is the opposite what we're looking for. You have to set it to "is true" to be able to find italicised text.

Actually there is a lot more; we can look for small caps, serif and sans serif fonts etc. Wonderful!
 

PressWise

A 30-day Fix for Managed Chaos

As any print professional knows, printing can be managed chaos. Software that solves multiple problems and provides measurable and monetizable value has a direct impact on the bottom-line.

“We reduced order entry costs by about 40%.” Significant savings in a shop that turns about 500 jobs a month.


Learn how…….

   
Back
Top