Indesign Data Merge Question

gg2017 · Jan 7, 2010

I am importing a list of names into indesign using data merge. The excel file has Names using Roman characters, however some of them also have names or titles using chinese characters. The problem is there are 2000-3000 names every week I have to import so searching and fixing the ones with chinese characters by hand isn't impossible, just a little time consuming. The fields in excel arent seperated between Chinese and roman fonts. its all in a name field. When I import it in Indesign, the font Im using is Futura, the Roman characters come through fine but the Chinese are boxes. Is there a relative easy way to fix this either by creating a new mixed font called Futura-Chi or can I somehow fix this with find and replace? Find Font doesnt work because the font is loaded....its just the characters arent available so it doesnt show any problems. Any help is appreciated.

Stephen Marsh · Jan 7, 2010

Hmmm....

What font is the original MS Word file using, I presume that it is an OpenType font?

The Futura that you are using, I presume is not an OT font, that it is TT or PST1?

What happens if you use the same font in INDD as in MS Word?

Still thinking on this...

[EDIT] Or is the problem when you save out to .CSV and the Chinese characters are lost there?

Stephen Marsh

gg2017 · Jan 8, 2010

The excel file is using Arial for the Names and a Chinese font for whatever the chinese part of the name is. I am saving the file out as A UTF 16 Unicode file because there are lots of funky characters in it for Polish Names, German names, etc...
That file looks good in textwrangler, all the characters(chinese,german,so forth) are all showing correctly. Its just when I data merge the txt file, the font in Indesign is Futura so the chinese fonts aren't merging right since there isn't a font match in Futura.

Thanks for helping with this.

Stephen Marsh · Jan 8, 2010

?, I am running out of ideas.

I thought that this may just be a simple font issue, the same font used for both Western and Chinese characters - as some OT fonts contain both character sets. If the Chinese uses different fonts, then I am guessing that you need to bring in the character or paragraph sytle into InDesign so that you can map the style to a style and font in InDesign that has Chinese characters.

You say that the Chinese looks fine in TextWrangler...what if you open the same file into TextEdit...does it display incorrectly as InDesign does? Is this a rich text vs. plain text issue? Why does TextWrangler work correctly and why does InDesign fail?

There may be some macro or other way to isolate the Chinese text in MS Excell, however that is beyond my knowledge.

I personally would see if the client could send you a separate database that only has the Chinese names, then you could setup a separate masterpage and data merge with the correct font.

You would obviously like to automate this task. The best InDesign/Automation source that I know of is here, join up and ask your question (email list):

Browse - InDesign Talk

Please let the forum know what the solution is when you find it!

Good luck,

Stephen Marsh

kyle · Jan 8, 2010

You should be able to do this with Indesign's Find/Change process.

With all of the text in Futura, select Edit>Find/Change.... In the Find/Change window, change to the "GREP" tab. In the "Find what:" field, enter "[^[:space:][

rint:]]" (without the quotation marks), then click on the magnifying glass icon to the right of "Change Format:," and select the font for the Chinese characters.

This should change all characters that are not normal printable characters to the other typeface.

The outer enclosing brackets define a list of things to match, the caret inverts the match (selects everything that isn't listed), "[

rint:]" means all printable characters including white space (which should include the carriage-return characters but doesn't), and "[:space:]" means whitespace characters, which will capture the carriage returns.

EDIT: I guess a colon followed by a lowercase P is interpreted as a code for a smily with its tongue out. I don't have time to figure out how to force it to be correct. Any time you see "[

rint:]," it should be "[ : p r i n t : ]" without the interleaving spaces.

Stephen Marsh · Jan 8, 2010

Kyle, thanks for bringing up GREP, it is not easy to learn - however it has a lot of power! The InDesign email list that I linked earlier has a few GREP experts on it and I have tried to follow their posts without too much success, GREP syntax is not easy to learn for the casual user.

That being said, I don't think that the Chinese font is in the InDesign file, nor is it in the data merge source .csv file (unless UTF16 includes font metadata info?). The font data is in the original Excel file, however this font metadata would be lost when exporting out a .csv or tdv. Or am I wrong in this regard? There are many formatting options (UTF-8, Unicode etc). The data merge source file is just a simple text file in .csv format - or perhaps tabs separating the fields, which is exported from Excel. Please correct me if I am wrong, the plain text file does not contain any font information, just ASCII text.

When I have performed data merges, the .csv file did not bring in any font or other information, just the raw text. Perhaps this is because I generally export out a .csv in Unicode format from NeoOffice. I have had data merge character translation issues in CS3 when using UTF formats. The font info is in the InDesign master page items that are linked to the data merge document.

I am looking to learn something new here too, thanks!

Stephen Marsh

kyle · Jan 8, 2010

Stephen,

In Unicode, there is not a one-to-one relationship between bytes in a file and characters (except that in UTF-8 encoding the standard 128 ASCII characters are directly mapped to the same bytes they are in 8-bit ASCII). This allows for more than 256 standardly defined code points. There are currently more than 100,000 defined, and the current specification allows for more 1,000,000. A capital Eth in the Icelandic language (Ã) is always code 00D0 (hexadecimal), and is stored as 00D0 in UTF-16 and C390 in UTF-8. The different encodings (UTF-8, UTF-16, and others) are just different ways of representing the same Unicode characters in data. If text is mostly Chinese with a small amount of English, UTF-16 encoding would be more compact, whereas text that is mostly English with a little bit of Chinese would take up less space stored as UTF-8.

Before Unicode and multibyte fonts, non-ASCII characters would be mapped often arbitrarily to different code points in a font, creating a sort of cipher text (like how the Wingdings font replaces "Y" with the Star of David, and Zapf Dingbats puts the same character in place of "A"). Fonts mapped in this way were often interdependent with the text that used them - the text made no sense unless rendered with that particular font.

With Unicode, there is no ambiguity about the intended character represented by a code, and the only concern between one font and the next (other than aesthetic considerations) is wether one or the other contains a glyph for all of the characters to be used. If the Unicode specification existed from the beginning of time, Wingdings and other "pi" fonts would probably be empty in the normal ASCII range, because there is a defined Unicode code for "smily face," "skull & cross-bones," and "ridiculous-looking ornamental detritus."

The data is probably stored by Excel natively as Unicode, and when the file is saved as UTF-16 text (not CSV - all non-ASCII characters would change to underscores), the raw text is preserved as English and Chinese in Unicode, without any font information at all.

The poster's file probably looks okay in one application because it is showing all of the text in a font that contains all necessary glyphs, or using a different font for just the Chinese characters because it knows the default font is missing glyphs for them.

In Indesign, it sounds like everything is in Futura, which lacks glyphs for the Chinese characters and probably shows them as squares. If the poster changes just these characters to another font, they can keep Futura for all of the English and display the Chinese in another font.

If you are using OS X, run Terminal.app from Applications/Utilities, then type "man grep" and return, and you can read the manual for the Unix program "grep," from which the Indesign find/replace option gets its name and copies its syntax. The manual is probably available online also.

Stephen Marsh · Jan 9, 2010

Kyle, firstly, my sincere thanks for your time and effort in your posts, it is greatly appreciated!

The os x terminal grep manual is also noted with thanks!

I am not sitting in front of InDesign while I write these posts, so I am flying blind. Perhaps I am over thinking the situation or I am missing something basic in what I think I understand (very possible!).

From my experience with data merge fonts in the database file are not recognised, it just comes in as "characters" that are linked to the text and font used in the master page. So I would imagine that there is only one font so I can't visualise how the OP would use grep to search for the Chinese font, as there would only be futura in the document and the Chinese font would not be listed (or am I wrong in presuming this?).

There are many names, a single column for all names in the database and the Chinese names would be randomly placed among thousands on entries. In such a situation, handling these exceptions manually would not be productive.

With luck the original poster can try your suggestions and comment and or supply sample files. I would really like the OP to get back to us all here with an answer when the issue is finally resolved.

Thanks again for the conversation!

Stephen Marsh

kyle · Jan 9, 2010

Stephen,

You are correct that the OP could not search for a different font because everything is likely in Futura. The find/change process I suggested does not do this, however. Rather, it searches for all characters that are not normal ASCII or white-space characters, regardless of what font they are set in, and changes the font used for them to another font that contains the necessary glyphs to show the Chinese characters. It changes one property of the matching text (the font) using an independent property to determine what matches (the characters in the text), sort of like finding all text that is red and changing its point size.

Similarly, strictly staying within the bounds of ASCII, you could have a document that is all English text that is all set in Helvetica, and change all vowels to Garamond if you wanted to go for that elegant ransom note effect. That's the same thing the process I suggested would do, only it is targeted specifically at the characters that Futura is not able to display.

When the document is first merged, all of the Chinese characters probably show up as squares because Futura lacks the glyphs for them. They are still there, however - just not plainly visible. If the text were then changed to a font containing the necessary glyphs, the Chinese text would appear, but the English might not look good because many Asian language fonts don't have pleasant-looking roman letters, or could be missing them altogether. The find/change process would divide the text into two groups - text that can be displayed in Futura and text that must be displayed in another font - and change the latter to another font, leaving Futura set for text that it can show.

Stephen Marsh · Jan 9, 2010

Kyle, thanks again for your patience and excellent explanations - I finally see where you are coming from!

As mentioned, I don't know much about grep - only enough to know that it has the answers to many problems in InDesign...if only one knows how to use it.

I will give your grep search a go in CS3 when I return to work (from what I understand, CS4 has improved grep over the options in CS3).

I have a very small list of grep links, if anybody has more links to add they would be appreciated!

Stephen Marsh

Adobe CS: Awesome Find/Change GREP tip for InDesign CS3 :: Tech Videos, Screencasts, Webinars, Techtalks, Tutorials

Free InDesign Tutorial | Using GREP styles changes | Layers Magazine

Lightning Brain GREPGrokker

Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns

"Grep query" thread from InDesign Talk

EDIT (NEW LINK)

http://www.night-ray.com/regex.pdf

Stephen Marsh · Jan 10, 2010

Kyle, I tried the grep string in CS3, it comes back with an error that a match can't be found - a screenshot is attached below.

Stephen Marsh

Stephen Marsh · Jan 10, 2010

Attached is a CS3 .zip archive containing the INDD file before running the datamerge and after running the datamerge - including the Unicode UTF16 .csv database file.

Stephen Marsh

kyle · Jan 11, 2010

Stephen,

You need to add a colon before "print."

gg2017 · Jan 11, 2010

Kyle...thanks for posting about this. I did try to use GREP before I posted but I didn't know what the hell I was doing. You are correct about my file setup. I will report back tomorrow after I get back to work and try this new information. The client said I could just delete the Chinese characters for this run, so thats what I did to get the job done friday. I'll try this and see how it goes on following orders. Thanks again for all the help to both of you.

Schnitzel · Jan 11, 2010

I had the same problem as you, when producing variable data jobs with mixed Hebrew and English characters.
To phrase it simply: I wanted InDesign to use a specific font for the Hebrew text, and another for the English text. Neither InDesign nor my VDP plugin (InData) had a way to set this preference.

As you suspected, I finally opened the Hebrew font in FontLab, pasted the English font's glyphs at the appropriate character range, saved it as a new OpenType font, and voilÃ ! Using this single new font, InDesign automatically takes the correct glyphs for every language in the text. Maybe the open-source FontForge editor will work too.

By the way, I have no idea whether copying/pasting the glyphs violates the license agreement of both fonts, but I couldn't find a better way to solve this, and at the bottom line, I am using the fonts as they were, it's just a small hack behind the scenes to make life easier...

Stephen Marsh · Jan 11, 2010

Thanks Kyle, I obviously missed that colon, cheers - it all works correctly now!

Stephen Marsh

Indesign Data Merge Question

gg2017

Member

Stephen Marsh

Well-known member

gg2017

Member

Stephen Marsh

Well-known member

kyle

Well-known member

Stephen Marsh

Well-known member

kyle

Well-known member

Stephen Marsh

Well-known member

kyle

Well-known member

Stephen Marsh

Well-known member

Stephen Marsh

Well-known member

Attachments

Stephen Marsh

Well-known member

Attachments

kyle

Well-known member

gg2017

Member

Schnitzel

Well-known member

Stephen Marsh

Well-known member

Similar threads

InSoft Automation