Ok,
I did not have a "pstopdf" process during the imposition process, but I had a process called "Runnorm".
Indesign was claiming 80-95%
Runnorm was at 99.6%
The interesting note is that my processor idle was around 71-72% during both processes. I've observed this before, and have wondered why it seemed to be only running at 25%. Also, I checked Top using a terminal window, and did not see the "pstopdf" process there either.
Ok, my versions are older, and the ability to use more than one core hasn't been consistent with newer versions of XMPIE. I will do a few tests, but this is what works under CS3/10.4.11/XMPIE 3.5
PSTOPDF is the process name under XMPIE3.5. I believe the XMPIE4 plug-in calls it RUNNORM.
If you create additional OSX user accounts, you can run more than one InDesign. With a 4-core machine, you can theoretically run 4 instances of InDesign (one in each user) and max out all the cores. Even though InDesign isn't capable of using more than 100% of a CPU in a single instance, you can use multiple users to get processing on the unused cores.
There's a problem though. When all the users on a machine are running the same copy InDesign (/MacHD/Applications/..), the InDesign process itself is capped at 100% even across multiple instances. That is, in one user, you launch ID and start running a job (almost 100%), then switch to another user and run another job, the InDesign processes will both drop down to 50% for each user (100% total). Run another user and each one is at 33%, etc. The Runnorm processes are spawned independently, and can each go to 100%, but InDesign itself is capped. Apparently, there's some sort of shared printing resources at the system level. While using multiple instances of Runnorm is helpful, it's only half of the job, and InDesign is still a bottleneck.
The trick is to copy InDesign to the local user directory, where the other user's cannot see it. Sandboxing it to /User/Applications/ means the process and all the resources run independently. Each user runs it's own copy of InDesign, and can go up to 100%. This probably violates the end user agreement if you are using the same serial number over multiple users, but it's still on one machine. Think of it as crude virtualization. This didn't work so well under XMPIE 4.6, so your mileage may vary. I plan on revisiting this soon.
Once each user is running their own local InDesign, you will see each user process get up to 100% of a CPU. There's some overhead though. While composing records, InDesign's GUI shows you a status bar, which eats up 3-5% of the user's max (in a process called WindowServer). That's why the InDesign process utilization is a bit lower than Runnorm. For some reason, InDesign+WindowServer can't exceed 100%. That overhead adds up, and if you run 4 Indesigns across 4 users, you probably will see WindowServer rise to 15-30% and cut the InDesign process down to 80-85% each. Once Runnorm is spawned and takes over, WindowServer and InDesign go to almost 0% and Runnorm hovers closer to 100% (gaining a bit of CPU because there's no GUI activity during this phase).
It may sound a bit complicated, but once you get used to the workflow, it really takes a natural rhythm. Typically, we will do something like this when faced with a large processing job:
1. Job is 32,000 records. I do some testing to find a quantity that works for bindery, the digital press operator, and gets me a good record-per-hour speed. There's a point of diminish returns with batch quantity and XMPIE. In my experience, 4000 records is a comfortable set. Some testing may be in order. You might find that a batch of 2000 is done in 5 minutes, 4000 in 15 minutes, 6000 in 30 minutes, and 8000 takes an hour. That means 2K batches yields 24Krph, 4K=16Krph, 6K=12Krph, and 8K=8Krph. Even though the 2K batch is the fastest, 5 minutes is just too short so I'd settle for 4K batches. Testing is important, and you should weigh batch size vs what works in your situation. Like I said, 4K is a nice batch for the work I do.
2. Link your ID file to the run CSV. Save and close.
3. Dupe your InDesign file, name it RUN-00-04, and make more copies, called RUN-04-08, RUN-08-12, etc. For a 32K file divided into 4K batches, you'd get 8 files.
4. open RUN-00-04, and dynamic print from 1-4000.
5. change users, open the next file (RUN-04-08) and print from 4001-8000. I find it useful to take notes so I know where I am. Being linked to the same CSV doesn't cause problems.
6. change to another user, open the next file (RUN-08-12) and print from 8001-12000. You make multiple files because the other files are busy until you close them.
7. Repeat for all users until you've pegged all the cores.
8. Knowing how long it takes, come back in 15 minutes and start another batch for all users.
9. Double-check your work by opening the PDFs, making sure correct start/stop and page count.
I know that sounds like a lot of manual work, but it's not bad. You wind up really cranking out a lot more pages in the same amount of time. In the above example (a real job, actually), you can get 64Krph instead of 16Krph. More cores would get you more performance. For 32K records, I'd be done in about 30 minutes.
It's important to note that parallel processing speed aren't linear. That is, just because you can get an effective rate of 600Krph doesn't mean you can do 100K records in 10 minutes. Like carpooling in a sedan, just because you move 4 people 60 miles in one hour doesn't mean you can get one person 60 miles in 15 minutes. It's a parallel process, and running things simultaneously is different than overclocking.
Fonts can be an issue if you try to control them on a user-level. I try to use global fonts by putting them in /Library/Fonts. Be very careful here: all users must use the same fonts for consistent output.
Also, this works for my type of work. My run lengths are in the 10K-25K range, and the file complexity is 10-20 fields, layer visibility, plus address block. Performance varies, but we don't typically throw multiple users at a jobs unless the thing's going to take one user more than 30 minutes. It depends on workload. The resulting files are also easy to split to our different output devices (two Digimasters). In another company, this might not make much sense. We developed this system within our workflow to address what we needed from it. I've never tried to use ARD with this, but since I can generate so many pages per hour, I don't have much need to.
Again, you may have problems with new version of InDesign or XMPIE. Earlier testing indicated inconsistent results, and sometime the parallel processing didn't work. I'm revisiting the issue because we've got a new 4-way Mac Pro and CS5 upgrades. It shows 8 virtual cores, and I'm curious to see if I can get the same performance from a virtual core as a real one. I can't seem to get this right under 10.6.3 either. If I can get linearity back, I'd love to get my hands on a new 12-way Mac Pro, which hyperthreads into 24 virtual cores. We never got this to work under Windows either (using more than one core).