Open-source SVG-based data merge engine (Inkscape). Looking for VDP/prepress feedback

xoel

Member
Hello.

I’m working on an open-source data merge engine built on top of Inkscape, focused on flexible, SVG-based templating for variable graphics.

The system is already functional and supports:

  • Template-driven batch generation from structured data (CSV / Google Sheets)
  • Precise positioning and transformations of elements.
  • Dynamic text handling (including variable-length content, inline icons/images)
  • Use of external sources (e.g. iconify, wikimedia, pixabay, openfreemap...)
  • Generation of large sets of unique outputs from a single template
I’m now exploring how far this approach could be pushed towards more production-oriented VDP workflows.

Some of the areas I’m considering next:

  • Preset layouts for labels and cards (e.g. Avery-style sheets)
  • Barcode and QR code generation (likely via bwip-js or similar)
  • PDF/X output using Ghostscript for better prepress compatibility
  • Basic imposition workflows for print-ready sheets
Since my background is more on the development side, I’d really value input from people experienced in VDP / prepress:

- At what point would something like this become usable in a real production workflow?

- What are the critical gaps you’d expect (color management, PDF consistency, font embedding, etc.)?

- Is an SVG-based templating approach viable long-term, or are there fundamental limitations?

- Any standards or best practices I should prioritize early on?


The goal is not to replace existing VDP systems, but to understand where a lightweight, open-source approach could realistically fit.

Thanks in advance!
 
Although not my area (large-format printing and textiles), from what I know, you have to export to PDF for people to use your program. SVG for printing probably has two big problems: color management (recent changes) and support from RIPs. Just my two cents.
 
Thanks, that’s very helpful.

Yes, my idea is not to rely on SVG for final output, but to use it as an internal templating format and then export to PDF for production.

I’m currently exploring a Ghostscript-based step for PDF/X output to improve compatibility with RIP workflows (My experience in this world is very, very small and I had to look up what RIP is)

Good to know that SVG support itself would be a limitation in practice.
 
Since my background is more on the development side, I’d really value input from people experienced in VDP / prepress:

- At what point would something like this become usable in a real production workflow?
I don't see an example of a use case here.
Exactly WHAT are you planning to output? Just VDP products?

- What are the critical gaps you’d expect (color management, PDF consistency, font embedding, etc.)?
Inkscape is a bitmap application so Fonts should be no issue as they will disappear at output.
File size for bitmap items - 4x4 is 16 times larger than 1x1 - so your final output product size multiplies the file size.
300 dpi minimum but we routinely use 1200 dpi. One single letter (A4) size page at 1200 dpi is 30+mb.
- Is an SVG-based templating approach viable long-term, or are there fundamental limitations?
Fundamentally bitmap is the issue.
- Any standards or best practices I should prioritize early on?
Ask what works - first - and understand the difference between bitmap and vector objects. You can use image replacement options for the RIP in VDP output to lower the overall multipage file sizes but if you make them SVG and they are unique you have to send individual pages so BIG files or lower quality.
The goal is not to replace existing VDP systems, but to understand where a lightweight, open-source approach could realistically fit.
Programmatically Inkscape is useful. How large is your output file when you send 1,000 pages of 5x7 postcards with bleed and variable text at the minimum 300dpi? I can send a file that is 10's of mb if I use PDF with vector fonts and type and most of that file size is PDF overhead. YMMV
Thanks in advance!
 
I don't see an example of a use case here.
Exactly WHAT are you planning to output? Just VDP products?
Hi @chriscozi . I have a powerful system that can create SVG files from dataset. I think that in terms of design, it probably has many more features than existing VDP tools, but it wasn't designed for professional VDP.
From SVG, I can export to all supported formats in Inkscape, (pdf, png, jpeg, tiff...) but for VDP, I understand that only pdf is a valid output.
Inkscape is a bitmap application so Fonts should be no issue as they will disappear at output.
File size for bitmap items - 4x4 is 16 times larger than 1x1 - so your final output product size multiplies the file size.
300 dpi minimum but we routinely use 1200 dpi. One single letter (A4) size page at 1200 dpi is 30+mb.
Inkscape is a vector application that works natively with SVG format. It is the opensource version of Illustrator. So by default it has no dpi limit, or it's as high as any vector tool.
Fundamentally bitmap is the issue.

Ask what works - first - and understand the difference between bitmap and vector objects. You can use image replacement options for the RIP in VDP output to lower the overall multipage file sizes but if you make them SVG and they are unique you have to send individual pages so BIG files or lower quality.
SVG is a standard almost as comprehensive as PDF. It can be as simple as a small vector icon of less than 1KB (i.e. most of the icons you see in all webpages around the word, included the arrow inside this "Post replay" button) or contain vectors, text, images, pages, filters, gradients, and all sorts of commands, just like PDF.
It can work with clones that avoid repeating elements, and even external references so can be a very efficient file format.

For example, imagine a 50MB image. An SVG could generate 1000 instances of that image cropping different parts, rotating them, stretching them, applying all kinds of filters, adding all kinds of text, and anything else you can imagine, and the SVG wouldn't take up more than 50KB, since only commands are stored in XML (and the original image could be stored separately).

Programmatically Inkscape is useful. How large is your output file when you send 1,000 pages of 5x7 postcards with bleed and variable text at the minimum 300dpi? I can send a file that is 10's of mb if I use PDF with vector fonts and type and most of that file size is PDF overhead. YMMV
Probably, since VDP flows do not support SVG, a conversion to PDF has to be done, and this is where the problem lies, since there are many ways to make a pdf and depending on how it is created it can be efficient or tremendously unmanageable.
 
Hello. I have some updates I'd like to share:

I can now generate SVG to PDF with virtually perfect fidelity, except for the complex filters that exist in SVG but not in PDF. I'm also able to intelligently rasterize only the affected areas by layer (like professional programs do). I can also create standard and CMYK PDFs with any ICC profile, and in versions 1a, 3, and 4 (only version 4 supports transparency).
Overall, the quality and optimization of the final PDF are very good, but the creation speed isn't as fast.

Now, I have many questions about things that are important for VDP users.

Is speed crucial?
Can jobs be divided into parts, or do they have to be huge PDFs?
If they are divided, where is this done? -In the dataset? -In the designer (Adobe)? -In the final stage (pdf)?
 
Is speed crucial?
Can jobs be divided into parts, or do they have to be huge PDFs?
If they are divided, where is this done? -In the dataset? -In the designer (Adobe)? -In the final stage (pdf)?
Breaking a large data file up would be an extra step no one wants to take. The batch process should be an option in the vdp software.

Option 1 within vdp software - single file output
Option 2 within vdp software - Multifile Output, then under option 2 another option for records per file

Speed would always be important, I don't want to wait 30 minutes for a job with 10,000 records. If we are talking the difference between seconds and few extra minutes, that's not a big deal.
 
Breaking a large data file up would be an extra step no one wants to take. The batch process should be an option in the vdp software.

Option 1 within vdp software - single file output
Option 2 within vdp software - Multifile Output, then under option 2 another option for records per file

Speed would always be important, I don't want to wait 30 minutes for a job with 10,000 records. If we are talking the difference between seconds and few extra minutes, that's not a big deal.
Thank you very much for your reply.

Regarding splitting the work, the system can do that (and in fact, it already does it internally to scale to large projects). My question is whether, for printing, it's preferable to have one 2GB PDF or ten 200MB PDFs...

The system is based on Inkscape, so it's completely controllable in batch mode and automatable (command lines/scripting).

Regarding speed, the problem is understanding what constitutes a single record. I.e, 350 cards with numerous different images, advanced filters (blur edges), transparencies, text, inline icons, etc., can be much larger than 35,000 cards with just text and the same logo.

Please, could someone provide an example of a VDP project I can find online and tell me what reasonable processing times would be?

[ For reference, currently, the 350 highly complex cards in 88 pages -my compex example now- takes me 26 minutes to generate an optimized 28MB PDF ].
 
Last edited:
Regarding splitting the work, the system can do that (and in fact, it already does it internally to scale to large projects). My question is whether, for printing, it's preferable to have one 2GB PDF or ten 200MB PDFs...
My opinion would be that splitting up large files should be up to the user to decide whether it’s necessary and if necessary, into how many files. Different digital presses and rip combinations will most likely determine what size files don’t slow down production, thus the reason why users need the ability to decide optimum file size.
 
My opinion would be that splitting up large files should be up to the user to decide whether it’s necessary and if necessary, into how many files. Different digital presses and rip combinations will most likely determine what size files don’t slow down production, thus the reason why users need the ability to decide optimum file size.
Agreed. The user may also want to split the files out to separate printers to speed up throughput. (i.e. sending files 1 through 5 to printer # 1, and, files 6 through 10 to printer # 2.)
 
Thank you very much for your reply.

Regarding splitting the work, the system can do that (and in fact, it already does it internally to scale to large projects). My question is whether, for printing, it's preferable to have one 2GB PDF or ten 200MB PDFs...

The system is based on Inkscape, so it's completely controllable in batch mode and automatable (command lines/scripting).

Regarding speed, the problem is understanding what constitutes a single record. I.e, 350 cards with numerous different images, advanced filters (blur edges), transparencies, text, inline icons, etc., can be much larger than 35,000 cards with just text and the same logo.

Could someone provide an example of a project I can find online and show me what reasonable processing times would be?

[ For reference, currently, the 350 highly complex cards in 88 pages -my compex example now- takes me 26 minutes to generate an optimized 28MB PDF ].
This might just be me, but I'd rather just have one large PDF. Acrobat has a "split pages" feature, so I don't mind just splitting the final output into x # of pages for my use. My RIP is pretty decent though, so I generally don't split my VDP up unless it's over 10,000 pages.

26 minutes seems extremely slow though. I can generate thousands of records with PSL Jetletter in under 5 minutes. Xmpie or InDesign's built in VDP is the only thing I can think of that's as slow as you describe.
 
@TJPrinter , @MailGuru , @namelessentity , thank you very much for your comments. They are very helpful in understanding the VDP issues a bit better.

I understand, therefore, that having a job splitter is highly recommended as a feature (although PDF splitting can be done afterward with external tools).

Regarding speed, I would like to run a test with a simple job to see the speeds achievable. The example I provided is not valid because it is extremely complex.

So as not to try things that don't make sense, please, would someone provide an example of a VDP project I can see online and tell me what reasonable processing times would be?

[ e.g. a business card, with 3 variable text fields, a logo image, a fixed text field, and a couple of embellishments (a separator line and a box) can be a good simple example ? ]
 
Last edited:
I've already done some initial tests that should be quite indicative of the speed.

Using the example I mentioned: a business card with one 300kb png logo image, three variable text fields, one fixed text field, and three rectangles...

The generation speed for 1,000 records (1,000 cards arranged on 88 pages) was 42 seconds.

The export speed to PDF was 96 seconds.

Total speed: 434 records/min

From what I've seen, the generation time could be optimized by doubling speed, and maybe could be better just in a PC with no other CPU load, or upgrade the PC (Core i7, etc.)... maybe 1.000-1500 records/min could be reached.

But honestly... The export functionality is currently limited by Inkscape and Ghoscript. In these cases, the export is optimized for fidelity, quality, file size, or compatibility and standard, but not for generation speed. So today, (until Inkscape 1.5 with its new PDF module is released) I think I have a bottleneck of 100-200 pages per minute.

Now, my honest question... Since I've seen that PSL jet reports 60,000-180,000 records/min...

Do you think this system, with these speed limitations, has any place in the VDP world?

Should I abandon the idea of using it in the VDP world and focus on the original functional objectives?
 
I understand that the lack of responses to the last post means a system with such poor throughput is pointless.

However, before giving up, I wanted to give the generator a chance, so I seriously optimized this part and achieved surprising and exceptional results...

63,777 records/min (a test with 3,000 differennt bussiness card arranged on A4 for labels)

1778432596702.png


1778431638903.png


And this is on a basic i5 laptop with additional CPU load running, but primarily a single-processor architecture.
With an architectural change to a multi-processing environment and powerful, dedicated hardware, I believe this speed could be multiplied by more than four times (x4), reaching speeds of 300,000 records/min.
 
Last edited:
I am working on some software myself but I am a prepress/designer by trade. Is you project in a Github repo I could check out? I am interested in this concept and would love to check it out.
 
I am working on some software myself but I am a prepress/designer by trade. Is you project in a Github repo I could check out? I am interested in this concept and would love to check it out.
Yes, project is free and public in github. You can googling for "pnpink"

But these latest developments (PDF export, optimizations, ...) have not yet been pushed (but probably very soon).
 
Awesome, thank you! You mentioned you were a developer, I would be very interested in talking with you in general if you are up for it. I know a lot about the printing industry and would love to build more powerful tools to help it into the future.
I am currently working on this site (it is actively being updated as I type this so bare with me)
I have bigger plans on the roadmap and will at some point need professional help, like most people in the commercial printing industry... :)
 
1779802395378.png

Hello.
I tested it on a standard laptop less than a year old, and it exceeded 100K/min in single-process mode, so generation speed is no longer an issue. I also achieved an optimized PDF conversion time of 8.9 seconds in single-process mode using Xobject for static parts. Therefore, I believe it's feasible to reach end-to-end speeds of 20,000 records/min today in a multi-process architecture for pdf export. It doesn't quite reach 60K, but I think these are reasonably fast figures, at least for lightweight VDP environments.

So now I need to identify the next points or features it should support.

Any ideas? Or any references where I can find the "must-haves" of a VDP software system?
 
Having the option to output the file(s) with or without the static background would be a feature that’s needed. Most rips will process files better when using the build in VDP option of creating a static page and then sending the variable data separately.

For me, your speeds of 20,000 records per minute would be outstanding!
 
Thank you so much @TJPrinter for your feedback. It's very helpful.

What you're suggesting is very easy to do, and in fact, I was already doing it to maximize speed:
the template is preprocessed to separate "layers of variable element groupings" from "layers of static element groupings." Keep in mind that in board game design, templates can become extremely complex (text over transparencies, over other variable decorative elements that change in position, size, colors, rotation, spliting, masking, shadows and complex filters, ... or even inline icons, e.g.:

":Tb(You) win 1:coin: for each:burger: in :Tb(your) 🏠." -> You win 1🪙 for each🍔 in your 🏠.

The system takes all of this and much more into account, so "1 variable layer + 1 static layer" is the simplest case.

For now, I was planning to add QR and barcode support, as well as preset layouts for standard labeling (Avery_5160, Avery_L7160, ...)
But given my complete lack of knowledge of these systems, I have the feeling that I am forgetting basic functionalities.


I'm glad to hear that 20K/min is an acceptable speed in this environment.
 
   
Back
Top