Xpdf Version 3.04 Released
2014 May 28
Glyph & Cog is pleased to announce the release of Version 3.04 of the Xpdf tools.
If you are a current licensed customer, you should have received download links for Version 3.04 of all licensed products.
Feel free to contact us with any questions: info@glyphandcog.com.
New Release
New features in Version 3.04 include:
A new text extractor. We completely rewrote the text extractor – used in XpdfText, XpdfViewer, and XpdfWidget – to make it more accurate. The new extractor also includes a "table" mode optimized for extracting tabular data, and a "line printer" mode for PDF files that use monospaced fonts.
A new rasterizer core, used in XpdfRasterizer, XpdfViewer, and XpdfWidget. This new PDF rendering engine is significantly faster than the old one, while maintaining accuracy and following the PDF scan conversion spec.
The Roadmap
Now that 3.04 is done, our next big project is re-architecting the Xpdf core to allow multithreaded rendering. This will make screen updates faster on multicore machines. More importantly, it will allow us to decouple rendering from the user interface event loop, making XpdfViewer and XpdfWidget/Qt more responsive.
We're also working with HTML. New products will do PDF-to-HTML conversion, and the reverse: HTML-to-PDF conversion.
Color Separation
XpdfRasterizer 3.04 includes a DeviceN rasterizer, which can generate separations – CMYK as well as spot (custom) colors.
The code sample here uses the DLL/library version of XpdfRasterizer. The COM version has similar functions.
First, load the PDF file:
To rasterize page 5 at 300 dpi, in DeviceN mode:
That function generates the color separation in memory, but doesn't actually return any information yet.
To get the number of channels:
The first four channels (0..3) will always be Cyan, Magenta, Yellow, and Black. Channels 4 and up will be any spot colors used on the page.
To get the channel name ("Cyan", etc.):
To get the CMYK value for the channel – a 32-bit value, with 8 bits per component, in CCMMYYKK order:
And finally, to get the bitmap for this channel:
The pdfGetDeviceNBitmap function works very similarly to the pdfConvertPageToBitmap2 function. The returned bitmap will be 8-bit grayscale.
Possible uses are to:
save the channel bitmaps separately
recombine channel bitmaps, possibly dropping or modifying certain channels
Be sure to free the bitmap memory when you're done:
The XpdfRasterizer manual is available on our web site.
Glyph & Cog
Glyph & Cog has been in the business of providing software components for PDF manipulation since 2002.
For more information on any of the products mentioned here, see our web site:
http://www.glyphandcog.com/
or email us at info@glyphandcog.com
This newsletter is sent to all of our customers. To be removed from the list, please email info@glyphandcog.com.