PDFdeconstruct is a command-line tool for PDF-to-XML conversion. The XML output describes text, vector graphics, and images. PDFdeconstruct also extracts fonts and bitmap images from the PDF file.

The output from PDFdeconstruct can be used for content analysis, format conversion, etc.

Supported Platforms

Source code licenses are also available.

Intellectual Property

The PDFdeconstruct software and documentation are Copyright 1996-2021 Glyph & Cog, LLC.

This software includes libpng, which is copyright 2004, 2006-2012 Glenn Randers-Pehrson.

This software is based in part on the work of the Independent JPEG Group.

This software includes libtiff, which is:
Copyright (c) 1988-1997 Sam Leffler
Copyright (c) 1991-1997 Silicon Graphics, Inc.

The PDF data structures, operators, and specification are documented in ISO 32000-2:2020.

About Glyph & Cog

Glyph & Cog designs and implements software for manipulating electronic documents. Current offerings include software libraries, components, and consulting services related to reading, viewing, and converting PDF files.

For more information, visit our web site at