Output Format
PDFdeconstruct generates the following XML output (which may be
written to
multiple files):
- Info: There will be one
XML
info
element, containing the PDF
document-level metadata.
- Outline: There will be one
XML
outline
element, if the -outline
switch was used and the document has an outline. If the PDF file does
not have an outline, the outline
element will not
be present.
- Pages: There will be one
XML
page
element for each page in the PDF file
(possibly constrained by the -f
and -l
options). Each page
element describes the content
of a page.
- Resources: There will be one XML
resources
element, listing all fonts and images used in the PDF document. Both
fonts and images can be reused across multiple pages – each PDF
font or image object is listed only once in
the resources
element.
It also generates separate files for resource data:
- Fonts: Any fonts embedded in the PDF file will be extracted and
placed in the output directory.
- Images: All images in the PDF file will be extracted and placed in
the output directory. Format conversion is controlled by
the
-imagefmt
and -keepjpeg
options.