Limitations
PDFdeconstruct has some limitations:
- Complex content: Certain complex PDF constructs are not
included in PDFdeconstruct's output (or are included, but incorrectly
formatted):
- clipping regions
- transparency
- gradient shadings
- rotated text and images
- masked images (binary masks, color-keying, and soft masks)
- and some other more obscure things
- Content ordering: Text is extracted and processed
separately from images and vector graphics. It's possible for a PDF
file to draw some text, then draw some vector graphics that cover up
part of that text, then draw some text that covers up part of the
vector graphics. That sort of content will not be correctly
represented in the PDFdeconstruct output. The content will all be
there, but the ordering relation will not.
- Text extraction issues: PDFdeconstruct converts text to
Unicode. There are some cases (bad font encodings, etc.) where this
is impossible. Roughly speaking, if copy-and-paste from the PDF file
in Adobe Reader fails, then PDFdeconstruct's text extraction will
usually fail as well.