XpdfText

The XpdfText® library/component extracts plain text from PDF files. The PDF file can be on disk or in memory, and likewise, the text can be extracted to memory or directly to disk.

XpdfText can be used in different ways:

The extracted text can be converted to a wide choice of standard encodings, including UTF-8 Unicode, ISO-8859-1 (Latin-1), 7-bit ASCII, and various other language-specific encodings.

The XpdfText library also includes all of the functionality of XpdfInfo.

XpdfText is easy to use:

pdf = new XpdfText.XpdfText pdf.loadFile("input.pdf") ' convert to a text file on disk... pdf.convertToTextFile(1, 5, "output.txt") ' ... or convert in memory s = pdf.convertToTextString(1, 5)

Supported platforms:

See also: For content extraction to XML (instead of plain text), try our PDFdeconstruct tool.

Contact Glyph & Cog for more information, including pricing, documentation, and evaluation copies.