Introduction

Overview

The XpdfText® COM component allows you to extract plain text from PDF files. Text can be extracted from one or multiple pages, and can be written to a file on disk or stored in a buffer in memory. Text can also be extracted from a rectangular region on a page.

The XpdfText component uses Unicode internally. It can provide text in Unicode format, or it can convert to a user-selected encoding.

The XpdfText component also includes all of the XpdfInfo functions for extracting PDF Info dictionary entries.

Supported Platforms

Intellectual Property

The XpdfText COM component and documentation are Copyright 1996-2024 Glyph & Cog, LLC.

The PDF data structures, operators, and specification are documented in ISO 32000-2:2020.

About Glyph & Cog

Glyph & Cog designs and implements software for manipulating electronic documents. Current offerings include software libraries, components, and consulting services related to reading, viewing, and converting PDF files.

For more information, visit our web site at www.glyphandcog.com.