pdfExtractTextFromRect2
Extract text from a rectangular region.
char *pdfExtractTextFromRect2(PDFHandle pdf, int page,
double x0, double y0, double x1, double y1,
int *length)
This function extracts text from a rectangular region on a page, and
returns the resulting text in a string.
The rectangle is defined by two opposite corners: (x0,
y0)
and (x1, y1)
. The coordinates are in PDF
coordinate space.
pdfExtractTextFromRect2
returns a string if successful, or
NULL if text extraction is prohibited by this PDF file.
The string is returned, and *length
is filled in with the
string length. The string will be zero-terminated, but it may contain
zero bytes, depending on the current text encoding (see
pdfSetTextEncoding
). The caller is responsible for freeing the
string with the pdfFreeMemory
function.
This function is identical to pdfExtractTextFromRect
except that
it takes points in PDF coordinate space.
See the "Setting parameters" section in the function list for settings that affect text extraction.
C:
char *buf;
int length;
/* extract a rectangle 4" from the left side, 1" up from
* the bottom, 2" wide, 0.5" high, on page 1 */
if (!(buf = pdfExtractTextFromRect2(pdf, 1, 4*72, 1*72, 6*72, 1.5*72, &length))) {
/* handle the error */
}
...
pdfFreeMemory(buf);