getNumVisibleChars
Get the number of visible chars on the most recent page.
getNumVisibleChars([out, retval] int *n)
This function returns the number of visible characters on the most
recently converted page or region, i.e., the last page from the last
call to
convertToTextFile
, convertToTextString
,
extractTextFromRect
, extractTextFromRect2
, buildWordList
,
or buildWordListFromRect2
.
This function, along with getNumInvisibleChars
and
getNumRemovedDupChars
, are useful for detecting problematic scanned
pages. In "electronic" (non-scanned) PDF files, all of the text will
be visible, and there will be zero invisible characters. In most
cases, removed duplicate characters occur in "fake boldface" text, and
the number of removed duplicates is small. Invisible characters are
used in scanned PDF files, where invisible OCR text is overlaid on top
of the scanned image. If an electronic PDF file is OCRed, it can end
up with both visible and invisible characters.
VB:
nVis = pdf.getNumVisibleChars()
nInvis = pdf.getNumInvisibleChars()
nDup = pdf.getNumRemovedDupChars()