Extracting Text

<< Click to Display Table of Contents >>

Navigation:  Review  >

Extracting Text

The tool will generate text of images and documents without embedded text. Several features use this generated text such as:

Keyword searching

Productions (exporting a separate .txt file for each document)

Enabling AI Search

To access the functions:

Right-click on the filter tree and select "Extract Text..."

clip0234

Right-click one or more selected documents and select "Extract text..."

clip0099

In the  dialog choose the option needed for the data set:

clip0245

Make production images searchable - create text searchable PDFs for export either in a production or as individual documents (see Exporting Documents)

Extract and save text - for productions requiring a text file for each document

Enable AI Search - turn this on to generate AI Search indexes

Overwrite exiting text - for use when existing text should be replaced (all other options will skip documents for which text has already been extracted)

Attempt forced OCR of PDFs - certain PDFs are delivered with an overlaid text file (usually a form or a produced PDF with a Bates number overlaid) in which only the text overlay is indexed.  This option forces the tool to use Optical Character Resolution of the PDF image and not extract the overlaid text.

When data sets involve additional languages the OCR engine can be optimized to recognize those languages.

To monitor the progress go to the Process tab and select the View Jobs interface.

After the tool completes extracting text where available from the documents, any document where text was not extracted will be OCR'ed and the results from that OCR will be added to the Index.

clip0038