02 Apr 2020

Optical character recognition (OCR) FAQ

Product Filter HighQ Collaborate
Product Area Filter System administration

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed or printed text into machine-encoded text, whether from a scanned document, a photo of a document or from subtitle text superimposed on an image.

HighQ has partnered with ABBYY to provide the OCR service. This is a premium service - for more information, contact your CSM. 

The OCR service is enabled on an individual site by enabling the feature within the site administration section. 

Once enabled, all existing and new documents uploaded to the site will be sent to the OCR server for processing, following a set schedule. Please note that it might take some time for all documents to go through the OCR process. 

Additionally, the status of the OCR(ed) documents can be viewed within the site administration interface. The OCR functionality comes with page numbering count and once the document has gone through the OCR process, the page numbers become visible in document details. 

 

Which document types are supported by OCR?

The supported document types are a combination of:

  • Files that are supported by the document viewer
  • Files that are whitelisted for the instance within the system configuration settings
  • File types supported by the ABBYY server. More information on the file types supported by the ABBYY server, click here: https://help.abbyy.com/en-us/finereader/14/user_guide/formats

 

OCR formatting

Does it auto-rotate/straighten OCRed documents?

No - the current ABBYY implementation does not rotate or straighten OCRed documents. 

Does it only OCR documents that need to be converted?

Specific file types and languages targeted for OCR conversion can be configured on your instance, all matching files are sent for conversion.

Are multipage TIFF documents converted to a single multipage PDF?

When a TIFF document with multiple pages is OCRed, then a single multipage PDF is created. As long as the source file matches the file type and language settings configured on your instance.

 

Is there any impact of OCR on system performance?

OCR is managed by a separate service and is scheduled at a fixed rate (the configuration allows you to change the frequency of documents that are OCRed in a given time) to ensure that the HighQ instance will have little to no impact due to OCR enablement. 

 

Can you force the OCR(ing) of an individual document

There is no option to force the OCR(ing) of an individual document. Once enabled, all documents are sent for OCR and if the OCR quality is poor, there is no option to send the document for OCR other than downloading and adding a new version of the document. This will trigger the OCR of the document again.

Supported languages

The below image shows all languages supported by the OCR service:

Please note that English and Danish are selected by default 

Was this article helpful?