OCR + Searchable PDFs

Posted Nov 29, 2011 in Features

We’re excited to finally be able to offer our customers an OCR solution that is quick, accurate and easy to use. We partnered with OpenText Corporation, as announced here, which allows us to leverage their RecoStar OCR engine inside ReadySuite. Our OCR functionality will support various languages, searchable-PDF creation, and several image enhancement tools, such as auto-rotation, deskew, half-tone removal, and more.

OCR Options

Upon opening the ‘OCR Documents’ wizard, which handles the output of OCR into text files, users will be presented with several export options. For starters, OCR can be saved inline (that is, in the same folder) with its image file counterpart or exported to a new folder. If files are being saved in a new folder, customization options of both the folder and filename structure is available. OCR can be generated on a per-document or per-page basis. And we’ve included an option to only run the OCR process on documents that do not already have it.

OCR Engine

When it comes to producing highly accurate results and factoring in the performance of the OCR translation process, we’ve added several options to tweak the output. For starters, if you know the target language, you can choose from over 30 supported languages. Cleanup and pre-processing options consist of: Auto-Invert, Auto-Rotate, Auto-Border Crop, Conversion to Binary (B&W), Deskew, Despeckle, and Halftone Removal.

The output encoding is adjustable, allowing for ASCII, UTF-7, UTF-8 and Unicode. If you need to add specific page markers to the bottom of each page, such as the bates number and/or page number, we’ve provided that as an option.

OCR Create Searchable PDFs

On the searchable-PDF creation side, we built the ‘Create Searchable PDFs’ wizard to handle this task. Once you have your data imported – which can be any combinations of existing PDFs, single-page or multi-page TIFF files, and other bitmap files such as JPGs – you’ll just need to specify an output folder.

As outlined above, the same cleanup and pre-processing options are available when generating searchable PDFs. Further, you can choose to export PDF/A compliant files if necessary.

We’re hopeful to have a release ready including this functionality in the next couple of weeks.