Automatic document analysis and recognition

The task of automatic document analysis and recognition is very common in everyday life. Basically, every time when a user needs to automatically parse and recognize some content from a picture captured with a mobile phone/tablet or a scanned document – for example, text, tables, links, etc., automatic document recognition and text analysis come to the stage.

Project Description

Within this project, we needed to build an engine that would help our client to automate the processing of medical forms. The workflow of medical institutions requires handling a large number of paper documents which, if done manually, is very time consuming and leaves room for errors in data entry. Now, let’s imagine we have a completed paper form and we want to get a digital document with proper fields and recognized data just by taking a picture of it with a mobile phone. This task is quite challenging due to the geometric distortions of the paper and varying illumination. Nevertheless, we have built the page unwrapping engine, which performs the image analysis and successfully handles the above problems. 

Although machine learning is often used for document structure recognition, our solution is based solely on classical computer vision approaches. Below we describe the first part of the project aimed at replacing real lines from the captured document with the synthetic perfectly aligned ones.

Document Recognition Technology

The developed algorithm contains several key components: Preprocessor, Feature Extractor, Geometric Model Estimator and Refiner.

The preprocessor performs image filtering, roughly locates the document boundaries and extracts vertical and horizontal spans (lines and text regions).

Feature Extractor module performs parsing of document content. For robustness, we have fused several types of features including corners and edges from images preprocessing in different ways.

Image analysis and recognition. Picture 1 Image analysis and recognition. Picture 2 Image analysis and recognition. Picture 3
Recognition technology - Feature Extractor module. Picture 1 Recognition technology - Feature Extractor module. Picture 2

All selected spans

Keypoints after sampling

The output nonlinear grid of point is used as an input for the next system component: Geometric Model Estimator.

In order to properly account for nonlinear geometric distortions of the document, we have constructed a specific 2D-3D model, in which the page shape is restored as a three-dimensional surface together with the 6 DoF camera position in three-dimensional space.

Document recognition technology - 3D model illustration Image analysis and recognition - model types
3D model illustration Model types

We have used 3 different polynomial models (1D along axis, 1D piecewise and 2D over the point grid).

Finally, after the estimation of the model parameters, we are ready to dewarp the document and correct the most of the existing distortions.

Here is a high-level diagram of the solution:

Image correction and document recognition algorithm

The algorithm steps include:

  • Extraction of horizontal and vertical linear spans. 
  • Extraction of several groups of local features. 
  • Document correction based on estimated parameters of the geometrical model of the page.
  • Compensation of residual image distortions for both vertical and horizontal directions using a dense sampling grid. 

Here is an example how Page Unwrapper engine works.

Speech recognition to word document

Business Value

The developed technology delivers diverse functions for automated document analysis and recognition that can be implemented in a wide range of applications such as scanning, search, sorting, information management etc.

Page Unwrapper