Automatic document analysis and recognition

Automatic document analysis and recognition is a hot topic in a modern computer vision. A common scenario is when the user takes a picture by mobile phone or tablet and the goal is to automatically parse and recognize content from the captured document. Such like pictures, tables, text data, links, etc. There are several challenges in this case: geometric distortions of the paper, varying illumination, occlusions.

Nevertheless, we have built the page unwrapping engine, which successfully handles above problems.

Developed algorithm contains several key components: Preprocessor, Feature Extractor, Geometric Model Estimator and Refiner.

Preprocessor performs image filtering, roughly locates the document boundaries and extracts vertical and horizontal spans (lines and text regions).

Image analysis and recognition. Picture 1 Image analysis and recognition. Picture 2 Image analysis and recognition. Picture 3

Feature Extractor module performs parsing of document content. For robustness, we have fused several types of features including corners and edges from images preprocessing in different ways.

Recognition technology - Feature Extractor module. Picture 1 Recognition technology - Feature Extractor module. Picture 2

All selected spans

Keypoints after sampling

Output nonlinear grid of point is used as an input for the next system component: Geometric Model Estimator.

In order to properly account nonlinear geometric distortions of the document, we have constructed specific 2D-3D model, where page shape is reconstructed as 3D surface together with the 6 DoF camera position in 3D space.

Document recognition technology - 3D model illustration Image analysis and recognition - model types
3D model illustration Model types

We have used 3 different polynomial models (1D along axis, 1D piecewise and 2D over points grid).

Finally, after estimation of model parameters we are ready to dewarp the document and correct the most of existing distortions.

Here is a high-level diagram of solution:

Image correction and document recognition algorithm

Firstly, horizontal and vertical linear spans are extracted. After that several groups are located and prepared for geometric model estimation. After estimation of model parameters, initial document correction is accomplished. Finally, a Refiner module is used. At this step additional corrections are applied across horizontal/vertical lines along dense sampling grids.

Here is an example how Page Unwrapper works.

Speech recognition to word document

Page Unwrapper