Augmented Reality (AR) is one of the most popular and challenging fields in computer vision research. It allows supplementing the real world with some kind of digital content, for example, virtual 3D objects. The key feature of Augment Reality in comparison to other image processing tools is that virtual objects are moved and rotated in 3D coordinates instead of 2D image coordinates.
The main objectives of AR are the analysis of changes in the captured camera frames and the correct alignment of the virtual data into the camera scene based on the tracking results. In turn, a marker-based approach provides accurate tracking using visual markers, for instance, binary markers (designed by ARUCO, METAIO, etc.) or photos of real planar objects in a camera scene.
Fig. 1. Binary markers: ARUCO (left) and Metaio (right)
The simplified scheme of the Augment Reality system is as follows
Fig. 2. AR system flowchart
Let’s consider AR system flowchart in details.
At first, we need to have the marker image and extract the consecutive camera frames. The tracking module in flowchart (Fig. 2) is the core of the augmented reality system. It calculates the relative pose of the camera based on correctly detected and recognized marker in the scene. The term “pose” means the six degrees of freedom (DOF) position, i.e. the 3D location and 3D orientation of an object. The tracking module enables the system to add virtual components as a part of the real scene. And since we’re dealing with camera frames in 2D coordinates system, it is necessary to use the projective geometry for virtual 3D object augmentation.
Read also:WebAR: Augmented Reality in Web
Detection and recognition
In the case of tracking by binary marker, the first necessary thing is to print the desired marker and place it in front of the camera. This requirement is an evident drawback of the tracking algorithm.
The algorithm of detection is very simple and based on the marker nature:
- Application of adaptive thresholding to extract edges;
- Extraction of closed contours from the binary image;
- Filtration of contours;
- Contours approximation and detection of quadrilateral-shaped contours.
After the above steps, the marker candidates are stored for further marker recognition.
Each candidate is warped to the frontal view and divided into blocks. The task of the recognition algorithm is to extract binary code from the marker candidate and compare it with the code of the true marker. The most similar candidate is considered a matched marker.
Fig. 3. Scene with a binary marker (left) and detected marker(right)
Fig.3 illustrates an example of the scene and how the detection and recognition of the binary marker is accomplished.
The more advanced tracking algorithm by the photo marker allows getting rid of placing synthetic binary markers in the scene. It is enough just to take a picture of a planar object in a real scene and use it as a marker.
The methods based on the local features are the most common to this task. Good candidates for such tasks are robust SURF  descriptors or one of the binary descriptors: ORB , FREAK , BRIEF , BRISK , or LATCH . The matching of local descriptors is typically done using a common Brute Force matcher or with a more efficient FLANN algorithm. As a result, after matching the data augmentation can be done. A high-level scheme of such a process is given below
Fig. 4. Algorithm of image-based tracking
This method also has some disadvantages. It really resource-intensive over a large number of computations on stages of feature detection, calculation of descriptors and feature matching. Nevertheless, our team has developed a robust tracking algorithm with real-time performance.
The example of the augmentation of a virtual object with a real planar marker in the scene is represented in figure 5.
Fig. 5. Augmentation result
 Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool. “SURF: Speeded Up Robust Features.” Computer Vision and Image Understanding (CVIU).Vol. 110, No. 3, pp. 346–359, 2008.
 Rublee, Ethan, et al. “ORB: an efficient alternative to SIFT or SURF.” Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
 Alahi, Alexandre, Raphael Ortiz, and Pierre Vandergheynst. “Freak: Fast retina keypoint.” Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
 Calonder, Michael, et al. “Brief: Binary robust independent elementary features.” Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 778-792.
 Leutenegger, Stefan, Margarita Chli, and Roland Y. Siegwart. “BRISK: Binary robust invariant scalable keypoints.” Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
 Gil Levi and Tal Hassner, LATCH: Learned Arrangements of Three Patch Codes, IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, March, 2016.