AR tracking with different markers
Augmented Reality (AR) is one of the most popular and challenging fields in computer vision research. It allows to supplement real world with some kind of digital content, for example, virtual 3D objects. The key feature of Augment Reality in comparison to other image processing tools is that virtual objects are moved and rotated in 3D coordinates instead of 2D image coordinates.
The main objectives of AR are analysis of changes in the captured camera frames and correct alignment of the virtual data into the camera scene based on the tracking results. In turn, a marker-based approach provides the accurate tracking using visual markers, for instance, binary markers (designed by ARUCO, METAIO, etc.) or with photo of real planar objects in camera scene.
Fig. 1. Binary markers: ARUCO (left) and Metaio (right)
The simplified scheme of Augment Reality system is as follows
Fig. 2. AR system flowchart
Let’s consider AR system flowchart in details.
At first, we need to have the marker image and extract the consecutive camera frames. The tracking module in flowchart (Fig. 2) is the core of the augmented reality system. It calculates the relative pose of the camera based on correctly detected and recognized marker in the scene. The term “pose” means the six degrees of freedom (DOF) position, i.e. the 3D location and 3D orientation of an object. The tracking module enables the system to add virtual components as a part of the real scene. And since we’re dealing with camera frames in 2D coordinates system, it necessary to use the projective geometry for virtual 3D object augmentation.
Detection and recognition.
In the case of tracking by binary marker, the first necessary thing is to print the desired marker and place it in front of the camera. This requirement is an evident drawback of the tracking algorithm.
The algorithm of detection is very simple and based on the marker nature:
- Application of adaptive thresholding to extract edges;
- Extraction of closed contours from binary image;
- Filtration of contours;
- Contours approximation and detection of quadrilateral shaped contours.
After above steps the marker candidates are stored for the further marker recognition.
Each candidate is warped to the frontal view and divided on blocks. The task of recognition algorithm is to extract binary code from the marker candidate and compare it with the code of true marker. The most similar candidate is considered as a matched marker.
Fig. 3. Scene with binary marker (left) and detected marker(right)
Fig.3 illustrates an example of the scene and how the detection and recognition of the binary marker is accomplished.
The more advanced tracking algorithm by the photo marker allows to get rid of placing synthetic binary markers in scene. It is enough just to take a picture of planar object in real scene and use it as marker.
The methods based on the local features are the most common to this task. Good candidates for such tasks are robust SURF  descriptors or one of binary descriptors: ORB , FREAK , BRIEF , BRISK  or LATCH . The matching of local descriptors is typically done using a common Brute Force matcher or with more efficient FLANN algorithm. As a result, after matching the data augmentation can be done. A high-level scheme of such process is given below
Fig. 4. Algorithm of image-based tracking
This method also have some disadvantages. It really resource intensive over the large number of computation on stages of feature detection, calculation of descriptors and feature matching. Nevertheless, our team has developed a robust algorithm of tracking with real-time performance.
The example of augmentation of virtual object with real planar marker in the scene is represented in figure 5.
Fig. 5. Augmentation result
 Bay, H., A. Ess, T. Tuytelaars, and L. Van Gool. “SURF: Speeded Up Robust Features.” Computer Vision and Image Understanding (CVIU).Vol. 110, No. 3, pp. 346–359, 2008.
 Rublee, Ethan, et al. “ORB: an efficient alternative to SIFT or SURF.” Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
 Alahi, Alexandre, Raphael Ortiz, and Pierre Vandergheynst. “Freak: Fast retina keypoint.” Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
 Calonder, Michael, et al. “Brief: Binary robust independent elementary features.” Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 778-792.
 Leutenegger, Stefan, Margarita Chli, and Roland Y. Siegwart. “BRISK: Binary robust invariant scalable keypoints.” Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
 Gil Levi and Tal Hassner, LATCH: Learned Arrangements of Three Patch Codes, IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, March, 2016.