By Ruslan Timchenko, CV engineer @It-Jim
9420

Apple events always amaze the entire world and 2020 was not the exception. Apple presented the first mobile devices equipped with LiDAR: iPad Pro 11 and iPhone 12 Pro (and PRO max version). This active sensor measures physical distances to the objects on a spatial two-dimensional grid. Nowadays it is widespread in the automotive area for object detection and collision avoidance.

How can developers and computer vision engineers use LiDAR in their work? With a lack of technical documentation, there is no other way to answer that question except for making own experiments. In this post, we are going to show you how to create a logger to retrieve data from the iPhone’s LiDAR and an experiment on the accuracy of distance measurements with this scanner. If you want to follow our steps, you’re going to need an iPhone 12 PRO LiDAR, a ruler, a tape measure, and some spare time.

Source: https://www.forbes.com/

 

iOS Logger Application

First things first. In order to play with the LiDAR data, we need to store it somehow. For this purpose, we created a basic logger application that saves RGB camera frames and depth maps obtained from the scanner.

Screenshot of Logger Application

To use a logger, you need to follow four main steps:

  1. Configuring and starting ARSession,
  2. Capturing RGB and depth frames,
  3. Getting distances to the objects,
  4. Saving the results.

Let’s have a closer look at each of them. 

Step 1: Configure and Start ARSession

The logger is based on ARSession. It combines data from cameras and motion-sensing hardware to fill an ARFrame object. The latter contains all the necessary information.

Sensor data storage principle

First of all, we import the ARkit framework into our project. Then we create the ARSession and set up its configuration. This configuration consists of a set of options to enable or disable sensors or tell ARKit how to process given data for a better user experience. As default settings, we choose ARWorldTrackingConfiguration, which tracks changes in translation and rotation of the device (6 Degrees of Freedom).

import UIKit
import ARKit
import Zip

class ViewController: UIViewController, ARSessionDelegate{
    var session: ARSession!
 
    override func viewDidLoad() {
        super.viewDidLoad()

        session = ARSession()
        session.delegate = self
    }
    
    override func viewWillAppear(_ animated: Bool) {
        super.viewDidAppear(animated)

        let configuration = setupARConfiguration()
        session.run(configuration)
    }  

    func setupARConfiguration() -> ARConfiguration{
        let configuration = ARWorldTrackingConfiguration()

	  // add specific configurations
	  ...
	  return configuration
    	}
}

Since we want to get depth data from LiDAR, we need to check whether our device supports this sensor and enable its flag ‘.sceneDepth’ in ARConfiguration.


func setupARConfiguration() -> ARConfiguration{
    let configuration = ARWorldTrackingConfiguration()

    // add specific configurations
    if ARWorldTrackingConfiguration.supportsFrameSemantics(.sceneDepth) {
        configuration.frameSemantics = .sceneDepth
    } 

    return configuration
}

ARSession is ready.

Step 2: Capture RGB and Depth Frames

The next step is to capture ARFrame at the specific moment. For this purpose, we added UIButton “SaveFrame” on the display. By clicking on it, you receive the current ARFrame with full information from enabled sensors from ARSession. 

@IBAction func onSaveFrameClicked(_ sender: Any) {
    	if let currentFrame = session.currentFrame {
        	let frameImage = currentFrame.capturedImage
        	let depthData = currentFrame.sceneDepth?.depthMap

	      // Process obtained data
            ...
      }
}

This code loads the RGB frame and depth map as ‘CVPixelBuffer’ objects. Additionally, ‘sceneDepth’ contains a confidence map. You might want to take a closer look at it since depth data can be incorrect in the case of surfaces with varying reflectivity. 

Let’s now move to prepare the RGB frame and depth map for saving. For that, we convert pixel buffers into ‘UIImages’ in almost the same way. As depth is supported by the limited number of devices, it is an optional type.

if let currentFrame = session.currentFrame {
       ...
      // Process obtained data
      // Prepare RGB image to save
      let imageSize = CGSize(width: CVPixelBufferGetWidth(frameImage),
                            height: CVPixelBufferGetHeight(frameImage))
      let ciImage = CIImage(cvPixelBuffer: frameImage)
      let context = CIContext.init(options: nil)
        	
      guard let cgImageRef = context.createCGImage(ciImage, from: CGRect(x: 0, y: 0, width: imageSize.width, height: imageSize.height)) else { return }
      let uiImage = UIImage(cgImage: cgImageRef)

      // Prepare normalized grayscale image with DepthMap
      if let depth = depthData{
           let depthWidth = CVPixelBufferGetWidth(depth)
           let depthHeight = CVPixelBufferGetHeight(depth)
           let depthSize = CGSize(width: depthWidth, height: depthHeight)
		
	     ...

           let ciImage = CIImage(cvPixelBuffer: depth)
           let context = CIContext.init(options: nil)
           guard let cgImageRef = context.createCGImage(ciImage, from: CGRect(x: 0, y: 0, width: depthSize.width, height: depthSize.height)) else { return }
           let uiImage = UIImage(cgImage: cgImageRef)
}

While the size of the RGB frame is 1920×1440, the depth map is quite small, only 192×256.

However, even a small resolution of a lidar depth map could be very helpful in object detection or background subtraction tasks.

Step 3: Distances to the Object

Depth UIImage is a normalized grayscale image. Distances are encoded in brightness, the closest objects are dark, while the further ones are light. 

While for some tasks it is enough to have relative distances, we need to get the real physical values in our case. To get LiDAR distances in meters, we need to read CVPixelBuffer as Float32. In the code below, we fill a 2-dimensional array ‘distancesLine’ with raw depth data.

 

if let depth = depthData{
    let depthWidth = CVPixelBufferGetWidth(depth)
    let depthHeight = CVPixelBufferGetHeight(depth)
    CVPixelBufferLockBaseAddress(depth, CVPixelBufferLockFlags(rawValue: 0))
    let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(depth),
				 		to: UnsafeMutablePointer<Float32>.self)
           	 
    for y in 0...depthHeight-1{
        var distancesLine = [Float32]()
        for x in 0...depthWidth-1{
            var distanceAtXYPoint = floatBuffer[y * depthWidth + x]
            distancesLine.append(distanceAtXYPoint)
            print("Depth in (\(x),\(y)): \(distanceAtXYPoint)")
        }
        depthArray.append(distancesLine)
    }     	
		...

Depth data spans from floatBuffer[0] up to [height * width]. In our case, there are 192 rows and 256 columns, 49152 elements in total. Keep in mind that floatBuffer is just a pointer to the memory address with depth information. Like in C++, the pointer does not know anything about the real size of the depth array, so you can easily go out of the limits without any warning. 

Step 4: Save Results

Finally, we need to save our results to the device folder to have an opportunity to analyze them. The following auxiliary code creates a folder, gets its path, and clears the folder in case it was built before.

 

func getTempFolder() throws -> URL {
    let path = try FileManager.default.url(for: .documentDirectory, in: .userDomainMask, appropriateFor: nil, create: true
).appendingPathComponent("tmp", isDirectory: true)
        
    if (!FileManager.default.fileExists(atPath: path.path)) {
       do {
           try FileManager.default.createDirectory(atPath: path.path, withIntermediateDirectories: true, attributes: nil)
       } catch {
               print(error.localizedDescription);
       }
    }
    return path
}


func clearTempFolder() {
    let fileManager = FileManager.default
    let tempFolderPath = try! getTempFolder().path
    do {
        let filePaths = try fileManager.contentsOfDirectory(atPath: tempFolderPath)
        for filePath in filePaths {
            try fileManager.removeItem(atPath: tempFolderPath +    
                                               filePath)
        } 
    } catch {
            print("Could not clear temp folder: \(error)")
    }
}

The folder should be created when setting up our ARSession.

override func viewDidLoad() {
    	super.viewDidLoad()
   	 
    	clearTempFolder()
   	 
    	session = ARSession()
}

Now, we can save images and depth array as a .txt file. 

// Save image (the same for depth)
let imagePath = try! getTempFolder().appendingPathComponent("\(frames.count).jpg")
try! uiImage.jpegData(compressionQuality: 0.9)?.write(to: imagePath)


// Save depth map as txt with float numbers
var depthTxtPath=try! getTempFolder().appendingPathComponent("\(frames.count)_depth.txt")
let depthString:String = getStringFrom2DimArray(array: depthArray, height: depthHeight, width: depthWidth)
try! depthString.write(to: pathTxt, atomically: false, encoding: .utf8)


// Auxiliary function to make String from depth map array
func getStringFrom2DimArray(array: [[Float32]], height: Int, width: Int)->String{
    var arrayStr: String = ""
    for y in 1...height-1{
   	 var lineStr = ""
   	 for x in 1...width-1{
   		 lineStr += String(array[y][x])
   		 if x != width-1{
   			 lineStr += ","
   		 }
   	 }
   	 lineStr += "\n"
   	 arrayStr += lineStr
    }
    return arrayStr
}

LiDAR Experiments

Now we are ready to compare the measured depth by LiDAR with real physical distances to objects. ARKit documentation suggests avoiding highly reflective or light-absorbing surfaces. The company’s poster meets these requirements perfectly, so we used it as a target.  We fixed a smartphone with a tripod and centered the object on the screen. 

Scene configuration

As long as we were dealing with a flat and distributed object, it was enough to take the data just from the central pixel of the depth map. We recorded the LiDAR data for distances from 20 cm up to 5.5 m. We used a 5 cm measurement step for distances up to 1 m and a 50 cm step for the larger ones. The distance was captured by LiDAR a few times to evaluate the repeatability of the results. Here is what we obtained.

Experimental results

One can see that LiDAR provides reasonable accuracy for distances of up to 4 meters which is sufficient for portrait mode and short-range AR. The higher distances provided different results for every button click (see error bars in the figure above). This indicates the limits for the iPhone’s LiDAR operation range. 

Summary

Definitely, the iPhone Lidar sensor is interesting to play with. We hope that provided code snippets and findings will be useful for those who are willing to examine the new LiDAR sensors. Our iPhone experiments are definitely to be continued. Stay tuned!

iPhone’s 12 PRO LiDAR: How to Get and Interpret Data
Tagged on: