By Oleg Ponomaryov, Tech Lead @It-Jim
1158

Introduction

At WWDC 2022 Apple introduced RoomPlan API for Swift, which allows obtaining room scans using a camera and LiDAR on iPhone and iPad. This might look similar to the Scene Reconstruction API, which was introduced earlier and also uses LiDAR. It produces a polygonal mesh of the environment, which essentially provides information about the shape of the environment. But what if you want to measure the size of a window, or, maybe, to know how an oven and a fridge are placed in a kitchen? That is exactly what RoomPlan is for since the parametric model that it outputs contains the positions and dimensions of the walls, doors, windows, furniture, and household appliances.

In this article we’ll see how one can use the RoomPlan API in their application, explore the output data structure and see some limitations that were discovered after a couple of weeks of extensive testing. Please note, that RoomPlan is still a relatively new API, so certain things could be changed and some issues might be fixed in future updates. The core concepts, however, are unlikely to change, so let’s dive into this new Apple API and see what it is all about.

Getting started with RoomPlan API

The simplest way to integrate RoomPlan API into your application is by using the default RoomCaptureView for Storyboard. There is a certain hierarchy related to it:

While RoomCaptureView handles all visualization and interactions with a user, the actual scanning is done by the RoomCaptureSession, which can be accessed through the corresponding view’s property. And the RoomCaptureSession itself uses the standard ARSession from the ARKit.

Knowing these basic ideas, let’s look at the simplest usage example:

import UIKit
import RoomPlan
 
class RoomPlanViewController: UIViewController, RoomCaptureViewDelegate {
  
   @IBOutlet weak var doneButton: UIButton!
   @IBOutlet weak var exportButton: UIButton!
   @IBOutlet weak var statusLabel: UILabel!
  
   var roomCaptureView: RoomCaptureView!
  
   var finalResults: CapturedRoom?
  
   override func viewDidLoad() {
       super.viewDidLoad()
       roomCaptureView = RoomCaptureView(frame: view.bounds)
       roomCaptureView.delegate = self
       view.insertSubview(roomCaptureView, at: 0)
   }
  
   override func viewDidAppear(_ animated: Bool) {
       super.viewDidAppear(animated)
       roomCaptureView?.captureSession.run(configuration: RoomCaptureSession.Configuration())
   }
   @IBAction func doneButtonPressed(_ sender: UIButton) {
       roomCaptureView?.captureSession.stop()
       doneButton.isHidden = true
   }
  
   func captureView(didPresent: CapturedRoom, error: Error?) {
       statusLabel.isHidden = false
       statusLabel.text = error == nil ? "The scan was successfully completed" :
                                         "An error occurred during the scan: \(error)"
       finalResults = didPresent
       exportButton.isHidden = false
   }
     
   @IBAction func exportButtonPressed(_ sender: UIButton) {
       let filename = "Room.usdz"
       let destinationURL = FileManager.default.temporaryDirectory.appending(path: filename)
       do {
           try finalResults?.export(to: destinationURL)
          
           let activityController = UIActivityViewController(activityItems: [destinationURL], applicationActivities: nil)
          
           // For an iPad:
           if let popoverController = activityController.popoverPresentationController {
               popoverController.sourceView = exportButton
           }
          
           present(activityController, animated: true, completion: nil)
       }
       catch {
           let alertController = UIAlertController(
               title: "Export error", message: "An error occurred during the export:  \(error)", preferredStyle: .alert)
           alertController.addAction(UIAlertAction(title: "OK", style: .default))
           self.present(alertController, animated: true, completion: nil)
       }
   }

Let’s break it down. In viewDidLoad(), we add the RoomCaptureView. We also set our view controller to be the RoomCaptureView’s delegate. In viewDidAppear(), we start the capture session that is accessed through the corresponding view’s property.

From that point, the RoomCaptureView will handle everything. It will start with the special coaching overlay, that tells a user to point the camera to different edges of a wall to initialize the capture session:

When the initial stage is finished, the main scanning routine starts, with captured surfaces and objects being highlighted as well as shown on the miniature 3D room model at the bottom of the screen:

When the user is done scanning, we stop the capturing session in doneButtonPressed(). This will make RoomPlan to post-process the scan to get the final result, which will be presented to the user by the RoomPlanView without requiring any additional actions from our side:

We will also receive the final model in the captureView(didPresent: …, error: …). The model can be then exported to a USDZ file as we do in exportButtonPressed(). There is plenty of software to open it on Mac. There are some dedicated 3D editing tools, but if you’re not used to working with them, the Preview app is probably the easiest way to view the model. And to access parameter values you might also open it in Xcode:

What feedback is provided by the API?

Certain users’ actions or environmental conditions might negatively influence the scan quality. RoomPlan provides feedback on such issues, which includes:

  • Too close to a wall
  • Too far from a wall
  • Moving too fast
  • Not enough light
  • Not enough texture

When using the standard RoomCaptureView, issues are displayed to a user as a text prompt:

Issues with being too close, moving too fast, or not having enough light are easy to reproduce, while other ones seem to be pretty rare and we haven’t encountered them in practice.

One of the fatal errors can also happen, which will be passed to the captureView(didPresent: …, error: …) callback:

  • A device is not supported
  • A device is too hot
  • A scene size limit exceeded
  • A world tracking failure
  • An invalid AR configuration
  • An internal error

Looking into output room models

RoomPlan performs pretty well for standard cases of one average-sized room:

It works even for larger rooms with more furniture:

When comparing scan dimensions to actual measurements they turned out to be accurate enough, with an error usually staying below 5%.

RoomPlan data structure

Now that we know how to run a RoomPlan scan and how results look visually, let’s see what data is actually contained in the output model. The room scan consists of surfaces and objects.

Surfaces

There are 4 types of surfaces:

  1. Wall
  2. Door
  3. Window
  4. Opening

Each surface contains the following set of properties:

  • Confidence
  • Dimensions
  • Transform
  • Normal
  • Curve
  • Completed edges

The confidence is discrete and can take only 3 possible values (low, medium and high). Dimensions contain width and height (depth value is also present, but is always set to 0 since RoomPlan doesn’t consider walls and other surfaces to have any thickness). The transform is represented with a typical 4×4 matrix and the normal is a 3-dimensional vector, so no surprises here. The curve property is for non-flat surfaces and is nil when there is no curvature.

The properties we’ve discussed so far limit a surface to being a rectangle (curved, maybe), so handling a triangular window, for example, would be impossible with the current parametrization of RoomPlan.

Finally, the list of completed edges contains ones that were already scanned by a user. That is an interesting property since it can be used to evaluate how well a user has scanned a room. However, there are some limitations to it. First, only vertical edges can receive the completeness status right now. Second, not all surfaces can have completed edges – windows, for example, do not have them and the list is always empty.

There is also a simple parent-child relationship between surfaces in the form of walls being parents to doors, windows and openings that belong to them.

Objects

RoomPlan can detect a number of different objects:

  1. Bathtub
  2. Bed
  3. Chair
  4. Dishwasher
  5. Fireplace
  6. Oven
  7. Refrigerator
  8. Sink
  9. Sofa
  10. Stairs
  11. Storage
  12. Stove
  13. Table
  14. Television
  15. Toilet
  16. Washer dryer

Like with surfaces, each object has a set of properties:

  • Confidence
  • Dimensions
  • Transform

The properties are similar to the corresponding ones of surfaces, but dimensions now contain 3 proper values. Dimensions and the transform essentially define a bounding box around an object. The bounding box is NOT axis aligned, i.e. it’s orientation matches the object rather than world coordinate axes.

Advanced usage examples

Custom view

For real-world tasks you might want to go beyond the standard RoomCaptureView, so let’s see how to do this. We will make a simple custom view that will color walls and objects according to their confidence value:

import Foundation
import ARKit
import RealityKit
import RoomPlan
 
class CustomCaptureView: ARView, RoomCaptureSessionDelegate {
   let captureSession: RoomCaptureSession = RoomCaptureSession()
   let roomBuilder: RoomBuilder = RoomBuilder(options: [.beautifyObjects])
      
   var delegate: RoomCaptureViewDelegate?
  
   required init(frame: CGRect) {
       super.init(frame: frame)
       initSession()
   }
  
   @MainActor required dynamic init?(coder decoder: NSCoder) {
       super.init(coder: decoder)
       initSession()
   }
  
   func initSession() {
       self.cameraMode = .ar
       captureSession.delegate = self
       self.session = captureSession.arSession
   }
  
   func captureSession(_ session: RoomCaptureSession, didUpdate: CapturedRoom) {
       DispatchQueue.main.async {
           self.scene.anchors.removeAll()
          
           for wall in didUpdate.walls {
               self.drawBox(scene: self.scene, dimensions: wall.dimensions,
                            transform: wall.transform, confidence: wall.confidence)
           }
          
           for object in didUpdate.objects {
               self.drawBox(scene: self.scene, dimensions: object.dimensions,
                            transform: object.transform, confidence: object.confidence)
           }
       }
   }
  
   func drawBox(scene: Scene, dimensions: simd_float3, transform: float4x4, confidence: CapturedRoom.Confidence) {
       var color: UIColor = confidence == .low ? .red : (confidence == .medium ? .yellow : .green)
       color = color.withAlphaComponent(0.8)
      
       let anchor = AnchorEntity()
       anchor.transform = Transform(matrix: transform)
      
       // Depth is 0 for surfaces, in which case we set it to 0.1 for visualization
       let box = MeshResource.generateBox(width: dimensions.x,
                                          height: dimensions.y,
                                          depth: dimensions.z > 0 ? dimensions.z : 0.1)
      
       let material = SimpleMaterial(color: color, roughness: 1, isMetallic: false)
      
       let entity = ModelEntity(mesh: box, materials: [material])
       anchor.addChild(entity);
      
       self.scene.addAnchor(anchor)
   }
  
   func captureSession(_ session: RoomCaptureSession, didEndWith data: CapturedRoomData, error: (Error)?) {
       Task {
           let finalRoom = try! await roomBuilder.capturedRoom(from: data)
           delegate?.captureView(didPresent: finalRoom, error: error)
       }
   }
}

The custom view is derived from the ARView. At the initialization, we create a RoomCaptureSession and provide the ARSession from the ARView to it. We also set our custom view as the capture session’s delegate to get access to some useful callbacks. In the captureSession(…, didUpdate: …) callback the capture session provides us with the current state of the room scan, from which we take walls and objects to draw them using the drawBox() method, which simply creates an anchor with a single box mesh of the specific color for each wall or object.

This is a somewhat simplified approach since we remove all entities and draw them from scratch on each update. One could also use captureSession(…, didAdd: …), captureSession(…, didChange: …) and captureSession(…, didRemove: …) to keep track of each entity, create smooth transition animations when its state changes, and so on, but as a basic example our simple approach should suffice.

Our custom view mimics the default one by also having the captureSession property that can be used to start and stop scanning and by accepting a RoomCaptureViewDelegate. In the captureSession(…, didEndWith: …) that is called after scanning has ended we get the final room scan, post-process it with a RoomBuilder and then pass it to the delegate’s captureView(didPresent: …, error: …) callback. Thanks to this, it is easy to replace the default view with the custom one in the   code that we’ve already explored earlier. The visualization using our custom view will look like this:

Additional data

The RoomPlan parametric model is pretty helpful, but what if you need something more, like maybe detecting some object class that is not supported by default? Such a task would require access to the source point cloud captured during the scan to run some 3D object detection model on it and, luckily, this is possible. As we’ve seen previously, RoomCaptureSession uses a standard ARSession, so one can get all the data that is typically available from it while performing a room scan. The data can be obtained from an ARFrame using the ARSession’s session(ARSession, didUpdate: ARFrame) callback and includes camera poses, LiDAR depth maps and RGB frames, which are enough to get a point cloud of an environment. You can read more about dealing with LiDAR data in our blog.

As an example, we’ve extracted a point cloud and rendered it on top of the RoomPlan scan:

There are some inconsistencies between point clouds from different frames, so for a real-world task one would need to perform at least a basic stitching of them, but the current result should be enough for our simplified example.

Limitations and failure cases

Multiple room

Apple itself advises against scanning either very large rooms or multiple rooms at once. The reasons for this are understandable, like higher chance of tracking drift during long scans or simply an overheating of a device. Here we can see an extreme case of scanning the whole house at once and problems are pretty obvious:

However, there is another issue that is rooted in the parametrization of the RoomPlan’s model. As we’ve already discussed, currently it assumes that all surfaces including walls have 0 thickness. So, when there are multiple rooms scanned, the thickness of a wall between them wouldn’t be properly considered by the RoomPlan and would be either spread between both of the rooms’ dimensions or go into one of them completely, increasing the measurement error.

This can be observed by comparing a point cloud of a thick wall and its scan produced by the RoomPlan. The detected wall surface was put on the one side of the actual wall, making one of the rooms on the scan much larger than it really is:

Another issue is that RoomPlan doesn’t handle ceiling and floor in any way, assuming them to be defined by top and bottom edges of walls. That makes it impossible to properly scan multiple floors at once, since the output model would have furniture from the upper floor just hanging in the air:

Odd surface shapes

Another issue with the RoomPlan’s parameterizations is that it assumes all surfaces to be rectangles (possibly with some curvature, but nothing beyond this). As we can see in this examples, it failed with both an angled top edge of a wall and a rounded top edge of an opening (and it missed a small window, by the way):

Here is a couple other examples with a window and a door:

Phantom objects

RoomPlan can sometimes see something that isn’t actually there, plain and simple. It’s surprising that this happens with the LiDAR, that should give accurate enough depth measurement and avoid such cases, but we’ve encountered this in practice multiple times:

Limited usage scope

As Apple states, the RoomPlan is aimed for residential rooms, so it’s very likely to fail with industrial premises. But it is somewhat limited even for ordinary households. For instance, the list of supported classes for object detection is not that extensive, so the RoomPlan cannot detect some appliances, like this water boiler:

Opened doors

Currently opened doors are most often misclassified as just openings. And if you have a 2-sided door with only one half being closed, the RoomPlan is likely to detect it as two separate doors or a door and an opening:

Thus, it is recommended to close all doors before scanning.

Honorable mentions

There are some other problems that RoomPlan doesn’t handle very well right now. Like mirrors, for example, which are challenging in the field of 3D reconstruction in general. RoomPlan doesn’t seem to be affected by smaller ones, but large mirrors can even make it crash.

Here a combination of a window and door leading to a balcony wasn’t handled very well:

And here, for example, window boundaries are wrong and walls edges also don’t look accurate enough (the last one is probably caused by the ceiling molding):

Where can we use RoomPlan?

Probably the most obvious use cases of the new API are home repair, interior design, furniture retail and so on. Apple RoomPlan applications might be used to estimate the amount of materials needed for a room renovation, visualize how a room would look after renovation or show how a new sofa will fit in.

A general knowledge about a room layout is also useful for certain generic AR scenarios, and it is very likely that this technology would be somehow involved in Apple’s upcoming AR headset.

What’s coming next?

As was mentioned in the introduction, both RoomPlan API and iOS 16 are still relatively new releases, so we’re definitely going to see some accuracy improvements and bug fixes. Updates of this kind are most likely to come in the near future.

However some more significant changes could be introduced in the next few years when the Apple Roomplan API will evolve. Currently the most promising direction seems to be the support of non-rectangular surfaces, since examples with odd shapes are pretty common. That, however, would require reparametrization of the model, i.e. defining surfaces with a set of corners and edges between them, with each edge having its own curvature, so such an update is unlikely to come during the first year of the API’s lifetime.

Another useful update would be to improve RoomPlan’s performance when scanning multiple rooms, which includes adding measurements of walls’ thickness and maybe even defining floors and ceilings to support scanning of multiple floors. This would be easier to implement without floors, but with them it will also require some changes to the structure of the output model.

 

Apple RoomPlan: Is There a Room for Improvement?
Tagged on: