Computer vision (CV) and machine learning (ML) algorithms solve a tremendous amount of problems. However many businesses often do not understand what hardware to choose for running your favorite neural net or some advanced image and video processing pipelines. With this blog post, we start a series of articles about embedded vision and specific practical things you need to know before making your choice.
Embedded, USB stick, and single-board computers
An embedded computer (in a narrow sense) is a computer typically found inside your car, router or washing machine. For sure, SpaceX Dragon 2 or Boeing 777 have even more serious on-board computers. An embedded computer (Fig. 1) is typically a printed circuit board, or sometimes a single chip, which has rather specialized input-output connectors and typically does not have any of the more usual connectors like USB, Ethernet or HDMI. It cannot be easily used outside of the larger piece of hardware (you car, router, etc.), thus it is called “embedded”.
Fig. 1. Embedded computers.
Sometimes the word “embedded”, in a broader sense, is also applied somewhat wrongly to USB-stick and single-board computers. USB-stick computers (Fig. 2) include Google Coral Stick and Intel Neural Compute Stick 2 (previously known as “Movidius”), the two principal USB deep learning accelerators. There are many other USB-stick devices, but they are not very interesting for CV/ML.
Fig. 2. Google Coral Stick (left) and Intel Neural Computer Stick 2 (right), the two competing USB deep learning accelerators.
Finally, single-board computers (Fig. 3) include the Raspberry Pi series, Google Coral, Nvidia Jetson series, and many others. Such computers are typically a single printed circuit board (sometimes in a box), but unlike the embedded computers, the single-board ones have output ports like USB, HDMI, and Ethernet, and sometimes also Bluetooth and Wi-Fi units, so that you can use them as small desktop computers. Some of them (Nvidia Jetson Xavier) come in both embedded and single-board versions.
Fig. 3. Single-board computers: Raspberry Pi 4 (UL), Google Coral (UR), Nvidia Jetson Nano (LL), Nvidia Jetson Xavier (LR).
These three classes of devices can run various operating systems (OSes). In this blog, we mainly focus on devices running some sort of Unix/Linux, as opposed to Windows, Android, or non-Linux embedded OSes found in some hardcore embedded devices.
Computer Vision and Machine Learning on Embedded Computers
How to use single-board computers for computer vision and machine learning (other devices, i.e. embedded, have some differences we are not going into)? Or for that matter, how to use them for anything at all? Let’s start with the easier second question:
- If you have something like Raspberry Pi, just think of it as a small desktop computer. Connect the power, monitor, keyboard, mouse, and either ethernet or Wi-Fi. Then use your device as a Linux computer with a GUI desktop. You will have to have basic Linux skills, of course, and the Linux versions for single-board computers are somewhat different from what you are used to on PCs, for example, you will hardly ever see heavyweight desktops like Gnome 3 or KDE.
- Some more primitive devices are “strictly headless”, i.e. they cannot use the monitor and often have no GPU. Moreover, even devices like Raspberry Pi are often used in a headless mode (and with a headless Linux) in order to maximize performance and to use less disk space. You can still use a monitor if you want, but only with a text-mode Linux console and no GUI. Using headless devices typically requires some sort of connection (Internet, serial, or USB), so that you can log in to your headless device via SSH from your computer. Of course, you must be fluent in the Linux command line (typically bash).
- But wait, we did not install Linux yet! How do we do that? It depends. Raspberry Pi series (and the copycats such as Orange Pi) is the simplest: You use a standard micro-SD card (like the one in your smartphone or digital camera) as your one and only hard disk. What does it mean? You guessed right, to install Linux simply download the desired Linux image and burn it onto the SD card in your laptop. Make sure you buy the biggest and fastest SD card you can find! Other devices, however, have built-in SSD drives. It means you have to use some sort of USB connector to install the OS from the host computer, often with instructions like “push a hidden button with a pencil when powering on the device”.
- Things are more “interesting” for the Nvidia Jetson series. On one hand, Jetsons usually ship with pre-installed Linux. But if you have to reinstall, the real fun starts. You need to install something called “Nvidia SDK Manager” on your laptop or desktop PC (known as “host”), which is available only for “Ubuntu Linux x64 version 18.04 or 16.04”. Not only Windows, macOS, or non-Ubuntu Linux users are excluded, but who runs something as ancient as Ubuntu 18.04 nowadays? Moreover, you will typically be asked for Nvidia GPU with the latest drivers (which excludes host PCs without Nvidia GPU) and a very particular version of CUDA (which is never the same as the version you have on your PC). We are not sure though that the latter applies to all possible Nvidia SDK Manager versions. In the end, using Docker (on a Linux host) seems the only solution that works, and it is far from easy (you have to turn on GUI, GPU, and USB support in Docker).
- Compared to the PCs, single-board computers (especially the cheaper or older ones) tend to be painfully slow and with very limited resources. They also overheat easily.
OK, suppose we managed to get our device up and running, and installed some sort of Linux on it. What next? We are all programmers. We like to write code. How do you do that on single-board computers?
- C and C++ are typically supported (although the hardcore embedded devices might have C only). This includes the usual Linux C/C++ infrastructure of gcc, cmake, gdb, etc. Use the package manager of your OS (apt for the Ubuntu-Debian family) to install packages. Many common C/C++ libraries are also available via apt, although often in very outdated versions.
- If some library is missing (or the apt-provided version is too old), you will have to build it from the source, which can take many hours on a single-board device.
- As an alternative to building libraries and your own projects on-device, you can cross-compile for Raspberry PI or whatever on your PC, which requires setting up a toolchain with libraries and other things, which is not easy. You will only need cross-compiling skills if you are a professional embedded developer. Beginners should build on-device instead if possible.
- Our small devices are typically too weak to run any Integrated Developer Environments (IDE) efficiently, so use the default notepad of your OS to edit the C++ code, cmake to build it, and text-mode gdb if you really need debugging. If you really want an IDE, try code::blocks or kdevelop. It is a good idea to develop a code on your PC first before porting it to a device.
- The architecture of single-board devices is almost always ARM, which means having Neon SIMD instructions instead of Intel SIMD (SSD, MMX), which makes low-level optimization strategies quite different. ARM devices are often mutually compatible.
- Python 3.x is available on most devices. However, some packages in pip3 repositories are missing, or the versions are very old. Installing python packages often involves building C/C++ code, and on single-board devices, it can take forever or sometimes fails with C++ compile errors. Do not expect to necessarily be able to use all your favorite Python libraries on a Raspberry Pi or Jetson Nano.
- Other languages, like Java, might be also available, but as they are seldom used for CV/ML, we will not focus on those.
Finally, how to do CV/ML on single-board devices?
- As explained above, C++ and Python are available, so if you can do CV/ML on a Linux PC, you can also do it on a device! Although speed can be an issue.
- In particular, the popular computer vision library OpenCV is available. If you have to build OpenCV from the source, it can take many hours (especially with contrib), and a lot of disk space.
- Some single-board devices have unique hardware, such as: CSI cameras, hardware video encoders/decoders, deep learning accelerators, etc. These issues will be addressed below.
Computer Vision with a Raspberry Pi
While nowadays Raspberry Pi 4 is available, here we will talk about Raspberry Pi 3 as we have experience mostly with this edition of the single-board computer. So when we say that something is “very slow”, expect that the things are now slightly better on the newer model. As explained above, Raspberry Pi 3 is basically a tiny Linux PC that uses a micro-SD card as a hard disk. It has 4 USB ports, HDMI, ethernet, and also Wi-Fi and Bluetooth (the latter two are not very reliable). For Linux, we strongly suggest Raspberry Pi OS a.k.a. Raspbian, as it is guaranteed to support all Raspberry-specific hardware. Currently, Raspbian 10 is available. C++ and Python are supported reasonably well. Raspberry Pi (RPi) is an ideal gadget for toy CV projects. It has no hardware-accelerated deep learning though.
When you hear “Raspberry Pi”, for many people the first reaction is “Camera !”. Indeed, Raspberries are most often used with some sort of camera (Fig. 4).
Fig. 4. Raspberry Pi cameras: CSI, USB and Spy CSI.
On PCs, USB (Fig. 4 center) web cameras are most commonly used, but Raspberries have another interface called Camera Serial Interface (CSI, Fig 4. left, right). All real RPi geeks use CSI cameras! In particular a “spy camera” (ultra-small CSI camera), Fig. 4. right, is a very popular toy.
But how do we use these cameras? If you google, you will find info on command-line tools like raspivid, raspistill, but we are programmers, we do not want that junk, how to use cameras in a C++ or python code? On Linux, there is a standard camera interface called Video4Linux2 (V4L2). OpenCV and many other libraries use V4L2 under the hood. USB cameras work out of the box with V4L2, while for CSI cameras you will need a special driver bcm2835-v4l2 (update: apparently, with the latest Raspberry Pi OS 10 it is no longer needed). There are many ways to use a CSI camera in your code:
- V4L2 (including OpenCV VideoCapture)
- Raspicam C++ library
- Multi-Media Abstraction Layer (MMAL) specification
- OpenMAX specification
The last 3 options are unique to RPi. Why would you possibly need other options when you have V4L2? Because they might be more efficient or allow a more efficient pipeline. And now we come to the next question often asked:
Can I get a 90 fps video stream with a Raspberry Pi camera?
Many CSI cameras advertise 90 fps. Does it really work? We did a lot of testing a while ago with RPi 3. The short answer: not really. Maybe the camera itself can do it (at 640×480 resolution and with a weird bluish tint). But RPi 3 is too slow to correctly process the video. First, it is not entirely trivial to set the camera to the 90fps mode, but can be done with both OpenCV VideoCapture and Raspicam. When only grabbing the frames in the C++ code without any processing, we got about 80 fps at most. When streaming it over the local network via UDP in the H264 codec, we could get about 50 fps at most. On a slow computer like RPi 3 any image processing operations can easily become a fatal bottleneck: conversions between RGB, BGR and YUV420, encoding to video codecs, saving frame or video to disk, displaying video on screen with cv::imshow(), streaming etc.
RPi also has a video accelerator that can encode/decode H264 and some other codecs (but not H265). It is not trivial to use it. The most “native” way is to use the OpenMAX specification. It can use both video accelerator and CSI cameras and allows building efficient pipelines in a way similar to gStreamer. However, OpenMAX is not easy to use, involves tons of boilerplate code, and most importantly, it is typically unavailable on devices other than RPi (like PCs), so your RPi code will run on RPi only. Common media libraries ffmpeg and GStreamer can use the RPi video accelerator, but ffmpeg requires building from the source, which takes many hours on Raspberry Pi 3. To compare: PCs do not always have video accelerators unless they have a good Nvidia GPU. Video accelerators do not always accelerate encoding/decoding much, but at least they avoid the heavy CPU load of doing these operations on the CPU, and leave the CPU available for other things.
We covered just the first batch of practical insights about embedded vision. There are plenty of things we want to discuss: for example, how to run deep learning algorithms on single-board/ embedded devices, including TensorRT and DL accelerators. We will talk about it in our next article, so stay tuned!