In the previous article, we’ve learned what GStreamer is and its most common use cases. Now, it’s time to start coding in C++. This tutorial does not replace but rather complements the official GStreamer tutorials. Here we focus on using appsrc and appsink for custom video (or audio) processing in the C++ code. In such situations, GStreamer is used mainly for encoding and decoding of various audio and video formats.
You might have heard of something called “GStreamer”. I know what you think. This is some old and boring geek-and-nerd stuff from Linux, right? But what is it? What is the use of GStreamer? If we want computer vision or audio (speech, music) processing, can GStreamer help us? In this article, I’ll try to answer these questions. This article is beginner-level and assumes no or little previous experience with GStreamer.
As we already explained, MediaPipe is a C++ pipeline library. It is very poorly documented, basically, the only documentation is the comments and docstrings in the MP source code. There are also examples, but they are not very readable. There is only one trivial “hello world” example, the rest is deep learning, which is counterproductive for learning basic MP concepts. Moreover, these examples are artificially obscured by things like GLog and GFlags.
It was not easy at all to master MediaPipe. We thought little in C++ could surprise us. MP did. They say Google libraries do not work outside of Google. We can confirm this is the truth. The ways Google uses the C++ language are highly unusual from our point of view. Normally (at least where we come from) people use CMake, a nice cross-platform build system, for C++ projects.
In the ML/DL community you can often hear ”Nowadays you must know Google MediaPipe”, “It’s a cool framework”, and sometimes “It’s internally used by YouTube!” Videos with various computer vision tasks like this hand tracking often appear on LinkedIn and forums with the comment “This is MediaPipe”! At this point, we decided we could not ignore it anymore.
Fiducial markers are widely used in various applications like robot navigation, logistics, augmented reality. Fig. 1. Applications of fiducial markers Advantages are obvious High contrast Simple code generation Resistance to extremal angles However, when we deal with a large number of markers, real-time recognition becomes challenging, especially on embedded devices with low power CPUs on-board.
Indoor positioning systems are becoming popular nowadays. Indeed, there is plenty of opportunities for real-time user navigation in GPS-denied environments. An interesting use cases are as follows: Fig. 1. Indoor navigation use cases There are several options for hardware (see It-Jim blog post). We have developed the positioning algorithm based on cheap Bluetooth beacons and built-in IMU sensors on a mobile device.
Our task was to develop the algorithm for the automatic road detection in radar images. The challenge was that the radar images are a bit different from the optical ones. In particular, in the case of synthetic aperture radar (SAR), the image formation process is accomplished via coherent processing of the received signals backscattered from the Earth surface. As a result, the multiplicative speckle noise appears in the SAR images.