admin, Author at it-jim ― page 7

A C++ Mini-Tutorial on MediaPipe

Posted on October 14, 2021February 5, 2026 by admin

What MediaPipe Really Is: a C++ Mini-Tutorial

As we already explained, MediaPipe is a C++ pipeline library. It is very poorly documented, basically, the only documentation is the comments and docstrings in the MP source code. There are also examples, but they are not very readable. There is only one trivial “hello world” example, the rest is deep learning, which is counterproductive for learning basic MP concepts. Moreover, these examples are artificially obscured by things like GLog and GFlags. So we had to learn MP the hard way while dealing with the Bazel issues. This kind of low-level MediaPipe work is exactly what later enabled us to build production-grade computer vision pipelines for demanding domains like motion analysis and performance tracking in professional sports.

As a result, we wrote the following tutorial: https://github.com/agrechnev/first_steps_mediapipe. It gives a gentle introduction to the basic MediaPipe C++ API (no deep learning or solutions). Below we give a very brief summary of this tutorial, see the actual code for more details.

The core MP concepts (unlike the C++ API) are pretty well explained in the official MP docs. The basic terminology:

Packet: An immutable data packet of an arbitrary type (with a timestamp). MP also has standard types for image and audio.
Graph: The pipeline, represented as a graph.
Node: A node of the graph, which processes data.
Stream: Graph edge, a stream of packets with monotonously increasing timestamps.
Calculator: A registered class for creating nodes.

First example

How does it work in practice? Let’s look at our first example 1.1. It deals with packets of doubles (using more complicated types, like images would be very wrong for first examples). Let’s define a very simple graph, as a Protobuf text string:

string protoG = R"(
    input_stream: "in"
    output_stream: "out"
    node {
        calculator: "PassThroughCalculator"
        input_stream: "in"
        output_stream: "out1"
    }
    node {
        calculator: "PassThroughCalculator"
        input_stream: "out1"
        output_stream: "out"
    }
    )";

It has two nodes of PassThroughCalculator. What does it do? Basically nothing, it forwards all input data packets to the output. The graph has input stream in, output stream out, and there is one more stream out1 in the middle. The graph looks like this (visualized by the MP visualizer):

Next, we parse the config and create our graph.

mediapipe::CalculatorGraphConfig config =
  mediapipe::ParseTextProtoOrDie<mediapipe::CalculatorGraphConfig>(protoG);
mediapipe::CalculatorGraph graph;
MP_RETURN_IF_ERROR(graph.Initialize(config));

Next, we should add an observer to process output packets of a graph asynchronously (synchronous processing is also possible if needed). Then we start running the graph:

auto cb = [](const mediapipe::Packet &packet)->mediapipe::Status{
  cout << packet.Timestamp() << ": RECEIVED " << packet.Get<double>() << endl;
  return mediapipe::OkStatus();
}
MP_RETURN_IF_ERROR(graph.ObserveOutputStream("out", cb));
MP_RETURN_IF_ERROR(graph.StartRun({}));

At this point, the graph starts running. It is now waiting for the input packets. But wait, we did not supply any! This is what we do next. The packet is sort of like an immutable shared_ptr<any>, plus a timestamp. It can hold data of any type. The timestamps in a stream must increase monotonously. Of course, they don’t have to be absolute timestamps since the epoch. Let’s send a few double packets, then “close the stream” to tell MP that no more packets are coming.

for (int i=0; i<13; ++i) {
  mediapipe::Timestamp ts(i);
  mediapipe::Packet packet = mediapipe::MakePacket<double>(i*0.1).At(ts);
  MP_RETURN_IF_ERROR(graph.AddPacketToInputStream("in", packet));
}
graph.CloseInputStream("in");

Adding the timestamp is crucial, MP will not work otherwise! Now let us wait for MP to process all packets and finish.

MP_RETURN_IF_ERROR(graph.WaitUntilDone());
return mediapipe::OkStatus();

That’s it, we are done!

Writing a custom calculator

Let us now write a custom calculator (example 1.2). Our calculator will multiply a double number by 2, aka “double the double”. A custom calculator must be defined in the mediapipe namespace and registered with the REGISTER_CALCULATOR() macro. After that MediaPipe finds the calculator by name (as specified in the Protobuf graph description), there is no need to import any header for the calculator class.

Every calculator must implement the static method GetContract() to describe inputs and outputs (streams in MP can have numbers, string tags, or both); and implement the method Process() which process each incoming packet (or, in, general, a synchronized bunch of packets with the same timestamp). Methods Open() and Close() are typically also overridden. The code for “double the double” calculator is:

namespace mediapipe{
class GoblinCalculator12 : public CalculatorBase {
public:
static Status GetContract(CalculatorContract *cc) {
  using namespace std;
  cc->Inputs().Index(0).Set<double>; 	// 1 double input
  cc->Outputs().Index(0).Set<double>;	// 1 double output
  return OkStatus();                   	// Never forget to say "OK" !
}

Status Process(CalculatorContext *cc) override {
  using namespace std;
  Packet pIn = cc->Inputs().Index(0).Value();	// Receive the input packet
  double x = pIn.Get<double>();     	// Extract the double number
  double y = x * 2;                        	// Process the number
  Packet pOut = MakePacket<double>(y).At(cc->InputTimestamp()); // Create packet
  cc->Outputs().Index(0).AddPacket(pOut);  // Send it to the output stream
  return OkStatus();             	// Never forget to say "OK" !
}
REGISTER_CALCULATOR(GoblinCalculator12); 	// Register this calculator
}

Example 1.3 contains further examples of custom calculators.

Can you configure a calculator? MP gives a few ways to do that:

Options: Parameter specified in the Protobuf graph definition, see example 1.4.
Side packets: Input and output data packets that are sent only once (and not for each timestamp). Example 1.5.
Extra stream: This can contain options for each timestamp. For example, stream 0 for video frames, and stream 1 for crop boxes of some sort. Example 2.3.

Let’s process images

Now let’s process images (and a stream of images is actually a video). Once you’re comfortable with image and video streams at this level, extending the pipeline to full-body keypoint detection and temporal pose tracking becomes a natural next step.

MP has a special type for images, ImageFrame. It can be converted back and forth to cv::Mat. Example 2.1 is a trivial example with PassThroughCalculator, but with video data. The graph is simple:

string protoG = R"(
    	input_stream: "in",
    	output_stream: "out",
    	node {
        	calculator: "PassThroughCalculator",
        	input_stream: "in",
        	output_stream: "out",
    	}
    	)";

Our Observer callback now converts the packet to cv::Mat and displays the image on the screen.

auto cb = [](const Packet &packet)->Status{
    	cout << packet.Timestamp() << ": RECEIVED VIDEO PACKET !" << endl;
    	// Get data from packet (you should be used to this by now)
    	const ImageFrame & outputFrame = packet.Get<ImageFrame>();
    	// Represent ImageFrame data as cv::Mat (MatView is a thin wrapper, no copying)
    	cv::Mat ofMat = formats::MatView(&outputFrame);
    	// Convert RGB->BGR
    	cv::Mat frameOut;
    	cvtColor(ofMat, frameOut, cv::COLOR_RGB2BGR);
    	// Display frame on screen and quit on ESC
    	// Returning non-OK status aborts graph execution
    	// I'll make a nicer quit in later examples
    	cv::imshow("frameOut", frameOut);
    	if (27 == cv::waitKey(1))
        	// I was not sure which Abseil error to use here ...
        	return absl::CancelledError("It's time to QUIT !");
    	else
        	return OkStatus();
	};

Note that we return an error code for a smooth quit from the application if the ESC key is pressed. If the Observer callback returns an error, the whole MP graph stops.

Now we take frames from the camera, convert them to ImageFrame, and send them to MP in an endless loop, which we break out of on a failed MP_RETURN_IF_ERROR() check:

for (int i=0; ; ++i){
    	// Read next frame from camera
    	cap.read(frameIn);
    	if (frameIn.empty())
        	return absl::NotFoundError("CANNOT OPEN CAMERA !");
    	// Convert BGR to RGB
    	cv::cvtColor(frameIn, frameInRGB, cv::COLOR_BGR2RGB);
    	// Create an empty RGB ImageFrame with the same size as our image
    	ImageFrame *inputFrame =  new ImageFrame(
        	ImageFormat::SRGB, frameInRGB.cols, frameInRGB.rows, ImageFrame::kDefaultAlignmentBoundary
    	);
    	// Copy data from cv::Mat to Imageframe, using
    	// MatView: a cv::Mat representation of ImageFrame
    	frameInRGB.copyTo(formats::MatView(inputFrame));
    	// Create and send a video packet
    	uint64 ts = i;
    	// Adopt() creates a new packet from a raw pointer, and takes this pointer under MP management
    	MP_RETURN_IF_ERROR(graph.AddPacketToInputStream("in",
        	Adopt(inputFrame).At(Timestamp(ts))
    	));
	}

Our further video examples:

2.2: Video pipeline with ImageCroppingCalculator and ScaleImageCalculator
2.3: Video pipeline with ImageCroppingCalculator (dynamic crop)
2.4: Video pipeline with FeatureDetectorCalculator and custom image processing. Here we write a custom calculator for processing images.

ImageCroppingCalculator, ScaleImageCalculator and FeatureDetectorCalculator are three standard image-processing calculators of MediaPipe. There are many more.

How to make MediaPipe real-time?

By default, MP is NOT real-time. It processes all packets deterministically, in the order of increasing timestamps, without loosing any packets. Any MP stream automatically has a buffer of unlimited size. This is fine if we want to process a video file offline.

As we all know, it is NOT acceptable for real-time pipelines. These real-time constraints become especially critical in applications where pose dynamics are used not just for visualization, but for quantitative analysis – for example, detecting subtle motor impairments or asymmetries over time. If we set up a real-time source of packets, and the pipeline is not fast enough to process them, the buffers will fill more and more, while increasing the lag, until they fill all RAM and your application crashes (Example 3.1).

Is it possible to create a real-time pipeline in MP? Yes. There are several ways. The simplest (and used in Google deep learning examples) is to put a FlowLimiterCalculator at the beginning of the pipeline. This calculator has a second input stream, which should be plugged into the output stream of the pipeline. It then compares the timestamps of two streams. If they are too different, it means that the buffers start to fill up, and, above a certain threshold (which can be adjusted), FlowLimiterCalculator starts to drop packets. A typical pipeline from the Google face detection example is (output video is actually sent to the “FINISHED” input of FlowLimiter, but the visualizer does not show such connections):

The right panel shows the subgraph FaceDetectionFrontCpu, which is a typical TFLite inference pipeline.

Our example 3.2 demonstrates the use of FlowLimiter.

What’s next?

MediaPipe has the following modules, each with a number of standard calculators:

audio
core
image
tensor
tensorflow
tflite
util
video

In our tutorial we focused on the basic MP concepts, there are lots of things we did not cover:

We barely touched the standard calculators
Using GPU
Audio processing
Deep learning with TFLite or TensorFlow
Solutions
Input policies
Languages and OSes other than C++/Desktop

And we repeat our final verdict: MediaPipe would be very nice, if not for Bazel. Bazel (and all related issues) makes you think twice before deciding to use MediaPipe in your C++ project.

If you’re more into watching than reading – we have a YouTube lecture on MediaPipe. Enjoy!

The Bizarre Google World: Bazel, ProtoBuf, and More

Posted on October 14, 2021 by admin

It was not easy at all to master MediaPipe. We thought little in C++ could surprise us. MP did. They say Google libraries do not work outside of Google. We can confirm this is the truth. The ways Google uses the C++ language are highly unusual from our point of view.

How is C++ normally used?

Normally (at least where we come from) people use CMake, a nice cross-platform build system, for C++ projects. Other somewhat common build systems for C++ include Autotools (aka configure+make, mostly Linux/Unix), qmake, and Visual Studio projects (Windows+Visual Studio only). These build systems are similar in the way they handle dependencies. Libraries needed by your projects are typically downloaded and installed system-wide, and not attached to any particular project (as they do in Java or JavaScript worlds). In Linux, macOS and MSYS2 you typically use the system package manager (e.g. ‘sudo apt install libopencv-dev’). For Windows+Visual Studio, you can use vcpkg. If a library is not in the package manager repo, you can download it by hand (as a binary), or, in the worst case, build from the source. By the way, in the latter case, we always install it in a user’s home directory in Linux (e.g. “/home/mickeymouse/opencv-cuda”), we never do “sudo make install”.

What is an installed C/C++ library (by ‘sudo apt install’ or otherwise)? It is a bunch of headers (.h or .hpp files); and one or more static (.a/.lib) or more often dynamic (.so/.dll) library files. In any case, an “installed library” is compiled once, then used as a binary, which is a good idea, since building a large library like OpenCV, FFMpeg or Boost from the sources takes significant time even on modern PCs. As a C++ developer, you rarely (if ever) have to deal with building standard libraries from the source.

But how do you use installed libraries in your C++ project? First, your project must find the libraries. CMake has a find_package() command for CMake packages, and pkg-config packages can be found by both CMake and Autotools projects on Linux. Things are a bit worse in Windows, but CMake find_package() still mostly works, if used properly.

How does MediaPipe use C++? Part 1.

MP logic is very different. MP does not use CMake. It uses a different build system called Bazel. We’ll tell you in a moment what it is. MP also has tons of dependencies. Namely:

Source downloaded from github (Non-google): Bazel-skylib, EasyExif, pybind11, Ceres
Source downloaded from github (Google): Abseil, GoogleTest, Benchmark, GLog, GFlags, Protobuf, libyuv, AudioTools, TensorFlow
The choice between building from source or using system libraries: OpenCV, ffmpeg

Below we will explain the “downloading and building from source” part. It is practically impossible to build MP in any other way (e.g. with CMake). Maybe a C++ professional could solve this, given time, but the sheer number of dependencies would make it very hard. Definitely not a project for beginners.

What is Bazel?

Bazel is a multi-language build system, which Google uses for many C++ projects, MP included. Probably there are production-related reasons for this, but for us (we are not Google professionals) our experience with Bazel was predominantly negative.

A Bazel project root directory has a file named WORKSPACE, which can be empty. What is a minimal Bazel project? It has an empty WORKSPACE file and a subdirectory fun1. This subdirectory has a file hello.cpp with a project file called BUILD:

load(“@rules_cc//cc:defs.bzl”, “cc_binary”)

cc_binary(

name = “hello”,

srcs = [“hello.cpp”],

)

Note that a project has only one WORKSPACE file, but it can have multiple BUILD files, usually as a hierarchical subdirectory structure. To build the target hello, type (in the project root):

bazel build //fun1:hello

It builds the target and creates 4 directories, which are actually symbolic links to somewhere in ${HOME}/.bazel (tricky !): bazel-bin, bazel-out, bazel-hello and bazel-testlogs. Or, if you want to build and run, type:
bazel run //fun1:hello

How does Bazel treat dependencies? First, there are internal dependencies, other targets of the same project, this is not interesting. Second, there are external dependencies, both Bazel and non-Bazel. Bazel dependencies must be Bazel projects built from the source. Non-Bazel dependencies, in theory, can be the binary libraries, combinations of *.h+.so files. All external dependencies must be listed in the WORKSPACE file.

Here the trouble starts. First, Bazel cannot look for CMake packages. It cannot even find pkg-config packages (we saw a library on GitHub which is supposed to do this, but it did not work for us, at least with OpenCV). We don’t think Bazel can even use standard system paths for libraries in include files (in Linux), you must specify an exact path to each and every library in WORKSPACE and its headers. And even this is nontrivial. Just look at the third_party directory of the MediaPipe repo to see how ugly things can get.

The preferred way in the Bazel world (or at least for Google projects like MP), is to download each and every dependency as a source code (and a Bazel project), and include it as an external Bazel dependency. Bazel has a macro called http_archive() for downloading, but you still must supply an URL. No, there is no “Bazel code repo”, it’s not like Gradle for Java or PIP for Python. Bazel does not manage any “packages”, it can only download stuff from the internet, even CMake can do that (with probably less boilerplate code).

And even such a model does not work properly, as Bazel does not understand “dependency of dependency”. Suppose your project P depends on library A, which in turn depends on B, C, D, E, F, do you add A as the external dependency in P? No, you must add A, B, C, D, E, F, or otherwise P will not build. And don’t forget that building all your dependencies from the source takes time, to say the least, especially if your dependencies are large libraries like OpenCV.

Is there any reason for using Bazel in C++ projects? We did not see any. However, in production, it might be good to download all dependencies from the internet and not rely on the Linux version and APT package versions, for example.

Another odd thing: suppose executable target A depends on a library target B. Then, if you build target A, Bazel compiles all source files (including the ones belonging to B) to .o, and links the executable A, but never actually links library B (as an .a or .so file). Only if you build target B explicitly, will the library be built.

Finally, how well is Bazel supported by IDEs? Our answer: Not at all. A CLion plugin was announced, but it is incompatible with recent CLion versions. VS Code plugin did not work either, giving very weird error messages, something about Android, while running on Linux desktop. We don’t know enough Bazel or VS Code to fix it.

To summarize, while Bazel documentation says how great Bazel is, our impression is quite the opposite.

How does MediaPipe use C++? Part 2.

Disclaimer: When we say “impossible” in this chapter, it actually means “impossible, unless you are a highly skillful C++ professional ready to devote a lot of effort to the task”.

Google MediaPipe is a Bazel project. What does it mean? It means it cannot be installed with “sudo apt install libmediapipe-dev”. And it cannot be installed as a pre-built binary library (.h and .so files). Can you build it from the source? Again, the answer is no, at least if you want .h and .so files you can use in your project. So, for all practical purposes (see the disclaimer above), MP can be only used in Bazel C++ projects. Moreover, MP itself has to be built from the source.

How does MP handle dependencies? As we explained above, it downloads >10 dependencies from the internet as source Bazel projects. An exception is made only for OpenCV and FFMpeg, where you can choose between source and system libraries (in the latter case you must specify full paths). Can you use MP as an external Bazel dependency of your project? Basically no, or at least it is very hard (we saw an example in GitHub though). The reason is the “dependencies of dependency” issue, you will need to specify basically all MP dependencies in your project, and not only MP itself.

So the only way (at least for beginners) to use MP is to make your projects not only Bazel projects but parts of the MP project, located inside the mediapipe/ directory, just like MP examples. From our point of view, this is extremely ugly. And not using any IDE does not make coding in C++ any easier.

If this is not enough for you, there are many other ways MP complicates things unnecessarily. For example:

You cannot build anything without the –define MEDIAPIPE_DISABLE_GPU=1 flag. The default is a GPU build that fails for rather obscure reasons.
MP examples use GLog logger a lot instead of cout and will not work without GLOG_logtostderr=1
The same examples require command line arguments with paths to graph, and will not work if called from a different directory.
MP creates its own wrappers for OpenCV headers and other dependencies, instead of using these libraries as they are.

We promised the final verdict by the end of the series of articles, but actually, we can put it here: MediaPipe would be very nice, if not for Bazel. Bazel (and all related issues) makes you think twice before deciding to use MediaPipe in your C++ project. In particular, if something like GStreamer is suitable for you, it is a much better choice, as it does not require Bazel.

What about using non-C++ wrappers? As we explained before, writing custom calculators requires rebuilding MP from C++ sources. Once again, you will have to deal with Bazel, and also an additional complication of integrating Bazel with Python or Android or whatever.

Google Libraries

MP uses a lot of Google libraries and some non-google ones, which it builds from sources as Bazel projects. What are those libraries? A few Google examples:

TensorFlow: If you are reading this, you should know what it is 😉
GLog: A pretty standard logger, and probably the worst logger we have seen. By default, it logs to files in some obscure locations (instead of console), and it’s hard to override.
GFlags: Google library for parsing command line arguments, and another reason why MP examples are so hard to read.
GTest: A well-known unit test library for C++.
Abseil: A Google’s answer to Boost, and a “thousand useful things for C++” type of library. It can be actually installed with apt and used in CMake projects (but not the latest version). It can be pretty nice, but as far as we know, MP uses only the error codes from Abseil.
Protobuf: The only library we genuinely liked. We devote a whole section to it.

Google Protocol Buffers (Protobuf)

What is Protobuf? It is a cross-language and cross-platform library from Google for class definition and serialization. Where is it used? TensorFlow and MediaPipe and probably many other things.

What does it all mean? Let’s do a simple example. Suppose we want to define a data type (or “message” in the Protobuf lingo) Hero in hero.proto:

syntax = "proto3"; // Language version: proto2, proto3
package goblin;  // Becomes C++ namespace
message Hero{
	string name = 1;
	int32 age = 2;
}

“Package” corresponds to a python or Java package, or a C++ namespace. “proto3” is the language version, there are 2 and 3 (they are incompatible). “=1”, “=2” are NOT defaults, but the field unique IDs, they are compulsory.

Next, we must compile the .proto file to the class definition of your language of choice. For C++, it is:

protoc --cpp_out=. hero.proto

It generates C++ files hero.pb.h and hero.pb.cc containing a C++ class Hero. It’s very important that Hero is not a “simple C++ data class of 2 fields”, but a monster class with lots of obscure methods that requires the Protobuf C++ library. However, it’s not a big problem, as Protobuf can be installed by APT and included in CMake projects easily. Then you can use this class in your own code, with getters and setters and such:

// Create a goblin::Hero object and set fields
goblin::Hero h1;
h1.set_name("Brianna");
h1.set_age(18);
// Can be copied by value (clone aka deep copy, expensive !)
goblin::Hero h2 = h1;
// Print it
cout << "h1: name=" << h1.name() << ", age=" << h1.age() << endl;
// Or like this
cout << h1.DebugString() << endl;

Classes like Hero (but not non-Protobuf classes) can be serialized in both binary and text formats. Such serialization is efficient, cross-language, cross-platform and immune to little/big-endian and 32/64-bit issues.

// Serialize to binary, then deserialize
string buf; // Here std::string is used for BINARY data !
bool ret = h1.SerializeToString(&buf);
goblin::Hero h2;
ret = h2.ParseFromString(buf);

// Serialize to text, then deserialize
string buf;
bool ret = google::protobuf::TextFormat::PrintToString(h1, &buf);
goblin::Hero h2;
ret = google::protobuf::TextFormat::ParseFromString(buf, &h2);
// Text format looks like this:
name: "Brianna"
age: 18

The binary serialization is, well, binary, even if it is contained in an std::string. Why use Protobuf? We think its potential is enormous. TensorFlow uses it to serialize models (.pb files). MediaPipe uses text format to define graphs. And you can use it in your own projects. Every time you see JSON, XML, YAML, TOML and such, Protobuf would probably be better. Binary serialization is efficient, while text serialization is human-readable, and good for e.g. config files.

Let’s now move to our next article and see how MediaPipe works in practice!

Down the Rabbit Hole: Our Journey to the Land of MediaPipe and Other Google Technologies

Posted on October 13, 2021 by admin

What is Google MediaPipe (MP) for Dummies?

In the ML/DL community you can often hear ”Nowadays you must know Google MediaPipe”, “It’s a cool framework”, and sometimes “It’s internally used by YouTube!” Videos with various computer vision tasks like this hand tracking often appear on LinkedIn and forums with the comment “This is MediaPipe”! At this point, we decided we could not ignore it anymore. So we packed our backpacks, said our goodbyes, and embarked on the journey to the Magical Land of MediaPipe and Google Technologies.

We quickly discovered that most people who praise MediaPipe on social media have no idea what it really is. “For Dummies” version: MediaPipe is a bunch of “solutions”, such as “Hand”, or “Face Mesh”. The table of all available solutions can be found here. As we can see, not all solutions are available for all platforms, although things are improving: this table nowadays has a few more checkmarks than it did half a year ago. But MediaPipe is not “solutions”. What is it really?

Fact #1: Google MediaPipe is a C++ library, other languages are wrappers around C++, with very limited functionality. If you want MediaPipe for real, you must use C++.
Fact #2: Google MediaPipe is a pipeline library. Look at the Wikipedia articles for Pipeline and related concepts of Dataflow- and Flow-Based Programming. Our previous blog post stressed the importance of pipelines for computer vision.

But what exactly is a pipeline? It is a number of Nodes organized as a Flow Graph. Data Packets (a data packet is a video frame, audio segment or some other data) run through the graph and are processed at the Nodes. Different nodes usually run on different CPU threads, so that they can utilize the available resources to the maximum. There are typically Buffers between nodes. For Real-Time Pipelines the buffers should have a limited capacity, and frames are lost if a buffer overflows. On the other hand, we want non-real-time pipelines (e.g. converting a VP9-encoded video file to H265) to be Deterministic: i.e. not-random, and with no frame loss.

Fact #3: MP can process arbitrary data types in pipelines, although it has special type for Image and Audio data.

But what about MP Solutions? What do they have to do with pipelines? MP Solutions are basically just pre-trained TensorFlow Lite (TF Lite) models under the hood. MP graphs add a few minor extra blocks to the raw inference, such as Non-Maximum Suppression and results visualization, sometimes also detection+tracking logic. But basically very little is added to TF Lite. So when you hear “MediaPipe is amazing, both fast and accurate” people are actually talking about TF Lite and particular pre-trained models. MP Solutions are rather trivial to use, and well-documented. We will not discuss them anymore.

Fact #4: MP uses TFLite or TF models for deep learning (DL), but it is in no way limited to DL. MP solutions are pre-trained TFLite models with some rather elementary pre- or post-processing. For the sake of DL, “MediaPipe” and “TFLite” are basically the same thing.

Can you do something similar with your own pre-trained TF Lite (or TF) networks? In theory, yes. In practice, the choice of standard pipeline building blocks (called Calculators in MP) is rather limited. Basically, any TFLite model can be plugged into the standard TfLiteInferenceCalculator, but MP might lack building blocks for pre/post-processing if your task is different from the tasks in the solutions. It is possible to write your own calculators, but only in C++.

What is Our Interest in MediaPipe?

We were interested mostly in MP as a universal pipeline C++ framework, and not in “solutions”. We wanted to see if MP was suitable for writing custom computer vision (CV) pipelines in C++ (see the end of this article series for the final verdict). In the process, we experimented with core MP C++ API a lot and wrote a tutorial: https://github.com/agrechnev/first_steps_mediapipe.

Can you use MP in languages other than C++ and platforms other than desktop? For solutions, yes. Python, JavaScript, Android (Kotlin/Java) and iOS (Swift). But once again, all these things are just wrappers around the C++ library. Presumably, they can be also used for a custom graph composed of standard MP calculators. However, any custom calculator must be written in C++. Moreover, if you use any custom calculators, you must (as far as we know) rebuild MP from the source, including the respective wrapper (Python, JavaScript, etc.). You must be a fluent MP C++ user in order to do that! So, for all practical purposes, MP is a C++ library, the wrappers are a joke. With this explained, we are not going to discuss any languages other than C++ in MP.

How does MP compare to another well-known pipeline library, GStreamer? Let’s have a look:


Part of, year of birth	GNOME universe, 2001	Google universe, ~2019
Language	C (GObject) + wrappers	C++ + wrappers
Main Purpose	Audio/Video conversion, filtering, resampling	Audio/Video processing, usually with Deep Learning
Standard A/V codecs	All you can think of: uses many plugins	Limited: OpenCV for video, FFMpeg for audio
Buffering, flow control	No buffering by default Enable buffers by hand	Unlimited buffering by default Enable flow control by hand
GPU, Neural nets	Yes with DeepStream+ TensorRT, NVidia GPUs only.	Yes, TensorFlow + TF Lite
Desktop use, docs	Easy, good	Hard, bad
Graph definition	C code (hard) or text string (limited)	ProtoBuf text string (easy)

In the following sections, we present our experience of designing pipelines with MediaPipe C++.

It-Jim’s 2021 Summer Internship on Computer Vision: an Overview

Posted on September 29, 2021 by admin

Another summer, another edition of our internship on computer vision to be proud of! This time we received well over 100 applications from more than 20 cities including Kyiv, Kharkiv, Lviv, Dnipro, Odesa, Mykolaiv, Vinnytsia, Uzhhorod, Poltava, Kremenchuk, Sumy, Zaporizhzhia, Kryvyi Pih, and Mariupol. What an impressive geography! Only three of the applicants made it to the ‘finals’. Curious what projects they worked on under the mentorship of It-Jim’s engineers? Let’s find out!

The Fifth Edition of It-Jim’s Internships

But first, let’s look at some more numbers. After pre-screening the list of candidates, we reached out to 75 of them and asked them to complete a couple of test assignments. Although only 25 participants sent in their solutions, 15 did so well that they made it to the next step: a technical interview with our engineers. This stage is always a little harsh: we’ve come such a long way together, yet the number of places is always limited and the majority of the candidates, unfortunately, will not receive positive answers. Three of the participants have eventually become our summer interns, and another one even became our trainee (and later a junior CV engineer, but that is a whole different story).

So what were the computer vision tasks our interns were working on for 4 weeks?

Project Zoo

One of the reasons for interviewing prospective interns is to understand their strengths and weaknesses and subsequently provide them with a project that is doable, a little challenging, but certainly educational and broadening their skills.

This summer’s list of projects included:

Soccer video analytics: creating an app to help assess a soccer player’s agility during practice by tracking the player and the ball and counting the number of kicks the player takes during drills.
Liveliness detection system: creating a solution that can detect that the facial verification system is being cheated by showing a photo of a person instead of a live face.
Traffic statistics estimation: creating an algorithm that counts the number of cars and pedestrians crossing a certain line on the road.

Interns’ Solutions

Soccer video analytics

Demo of automatic kick counting

Tools and technologies: OpenCV, deep learning, TensorFlow Lite, Kotlin

Liveliness detection system

Demo of a liveliness detection system

Tools and technologies: feature crafting, deep learning

Traffic statistics estimation

Demo of traffic statistics estimation

Tools and technologies: OpenCV, image processing, object tracking

Summary

Our interns say this program is one of the best ways to get commercial experience and try your hand at being a computer vision engineer. If you’re still wondering if this is the right path for you, remember that you can always try it first. For example, by joining us next winter 2022 for a computer vision internship. We are looking forward to receiving your application!

Computer Vision in Healthcare

Posted on September 13, 2021 by admin

Want to know what stands behind remote photoplethysmography (rPPG) and how to non-invasively monitor vital parameters such as heart rate and respiration, oxygen saturation, and blood pressure using just a phone camera?

During the event, our CEO Ievgen Gorovyi will dive into the details of developing a computer vision-based solution for such healthcare application.

📅 Join us on September 18 at 11:00 in Zoom meeting!

🎯 Participation is free by pre-registration 👉🏻 https://cutt.ly/mWT8uv0.

Computer vision: DL or not DL?

Posted on September 9, 2021 by admin

📅 When: 7 p.m. EEST | September 23, 2021
🏡 Where: Online, details will be sent via email
🔊 Speakers: Pavlo Vyplavin, CTO at It-Jim, Ph.D., and Yurii Chyrka, Head of ML at It-Jim, Ph.D.
💬 Language: Russian and Ukrainian

📝 Registration: https://bit.ly/2YuiRLs 👈

AI Ukraine Online Conference 2021

Posted on September 1, 2021 by admin

On October 30, the AI Ukraine Online Conference will take place. Since 2014, it has been gathering experts immersed in Data Science and Machine Learning.
Every year AI Ukraine brings together more than 900 participants from all over the world to increase (accumulate?) expertise, share experience, and take another step forward the future of emerging technologies.
The conference will be held online. In addition to three thematic streams, it will consist of Q&A sessions, interactives, and networking.

Applied Computer Vision Course

Posted on August 31, 2021 by admin

After several very successful editions of internships and schools on computer vision and lots of interviews for CV/ML/DL engineers’ positions at our company, we are super excited to announce that we are launching our course in October 2021! 🚀

10 weeks, 20 lessons, each one being a mixture of theory, enhanced with mathematics essentials for computer vision, and practical workshops showcasing the methods learned. Unlike many other courses, we are going to focus not only on DL methods but also on classical CV algorithms.

Our #ACV course will be ideal for:

✅ experienced software developers who want to switch to CV/ML/DL domain,

✅ 4- or 5-year students of technical specialties with a passion for computer vision,

✅ data scientists with little background in computer vision aiming to change that,

✅ anyone looking for an extensive and solid base in computer vision.

More details 👉 edu.it-jim.com/. Сontact Daryna Pesina should you have any questions.