Blog

64
GStreamer C++ Tutorial
GStreamer C++ Tutorial
Oleksiy Grechnyev Photo
Oleksiy Grechnyev CV/ML engineer @It-Jim

In the previous article, we’ve learned what GStreamer is and its most common use cases. Now, it’s time to start coding in C++. This tutorial does not replace but rather complements the official GStreamer tutorials. Here we focus on using appsrc and appsink for custom video (or audio) processing in the C++ code. In such situations, GStreamer is used mainly for encoding and decoding of var...

81
GStreamer for Computer Vision and Audio Processing
GStreamer for Computer Vision and Audio Processing
Oleksiy Grechnyev Photo
Oleksiy Grechnyev CV/ML engineer @It-Jim

You might have heard of something called “GStreamer”. I know what you think. This is some old and boring geek-and-nerd stuff from Linux, right? But what is it? What is the use of GStreamer? If we want computer vision or audio (speech, music) processing, can GStreamer help us? In this article, I’ll try to answer these questions. This article is beginner-level and assumes no or little previous...

364
Apple RoomPlan: Is There a Room for Improvement?
Apple RoomPlan: Is There a Room for Improvement?
Oleg Ponomaryov Photo
Oleg Ponomaryov Tech Lead @It-Jim

At WWDC 2022 Apple introduced RoomPlan API for Swift, which allows obtaining room scans using a camera and LiDAR on iPhone and iPad. This might look similar to the Scene Reconstruction API, which was introduced earlier and also uses LiDAR. It produces a polygonal mesh of the environment, which essentially provides information about the shape of the environment. But what if you want to measure the ...

278
Writings on the Wall: Recognizing Speech on Spectrograms
Writings on the Wall: Recognizing Speech on Spectrograms
Oleg Ponomaryov Photo
Oleg Ponomaryov Tech Lead @It-Jim

If you’ve ever come close to anything related to audio or other signal processing, you likely already know about spectrograms. Those fancy-looking and usually colorful plots are commonly used to represent a spectrum’s change over time. But can they provide us with some higher-level information about, let’s say, human speech? What if I told you that one could effectively get a transcript of a...

272
Computer Vision in a Web Browser: Practical Examples
Computer Vision in a Web Browser: Practical Examples
Oleksiy Grechnyev Photo
Oleksiy Grechnyev CV/ML engineer @It-Jim

This blog post covers some important aspects of deploying and running classical computer vision algorithms as well as convolutional neural networks in a web front-end. Please make sure you have read the first part of the blog post. This will definitely help you to follow all technical aspects much easier. How can you pass an image or a video frame from JS to C++ and back? We’ll give a minimal ex...

308
Computer Vision in a Web Browser: Basics
Computer Vision in a Web Browser: Basics
Oleksiy Grechnyev Photo
Oleksiy Grechnyev CV/ML engineer @It-Jim

Are you interested in Computer Vision (CV)? Probably yes, if you are reading this. If you read CV tutorials, you might have noticed that most of them are in Python. This applies to both traditional CV (without neural networks) and, even more, to deep learning (neural networks). Occasionally, CV tutorials use C++ instead of Python, but any other programming languages are very rare. The fact is, Pyt...

261
Computer Vision in the Food Domain
Computer Vision in the Food Domain
Kateryna Arkhypova Photo
Kateryna Arkhypova Business Analyst @It-Jim

Surprising but true: according to market research, customers prefer apples with a maximum diameter of 75 to 80 mm 🍏 Now you know 🙂 People would obviously struggle to accurately evaluate fruits’ size with their naked eyes. In contrast, computer vision (CV) systems can measure the precise diameter of an apple in the blink of an eye, literally. CV systems can collect and process a variety...

3955
A C++ Mini-Tutorial on MediaPipe
A C++ Mini-Tutorial on MediaPipe
Oleksiy Grechnyev Photo
Oleksiy Grechnyev CV/ML engineer @It-Jim

As we already explained, MediaPipe is a C++ pipeline library. It is very poorly documented, basically, the only documentation is the comments and docstrings in the MP source code. There are also examples, but they are not very readable. There is only one trivial “hello world” example, the rest is deep learning, which is counterproductive for learning basic MP concepts.  Moreover, thes...

1770
The Bizarre Google World: Bazel, ProtoBuf, and More
The Bizarre Google World: Bazel, ProtoBuf, and More
Oleksiy Grechnyev Photo
Oleksiy Grechnyev CV/ML engineer @It-Jim

It was not easy at all to master MediaPipe. We thought little in C++ could surprise us. MP did. They say Google libraries do not work outside of Google. We can confirm this is the truth. The ways Google uses the C++ language are highly unusual from our point of view. Normally (at least where we come from) people use CMake, a nice cross-platform build system, for C++ projects. Other somewhat common...

2251
Down the Rabbit Hole: Our Journey to the Land of MediaPipe and Other Google Technologies
Down the Rabbit Hole: Our Journey to the Land of MediaPipe and Other Google Technologies
Oleksiy Grechnyev Photo
Oleksiy Grechnyev CV/ML engineer @It-Jim

In the ML/DL community you can often hear ”Nowadays you must know Google MediaPipe”, “It’s a cool framework”, and sometimes “It’s internally used by YouTube!” Videos with various computer vision tasks like this hand tracking often appear on LinkedIn and forums with the comment “This is MediaPipe”! At this point, we decided we could not ignore it anymore. So we packe...

4840
Audio Processing Basics in Python
Audio Processing Basics in Python
Oleksiy Grechnyev Photo
Oleksiy Grechnyev CV/ML engineer @It-Jim

If you want to try some sound processing in Python (with neural network or otherwise) and don’t know where to start, then this article is for you. This post is for absolute beginners.  What do we want? Basically 3 tasks. Read and write audio files in different formats (WAV, MP3, WMA etc.). Play the sound on your computer. Represent the sound as a waveform, and process it: filter, resample, buil...

738
WebAR Development and Deployment: Cloud-Based or Serverless?
WebAR Development and Deployment: Cloud-Based or Serverless?
Ruslan Timchenko Photo
Ruslan Timchenko CV engineer @It-Jim

Enhancing the physical world with virtual content, connecting real life with the digital world, and making that interaction an immersive experience are the reasons for many businesses to turn to extensive usage of augmented reality (AR). In many cases, however, installation of a specific mobile application is required. Would it not be easier and less time-consuming for a user to have AR directly i...

1367
Automatic Floor Segmentation Using Computer Vision
Automatic Floor Segmentation Using Computer Vision
Yurii Chyrka Photo
Yurii Chyrka Head of ML @It-Jim

Automatic floor segmentation can serve many interesting purposes including mixed reality (MR) applications, interior design, entertainment, computation of available space in a room, or indoor robot navigation. In this project, we have been solving a problem of scene understanding and, in particular, determining which pixels of the image belong to the floor.   The problem of floor segmentation is...

758
Becoming a Computer Vision Engineer in 2021
Becoming a Computer Vision Engineer in 2021
Daryna Pesina Photo
Daryna Pesina COO @It-Jim

If you want to dig into Computer Vision (CV) but have no idea where to start, this beginner guide is for you. Here we recommend some sources which will come in handy for learning and understanding both the computer vision and deep learning basics.  When you search for a position of computer vision engineer, you’re likely to see that companies are looking for a candidate with: digital image ...

8003
iPhone’s 12 PRO LiDAR: How to Get and Interpret Data
iPhone’s 12 PRO LiDAR: How to Get and Interpret Data
Ruslan Timchenko Photo
Ruslan Timchenko CV engineer @It-Jim

Apple events always amaze the entire world and 2020 was not the exception. Apple presented the first mobile devices equipped with LiDAR: iPad Pro 11 and iPhone 12 Pro (and PRO max version). This active sensor measures physical distances to the objects on a spatial two-dimensional grid. Nowadays it is widespread in the automotive area for object detection and collision avoidance. How can developers...

2346
4 Ways How Computer Vision Is Deepening the Fashion Industry
4 Ways How Computer Vision Is Deepening the Fashion Industry
Daryna Pesina Photo
Daryna Pesina COO @It-Jim

What is your first thought when you hear about computer vision (CV) in fashion? Or, what is the first thing that pops into your head when you hear about deep learning fashion? Let us guess – online clothing shopping or virtual try-on applications? Well, this might be surprising but deep fashion is not a far future anymore. What’s more, fashionably speaking, the usage of deep learning in the fa...

534
Computer Vision in Healthcare
Computer Vision in Healthcare
Daryna Pesina Photo
Daryna Pesina COO @It-Jim

Artificial intelligence (AI) and machine learning (ML) are being progressively used across different sectors including healthcare. One of the AI-powered tools is computer vision (CV), the ability to recognize, interpret, and process visual data. Thus, potential applications of computer vision in the medical field are multifold, from image processing and predictive analysis to automated health reco...

3278
Applications of Artificial Intelligence in Automotive Industry
Applications of Artificial Intelligence in Automotive Industry
Daryna Pesina Photo
Daryna Pesina COO @It-Jim

A century ago, the very thought of machines being able to think, make complicated calculations, and come up with effective solutions to pressing problems was more of a figment of science fiction writer’s fantasy rather than a foreseeable reality. Still, as we move into the third decade of the 21st century, we cannot imagine our life without manufacturing robots, marketing and stock trading bots,...

1412
Practical Aspects of Real-Time Video Pipelines
Practical Aspects of Real-Time Video Pipelines
Oleksiy Grechnyev Photo
Oleksiy Grechnyev CV/ML engineer @It-Jim

Video is an extremely popular way to represent information. Indeed, sometimes it is enough to watch a short clip instead of long listening or reading about complicated technical concepts. From a user’s point of view, a video is just a sequence of images followed one-by-one with a very short inter-frame interval. Typically it has around 30 frames per second (FPS). However, many things are lef...

1632
Embedded and Single-Board Computer Vision: Running Deep Neural Nets
Embedded and Single-Board Computer Vision: Running Deep Neural Nets
Oleksiy Grechnyev Photo
Oleksiy Grechnyev CV/ML engineer @It-Jim

Deep learning (DL) and neural networks are extremely widespread in different computer vision (CV) applications. Indeed, many typical problems (like object recognition or semantic segmentation) are effectively solved by convolutional neural networks (CNNs). In this article, we are going to discuss how to utilize CNNs on embedded devices. Neural networks today are ubiquitous. In particular, it is ha...

2074
Embedded and Single-Board Computer Vision: Introduction
Embedded and Single-Board Computer Vision: Introduction
Oleksiy Grechnyev Photo
Oleksiy Grechnyev CV/ML engineer @It-Jim

Computer vision (CV) and machine learning (ML) algorithms solve a tremendous amount of problems. However many businesses often do not understand what hardware to choose for running your favorite neural net or some advanced image and video processing pipelines. With this blog post, we start a series of articles about embedded vision and specific practical things you need to know before making your ...

920
Binary Marker Recognition on Raspberry
Binary Marker Recognition on Raspberry
Ievgen Gorovyi Photo
Ievgen Gorovyi CEO @It-Jim

Fiducial markers are widely used in various applications like robot navigation, logistics, augmented reality. Fig. 1. Applications of fiducial markers Advantages are obvious High contrast Simple code generation Resistance to extremal angles However, when we deal with a large number of markers, real-time recognition becomes challenging, especially on embedded devices with low power CPUs on-board....

3059
Watch Your Steps: a Brief Review of Step Detection Using Mobile Sensors
Watch Your Steps: a Brief Review of Step Detection Using Mobile Sensors
Daryna Pesina Photo
Daryna Pesina COO @It-Jim

In our swarming world, it is quite hard to imagine someone having no mobile phone in the pockets of their jeans, dress, or suit. Even the inveterate skeptic has to accept the fact that smartphones entered our life and have become its inalienable part, the part of us. Mobile phones became our assistants in all aspects of our life, like filming the greatest events of our life, scheduling our time, b...

533
Biological Cells Segmentation
Biological Cells Segmentation
Daryna Pesina Photo
Daryna Pesina COO @It-Jim

The task of accurate cell segmentation is essential for cellular biology and single-cell analysis, as well as for studying biological processes as a whole. In biomedical image processing, this includes reconstruction of microscopy images, foreground segmentation, cell detection, cellular compartments and organelles segmentation. Despite the tremendous progress in microscopy cell imaging and numero...

2484
Overview of Indoor Navigation Technologies
Overview of Indoor Navigation Technologies
Ievgen Gorovyi Photo
Ievgen Gorovyi CEO @It-Jim

The development of indoor navigation services and algorithms is becoming a popular trend in the IT industry in recent years. Some of the modern buildings, like airports, shopping malls, and warehouses have grown enough (Fig.1) to feel a need for their own navigation tools for customers. Closed environment conditions exclude the usage of common satellite-based navigation systems like GPS or GLONASS...

6018
Marker-Based Augmented Reality
Marker-Based Augmented Reality
Dmytro Sharapov Photo
Dmytro Sharapov CV engineer @It-Jim

Augmented Reality (AR) is one of the most popular and challenging fields in computer vision research. It allows supplementing the real world with some kind of digital content, for example, virtual 3D objects. The key feature of Augment Reality in comparison to other image processing tools is that virtual objects are moved and rotated in 3D coordinates instead of 2D image coordinates. The main obje...

3311
Tesseract Library Configuration
Tesseract Library Configuration
Dmytro Sharapov Photo
Dmytro Sharapov CV engineer @It-Jim

You’ve undoubtedly seen it before… It’s widely used to process everything from scanned documents to the handwritten scribbles on your tablet PC and Google Translate. And today you’ll create your first app for text recognition. Optical Character Recognition, or OCR, is the process of electronically extracting text from images and reusing it in a variety of ways such as document editing, fre...

2105
Automatic Number Plate Recognition (ANPR) Systems
Automatic Number Plate Recognition (ANPR) Systems
Ievgen Gorovyi Photo
Ievgen Gorovyi CEO @It-Jim

Currently, the number of cars in the world is well over 1 billion. It is no wonder that one of the most common computer vision tasks is the effective control of these vehicles through automatic number plate recognition (ANPR) systems. The applications of automatic vehicle number plate detection and recognition vary depending on the area of use and include, among others, border control, stolen car ...