Text-to-speech (TTS) has been a popular topic for some time, and its development shows no signs of slowing down. There are a plethora of deep learning models, software programs, and companies offering this service. It’s no surprise, given the broad range of applications, from voice assistants and answering machines to creating audio versions of articles, books, and even automatic voiceovers ...
People are excited about extended reality. They dream of having a Metaverse where they would spend their time and earn money. And one of the key aspects of the Metaverse is the virtual world itself. It may be designed by artists, created via procedural 3D modeling, or taken from the real world via scanning. The broader availability of consumer 3D scanning makes it one of the most promising ways fo...
Historically, there have been many Deep Learning (DL) frameworks, like Theano, CNTK, Caffe2, and MXNet. Nowadays, they appear to be dead or dying, as just two frameworks heavily dominate the DL scene: Google TensorFlow (TF), which includes Keras; and PyTorch from Meta aka FaceBook. However, there is no reason to believe such a duopoly will persist forever. All the time, new DL frameworks are propo...
In the previous article, we’ve learned what GStreamer is and its most common use cases. Now, it’s time to start coding in C++. This tutorial does not replace but rather complements the official GStreamer tutorials. Here we focus on using appsrc and appsink for custom video (or audio) processing in the C++ code. In such situations, GStreamer is used mainly for encoding and decoding of var...
You might have heard of something called “GStreamer”. I know what you think. This is some old and boring geek-and-nerd stuff from Linux, right? But what is it? What is the use of GStreamer? If we want computer vision or audio (speech, music) processing, can GStreamer help us? In this article, I’ll try to answer these questions. This article is beginner-level and assumes no or little previous...
At WWDC 2022 Apple introduced RoomPlan API for Swift, which allows obtaining room scans using a camera and LiDAR on iPhone and iPad. This might look similar to the Scene Reconstruction API, which was introduced earlier and also uses LiDAR. It produces a polygonal mesh of the environment, which essentially provides information about the shape of the environment. But what if you want to measure the ...
If you’ve ever come close to anything related to audio or other signal processing, you likely already know about spectrograms. Those fancy-looking and usually colorful plots are commonly used to represent a spectrum’s change over time. But can they provide us with some higher-level information about, let’s say, human speech? What if I told you that one could effectively get a transcript of a...
This blog post covers some important aspects of deploying and running classical computer vision algorithms as well as convolutional neural networks in a web front-end. Please make sure you have read the first part of the blog post. This will definitely help you to follow all technical aspects much easier. How can you pass an image or a video frame from JS to C++ and back? We’ll give a minimal ex...
Are you interested in Computer Vision (CV)? Probably yes, if you are reading this. If you read CV tutorials, you might have noticed that most of them are in Python. This applies to both traditional CV (without neural networks) and, even more, to deep learning (neural networks). Occasionally, CV tutorials use C++ instead of Python, but any other programming languages are very rare. The fact is, Pyt...
Surprising but true: according to market research, customers prefer apples with a maximum diameter of 75 to 80 mm 🍏 Now you know 🙂 People would obviously struggle to accurately evaluate fruits’ size with their naked eyes. In contrast, computer vision (CV) systems can measure the precise diameter of an apple in the blink of an eye, literally. CV systems can collect and process a variety...
As we already explained, MediaPipe is a C++ pipeline library. It is very poorly documented, basically, the only documentation is the comments and docstrings in the MP source code. There are also examples, but they are not very readable. There is only one trivial “hello world” example, the rest is deep learning, which is counterproductive for learning basic MP concepts. Moreover, thes...
It was not easy at all to master MediaPipe. We thought little in C++ could surprise us. MP did. They say Google libraries do not work outside of Google. We can confirm this is the truth. The ways Google uses the C++ language are highly unusual from our point of view. Normally (at least where we come from) people use CMake, a nice cross-platform build system, for C++ projects. Other somewhat common...
In the ML/DL community you can often hear ”Nowadays you must know Google MediaPipe”, “It’s a cool framework”, and sometimes “It’s internally used by YouTube!” Videos with various computer vision tasks like this hand tracking often appear on LinkedIn and forums with the comment “This is MediaPipe”! At this point, we decided we could not ignore it anymore. So we packe...
If you want to try some sound processing in Python (with neural network or otherwise) and don’t know where to start, then this article is for you. This post is for absolute beginners. What do we want? Basically 3 tasks. Read and write audio files in different formats (WAV, MP3, WMA etc.). Play the sound on your computer. Represent the sound as a waveform, and process it: filter, resample, buil...
Enhancing the physical world with virtual content, connecting real life with the digital world, and making that interaction an immersive experience are the reasons for many businesses to turn to extensive usage of augmented reality (AR). In many cases, however, installation of a specific mobile application is required. Would it not be easier and less time-consuming for a user to have AR directly i...
Automatic floor segmentation can serve many interesting purposes including mixed reality (MR) applications, interior design, entertainment, computation of available space in a room, or indoor robot navigation. In this project, we have been solving a problem of scene understanding and, in particular, determining which pixels of the image belong to the floor. The problem of floor segmentation is...
If you want to dig into Computer Vision (CV) but have no idea where to start, this beginner guide is for you. Here we recommend some sources which will come in handy for learning and understanding both the computer vision and deep learning basics. When you search for a position of computer vision engineer, you’re likely to see that companies are looking for a candidate with: digital image ...
Apple events always amaze the entire world and 2020 was not the exception. Apple presented the first mobile devices equipped with LiDAR: iPad Pro 11 and iPhone 12 Pro (and PRO max version). This active sensor measures physical distances to the objects on a spatial two-dimensional grid. Nowadays it is widespread in the automotive area for object detection and collision avoidance. How can developers...
What is your first thought when you hear about computer vision (CV) in fashion? Or, what is the first thing that pops into your head when you hear about deep learning fashion? Let us guess – online clothing shopping or virtual try-on applications? Well, this might be surprising but deep fashion is not a far future anymore. What’s more, fashionably speaking, the usage of deep learning in the fa...
Artificial intelligence (AI) and machine learning (ML) are being progressively used across different sectors including healthcare. One of the AI-powered tools is computer vision (CV), the ability to recognize, interpret, and process visual data. Thus, potential applications of computer vision in the medical field are multifold, from image processing and predictive analysis to automated health reco...
A century ago, the very thought of machines being able to think, make complicated calculations, and come up with effective solutions to pressing problems was more of a figment of science fiction writer’s fantasy rather than a foreseeable reality. Still, as we move into the third decade of the 21st century, we cannot imagine our life without manufacturing robots, marketing and stock trading bots,...
Video is an extremely popular way to represent information. Indeed, sometimes it is enough to watch a short clip instead of long listening or reading about complicated technical concepts. From a user’s point of view, a video is just a sequence of images followed one-by-one with a very short inter-frame interval. Typically it has around 30 frames per second (FPS). However, many things are lef...
Deep learning (DL) and neural networks are extremely widespread in different computer vision (CV) applications. Indeed, many typical problems (like object recognition or semantic segmentation) are effectively solved by convolutional neural networks (CNNs). In this article, we are going to discuss how to utilize CNNs on embedded devices. Neural networks today are ubiquitous. In particular, it is ha...
Computer vision (CV) and machine learning (ML) algorithms solve a tremendous amount of problems. However many businesses often do not understand what hardware to choose for running your favorite neural net or some advanced image and video processing pipelines. With this blog post, we start a series of articles about embedded vision and specific practical things you need to know before making your ...
Fiducial markers are widely used in various applications like robot navigation, logistics, augmented reality. Fig. 1. Applications of fiducial markers Advantages are obvious High contrast Simple code generation Resistance to extremal angles However, when we deal with a large number of markers, real-time recognition becomes challenging, especially on embedded devices with low power CPUs on-board....
In our swarming world, it is quite hard to imagine someone having no mobile phone in the pockets of their jeans, dress, or suit. Even the inveterate skeptic has to accept the fact that smartphones entered our life and have become its inalienable part, the part of us. Mobile phones became our assistants in all aspects of our life, like filming the greatest events of our life, scheduling our time, b...
The task of accurate cell segmentation is essential for cellular biology and single-cell analysis, as well as for studying biological processes as a whole. In biomedical image processing, this includes reconstruction of microscopy images, foreground segmentation, cell detection, cellular compartments and organelles segmentation. Despite the tremendous progress in microscopy cell imaging and numero...
The development of indoor navigation services and algorithms is becoming a popular trend in the IT industry in recent years. Some of the modern buildings, like airports, shopping malls, and warehouses have grown enough (Fig.1) to feel a need for their own navigation tools for customers. Closed environment conditions exclude the usage of common satellite-based navigation systems like GPS or GLONASS...
Augmented Reality (AR) is one of the most popular and challenging fields in computer vision research. It allows supplementing the real world with some kind of digital content, for example, virtual 3D objects. The key feature of Augment Reality in comparison to other image processing tools is that virtual objects are moved and rotated in 3D coordinates instead of 2D image coordinates. The main obje...
You’ve undoubtedly seen it before… It’s widely used to process everything from scanned documents to the handwritten scribbles on your tablet PC and Google Translate. And today you’ll create your first app for text recognition. Optical Character Recognition, or OCR, is the process of electronically extracting text from images and reusing it in a variety of ways such as document editing, fre...
Currently, the number of cars in the world is well over 1 billion. It is no wonder that one of the most common computer vision tasks is the effective control of these vehicles through automatic number plate recognition (ANPR) systems. The applications of automatic vehicle number plate detection and recognition vary depending on the area of use and include, among others, border control, stolen car ...