audio processing

How Text-to-Speech Models Work: Theory and Practice
How Text-to-Speech Models Work: Theory and Practice

Text-to-speech (TTS) has been a popular topic for some time, and its development shows no signs of slowing down. There are a plethora of deep learning models, software programs, and companies offering this service. It’s no surprise, given the broad range of applications, from voice assistants and answering machines to creating audio versions of articles, books, and even automatic voiceovers for videos.

GStreamer C++ Tutorial
GStreamer C++ Tutorial

In the previous article, we’ve learned what GStreamer is and its most common use cases. Now, it’s time to start coding in C++. This tutorial does not replace but rather complements the official GStreamer tutorials. Here we focus on using appsrc and appsink for custom video (or audio) processing in the C++ code. In such situations, GStreamer is used mainly for encoding and decoding of various audio and video formats.

GStreamer for Computer Vision and Audio Processing
GStreamer for Computer Vision and Audio Processing

You might have heard of something called “GStreamer”. I know what you think. This is some old and boring geek-and-nerd stuff from Linux, right? But what is it? What is the use of GStreamer? If we want computer vision or audio (speech, music) processing, can GStreamer help us? In this article, I’ll try to answer these questions. This article is beginner-level and assumes no or little previous experience with GStreamer.

Writings on the Wall: Recognizing Speech on Spectrograms
Writings on the Wall: Recognizing Speech on Spectrograms

If you’ve ever come close to anything related to audio or other signal processing, you likely already know about spectrograms. Those fancy-looking and usually colorful plots are commonly used to represent a spectrum’s change over time.

Audio Processing Basics in Python
Audio Processing Basics in Python

If you want to try some sound processing in Python (with neural network or otherwise) and don’t know where to start, then this article is for you. This post is for absolute beginners.  What do we want? Basically 3 tasks. Read and write audio files in different formats (WAV, MP3, WMA etc.). Play the sound on your computer. Represent the sound as a waveform, and process it: filter, resample, build spectrograms etc.