Text Prompt Engineering for Image Generation

The development of modern neural networks has brought about a revolution in the field of image generation. One such example is the text-to-image neural network, DALL-E 2, which can generate beautiful art when supplied with good text descriptions (typically referenced as “prompts”).

Project Description

The quality of images generated by DALL-E 2 heavily depends on the proper structure of inputs. But what if one has a poem and wants to generate a matching art for it without learning all intricacies of writing good prompts? That is exactly what our client was looking for. Our team recognized this challenge and leveraged our experience with GPT-3, a powerful large language model, to create an automatic prompt generator for DALL-E 2. The combined pipeline of GPT-3 and DALL-E 2 allows a user to get wonderful images given only a poem itself, just as if they have a professional prompt engineer to help them.

Solution

One of the key challenges we faced in developing this solution was the lack of a dataset of poem-prompt pairs. To overcome this, we had to use either a zero- or few-shot learning approach. We have tested multiple prompts for GPT-3 and accompanied the best one with examples of good DALL-E 2 prompts. The example prompts were designed to resemble those typically used for literature illustrations and were randomized each time to reduce the likelihood of repetitive results.

Here are some examples of how our solution works:

The solution we developed was delivered to our client as a web application, with deployment on AWS. This powerful tool allows anyone to generate stunning artwork based on a poem without the need for any prior expertise in prompt engineering.

 

WebAR: Augmented Reality in Web

Augmented reality has already proven its positive impact on many businesses. One of the latest trends is so-called WebAR. Indeed, what can be easier than just opening the web page for instant immersive experience?

Project Description

The goal of this project was to develop and optimize the image detection and tracking algorithm for AR applications. The main challenge was to make it work directly in the mobile web front-end with all computations done on the edge.

Technical Aspects

To avoid dependence on internet connection and potential lags, we have built an efficient data processing pipeline for web front-end deployment. Firstly, a custom C++ code was developed for content-based image retrieval and tracking. Secondly, we modified and recompiled the C++ code to the WebAssembly binary code using Emscripten SDK to run it directly in the browser. Finally, the algorithm was fine-tuned and optimized within the Emscripten to ensure proper performance on the web.

WebAR Infrastructure Overview

The image processing engine was just a part of the overall WebAR system. Here is a high-level overview of the created infrastructure:

In principle, the architecture of such systems is similar to a common SDKs for mobile AR. The key difference is in the way how it runs in browser. A huge advantage is that you do not need to install any application. Moreover, the WebAR front-end works in both mobile and desktop browsers.

The demo below demonstrates how our WebAR solution works.

If you would like to read more about It-Jim’s WebAR development, please check the blog post and our paper.

Interested in building your own WebAR system or boost your business with cool content in the browser?  Let’s discuss! Use the form below to get in touch with us.

SDK for Augmented Reality Applications

SDK for Augmented Reality Applications - Computer vision engineering company It-Jim

SDK for Augmented Reality Applications

Project Overview

Our client’s goal was to enhance various printed media (magazines, posters, banners, etc.) with interactive experience using augmented reality. With AR, certain areas on the reading materials can be overlayed with digital information of a different kind: from videos, images, and 3D models to weather information and buttons that bring additional functionality, etc. Imagine, for example, a cooking video popping up when you hover your phone over its recipe in a journal or an instant 3D view on a model’s outfit you liked in the catalog along with the information where to buy it. This AR experience makes printed products more fun, exciting, emotional, and interactive, and ultimately requires robust computer vision algorithms. That is why the client was looking for a partner with solid experience in computer vision to create mobile AR SDK. 

AR SDK: Key Components

The developed AR SDK provides a complete list of tools for the creation of augmented reality experience. This makes it very simple and ready to use. The components built are presented in Fig. 1:

Fig. 1. Key components of the developed AR SDK.

They include:

  • AR tools (Web and desktop versions) 
  • AR engine including object detection, tracking and image retrieval modules 
  • Native part including high-level APIs for iOS and Android

AR tools are used to provide a simple user interface for digital content creation. In particular, the user can upload the target images (magazine pages, banners, restaurant menu, photos, etc.). Built-in image analysis algorithms automatically determine the image quality,  estimate the level of its suitability, and enhance its content for better AR experience. Another important role of AR tools is to provide an easy way to manage the digital layers, i.e. change the layout and geometrical properties of AR models (videos, images, 3D models).  As a result, users can see how exactly the AR content will look like on mobile devices.

AR engine is a core of the system. It contains a set of custom computer vision algorithms and solutions for analysis and recognition of video stream from the mobile camera. The AR engine comprises three major modules:

  • Marker detection and tracking modules provide robust real-time image recognition. The stability of this part is a key to the smooth augmentation of the digital AR layer. 
  • An additional visual search module allows adding AR experience to large image collections.

AR engine is written in C++, which is additionally optimized for real-time performance directly on mobile devices. This means that all algorithms work on the edge without an internet connection.

The data flow within AR SDK is illustrated in Fig. 2.

Fig. 2. AR SDK data flow.


Firstly, the user selects the image targets which will be used as triggers in the mobile AR application. Secondly, the digital layer is uploaded and easily managed using the Web AR tool. All auxiliary data is automatically generated and stored in the cloud infrastructure. Finally, once the mobile application is installed, all necessary data are downloaded from the server and we are ready to enjoy the AR immersion directly on the device.

Here is an example of AR SDK usage on different markers:

If you are looking for more technical details, check our blog post on marker-based augmented reality or the research paper on an advanced planar tracking approach for augmented reality applications.

Value delivered 

Developed AR SDK opened a number of possibilities to apply advanced computer vision algorithms in a seamless manner. Created AR tools can be used without any additional expertise in AR and computer vision making the process of adding digital layers to the product really simple. 

Do you have a plan on how augmented reality could bring new functionality to your app or software? Use the form below to get in touch with us and discuss your idea.

Page Unwrapper

Automatic document analysis and recognition - Computer vision engineering company It-Jim

The task of automatic document analysis and recognition is very common in everyday life. Basically, every time when a user needs to automatically parse and recognize some content from a picture captured with a mobile phone/tablet or a scanned document – for example, text, tables, links, etc., automatic document recognition and text analysis come to the stage.

Continue reading “Page Unwrapper”

Object Recognition in Radar Images

Radar target recognition software - Computer vision engineering company It-Jim

Object recognition is an important computer vision and machine learning problem. A specific case is automatic target recognition (ATR) on radar images. ATR can be effectively used for border security, safety systems to identify either man-made objects (such as buildings, ground and air vehicles) or people, as well as for target surveillance. In other words, with ATR one can obtain any visual information about the ground and objects without direct physical contact. 

In the project, our team has developed a custom classification algorithm based on two different tools.

We have been working with MSTAR dataset. This is a public dataset containing ten classes of vehicles with different orientations with 0.3mx0.3m. Continue reading “Object Recognition in Radar Images”

Indoor Positioning Engine

Indoor positioning systems are becoming popular nowadays. Indeed, there is plenty of opportunities for real-time user navigation in GPS-denied environments.

An interesting use cases are as follows:

Options for using indoor positioning systems

Fig. 1. Indoor navigation use cases

There are several options for hardware (see It-Jim blog post).

We have developed the positioning algorithm based on cheap Bluetooth beacons and built-in IMU sensors on a mobile device. Continue reading “Indoor Positioning Engine”

High-performance Autofocusing for Synthetic Aperture Radars

Synthetic aperture radar systems - Computer vision engineering company It-Jim

Synthetic aperture radar (SAR) systems are very popular instrument for high-resolution image of ground surface. Unlike to optical systems, SAR can be used in all weather and lighting conditions.

A basic idea of SAR technique is coherent processing of received signals on a moving platform (aircraft or satellite). The main challenge is to perform very precise measurements of platform position at each moment of time. Continue reading “High-performance Autofocusing for Synthetic Aperture Radars”

Road Detector

Automatic road detection algorithm - Computer vision engineering company It-Jim

Our task was to develop the algorithm for the automatic road detection in radar images. The challenge was that the radar images are a bit different from the optical ones. In particular, in the case of synthetic aperture radar (SAR), the image formation process is accomplished via coherent processing of the received signals backscattered from the Earth surface. As a result, the multiplicative speckle noise appears in the SAR images.  Continue reading “Road Detector”