The development of modern neural networks has brought about a revolution in the field of image generation. One such example is the text-to-image neural network, DALL-E 2, which can generate beautiful art when supplied with good text descriptions (typically referenced as “prompts”).
Project Description
The quality of images generated by DALL-E 2 heavily depends on the proper structure of inputs. But what if one has a poem and wants to generate a matching art for it without learning all intricacies of writing good prompts? That is exactly what our client was looking for. Our team recognized this challenge and leveraged our experience with GPT-3, a powerful large language model, to create an automatic prompt generator for DALL-E 2. The combined pipeline of GPT-3 and DALL-E 2 allows a user to get wonderful images given only a poem itself, just as if they have a professional prompt engineer to help them.
Solution
One of the key challenges we faced in developing this solution was the lack of a dataset of poem-prompt pairs. To overcome this, we had to use either a zero- or few-shot learning approach. We have tested multiple prompts for GPT-3 and accompanied the best one with examples of good DALL-E 2 prompts. The example prompts were designed to resemble those typically used for literature illustrations and were randomized each time to reduce the likelihood of repetitive results.
Here are some examples of how our solution works:
The solution we developed was delivered to our client as a web application, with deployment on AWS. This powerful tool allows anyone to generate stunning artwork based on a poem without the need for any prior expertise in prompt engineering.