Imagen, or the AI creation of images from texts – SEO and search engine news
In the wake of Open AI’s DALL-E 2 or, in another style, Microsoft’s XiaoIce, the text-image couple is currently in the spotlight, served by rather surprising artificial intelligence (AI) algorithms. This is the case of Imagen, a new Google project that creates images from descriptive texts…
Do you know Imagen?
This is a Google R&D project which allows, on the basis of a description taking into account a certain number of notions, to create images representative of this source of information. Here is what is explained on the official website: Imagen is “a text-to-image broadcast model with an unprecedented degree of photorealism and a deep level of language understanding. Imagen leverages the power of large, transformative language models for text comprehension and the strength of diffusion models for high-fidelity image generation.
Our main finding is that large generic language models (e.g. T5), pre-trained on text-only corpora, are surprisingly efficient at encoding text for image synthesis: increasing the size of the language in Imagen improves both sample fidelity and image-text alignment much more than increasing the size of the image diffusion pattern. Imagen achieves a new peak FID score of 7.27 on the COCO dataset, without ever training on COCO, and human evaluators find Imagen samples to be equivalent to the COCO data itself in terms of image-text alignment. To evaluate text-image models in more depth, we present DrawBench, a comprehensive and engaging benchmark for text-image models.
Using DrawBench, we compare Imagen to recent methods including VQ-GAN+CLIP, Latent Diffusion and DALL-E 2 models, and find that human raters prefer Imagen over other models in side-by-side comparisons, both in terms of sample quality than image-text alignment.
A system that is still very basic and not very usable For the moment, the system is quite basic and only allows you to create images that meet certain criteria chosen from a predetermined list. Here are some examples in images, with below the text that made it possible to create them: A majestic oil painting of a raccoon Queen wearing red French royal gown. The painting is hanging on an ornate wall decorated with wallpaper. Source: Imagen A marble statue of a Koala DJ in front of a marble statue of a turntable. The Koala has wearing large marble headphones. Source: Imagen A bucket bag made of blue suede. The bag is decorated with intricate golden paisley patterns.
The handle of the bag is made of rubies and pearls. Source: Imagen A giant cobra snake on a farm. The snake is made out of corn. Source: Imagen Do you see the concept? Well, obviously, these are demo examples that are a bit delusional on purpose, because it’s a safe bet that you will very rarely need this type of image in real life… 🙂 Which is is more interesting is to imagine what it is possible to do afterwards, in terms of illustrations (especially in the field of animation and advertising, for example, but not only) when these algorithms have evolved and can be used in full size. Maybe the SEO will even be able to interfere and try to understand how certain images were created, in order to try to position themselves with the same starting text. A kind of “reverse engineering” in metaverse mode? Who knows what will be the evolution of SEO in the years to come? Tools to follow in any case, for their promises as well as for the possible overflows that they can generate…