Text-to-image generative AI

 

Text-to-image generative AI is a type of artificial intelligence (AI) that can generate images from text descriptions. This technology has the potential to revolutionize many industries, including art, design, and media. Text-to-image generative AI models work by using a neural network to learn the relationship between text and images. The neural network is trained on a massive dataset of text-image pairs. This allows the model to learn the patterns and relationships between words and the corresponding images.
Once the neural network is trained, it can be used to generate new images from text descriptions. The user simply provides a text prompt, and the model will generate an image that matches the description.

In the art world, text-to-image generative AI can be used to create new forms of art and to help artists explore new creative possibilities. In the design industry, it can be used to generate prototypes and to create new products. In the media industry, it can be used to create realistic images for news articles and other forms of content.

As text-to-image generative AI technology rapidly continues to develop, we can expect to see even more innovative applications for this technology sooner than we would expect (or we are ready for).

Text-to-image generators

  • Stable Diffusion made their text-to-image open source and a lot of people jumped on board developing plugins or scripts (myself included). The two best installations with web interface are AUTOMATIC1111 and ComfyUI. My favourite!
  • DreamStudio is the web interface of Stable Diffusion. Free, although you need an account and it's very close to the standard in the area.
  • Midjourney - mature product, quite popular and very good, free for the first 25 queries (need an account), after that - not expensive.
  • DALL-E-2 and DALL-E-3 - you need an account, cheap, and an excellent one. DALL-E can generate realistic images from a wide variety of text prompts.
  • Craiyon generator (former DALL-E mini). Accessible, free and reasonably good. Although the free option is really slow.
  • DeepAI - the use of web interface is free and the quality is good. The use of API is free for a trial period afterwards it's cheap.
  • NightCafe is very popular. You need an account. Up to 5 queries per day are free and after that you will be charged. It is good and very versatile (a lot of options and engines to choose from). Worth visiting.
  • Imagen by Google - Imagen is known for its ability to generate images that are both realistic and creative.

In the press

 

Composing better prompts for text-to-image generation

Publications related to text-to-image generation

Ilya Sutskever: “If you learn all of these, you’ll know 90% of what matters”

Apparently, this is the list Ilya gave to John Carmack more than a year ago [1], which was shared by an OpenAI employee on 8 May 2024 [2].

[1] https://dallasinnovates.com/exclusive-qa-john-carmacks-diffe...

[2] https://twitter.com/keshavchan/status/1787861946173186062
 

PDF library to better your background in machine learning field

 

An alternative to Ilya's collection of papers covering AI/ML

Unlocking the Secrets of AI: A Journey through the Foundational Papers by @vrungta (2023)

1. "Attention is All You Need" (2017) - https://arxiv.org/abs/1706.03762 (Google Brain)

2. "Generative Adversarial Networks" (2014) - https://arxiv.org/abs/1406.2661 (University of Montreal)

3. "Dynamic Routing Between Capsules" (2017) - https://arxiv.org/abs/1710.09829 (Google Brain)

4. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks" (2016) - https://arxiv.org/abs/1511.06434 (University of Montreal)

5. "ImageNet Classification with Deep Convolutional Neural Networks" (2012) - https://papers.nips.cc/paper/4824-imagenet-classification-wi... (University of Toronto)

6. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" (2018) - https://arxiv.org/abs/1810.04805 (Google)

7. "RoBERTa: A Robustly Optimized BERT Pretraining Approach" (2019) - https://arxiv.org/abs/1907.11692 (Facebook AI)

8. "ELMo: Deep contextualized word representations" (2018) - https://arxiv.org/abs/1802.05365 (Allen Institute for Artificial Intelligence)

9. "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" (2019) - https://arxiv.org/abs/1901.02860 (Google AI Language)

10. "XLNet: Generalized Autoregressive Pretraining for Language Understanding" (2019) - https://arxiv.org/abs/1906.08237 (Google AI Language)

11. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer" (2020) - https://arxiv.org/abs/1910.10683 (Google Research)

12. "Language Models are Few-Shot Learners" (2021) - https://arxiv.org/abs/2005.14165 (OpenAI)