OpenAI's most recent picture-making AI DALL-E is incredible

OpenAI's picture-making neural network DALL-E was revealed in 2021. The program's human-like ability of combining different concepts in new ways was remarkable when it was first announced. It produced images that were surreal and cartoonish but showed that the AI had learnt key lessons about how the world works together. DALL-E's avocado chairs had all the key features of both avocados AND chairs; dog-walking daikons wore their tutus around the waists and held their leashes in their hands.

DALL-E 2, the successor to DALL-E, produces better images and is simpler to use. According to Oren Etzioni (CEO, Allen Institute for Artificial Intelligence) the leap from DALL-E to DALL-E 2 is similar to the leap from GPT-2 to GPT-3. DALL-E 2 could even exceed current definitions for artificial intelligence and will force us to look at the concept and determine its true meaning.

In just a few short years, image-generation models such as DALL-E has come a long ways. AI2 presented a neural network in 2020 that could create images from prompts like "Three people play videogames on a couch." Although the results were blurry and distorted, they were still easily recognizable. Baidu, a Chinese tech giant, improved the image quality of DALL-E last year with ERNIE-ViLG.

DALL-E 2 extends this approach. It can create amazing images: it can generate images of astronauts riding on horses, teddy bear scientists, or sea-otters in the Vermeer style. OpenAI's cofounder and chief scientist, Ilya Sutskever says that "one way to think about the neural network is transcendent beautify as a service." "Every now, it generates something just to make me gasp."

DALL-E 2, which has a better performance, is the result of a complete redesign. Originally, it was an extension to GPT-3 which can be described as a supercharged autocomplete. It starts with a few sentences or words and then it predicts the next several hundred words. DALL-E performed in a similar manner, but it switched words for pixels. It completes a text prompt by anticipating the next string of pixels that would come next - thus producing an image.

DALL-E 2 works in two steps. It first uses OpenAI's language model CLIP to translate the prompt text into an intermediate representation. This captures key characteristics of an image that should match the prompt. DALL-E 2 uses a type of neural net known as a diffusion network to create images. Images that have been altered with random pixels are used as training material for diffusion models. These images are then converted back to their original form. The diffusion model then takes the random pixels and converts them into an image from scratch that matches the text prompt.

DALL-E 2 still slips up. It can have problems with prompts that ask it to combine objects with multiple attributes. OpenAI believes this is because CLIP doesn't always correctly connect objects and attributes. DALL-E 2 is capable of creating variations on existing images, in addition to riffing on text prompts. Each new image can be used to start their own series of variations in a feedback loop that could prove to be very useful for designers.

DALL-E 2 appears much more polished than the previous version. OpenAI plans to release DALL-E 2 after a limited rollout to trusted users. This is similar to what happened with GPT-3.

GPT-3 can create toxic text. OpenAI claims it used the feedback from GPT-3 users to create InstructGPT, a safer version. OpenAI hopes to continue a similar track with DALL-E 2. It will also incorporate user feedback. OpenAI will encourage its first users to break the AI by tricking it into creating offensive or dangerous images. OpenAI will make DALL-E 2 more accessible as it solves these issues.

OpenAI has also released a policy for DALLE. It prohibits the AI from creating offensive images. There is no violence, pornography, and no political images. Users will not be allowed ask DALL-E for images of actual people in order to prevent deep fakes.

OpenAI has also removed certain images from DALL-E 2’s training data. This includes those that show graphic violence. OpenAI claims that it will also pay human moderators for reviewing every image created on its platform.

Multiskilled AIs can see the world and work with concepts across multiple modes of communication, such as language and vision. This could be seen as a step towards general-purpose intelligence and DALL-E 2 shows impressive results. However, deep learning is still governed by humans, who create these tasks and give it its marching orders.

Mark Riedl, an AI researcher from Georgia Tech in Atlanta believes creativity is a way to measure intelligence. Riedl's Lovelace2.0 test is different from the Turing test. It measures a machine's intelligence based on how it responds to requests for creation. For example, "A penguin walking a robot dog beside Santa Claus on Mars." On this test DALL-E scores well.

Still, AI systems are still very far from understanding if understanding is defined as human understanding. Like all AI systems, DALL-E 2, works on information and produces images that satisfy human expectations. But DALL-E 2 and other AIs encourage us to consider what intelligence and understanding really mean.

What is Prompt Engineering?

What is GPT-3?

Sources: