You know DALL-E 2 but what’s Google Imagen?
You’ve probably heard of DALL-E 2, the AI image generator, if you’ve been on Twitter in the last few months. The giant Google has released its own version called “Imagen”, which is paired with deep levels of language understanding.
AI systems like this can “unleash the joint creativity of humans and computers,” according to the head of Google’s AI division.
DALL-E VS IMAGEN
DALL-E 2 is for the most part realistic in its output, but we may see some regulations coming down the pike regarding the rights of the creators and the images DALL-E has been trained on.
Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation. Our key discovery is that generic large language models (e.g. T5), pretrained on text-only corpora, are surprisingly effective at encoding text for image synthesis: increasing the size of the language model in Imagen boots both sample fidelity and image-text alignment much more than increasing the size of the image diffusion model.
This comes with a little bit of evidence to back it up! To evaluate and benchmark text-to-image models, Google has created something. It is called Drawbench!
Due to potential abuse, Google “decided not to release code or a public demo” of Imagen. In addition:
Imagen relies on text encoders trained on uncurated web-scale data, and this inherits the social biases and limitations of large language models. As such, there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place.
Have a great week everyone!
Prepare for your next job application with this Cover Letter Generator!