Questions Lidia Zunin

pollinations@newsletter.paragraph.com (pollinations.ai) — Mon, 27 Dec 2021 23:29:55 GMT

How does a generative AI work to create images, when you have written prompts? Does it follow a different "path" than those AI that generate random photos?

Generative art using deep learning models has been gaining a lot of traction recently due to advances in a few areas.

Models that can generate random photorealistic images just by learning from huge datasets of images have become very common. Although at first impressive, (e.g. it is possible to smoothly transform a face of a celebrity into a horse), the creative applications were still limited.

In my opinion, the magic starts to appear when using these generative models in combination with models from other modalities, e.g. models that can understand how texts and images relate to each other.

This was made possible by a model called CLIP, (released about 10 months ago by OpenAI) which is able to judge how well a text corresponds to a given image. This model learned from a huge dataset of millions of images with associated text captions.

Surprisingly quickly a community of hackers, researchers and artists managed to connect the CLIP model to other models that can generate images. To everyone’s surprise, simply connecting these models allows a human to write a short sentence and the machine learning models will draw a creative interpretation of these words.

Imagine a king who is very keen to make any art he desires but he does not have the skills or time to do it himself.

But the king has: A deaf painter (he is virtuous at drawing any kind of image but is not able to understand text) An art critic who has seen and read about many artworks and images but has absolutely no talent at painting. The art critic can give the painter a (multidimensional) thumbs-up or a thumbs-down sign but not communicate any words.

So at first the king will tell the art critic his desired painting. So he would say to the art critic: “Draw me a painting of Glowing cacti and peyote flowers in the Arizona desert dream night. Painted by Shaun Tan, Ivan Bilibin and Studio Ghibli”. This is called the prompt.

The painter will start by drawing a random image. The art critic looks at the image and gives the painter a thumbs-up sign if he managed to get close to the prompt of the king or a thumbs-down sign if he is far away.

The painter modifies the image and they repeat this process over many steps until the painting starts to resemble the prompt more and more. Once the art critic is happy they are done and the king can enjoy his unique painting.

“Glowing cacti and peyote flowers in the Arizona desert dream night. Painted by Shaun Tan, Ivan Bilibin and Studio Ghibli”

For more detail I recommend this article: https://ml.berkeley.edu/blog/posts/clip-art/

Where do the images come from, in the case of pollinations.ai? Do you have a specific database or does the AI use the responses of a query on Google, for instance?

The way these models work currently can be split into two phases: training and inference.

For training, researchers scrape large datasets of images, texts or other media from the internet which are then repeatedly fed into a large neural network (a simplified mathematical abstraction of a biological brain). In this phase the neural network learns to represent all of this data internally, which means it does not need access to the internet or the original data to recreate it. Since it cannot simply memorize all the images, the process forces the network to compress this information and find more abstract ways of storing it internally.

Once a model is trained, it can be downloaded and used to generate new content or combined with other models as in the text to image use-case.

Interestingly researchers are coming to the conclusion that it could be an advantage to allow the models to access external data but this research is stilk pretty young. More details here: https://ai.stanford.edu/blog/retrieval-based-NLP/

While there is already books written in partnership with engines like GTP-3, do you think image generative AIs could be used for creative work as well?

If I were to judge based on my twitter feed, I would say it is already common practice, although still slightly niche. In my opinion, It is only a matter of time until it becomes mainstream. Even my colleague’s dentist has started using pollinations.ai to generate art.

How sophisticated is AI nowadays? Do you think we will ever reach a general artificial intelligence or an AI that "surpasses" human intelligence?

My opinion regarding this is probably a little more radical than other people’s. I believe now through the advances with language modeling and multimodal learning (learning to connect different types of media, e.g. video, audio and text) we are on a clear and unstoppable path towards AGI.

The objective of GPT-3 is quite simple: given the last n words, predict the next word in the sentence. By doing this repeatedly, GPT-3 learns to write and continue text very well. In order to write convincing text and be indistinguishable from a human it needs to imitate how a human would write as closely as possible. For this it needs to develop a form of self-awareness or consciousness. Just to meet the training objective of predicting the next word in a sentence.

In the words of the neuroscientist and philosopher David Chalmers: “What fascinates me about GPT-3 is that it suggests a potential mindless path to artificial general intelligence (or AGI). GPT-3’s training is mindless. It is just analyzing statistics of language. But to do this really well, some capacities of general intelligence are needed, and GPT-3 develops glimmers of them. It has many limitations and its work is full of glitches and mistakes. But the point is not so much GPT-3 but where it is going. Given the progress from GPT-2 to GPT-3, who knows what we can expect from GPT-4 and beyond?” https://dailynous.com/2020/07/30/philosophers-gpt-3/

Last but not least, how do you want me to present yourself in the article? You can send me a minibio if it's easier!

I just copied from linkedin:

After studying Computer Science & Artificial Intelligence at Edinburgh University, I spent 9 years in São Paulo creating an art collective, while always keeping one foot in the technology sector. I conceived and implemented a variety of interactive installations, combining my passion for art, research, and technology.

After settling back in Germany, I completed a project which involved researching the future trends of Artificial Intelligence for the World Government summit. Since then I spent a few years researching and working with data-driven generative audio modeling.

At the moment, I am building the platform Pollinations.AI that has the aim of making generative machine learning more accessible.