Friday, February 23, 2024
HomeAIHow to make the Dall-E2's AI output consistent – ​​tips from the...

How to make the Dall-E2’s AI output consistent – ​​tips from the pros!

Table of Contents

  • Foreword
  • Prompt must contain the same keyword
  • Batch issue
  • Character and context matter
  • Choose an AI
  • Prompt refinement
  • Community

Foreword

The Dall-E2 can paint amazing pictures with just words. However, when using the images output by this AI as illustrations for books and comics, it is necessary to make the characters and style consistent.

It is currently not possible to upload a character to the Dall-E2 and instruct the AI ​​to use that character when drawing a new drawing. The following advice will give you at least some degree of consistency in your output images.

Prompt must contain the same keyword

Once you have decided on a style, repeat that style over and over with prompts.

When I did the art direction for the picture book Indecisive Chameleon, I used the keywords “oil painting, children’s art.” With this keyword, we unified the style and character design.

Dall-E2

Batch issue

The Dall-E2 seems to respect prompts made in the same batch. As an example, take a look at the chameleon I generated. Each of the above paintings was painted in the same batch, and the style appears to be the same.

A few days later, the following image was drawn with the same keywords (oil painting, children’s drawing). The painting style is consistent, but slightly different from the last time.

Character and context matter

The Dall-E2 cannot (or will not) generate images in conjunction with certain characters or settings. The model should therefore be used in media that allow some degree of contradiction. I used the Dall-E2 to create a picture book with a chameleon as the main character, but due to the nature of the picture (generated by the DALL-E2), the depiction of the chameleon looks slightly different each time. Precisely because it is a children’s picture book about chameleons (a medium), it seems to be consistent from beginning to end. Here’s a short video of the book in its entirety.

(*Translation Note 1) The picture book “Indecisive Chameleon” produced by Dori Adar, the author of this article, was drawn by him using DALL-E 2 and the story was generated by GPT-3 .

Choose an AI

There are various painting styles depending on the AI ​​model. The Dall-E2 is the most “style agnostic” design. The styles of the output images are so diverse that they cannot be attributed to a single source (for learning data). Models like Midjourney, on the other hand, seem to have a distinctive style of painting. It’s easy to maintain consistency with these models, but the style can quickly become worn out. Check out the awesome MidJourney video clip below to get a feel for the style.

(*Translation Note 2) As a research report on the differences in style between DALL-E 2 and Midjourney, a blog post published by Michael, a digital analyst living in Canada, titled ” Craiyon, DALL-E 2, and Midjourney. ” Compare . The article has images showing that DALL-E 2’s style is wider than Midjouenry’s. The image below is the output for the prompt “Two Astronauts Exploring a Tomb on Mars”. The top is the output image of DALL-E 2, the bottom is that of Midjourney. 

Prompt refinement

The Dall-E2 has been trained on millions of images, much of its training material derived from stock photos.

Therefore, it is safer to display trained images that may have biases. Below is a DallE2 generated image for “A portrait of a kindergarten teacher”. As you can see, only women are depicted (although there could be male kindergarten teachers).

(*Translation Note 3) The inclusion of racial and gender bias in the images generated by the DALL-E 2 is described in the system card on the DALL-E 2 released on April 10, 2022, entitled “ Bias and Display ”. Later, on July 18, 2022, OpenAI announced that it had de-biased the model. In an internal evaluation of the model after debiasing , 12 times more respondents said the output images were more racially and gender-diverse . Therefore, prompting the same model with the “Kindergarten teacher portrait” referred to in this article after August 2022 will likely output a male kindergarten teacher as well.

I have a problem when trying to display “woman with her 3 months old baby in a cafe” on my Dall-E2 and it says “3 month old baby in a cafe” A young family with a 3-month-old baby in a cafe” gave a much more accurate output.

All in all, the Dall-E2 seems to do its job when the prompt makes sense.

Below is an output image example of a monkey eating a banana. No problem.

However, when I typed in “monkey-eating bananas”, things didn’t go as planned as shown below.

If you want Dall-E2 to draw a banana eating a monkey, you need to elaborate. For example, the following image is output by inputting “A giant banana with a big mouth is taking a bite from a monkey”. ).

Entrepreneur Joy Zhang, author of the AINOW translated article, “What I Learned After Spending $15 DALL-E 2 Credits To Create This AI Image, ” said that the DALL-E 2 I typed in the prompt “llama playing basketball” and got a cartoonish output image. He speculates that the result was cartoonish because the training data didn’t include pictures of llamas playing basketball .
Furthermore, when he entered “a realistic photo of llama playing basketball,” the output was a llama that was more photoreal than the previous output. Therefore, adding the phrase “photorealistic image” to the “monkey eating banana” prompt referred to in this article may result in the output of photorealistic bananas and monkeys.

The image above is close to what I had in mind, but it’s still not perfect.

You can get more accurate results by telling DallE2 what style you want to see in the output and making sure the style matches the prompt. For example, “A giant banana with a big mouth is taking a bite from a monkey, a surrealist digital art” is below.

Now that the character and art style are established, we can continue with the Giant Banana Batch with a little more creativity from DallE2. After all, no one likes to be directed to every detail, and this trend seems to apply to AI models as well. “A giant monster banana with a big mouth is the fear of all monkeys, a surrealist digital art” Below are the input results:

Community

The Dall-E2 has received millions of prompts and the community has been very generous. Dall-E’s official Discord has lots of tips and tricks. This dictionary describes how the Dall-E2 reacts to different filters, camera angles and famous creators. These materials will help you envision how the Dall-E2 will react to your prompts without actually typing them.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Recent Posts

Most Popular

Recent Comments