Friday, February 23, 2024
HomeTechnologyWhat I Learned After Spending $15 DALL-E 2 Credits To Create This...

What I Learned After Spending $15 DALL-E 2 Credits To Create This AI Image!

 DALL-E 2

Table of contents

  • Yes, the image above is of a llama dunking a basketball. A summary of the process, limitations and lessons learned while experimenting with the closed beta of DALL-E 2
  • Starting point
  • Prompt engineering is the art of realizing exactly what the user wants
  • As you may have noticed, the composition produced by DALL-E 2 is not good
  • DALL-E 2 struggles to generate realistic faces
  • Other DALL-E 2 limitations
    • Interpret angles and shots loosely
    • DALL-E 2 can’t spell words
    • DALL-E 2 can be capricious with complex or poorly worded prompts
  • DALL-E 2’s ability to transfer styles is impressive
    • “Abstract style…”
    • “Vaporwave”
    • “Digital Art”
    • “Screenshots from the Miyazaki animated film”
  • Final impression

Yes, the image above is of a llama dunking a basketball. A summary of the process, limitations and lessons learned while experimenting with the closed beta of DALL-E 2

Ever since I first saw this artificial image of a Shiba Inu Bento , I’ve been dying to try the DALL-E 2.

Wow, this is disruptive technology.

For those who don’t know, DALL-E 2 is a system created by OpenAI that can generate original images from text.

It’s currently in closed beta, and I was put on the waitlist in early May and got access at the end of July. During the beta period, users will receive credits (50 credits free for the first month, 15 credits per month thereafter), consume 1 credit per use, and get 3-4 images per use . You can also purchase 115 credits for US$15.

(*Translation Note 1) OpenAI released the DALL-E 2 beta to the public in an official blog post published on July 20, 2022 .
(*Translation Note 2) The DALL-E mini released by Hugging Face, which operates the AI ​​community, is now renamed “craiyon” to avoid confusion with the original DALL-E developed by OpenAI .

I’m sure you’ve seen a selection of generated images online of what the DALL-E 2 can do (with the right and creative prompts). In this article, I’ll give you a candid behind-the-scenes look at what it takes to create an image for your next subject from scratch. The theme is “Lama playing basketball”. If you are thinking of trying DALL-E 2 or want to understand the functions of DALL-E 2, please refer to it.

Starting point

Knowing what prompts to give the DALL-E 2 can be explained in terms of both art and science. For example, the results for ” llama playing basketball ” are:

Why is the DALL-E 2 leaning toward generating cartoon images for this prompt? I suspect this has something to do with the fact that the model didn’t see actual images of llamas playing basketball during its training.

Taking a step further, adding the keyword ” realistic photo of “, the results are as follows.

The llamas look more realistic, but the whole image is starting to look like a Photoshop failure. In this case, it’s clear that the DALL-E 2 seems to need a boost to create a cohesive scene.

Prompt engineering is the art of realizing exactly what the user wants

Prompt engineering in the context of DALL-E refers to the process of designing prompts to achieve desired results.

The DALL-E 2 Prompt Book is a great resource for that. A detailed list of inspirations for prompts with photos and works of art as keywords is provided.

I wonder why we need something like this. It’s hard to get usable output from the DALL-E 2 (for business, etc.), especially if you don’t know what the DALL-E 2 can do. That’s why some startups have even created marketplaces that trade a single prompt for $1.99 to save users the time and money of coming up with their own prompts .

(*Translation Note 3) According to an article published by TechCrunch on July 30, 2022, in June 2022, the operation of PromptBase , a marketplace where users can trade prompt texts entered into DALL-E 2 for $1.99 per input. has begun. 20% of the transaction value goes to PromptBase, which launched the service. However, amidst the abundance of free materials on Prompt Engineering, there are also movements to criticize the service.
In addition, all input sentences traded on PromptBase have been scrutinized, and there is no concern that images contrary to common sense will be output.

One of my personal favorite discoveries from experimenting with Prompt Engineering is ” dramatic backlighting .”

The key to prompt engineering is telling the DALL-E 2 exactly what you want it to output . Apparently it’s not clear from the context given by the prompt input (as you can see in the image above) whether the llama I’m asking should be dressed. However, by specifying ” llama wearing a jersey ,” the model successfully realizes the following fantastic scene.

The above results do not stop there. Special phrases like “dunking a basketball” or “action shot of…” are needed to make this llama really fly to add drama to the image. become. My favorite of these phrases is ” …llama in a jersey dunking a basketball like Michael Jordan .”

As you may have noticed, the composition produced by DALL-E 2 is not good

From the context of the input statement “dunk a basketball”, you would think it would be obvious where the relative positions of the llama, ball, and goal should be. More often than not, however, Lama dunks in the wrong direction, or the ball is placed in a position that makes it unlikely that he will score a shot. Even though the prompt input statement clearly states all the elements (that should be generated), DALL-E 2 doesn’t really “understand” the positional relationship of each element. This article delves deeper into this topic .

(*Translation Note 4) According to an article published on August 4, 2022 by AI specialized media “Unite.AI”, a research team at Harvard University published a paper on July 29, 2022 titled “In text-guided image generation ” Testing Relationship Understanding ” discusses deficiencies in the positional relationships between objects in images produced by the DALL-E2. According to the paper, 87% of the 169 human raters correctly rendered images based on realistic input sentences
such as “A child touching a bowl.” Only 11% rated the less realistic input sentence “A monkey touching an iguana” as correct, compared to 11% . The paper proposes the implementation of CLIPORT , an AI model jointly researched by the University of Washington and NVIDIA, as a method to improve the ability to understand the positional relationships of DALL-E 2 . The model was developed for robot control, and in addition to image recognition capabilities, spatial understanding capabilities are also implemented .

Another flaw caused by the DALL-E 2 not ‘understanding’ the scene is the occasional mix of textures. In the image below, the net is made of fur (a human would know this scene is morbid with a little thought).

DALL-E 2 prompts “Expressive photo of a jersey llama dunking a basketball like Michael Jordan, low angle, extreme wide shot, indoors, dramatic backlight, high detail (Expressive photo of a Image generated by the author by typing llama wearing a jersey dunking a basketball like Michael Jordan, low angle, extreme wide shot, indoors, dramatic backlighting, high detail.

DALL-E 2 struggles to generate realistic faces

Some sources suggst that the struggle to generate realistic faces may have been a deliberate measure to prevent deepfakes. You would think that this measure would only apply to humans, but apparently it also applies to llamas.

Some of the realistic llama face generation failures were downright creepy.

(*Translation Note 5) An article published on July 14, 2022 by IEEE Spectrum, a media run by IEEE , discusses the limitations of DALL-E 2 from multiple angles. In this article, it is reported that the model is not good at drawing multiple people. For example, an image with a single female astronaut is generated just fine, but an image with seven engineers has distorted faces.
Regarding the fact that DALL-E 2 has a restriction that does not generate photorealistic human faces, the above-mentioned OpenAI official blog post states as follows.

Other DALL-E 2 limitations

Below are some other minor issues I’ve encountered.

Interpret angles and shots loosely

No matter how many times you enter phrases like ” in the distance ” and ” extreme long shot ,” it’s hard to find an image that fits the entire llama in the frame.

In some cases, framing was completely ignored.

DALL-E 2 prompts: “Dramatic film still of a jersey-clad llama dunking a basketball, low angle, shot from below, slanted frame, 35°, Dutch angle, extreme long shot, indoors, dramatic backlight, Enter (Dramatic film still of a llama wearing a jersey dunking a basketball, low angle, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, indoors, dramatic backlighting, high detail.) in high detail. image generated by the author

DALL-E 2 can’t spell words

Given that the DALL-E 2 struggles to “understand” the positional relationships between elements in an image, the inability to spell words correctly doesn’t seem too surprising. 6). However, in the right context, it can generate fully formed characters.

In the article “7 Limitations and Risks” detailing the DALL-E 2, it is stated that it is not very good at character generation . Regarding this limitation, see also ” Weaknesses in image generation ” in the AINOW translation article ” What is DALL-E 2? Unraveling from architecture to risk “.

DALL-E 2 can be capricious with complex or poorly worded prompts

Also, depending on the addition of keywords and phrasing, you may get completely different results than you expected.

For example, in the case below, the real subject of the prompt (a llama in a jersey) was completely ignored.

It’s certainly an impressive dunk shot. DALL-E 2 prompts “low angle, long shot, indoors, dramatic backlighting, professional photo of a llama wearing a jersey and dunking a basketball. Image generated by the author by typing llama wearing a jersey, dunking a basketball.

In some cases , even the addition of the word “fluffy” dramatically degraded performance, making the DALL-E 2 seem broken .

DALL-E 2 prompts “Film still of a fluffy llama in a jersey dunking a basketball like Michael Jordan, high detail, indoors, with dramatic backlight. Image generated by the author by typing “like Michael Jordan, high detail, indoors, dramatic backlighting.” (Image deliberately processed to blur and hide the face)

When working with the DALL-E 2, it’s important to be specific about what you’re looking for without overstuffing or adding redundant language .

DALL-E 2’s ability to transfer styles is impressive

Please try DALL-E 2’s style transition.

Once you have decided on a subject that will serve as a keyword, you can generate images in a surprising number of art styles.

“Abstract style…”

“Vaporwave”

“Digital Art”

DALL-E 2 prompts, “Lama in jersey dunking a basketball like Michael Jordan, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, dramatic backlight, Epic, digital art (llama in a jersey dunking a basketball like Michael Jordan, shot from below, tilted frame, 35°, Dutch angle, extreme long shot, high detail, dramatic backlighting, epic, digital art) Image generated by the author

“Screenshots from the Miyazaki animated film”

DALL-E 2 prompts: “Llama in a jersey dunking a basketball like Michael Jordan, screenshots from the Miyazaki anime movie.” Image generated by the author by typing . Thanks for the tips in this article.

(*Translation Note 7) The article ” What DALL-E 2 can and cannot do ” posted on the opinion media “LessWrong” on May 2, 2022 states that the model is good at generating images related to various pop culture . pointed out. For example, it is possible to generate images related to Marvel heroes and Disney princesses as shown below. The “screenshots from Miyazaki’s animated films” in this article can also be said to be image examples related to pop culture.

Image generated by prompting “art nouveau stained glass window depicting Marvel’s Captain America”

Image generated by prompting “Elsa from Frozen, cross-stitched sampler”

Final impression

After investing more than 100 credits (equivalent to 13 US dollars), the image below was completed after trial and error.

The image isn’t perfect, but the DALL-E 2 fulfilled about 80% of my expectations.

I put most of the credit into getting the style, face and composition right.

OpenAI’s announcement about DALL-E has the following description.

A statement regarding the commercialization rights for the DALL-E 2 generated image can be found in the official OpenAI blog post announcing the public beta release above.

Many users are expected to be at the mercy of this rule.

For content creators, DALL-E 2 will be most useful for creating simple illustrations, photos and graphics for blogs and websites. My plan is to use it instead of Unsplash to create unique blog cover images.

For those of you who want to try the DALL-E 2 yourself, here are some things to know before you start .

  • To get what you want, you need to be prepared to make trial and error. 15 free credits may sound like a lot, but it really isn’t. Assume you use at least 15 credits to generate a usable image . The DALL-E 2 is by no means cheap .
  • Don’t forget to save your favorite images.
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Recent Posts

Most Popular

Recent Comments