Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
DALL·E now available in beta (openai.com)
924 points by todsacerdoti on July 20, 2022 | hide | past | favorite | 551 comments


I was supposed to be making a video game, but got a bit sidetracked when DALL·E came out and made this website on the side: http://dailywrong.com/ (yes I should get SSL).

It's like The Onion, but all the articles are made with GPT-3 and DALL·E. I start with an interesting DALL·E image, then describe it to GPT-3 and ask it for an Onion-like article on the topic. The results are surprisingly good.


Thanks, finally a legit news publication :)

This was really funny :)

http://dailywrong.com/man-finally-comfortable-just-holding-a...


Somehow these articles are more readable than typical AI-generated search engine fodder... Is it because I'm entering the site with an expectation of nonsense?


Probably because, by the creator's own admission, the articles are heavily cherry-picked to make sure the output is decent, which is probably a lot more human effort than goes into the aforementioned search engine fodder.

http://dailywrong.com/sample-page/


I would guess that most Spam farms are not using openAI davinci model which is really really good, but expensive. Just a guess.


That's... so weirdly ironic I can't even! Blogspam websites are made by real humans with little oversight, while a literal AI with oversight generates better results.

That said, with a little tweaking, these technologies can - and probably already are - being used to churn out blogspam websites left and right, fully automatic.


So the other men in the pictures are the uncomfortable ones?


Yes, I actually LOLed at that one!


Feels like the headlines could be generated similar to the style of "They Fight Crime!"

"He's a hate-fuelled neurotic farmboy searching for his wife's true killer. She's a tortured insomniac snake charmer from a family of eight older brothers. They fight crime!"

https://theyfightcrime.org/

Here's an implementation in Perl.

http://paulm.com/toys/fight_crime.pl.txt


lol that site is great

>He's an unconventional gay paranormal investigator moving from town to town, helping folk in trouble. She's a violent motormouth wrestler from the wrong side of the tracks. They fight crime!

>He's a Nobel prize-winning sweet-toothed rock star who believes he can never love again. She's a strong-willed communist widow with a knack for trouble. They fight crime!

>He's an obese white trash barbarian with a secret. She's a virginal thirtysomething traffic cop with the power to bend men's minds. They fight crime!


I definitely want to see the DALLE illustrations for this!


The results with things that are artworks or more general concepts are fascinating, but there is for sure something creepy with "photorealistic" human eyes and faces going on...

If you want to see some really creepy AI generated human "photo" faces, take a look at Bots of New York:

https://www.facebook.com/botsofnewyork


Unfortunately the content of that project is a hostage of Facebook now - similar to ransomware gangsters they force you to do something to get the data, in this case you need to create an account and take part in that global surveillance network. I do not understand why people do that.


Hyperbole will get you nowhere good (Just ask RMS).

How about wording your comment in a way that highlights why it’s a shame these pictures aren’t accessible for those without a Facebook account, and skip the whole “you’re murdering puppies” bit?


Being a creepy sex pest will get you nowhere good (Just ask RMS).

The hyperbole is good marketing for a certain audience.


They intentionally prevented it from being able to general realistic faces to reduce the potential for deep fakes


From the server IP looks like you're on some managed WordPress hosting that only offers free SSL on the more 'premium' packages.

Easiest way for free SSL would be to just throw the domain on CloudFlare :)



We joke about it, but an early and very cheap robotic floor cleaner I had was one of those weasel balls constrained in a flat ring harness with a dusting cloth underneath. It was entertaining and not completely useless.

Put a guinea pig in there and you'd get the same effect.


Actually got a chuckle out of the duck one (http://dailywrong.com/man-finally-comfortable-just-holding-a...). Thanks! I hope your keep generating them. Kind of wish there weren't a newsletter nag, but on the other hand it adds to the realism. Could be worthwhile to generate the text of the nag with gpt too; call it a kind of lampshading.


The part where you have to confirm you are not a robot to subscribe to the mailing list is the best part of this, my new favorite website.


Spam advertising is about to reach whole new levels of weird.


Haha, I was in a very similar boat when I built https://novelgens.com -- I was also supposed to be making a video game, but got a bit sidetracked with VQGAN+CLIP and other text/image generation models.

Now I'm using that content in the video game. I wonder if you could use these articles as some fake news in your game, too. :)


This is clever. Does GPT-3 come up with the title of the article, too? That's the funniest part.


At first I came up with them myself, but found that it often comes up with better ones, so I ask it for variations.

I think I got it to even fill the title given a picture, something like “Article picture caption: Man holding an apple. Article title: ...”. Might experiment more with that in the future.


How do you prompt GPT-3 to come up with the titles? That’s an interesting problem.


Well, then I'm impressed with GPT-3's ability to generate those titles!

The combination of photo/title feels like they come from the more absurd articles published by theonion.

If we aren't living in a simulation, it's just a matter of time...


http://dailywrong.com/wp-content/uploads/2022/07/DALL%C2%B7E...

Hot dang. Some Reddit subs can be auto-generated now.


It's already a thing:

https://www.reddit.com/r/SubSimulatorGPT2/

but yea, it will have generated images now.


Hmmm, these seem less deranged than headings from the previous Markov-chain bots—and kinda less interesting because of that.

I guess Markov chains for the headings, Dall-E for images, maybe GPTx for comments. And/or the GPT models should be made wackier somehow—less coherent, perhaps.


I had the same thought when I saw this one http://dailywrong.com/wp-content/uploads/2022/07/DALL%C2%B7E...


This is amazing! Honestly one of the first uses of GPT3/DALL E that has held my attention for longer than a few seconds.



Parenting > "Gillette Releases a New Razor for Babies"


I loved how it just consistently decided that if babies have facial hair, it's always white fluff.


I think it's because it's using images of babies with soap on their face to learn. Still funny though!



Very funny! The "Scientists Warn New Faster Toothbrush May Cause Insanity"-story is not fake though, I've experienced it ;)


This a fucking fantastic site, it’s absolutely hilarious, and I’ve bookmarked it - I kinda unironically want to set it as my home page - but just a heads up that the CSS is broken for me on my iPhone SE2.

The images don’t scale properly with the rest of the site, they’re massive compared to the content.


These results are pretty amazing. Are these cherry picked / curated / edited at all?


Yes they are heavily cherry picked. The web site itself has a disclaimer about it.


This one seems like it could actually be real in Japan: http://dailywrong.com/anime-pillow-gym-opens-in-tokyo/ ;)


It would be interesting to see if there was a market for a monthly newsprint version similar to the old https://weeklyworldnews.com/

Can DALL-E render Bat Boy?


I’m curious, if they’re only making DALL-E accessible now, and if GPT-3 was never really accessible (as far as I know). How do you have access to these things to generate text and images?


There's no waitlist for GPT-3 now. DALL-E is an unrelated product; you don't need access to both.


DALL-E was accessible by invite/waiting queue. GPT-3 is available for pretty long.


You should let readers rate the articles. This way readers new to the site can read the best ones first and get a good impression.


Love it! Better than other news I get to read these days. Some of it rings..like the bluebird suing the cat.

Thank you! Bookmarked!


These are actually quite funny. A bit of a surreal touch, but that makes them even more fun.


Your images are coming over SSL - so won't show up on many browsers (E.g. Firefox)


I've had that idea since GPT 3 but never got any access...


Try Auto-Install Free SSL plugin, it easy for me


This is great, I love it.

Why do the images load so slow though?


This is fantastic, the fake news the world needs.


NGL this shit is pretty cursed and I like it.


This is the most wonderful thing ever.


You get + two million points from me for not having HTTPS.


How do you generate the original image? And what about the subsequent images, do they come automatically from the text? I'd love to know more about the process.


I have been having a blast with DALL-E, spending about an hour a day trying out wild combinations and cracking my friends up. I cannot imagine getting bored of it; it's like getting bored with visual stimulus, or art in general.

In fact, I've been glad to have a 50/day limit, because it helps me contain my hyperfocus instincts.

The information about new pricing is, to me as someone just enjoying making crazy imagines, a huge drag. It means that to do the same 50/day I'd be spending $300/month.

OpenAI: introduce a $20/month non-commercial plan for 50/day, and I'll be at the front of the line.


I think people don't realize how huge these models really are.

When they're free, it's pretty cool. But charge an amount where there's actual profit in the product? Suddenly seems very expensive and not economically viable for a lot of use cases.

We are still in the "you need a supercomputer" phase of these models for now. Something like DALLE mini is much more accessible but the results aren't good enough. Early early days.


> I think people don't realize how huge these models really are.

They really aren't that large by the contemporary scaling race standards. DALLE-2 has 3.5B parameters, which should fit on an old GPU like Nvidia RTX2080, especially if you optimize your model for inference [1][2] which is commonly done by ML engineers to minimize costs. With optimized model, your memory footprint is ~1 byte per parameter, and some less than 1 ratio (commonly ~0.2) of all parameters to store intermediate activations.

You should be able to run it on Apple M1/M2 with 16GB RAM via CoreML pretty fine, if an order of magnitude slower than on an A100.

Training isn't unreasonably costly as well: you can train a model given O(100k)$ which is less than a yearly salary of a mid-tier developer in silicon valley.

There is no reason these models shouldn't be trained cooperatively and run locally on our own machines. If someone is interested in cooperating with me on such a project, my email is in the profile.

1. https://arxiv.org/abs/2206.01861

2. https://pytorch.org/blog/introduction-to-quantization-on-pyt...


It's true that image models are much less of a burden on GPU VRAM than a model like BLOOM where fitting it into a few A100s is ideal, but these diffusion models are a PITA for a ordinary hobbyist in terms of total compute: the CLIP pass over the text input is almost free, but then you feed it into the diffusion model, for one sample you'll be doing 10-100 forward passes (depending on how fancy the diffusion methods are - maybe even 1000 passes if you're using older/simpler ones), and for interactive use, you really want more like 6-9 separate samples; then they have to pass through the upscalers, which are diffusion models themselves and need to do a bunch of forward passes to denoise it. If you do 1 sample in 10s on your 1 consumer GPU, which would be pretty good, 6-9 means a joykilling minute+ wait. And then the user will pick a variation or edit one or tweak the prompt, and start all over again! It's like being back on 16kb dialup waiting for the newest .com to load.


All valid points, of course. As an independent explorer I adapted my workflow to use night's worth of workstation compute to generate a crop of new images from a simple templated prompt language. It also helps a lot to have at least two presets - "exploratory" and "hq", to minimize iteration time and maximize quality of promising prompts.

Still, I think optimization of diffusion models for efficient inference isn't yet pushed to the limits. At least if we look at what's available to the public - AFAIK public inference software distributions didn't even quantize their weights.


> 6-9 means a joykilling minute+ wait

Excet we’re already waiting 90+ seconds for dall-e mini.


min-dalle with MEGA model params takes about 20 seconds to generate 9 images on a RTX3080 if you run it locally, including the param loading time (which is about 8 of those seconds)


How much VRAM will minDALL-E plus the MEGA model take?


DALL-E 2 tends to be 15 - 30 seconds, from my experience.


I've been playing with neural text to speech APIs recently and wondered the same. Why are all the cloud providers charging around $16 per million characters? Do they really need to run on massive machines or are they just scalping/trying to recover training costs?


What are the resources at work here?

What are the resources needed to train this model?

If someone just gave you the model for free, what resources would you need to use it to generate new results?


In the unCLIP/DALL-E 2 paper[0], they train the encoder/decoder with 650M/250M images respectively. The decoder alone has 3.5B parameters, and the combined priors with the encoder/decoder are the in the neighborhood of ~6B parameters. This is large, but small compared to the name-brand "large language models" (GPT3 et. al.)

This means the parameters of the trained model fit in something like 7GB (decoder only, half-precision floats) to 24GB (full model, full-precision). To actually run the model, you will need to store those parameters, as well as the activations for each parameter on each image you are running, in (video) memory. To run the full model on device at inference time (rather than r/w to host between each stage of the model) you would probably want an enterprise cloud/data-center GPU like an NVIDIA A100, especially if running batches of more than one image.

The training set size is ~97TB of imagery. I don't think they've shared exactly how long the model trained for, but the original CLIP dataset announcement used some benchmark GPU training tasks that were 16 GPU-days each. If I were to WAG the training time for their commercial DALL-E 2 model, it'd probably be a couple of weeks of training distributed across a couple hundred GPUs. For better insight into what it takes to train (the different stages/components of) a comparable model, you can look through an open-source effort to replicate DALL-E 2.[2]

[0] https://cdn.openai.com/papers/dall-e-2.pdf [1] https://openai.com/blog/clip/ [2] https://github.com/lucidrains/dalle2-pytorch


> This means the parameters of the trained model fit in something like 7GB (decoder only, half-precision floats) to 24GB (full model, full-precision)

> you would probably want an enterprise cloud/data-center GPU like an NVIDIA A100, especially if running batches of more than one image.

That doesn't seem so bad.

looks up price of NVIDIA A100 - $20,000

oh...ok I'll probably just pay for the service then


I know you're half joking here but there are more consumer-affordable versions like the Geforce RTX 3090ti ($1600 for 24GB). It may not do CUDA work as fast as the A100 but it'll be able to run the model.

For the half-precision version at 7GB there are a ton more options (the RTX 3060 has 12GB for example at ~$450).


p4d.24xlarge is only $33/hr! And you get 400 Gbe so it should be quick to load.


Thanks for the really excellent insight and links.

I do hope that the conversation starts to acknowledge the difference between sunk costs and running costs.

Employees, office leases and equiment are all happening, regardless and ongoing.

Training DALL-E 2: very expensive, but done now. A sunk cost where every dollar coming in makes the whole endeavor more profitable.

Operating the trained model: still expensive, but you can chart out exactly how expensive by factoring in hardware and electricity.

I believe that by not explicitly separating these different columns when discussing expense vs profit, we're making it harder than it needs to be to reason about what it actually costs every time someone clicks Generate.


Facebook released over 100 pages of notes a few months ago detailing their training process for a model that is similar in size. Does anyone have a link? I can't seem to find it in my notes, googling links to posts that have been removed or are behind the facebook walled garden.

But I seem to remember they were running 1,000+ 32gb GPUs for 3 months to train it and keeping that infrastructure running day-to-day and tweaking parameters as training continued was the bulk of the 100 pages. It is beyond the reach of anybody but a really big company, at least in the area of very large models, and the large models are where all the recent results are. I wish I was more bullish on algorithm improvements meaning you can get better results on less hardware; there will definitely be some algorithm improvements, but I think we might really need more powerful hardware too. Or pooled resources. Something. These models are huge.


> Facebook released over 100 pages of notes a few months ago detailing their training process for a model that is similar in size. Does anyone have a link?

Is https://github.com/facebookresearch/metaseq/blob/main/projec... what you're referring to?


Yes! Thank you! Very good read for anyone interested in the field.


If I had to guess, based on other large models, it’s in the range of hundreds of GBs. It might even be in the TB range. To host that model for fast production SaaS inference requires many GPUs. An A100 has 80GB, so a dozen A100s just to keep it in memory, and more if that doesn’t meet the request demand.

Training requires even more GPUs, and I wouldn’t be surprised if they used more than 100 and trained over 3 months.


> Training requires even more GPUs, and I wouldn’t be surprised if they used more than 100 and trained over 3 months.

Based on this blog post where they scale to 7,500 'nodes', they say:

> A large machine learning job spans many nodes and runs most efficiently when it has access to all of the hardware resources on each node.

So I wouldn't be surprised if they do have a total of 7500+ GPUs to balance workloads between. TO add, OpenAI has a long history of getting unlimited access to Google's clusters of GPUs (nowadays they pay for it, though). When they were training 'OpenAI Five' to play Dota 2 at the highest level, they were using 256 P100 GPUs on GCP[0] and they casually threw 256 GPUs at 'clip' for a short while in January of 2021[1].

As for how they do it, see these posts:

https://openai.com/blog/techniques-for-training-large-neural...

https://openai.com/blog/triton/

0: https://openai.com/blog/openai-five/

1: https://openai.com/blog/clip/


Training is obviously very expensive, and ideally they'd want to recoup that investment. But I'm curious as to what the marginal cost is to run the model after it's trained. Is it close to 30 images per dollar, like what they're charging now? Or do training costs make up the majority of that price?


How hard would it be to spin off a variant of this with more focused data models that cater to specific styles or art-types? Like say, a data model only for drawing animals. Or one only for creating new logos?


Generative networks are worth exploring for randomly creating things in a given category, see this recent HN post about food pictures: https://news.ycombinator.com/item?id=32167704


My heart sank when I saw the pricing model.

I’ve been creating generative art since 2016 and I’ve been anxiously waiting for my invite. I wont be able to afford to generate the volume of images it takes to get good ones at this price point.

I can afford $20/mo for something like this but I just can’t swing $200 to $300 it realistically takes to get interesting art out of these CLIP-centric models.

Heck, the initial 50 images isn’t even enough to get the hang of how the model behaves.


MidJourney is a good alternative. Maybe not quite as good as DALL-E, but close enough, without a waitlist and with hobby-friendly prices ($10/month for 200 images/month, or $30 for unlimited)


If you’re technically inclined, I urge you to explore some newer Colabs being shared in this space. They offer vastly more configurable tools, work great for free on Google Colab, are straightforward to run on a local machine.

Meanwhile we should prepare ourselves for a future where the best generative models cost a lot more as these companies slice and dice the (huge) burgeoning market here.


Can you share a few, please? I am already out of credits on Midjourney and I don't even feel like I got the hang of it until I was almost out.


Here are a couple I've used recently:

Majestic diffusion - https://github.com/multimodalart/majesty-diffusion

Centipede diffusion - https://colab.research.google.com/github/Zalring/Centipede_D...


These need to be Kaggle'd already, fuck

Colab is dogshit if you don't pay


Do you have a decent GPU and a Windows box? If so install VOC: https://softology.pro/tutorials/tensorflow/tensorflow.htm


FYI, each text prompt submission uses 1 credit and then renders out 4 images.


You can also try out wombo Dream - their newest version is similar to mid journey, and is unlimited


They have a form for discounted plans if you can't afford it, might be worth a try too


I'm sure the prices will go down each year as the computing costs go down.


Yeah I've been having fun with it recreating bad Heavy Metal album art (https://twitter.com/P_Galbraith/status/1548597455138463744). It's good, but surprisingly difficult to direct it when you have a composition in mind. A few of these I burned through 20-30 prompts to get and I can't see myself forking up hundreds of dollars to roll the dice.

My brother is a digital artist and while excited at first he found it to be not all that useful. Mainly because it falls apart with complex prompts, especially when you have a few people or objects in a scene, or specific details you need represented, or a specific composition. You can do a lot with in-painting but it requires burning a lot of credits.


I'm already bored of it. When you have everything, you have nothing.


I'm sure the novelty wears off. But I'm already coming up with several applications for it.

On the personal side, I've been getting into game development, but the biggest roadblock is creating concept art. I'm an artist but it takes a huge amount of time to get the ideas on paper. Using DALLE will be a massive benefit and will let me expedite that process.

It's important to note that this is not replacing my entire creative process. But it solves the issue I have, where I'm lying in bed imagining a scene in my mind, but don't have the time or energy to sketch it out myself.


>I'm an artist but it takes a huge amount of time to get the ideas on paper.

this is what I really like about DALLE-mini, it's ability to create pretty good basic outlines for a scene. it's low resolution enough that there's room for your own creativity while giving you a good template to spring off from. things like poses, composition of multiple people, etc.


I've used AI to try out different composition/layout possibilities. Sometimes it comes up with an arrangement of objects I hadn't considered. Sometimes it uses colors in really interesting ways. Great jumping-off point for drafting.


I'm in exactly the same boat. I got tired of waiting around for openai to take me off their waitlist and used DALLE-mini (now craiyon) to generate large batches of concept art for a project I was working on. I picked the ones that, despite being low-res blobs, conveyed the right mood or had an interesting composition of elements. I then layered my favorite elements of those and painted over, adding details wherever I wanted, and came out with something much better than I would've been able to make alone.


I've been having a blast using it in my dungeons and dragons games. If you type in, say, "dnd village battlemap" it's really pretty usable. Not to mention the wild magic weapons and monsters it can come up with.


I did notice it is very good at making small pixel art icons/sprites.


Do you have any tips or some prompt examples?


I’ve been using generative models as an art form in and of themselves since the mid/late 2010s. I like generating mundane things that bump right up along the edge of the uncanny valley and finding categories of images that challenge the model (e.g. for CLIP, phrases that have a clear meaning but are infrequently annotated).

Generating itself can be art. I’m not going to win a Pulitzer here, it’s for the personal joy of it, but I will certainly never get tired of it.


I don't know how to say this without sounding like a jerk, even if I bend over backwards to preface that this isn't my intent: this statement says more about your creativity and curiosity than a ceiling on how entertaining DALL-E can be to someone who could keep multiple instances busy, like grandma playing nine bingo cards at once.

Knowing that it will only get better - animation cannot be far behind - makes me feel genuinely excited to be alive.


Dall-e has novelty, but no intent, meaning, originality. Yes the author can be creative at generating prompts, but visually I haven’t seen it generate anything that feels artistically interesting. If you want pre-existing concepts in novel combinations then yes it works.

It’s good at “in the style of” but there’s no “in a new style”.

It has a house style too that tends to feel Reddit-like.


Isn't every "new style" just a novel combination of pre-existing concepts? Nothing new under the sun and all that.

Either way, I feel like your view is an exhaustingly pessimistic take on AI-generated art. I mean, sure, most of what DALL-E generates is pretty mundane, but other times I have been surprised at how bizarre and unique certain images are.

You seem to imply that because an AI is not human, its art is not imbued with meaning or originality -- but I find that an AI's non-human nature is precisely what _makes_ the art so original and meaningful.


> Isn't every "new style" just a novel combination of pre-existing concepts?

At the extreme limit, maybe. But within art or even digital art, then new styles are actually not that rare, humans are pretty good at generating them. Maybe they grab inspiration from nature, visual phenomena, etc, so in that sense it's not "new" but it is "new to the medium". In art you new styles all the time. DALL-E will never do that by it's very nature, and so it's easy to see how it's boring.

And that's just the stylistic level, but it's happening at almost all levels. It's almost definitional that it doesn't innovate, only remix.

It's strange framing this as pessimistic, it's not really optimistic nor pessimistic, it just is. It's also not AI, and that's important to realize: it's a statistical model that generates purely based on pre-existing training. It's very nature is without-meaning and without-originality. That doesn't detract from it being cool or interesting or helpful or enjoyable. I find it cool and useful.

But it's not innovative or creative or meaningful by itself.


> It's very nature is without-meaning and without-originality.

That's a pretty bold claim. What are humans but statistical models that generate based on pre-existing training -- and yet, are humans not with-meaning and with-originality?

Of course, human brains are some large order of magnitude more complex than the neural nets that underlie most AIs, but we can already see areas where these "simplistic" AIs outperform humans on specific tasks. So what prevents the arts from being one of those areas? If not now, in some not-too-distant future?


Tempting to leave it here but I’ll give it a go.

One thing humans seem to have that is beyond statistics is creativity. In that stats explain what is, and creativity takes what is and makes a dot outside that other people appreciate. No model has demonstrated even attempting the dot, let alone having a good chance of success. What DALL-E does is draw a dot between a few existing points, but never outside.

Humans incidentally have three more things that make for interestingness: emotions derived from feelings, long term memory, and roughly storytelling (~ an ability to turn long term memory into long form recall with a specific reaction intended to a specific audience). I don’t think ML has any of those, but it likely (eventually) gets the latter two.

Meaningness/interestingness require at least a few of those, and it’s what puts most art in a different category than games or math.


Dall-E is like a new camera. You as a user need to give angle and context and meaning. to the prompt and the output.


I would say it helps to first think what you want to get out of it.

If your task is "show me something that breaks through our hyperspeed media", then I guess some obscure museum is a better place than an ML model.

If your task is "find the best variation on theme X" or "quick draft visualization", they are often very useful. I am sure there will be many further tasks to which current and future models will be well suited. They are not magic picture machines. At least not yet.


> I haven’t seen it generate anything that feels artistically interesting.

I'd disagree.

- One of the first queries I did made some interesting chairs, I would genuinely buy the first if it was sanely priced: https://www.ryanmercer.com/ryansthoughts/2022/6/17/dall-e-2-...

- One of the first H R Giger inspired works I did made some really interesting computers, I would (and may) hang some of these https://www.ryanmercer.com/ryansthoughts/2022/6/19/dall-e-2-...

- The first wood carving query I did generated solid gold of Vikings eating pizza, this is the kind of thing I'd see in a hole in the wall restaurant and absolutely love https://www.ryanmercer.com/ryansthoughts/2022/6/17/dall-e-2-...

- The first "painting" in this series I may very well print and hang in my office https://www.ryanmercer.com/ryansthoughts/2022/6/27/dall-e-2-...

- These H.R. Giger chairs 100% look like something I'd expect to see in a modern art museum https://www.ryanmercer.com/ryansthoughts/2022/6/24/dall-e-2-...

- If you want to create new characters for something, and are lacking inspiration, I think it could be extremely useful to artists. For example these variations of Don't Hug Me I'm Scared https://www.ryanmercer.com/ryansthoughts/2022/6/21/dall-e-2-...

I've got thousands of queries, and a LOT of them have generated things I genuinely see as having artistic value, I've probably got 200~ images that I would 100% hang/display in my home ('woodcarving' and 'stone carving' queries rarely disappointed me)

is it some unique form of art, no, but can it produce works I want to see in a medium or style that already exists to a level that it is believable as authentic human art, absolutely.

People like me, with zero artistic ability, are able to take part in creating visually pleasing works. I imagine artists would also find great value in it, being able to feed a few queries in with what they are thinking of creating to draw inspiration, or even putting their own work in and generating variations that may lead to inspiration for new works.


Same. I generated several thousand images and found it a chore, outside of the daily theme on the discord server, to try and even think of anything to query. It was also discouraging when sometimes you'd hit pure gold for 4-5 of the 6 images, then you'd be lucky to get 1 out of the 6 that was worth saving for several more queries. Now it's down to 4 images and... yeah...

I'm not going to try and profit from the images, I don't need them for any business uses or anything, so to me it was a fun for a while and now just something I'll largely put out of mind.


I was actually forcing myself to go through the whole 50/day because I knew it wouldn't be free forever, and I wanted to get better at it. I'm glad I did, but I wish I did more.


MidJourney gives ~unlimited generation for $30/month, and is nearly as good. Unlike DALL-E it doesn't deliberately nerf face generation. I've been having a blast.



> Curbing misuse: To minimize the risk of DALL·E being misused to create deceptive content, we reject image uploads containing realistic faces and attempts to create the likeness of public figures, including celebrities and prominent political figures. We also used advanced techniques to prevent photorealistic generations of real individuals’ faces

directly from the website


Both seem equally bad with face generation, but Midjourney works really well with famous people (e.g. Trump).


Check out Artbreeder, it is likewise a ton of fun!

Multimodal.art (https://multimodal.art/) is working on a free version of something like DALLE, though it's not that good as of yet.


Sounds kind of like scribblenauts. I would try the craziest things to see what it could come up with.


>In fact, I've been glad to have a 50/day limit, because it helps me contain my hyperfocus instincts.

This belongs on /r/linkedinlunatics


> trying out wild combinations and cracking my friends up

Wait until the next edition comes out where it automatically learns the sorts of things that crack you up and starts generating them without any input from you.


Ai generated TikTok could be almost like wire jacking humans.


How many years away is this? 5? 10? I seriously doubt it'll be longer than that, considering the recent advances of autoregressive models, and the overall trajectory of ML the last decade.

It'll use hideous amounts of compute.


Imagine being able to subtly influence opinions too.

If you get a large percentage hooked on TikTok you can change and undermine democracy.

Starting to believe representative democracy and social media are incompatible.


Since many people will start generating their first images soon, be sure to check out this amazing DALL-E prompt engineering book [0]. It will help you get the most out of DALL-E.

[0]: https://dallery.gallery/wp-content/uploads/2022/07/The-DALL%... (PDF)


AMAZING, Thank you.

I hope that every science teacher that can - provide this to every student. This is the future they live in now. They should know these as well as they know how to install an app on a device.

Wait until we have a DALL-E -- Enabled Custom EMOJI stream - whereby, every text you send out has it corresponding DALL-E resultant image for every txt --

Then we can compare images from different people at different times but the prompt was identical... and see what the resultant library of emoji<-->PROMPT looks like?

What about using Dall-e as a watermark for 'nft' signature 'notary' of an email.

If DALL-E provided a unique PID# for every image - and that PID was a key that only the OP runner of the image has - it can be used to authenticate an image to a text source... ??? (Assuming that no two prompts have the same result ever, but assigning a unique id that CAN be used to replay the image to verify it was generated when an original email/SMS was actually sent - it could be a unique way to timestamp authenticity/provenance of a thing...


Thanks for this! A bit of prompt engineering know-how will help me get the most bang for the buck out of this beta. I also just want to say that dallery.gallery is delightfully clever naming.


This is absolutely amazing. Thanks!


Thank you, this is great!


thanks! love the link highlight


nice write up, thanks


Surprised by the lack of comments on the ethics of DALL-E being trained on artists content whereas copilot threads are chock full of devs up in arms over models trained on open source code. Isn’t it the same thing?


I recently talked with a concept artist about DALL-E and first thing they mentioned was "you know that's all stolen art, right?" Immediately made me think of GitHub Copilot.

However the artists being featured in DALL-E's newsletters can't stop gushing about 'the new instrument they are learning how to play' and other such metaphors that are meant to launder what's going on.

My theory is that the professions most at-risk for automation are acting on their anxieties. Must not be a lot of freelance artists on HN, and a whole lot of programmers.

I think the artists have an even clearer case. I don't think GitHub Copilot is ready to steal anyone's job yet. But DALL-E is poised to replace all formerly commissioned filler art for magazines, marketing sites, and blogs. Now the only point to hiring a human is to say you hired a human. Our filler art is farm-to-table.


Having used copilot for over a year now, it isn't there to replace programmers. It isn't called GitHub Pilot, and it doesn't do well with generating original ideas. Sure, if your job is to create sign up forms in HTML then sure, it'll do your job in a second, but if you're creating more complex systems, copilot is just there to help save you time when writing code (which is just implementing ideas).

Think of it like a set of powertools saving you time over manual tools.


Agreed. But I also understand the anxieties. I'd say a very significant % of programmers are not creating complex systems; they're coding up mostly CRUD UIs that have a great deal in common. It's getting to that inflection point where less programmers might be needed [??] ... Let's wait and see.


I first read the artist's reply as "you know all art is stolen, right" which made more sense to me. If you look at the history of art, you'd also know that it's true.


> My theory is that the professions most at-risk for automation are acting on their anxieties

That's not my problem with Copilot. I think tools and methods that can free human from some amount of work are good in a correctly organized society. They have been existing for a long time, too. They let us free time for other stuff that can't be automated. This extra free time could theoretically let us have more leisure or rest time too. I also trust myself to be able to learn another job if mine can ever be automated.

But I don't want my work to be reused under terms I don't approve of. There are some things I don't want to help with my work and this is reflected in the licenses I choose. I totally sympathize with artists who don't want their work to be reused in ways they don't like. I don't find this hard to understand. I also don't find it hard to understand that if an artist do some work that you should pay for to use is not happy with their work being reused without being paid. They should get paid a tiny bit for each generated art if theirs is in the training set, and only if they approve this use. That's would be only fair, the set would not be possible without those artists.

(Good for me, my personal code is not on GitHub for other, older, reasons)


This entire concept of AI learning using copyrighted works is going to be really tested in courts at some point, perhaps very soon, if not already.

However if the result is adequately different, I don't see how it is different from someone viewing other's work and then being "inspired" and creating something new. If you think about it the vast majority of things are built on top of existing ideas.


Quite true. Best case, we're seeing DJ Spooky style culture jamming/remixing. But more likely it is as you write.

On the other hand, the market for stock photography was already decimated by the internet. Where previously skilled photographers would create libraries of images to exemplify various terms and sell these as stock, in the last decade or so, an art director with the aid of a search engine could rapidly produce similar results.


Where did they get the art from anyway? Do they have a list of sources anywhere?


Of course. Because the majority of the tech bros on this site are self centered and think of arts as a lowly field deserving of no respect. While something slightly resembling some boilerplate lego code they wrote is a criminal act to learn from.


If you really want to learn, visit github.com. There are over 200 million freely available, open source code repositories for you to study and learn from.


Surely being suprised by the lack of comments on the ethics of DALL-E on HN is the same as the lack of comments on the ethics of co-pilot on some artists forum. I highly doubt you're going to find r/artists or whatever up in arms about co-pilot, even if they are about DALL-E.


Why is this any worse than an art student learning to paint by looking at other painter's work?


You are surprised that human beings react more strongly to things that more directly threaten them?


It is bad. But:

As long as DALL-E isn't caught painting out a 1-to-1, reverse searchable copy of an image, its not really as bad as copilot, IMO.

The issue isnt just that copilot is trained on my GPL code, its that it might decide to copy paste lines from it, including my comments, etc.


"DALL-E being trained on artists content"

Well, I can't go ask Caravaggio or Gentileschi to paint my query since they've been dead hundreds of years. But being able to to feed a query containing much more modern concepts in and get a baroque style painting in that specific style is wonderful.

Plus what has already been said about a lot of art being an imitation/derivation of previous works.


It's because the furor over AI replicating human artists already played out over earlier AI iterations. Remember when thisfursonadoesnotexist.com was flamed for stealing furry art? Turns out that many artists shared an extremely generic style that the AI could easily replicate.


It feels like it would be good for this to not be a legal grey area. Whether it's considered a large copyright infringement conspiracy or a form of fair use, it would be good if the law reached a position on that sooner rather than later.


Is it trained on unlicensed work gathered at random at the web?

(I really don't know, and I didn't find anything about it on their site.)


In a way, it's no different than an artist walking through an art gallery then going home, inspired, to paint a dark portrait a la Rembrandt


What is ok if they do something different, and not ok if they just repaint the same thing, depending on the source. And this software surely repaints a lot of the same things, so the only question is the source.


In such analogy, Copilot is no different than looking at someone code a bubble sort and then later doing one yourself. Some people on this forum had issue with that.


There are a few of those discussions going on in artist's circles these days. I imagine they'll get sued for doing this, but it'll probably take a very famous artist or a hell of a class action suit to make it happen.


Another question related to ethics

> Preventing harmful images: We’ve made our content filters more accurate so that they are more effective at blocking images that violate our content policy — which does not allow users to generate violent, adult, or political content

What is defined as political content? Can I prompt DALL-E to draw ”Fat Putin”?


No. Just tried.


good artists borrow, great artists steal


Something I haven’t seen anyone talking about with these huge models: how do future models get trained when more content online is model generated to start with? Presumably you don’t wanna train a model on autogenerated images or text, but you can’t necessarily know which is which.


This precise thing is causing a funny problem in specialty areas. People are using e.g. Google Lens to identify plants, birds and insects, which sometimes returns wrong answers e.g. say it sees a picture of a Summer Tanager and calls it a Cardinal. If the people then post "Saw this Cardinal" and the model picks up that picture/post and incorporates it into its training set, it's just reinforcing the wrong identification..


That's not really a new problem, though. At one point someone got some bad training data about an old Incan town, the misidentification spread, and nowadays we train new human models to call it Macchu Picchu.


The difference between the name of an old Incan town and a modern time plant identification mistake is that maybe the plant is poisonous.

Made with gpt3



Imagine when there is an AI that is monitoring content creation and keeping tabs of original sources....


Then that's a cardinal now.


It's a cybernetic feedback system. Dalle is used to create new images, the images that people find most interesting and noteworthy get shared online, and reincorporated into the training data, but now filtered through human desire.


I wonder if human artists can demand that their work not be used for modelling. So as the robots are stuck using older styles for their creations, the humans will keep creating new styles of art.


They’ll ignore it either way.




In this situation, the low-background steel is the MS-COCO dataset, associated with the Fréchet inception distance computed by comparing the statistical divergence between the high-level vector outputs of passing MS-COCO images through Google’s InceptionV3 classifier, and passing DALL-E images (or its competitors) through it.

For now at least, there is a detectable difference in variety.


One interesting comment about this is that some models actually benefit from being fed their own output. Alphafold for instance was fed with its own 'high likelihood' outputs (as demis hassabis described in his lex friedman interview).


Training on auto generated images collected off the Internet is gonna be fine for a while since the images surfacing will be curated (ie. selected as good/interesting/valuable) still mostly by humans.


My discussion of this issue (which actually comes up in like every DALL-E 2 discussion on HN): https://www.lesswrong.com/posts/uKp6tBFStnsvrot5t/what-dall-...


The images i have created all have a watermark.. This is at least one way to filter out most images, by the same AI at least.


It’s trivial to remove the watermark and other tools put no watermark on.


This should be a step in cleaning your data to begin with. If you don't know the providence of your data then you shouldn't be even training with it.

Getting humans to refine your data is the best solution right now and many companies and researches go with this approach.


> Getting humans to refine your data is the best solution right now

Source ?

All those big models are trained with data for which the source is not known or vetted. The amount of data needed is not human-refinable.

For example for language models we train mostly on subsets of CommonCrawl + other things. CommonCrawl data is “cleaned” by filtering out known bad sources and with some heuristics such as ratio of text to other content, length of sentences etc.

The final result is a not too dirty but not clean huge pile of data that comes from millions of sources that no human as vetted and that no one in the team using the data knows about.

The same applies to large images dataset, e.g. Laon 400m that also comes from CommonCrawl and is not curated.


You can't use humans to manually refine a dataset on the scale of GPT-3 or DALL-E

Clip was trained on 400,000,000 images, GPT is roughly 180B tokens, at 1-2 tokens per word, that's 120,000,000,000 words.


At least cleaning it up is an embarrassingly parallel problem, so if you had the resources to throw incentives at millions of casual gamers, you might make a nice dent on Clip.


Alternatively, making a captcha where half the data is unlabeled, and half is labeled, forcing users to categorize data for you as they log into accounts.


But how would you know? A random string of text or an image with the watermark removed is going to be very hard to distinguish generated from human written.


s/ide/ena


I think with the terms requiring explicitly telling which images/parts were generated, they could be filtered out and prevent a feedback loop of "generated in/generated out" images. I'm sure there will be some illegal/against terms of use cases there but the majority should represent fair use.


I fully expect stock image sites to be swamped by DALL-E generated images that match popular terms (e.g. "business person shaking hands"). Generate the image for $0.15. Sell it for $1.00.


DALLE images are still only 1024 px wide. Which has its uses, but I don’t think the stock photo industry is in real danger until someone figures out a better AI superresolution system that can produce larger and more detailed images.


You can obtain any size by using the source image with the masking feature. Take the original and shift it then mask out part of the scene and re-run. Sort of like a patchwork quilt, it will build variations of the masked areas with each generation.

Once the API is released, this will be easier to do in a programmatic fashion.

Note: Depending on how many times you do this... I could see there being a continuity problem with the extremes of the image (eg: the far left has no knowledge of the far right). An alternative could be to scale the image down and mask the borders then later scale it back up to the desired resolution.

This scale and mask strategy also works well for images where part of the scene has been clipped that you want to include (EG: Part of a character's body outside the original image dimensions). Scale the image down, then mask the border region, and provide that to the generation step.


Another commenter mentioned Topaz AI upscaling, and Pixelmator has the "ML Super Resolution" feature; both work remarkably well IMO. There are a number of drop-in and system default resolution enhancement processes that work in a pinch, but the quality is lacking compared to the commercial solutions. There are still some areas where DALL-E 2 is lacking in realism, but anyone handy with photo editing tools could amend those shortcomings fairly quickly.

On-demand stock photo generation probably is the next step, particularly when combined with other free media services (Unsplash immediately comes to mind). Simply choose a "look" or base image, add contextual details, and out pops a 1 of 1 stock photo at a fraction of the cost of standard licensing. It'll be very exciting seeing what new products/services will make use of the DALL-E API, how and where they integrate with other APIs, use cases, value adds like upscaling and formatting, etc.


I've been using this app to upscale the images to 4000x4000, and it works amazingly well (there is also a version for Android):

https://apps.apple.com/us/app/waifu2x/id1286485858

I paid extra to get the higher quality model using the in-app purchase option. It crushes the phone's battery life, but runs in only ~10 seconds on an iPhone 13 Pro for a single 1000x1000 input image.


I mean, waifu2x and similar waifuxx libraries are free and open-source, there's really no reason to pay for it if you're working on a desktop.


Things have moved on a considerable amount since waifu2x

Try https://github.com/n00mkrad/cupscale


I've recently updated waifu2x and I've seen it now supports lots of algorithms for different use cases and contexts and it also supports other tasks like frame interpolation. So could you briefly explain in what is cupscale better than it?


Cupscale supports multiple models tuned for different types of content (including Anime like waifu2x): https://upscale.wiki/wiki/Model_Database


As I said Waifu2x too supports many different models aimed at different content, so what's the big improvement with Cupscale?


Well in that case I might be wrong.

Considering waifu2x is the name of an algorithm I assumed it was just that algorithm. There's also no mention of other models on the demo page or the Github page as far as I can see.


My bad! Yes Waifu2x is just a single algorithm, you are right.

The confusion originates from the fact that I was using a GUI project for Waifu2x called "Waifu2x Extension GUI" (https://github.com/AaronFeng753/Waifu2x-Extension-GUI) which other than Waifu2x also supports other algorithms like Real-ESRGAN, Real-CUGAN, SRMD, RealSR, Anime4K, RIFE, IFRNet, CAIN, DAIN, and ACNet.

So as you said Cupscale is surely more advanced than Waifu2x (the single algorithm), but do you think it's also better than Waifu2x Extension GUI?


Thanks, that’s actually the “better” model that I referenced. You can buy it with an in-app purchase using the waifu2x app.


Buy what?


I think he refers to the premium version of Waifu2x.


Yes, but I usually find myself playing with this stuff when I have some free time and relaxing outside or on the couch, and it’s nice to be able to do it all on the phone.


what's a desktop? /s


DallE2 + Topaz Gigapixel AI works amazingly well.



Photoshop also has super resolution


Makes me imagine stock image sites in the near future. Where your search term ("man looks angrily at a desktop computer") gets a generated image in addition to the usual list of stock photos.

Maybe it would be cheaper. I imagine it would one day. And maybe it would have a more liberal usage license.

At any rate, I look forward to this. And I look forward to the inevitable debates over which is better: AI generation or photographer.


They won't. DALL-E images are mostly not as high quality. The high quality stuff which everyone has been sharing is result of lots of cherry picking.


In my experience it doesn’t require that much cherry picking if you use a carefully crafted prompt. For example: “ A professional photography of a software developer talking to a plastic duck on his desk, bright smooth lighting, f2.2, bokeh, Leica, corporate stock picture, highly detailed”

And this is the first picture I got: https://labs.openai.com/s/lSWOnxbHBYQAtli9CYlZGqcZ

It got it a bit strong on the depth of field and I don’t like the angle but I could iterate a few times and get a good one.


Additionally, wherever it classically falls over (such as currently for realistic human faces), there will be second pass models that both detect and replace all the faces with realistic ones. People are already using models that alter eyes to be life-like with excellent results (many of the dalle-2 ones appear somewhat dead atm).


Even this image is just an illusion of a perfect photo, which is a blur for most part, see the face of duck. I had access since past 4 5 days and it fails badly whenever I tried to create any unusual scene.

For the first few days when it was announced I use to look deep even in real photos in search of generative artifacts. They are not so difficult to spot now, most of the times anyway.


NB: when you share links like that, nobody who doesn't have access can see the results


sure they can, just tried in incognito


I didn't even need incognito.


Even the high quality stuff still can't do human faces right.


This one surprised me when it came out, felt more ‘human’ than lots of stock photos: https://labs.openai.com/s/AsRKFiOKJmmZrVDxIGa75sSA


They avoided using real human faces in the training data.


If the price is low enough, you can have humans rank generated images (maybe using Mechanical Turk or a similar service), and from that ranking choose only the highest quality DALL-E generated images.


If someone can make money doing it they might.

Heck: If the cost to entry is prohibitively low they might do it at a loss and take over the site


It's a lot better than you are claiming. Mind if I ask if you have access personally?


Yes I have. And I realized it as soon as I started experimenting that mind blowing results are mostly cherry picking.

It's very good at generating art style images. These kind of images are mostly amazing most of the times. But the Photorealistic images only work with cherry picking.


> And I realized it as soon as I started experimenting that mind blowing results are mostly cherry picking

Me and you must have very different definitions of "cherry picking". For prompts that fall within it's scope (i.e not something unusually complex or obscure) I get usable results probably 90% of the time.

Can you give me some examples of prompts that you tried where you found good results difficult to obtain?


I get bad results on unusual prompts, you are right about that.

It did generate good dslr like face closeups, as good as Nvidia does, most of the times but not always. Sometimes there are weird artifacts and face does not make sense.

Dslr style blurry photos are mostly good. From the looks of images I follow, imagen is probably more believable. Don't know how much cherry picking goes on there. See this thread [1] for example. I failed to generate image like this (honey dress) in dalle2.

[1]: https://www.reddit.com/r/ImagenAI/comments/w3saku/creating_i...


Give it a few years. I'd be exiting if I owned a stock site


Eh, I’d bet the arbitrage window is pretty brief, and that prices will fall closer to $0.15 pretty quickly.


They'll likely immediately go out of business, because I can just pay OpenAI 15 cents directly for the exact same product.


So what's the loss? It's not like stock photos are the highest art form. Surely, for some people it means they need to change their business model, but all those just needing pictures to illustrate something the process will be much smoother.


"buy fo' a dollar, sell fa' two" - Prop. Joe


King stays the king!


DALL-E 2 isn't good enough for such photorealistic pictures with humans as of yet however.


https://twitter.com/TobiasCornille/status/154972906039745331...

Unless I'm missing something, these seem pretty darn good


Woof, that bias "solution" that that thread is actually about though...!


There has been trouble with generating life-like eyes but a second pass with a model tuned around making realistic faces has been very successful at fixing that.


remember DALL-E is not allowed to generate faces


No longer true.


One of the commercial use cases this post mentions is authors who want to add illustrations to children's stories.

I wonder if there is a way for DALL-E to generate a character, then persist that character over subsequent runs. Otherwise, it would be pretty difficult to generate illustrations that depict a coherent story.

Example ...

Image 1 prompt: A character named Boop, a green alien with three arms, climbs out of its spaceship.

Image 2 prompt: Boop meets a group of three children and shakes hands with each one.


You can't do that. I can't see this working well for children's book illustrations unless the story was specifically tailored in a way that makes continuity of style and characters irrelevant.


As an aside, Ursula Vernon did pretty well under the constraint you described. She set a comic in a dreamscape and used AI to generate most of the background imagery: https://twitter.com/UrsulaV/status/1467652391059214337

It's not the "specify the character positions in text" proposed, but still a neat take on using this sort of AI for art.


Nice example and very well done. But yeah, very niche application unfortunately.


What AI app did she use, do you know?


I would expect continuuity to be a relatively simple feature to retrain for and implement.


You can cheat this to a limited extent using inpainting.


You mean just generate a single large image with all the stuff you want for the whole story, and then use cropping and inpainting to get only the piece you want for each page?


You cannot. But a workaround would be to say something like “generate an alien in three different poses— running, walking, waving”

Then use inpainting to only preserve that pose and generate new content around it. It’s definitely not perfect.


You can do better than this. Draw/generate your character.

Then put that at the side of a transparent image, and use as the prompt, "Two identical aliens side by side. One is jumping"


I think you can feed it a link to images for inspiration. Wondering if you could just pass the first image to retain 'Boop'.


Wait until someone trains a model like this, for porn.

There seems to be a post-DALLE obscenity detector on openAI's tool, as so far I've found it to be entirely robust against deliberate typos designed to avoid simple 'bad word lists'. Ask it for a "pruple violon" and you get purple violins... you get the deal.

"Metastable" prompts that may or may not generate obscene (content with nudity, guns, violence as I've found) results sometimes shown non-obscene generations, and sometimes trigger a warning.


I’ve thought about this and in fact porn generation sounds like a good thing?? It ensures that it’s victimless. Of course, there is a problem with generation of illegal (underage) porn but other than this, I think it could be helpful for this world.


If all of the child porn industry switched to generated images they'd still be horrible people but many kids would be saved from having these pictures taken. So a commercial model should certainly ban it, but I don't think it's the biggest thing we have to worry about.


i wonder if you could say “person who looks 15 but is 18”


If I had to guess, I'd bet they have a supervised classifier trained to recognize bad content (violence, porn, etc) that they use to filter the generated images before passing them to the user, on top of the bad word lists.


This is mentioned, "content filters" are "blocking images that violate our content policy — which does not allow users to generate violent, adult, or political content, among other categories" and they "limited DALL·E’s exposure to these concepts by removing the most explicit content from its training data."


Most likely they just take the one from bing. Or, if they trained a better one, it goes vice versa sooner or later


Exactly!


Honestly that part pisses me off. Who cares if their AI "makes porn" or something "offensive".


I suspect it's more a business restriction than a moral one. If OpenAI allows people to make porn with these tools, people will make a ton of it. OpenAI will become known as "the company that makes the porn-generating AIs," not "the company that keeps pushing the boundaries of AI." Being known as the porn-ai company is bad for business, so they restrict it.


The part I really don't understand the purpose of the threat "Further policy violations may lead to an automatic suspension of your account".

Why do that? Just refusing to run my query is sufficient. Who is harmed if I continue to bang my head against that wall?


Because it is a solid line (the policy rule) drawn across a fractal boundary (actual porn), and given lots of attempts you can find somewhere inside the line but across the boundary.

Stopping more than so many attempts makes this much harder / much less likely.


I tried the term "cockeyed" and got a TOS violation notice


Porn is cheaper to make, and probably pleasant. But, fantasy porn is not. I can see it sparkling a revival of fantastic erotica


I'm blown away by this:

"Starting today, users get full usage rights to commercialize the images they create with DALL·E, including the right to reprint, sell, and merchandise. This includes images they generated during the research preview."

I assumed this was going to be the sticking point for wider usage for a long time. They're now saying that you have full rights to sell Dall-E 2 creations?


I think they are reacting to competition. MidJourney is amazing, was easier to get into, gives you commercial rights, and frankly I found more fun to use and even better output in most instances.


Midjourney recently changed their terms of service and now the creators own the image and give a license back to Midjourney. Pretty cool.


MidJourney seems a little less all-out commercial. The way everyone’s creations are in giant open Discord channels is great too


It's an interesting set-up. Viewing other's images and seeing their exact prompts is just as entertaining as generating your own.


The only thing I don’t like about MidJourney is the Discord based interface. I think I can grok why Dave chose this route as it bakes in an active community element and allows users to pick up prompt engineering techniques osmotically… but I’d prefer a clean DALL-E style app and cli / api access.


In case you don’t know, you can at least PM the MidJourney bot so you have an uncluttered workspace.

It’s clearly personally preference, but I loathe Discord but love it for MidJourney. As you said, there’s an interactive element where I see other people doing cool things and adapting part of their prompts and vice versa. It really is fun. And when you do it in a PM, you have all your efforts saved. DALL-E is pretty clunky in that you have to manually save an image or lose it once your history rolls off.


I've completely changed my mind after spending the last few days neck deep in it around the clock. Sleep is overrated! MidJourney is awesome and the way it's implemented within Discord is a masterstroke of elegant simplicity.


Thanks. Yeah fair point; I haven’t ponied up for a subscription yet so am still stuck in public channels and often find my generations get lost in the stream. Imagine you’re right and having the PM option would change the experience drastically for the better albeit still within Discord’s visually chaotic environment.


Don't they both give you commercial rights now?

I have access to both and they're good for different things. DALL-E seems somewhat more likely to know what you mean. Midjourney seems better for making interesting fantasy and science fiction environments.

For comparison, I tried generating images of accordions. Midjourney doesn't really understand that an accordion has a bellows [1]. DALL-E manages to get the right shape much of the time, if you don't look too closely: [2], [3]. Neither of them knows the difference between piano and button accordions.

Neither of them can draw a piano keyboard accurately, but DALL-E is closer if you don't look too hard. (The black notes aren't in alternating groups of two and three.)

Neither of them understands text; text on a sign will be garbled. Google's Parti project can do this [4], but it's not available to the public.

I expect DALL-E will have many people sign up for occasional usage, because if you don't use it for a few months, the free credits will build up. But Midjourney's pricing seems better if you use it every day?

[1] https://www.reddit.com/r/Accordion/comments/uuwrbj/midjourne...

[2] https://www.reddit.com/r/Accordion/comments/vz9zxw/dalle_sor...

[3] https://www.reddit.com/r/Accordion/comments/w0677q/accordion...

[4] https://parti.research.google/


MidJourney definitely struggles more with complex prompts from what I saw. If you like the output more, that’s subjective, but I think DALL•E is the leader in the space by a wide margin.


I think both have strengths and weaknesses, but I don’t disagree DALL-E in most instances is technically better at matching prompts. But I often enjoyed, artistically, the results of MidJourney more; it just felt fun to use and explore.


Really hope I get an invite for MidJourney soon. Been on the waitlist since March :(


Midjourney is in open beta now. Just go to their site and you can get started right away. I got in and I wasn't even on their waiting list.


Thanks. Will try again.

Edit: Joined the discord via the beta and got in. Thanks a lot for the heads up!


nightcafe.studio is also free and good. Very good.


Gave it a try. After each image (all disappointing) I dumbed down the prompt, finally ending in “dog”. Didn’t even handle that.


I guess it depends on what you like/enjoy? It's not good at photorealistic, but it comes up with some pretty entertaining (and pretty?) 'arty' type stuff. I go on regularly just to play around for fun.


Previously, OpenAI asserted they owned the generated images, so the new licensing is a shift in that aspect. GPT-3 also has a "you own the content" clause as well.

Of course, that clause won't deter a third party from filing a lawsuit against you if you commercialize a generated image too close to something realistic, as the copyrights of AI generated content still hasn't been legally tested.


AFAIK only people can own copyright (the monkey selfie case tested this), and machine-generated outputs don't count as creative work (you can't write an algorithm that generates every permutation of notes and claim you own every song[1]), so DALL-E-generated images are most likely copyright-free. I presume OpenAI only relies on terms of service to dictate what users are allowed to do, but they can't own the images, and neither can their users.

[1]: https://felixreda.eu/2021/07/github-copilot-is-not-infringin...


> DALL-E-generated images are most likely copyright-free

The US Copyright Office did make a ruling that might suggest that recently[1], but crucially, in that case, the AI "didn't include an element of human authorship." The board might rule differently about DALL-E because the prompts do provide an opportunity for human creativity.

And there's another important caveat that the felixreda.eu link seems to miss. DALL-E output, whether or not it's protected by copyright, can certainly infringe other copyrights, just like the output of any other mechanical process. In short, Disney can still sue if you distribute DALL-E generated images of Marvel characters.

1: https://www.theverge.com/2022/2/21/22944335/us-copyright-off...


DALL-E can generate recognizable pictures of Homer Simpson, Batman and other commercial properties. Such images could easily be considered derivative works of the original copyrighted images that were used as training input. I'm sure there are plenty of corporate IP lawyers ready to argue the point at court.


I'm kind of surprised that no one had found "verbatim copy" cases as were made with GitHub Copilot. Such exact copies in photography are likely easier to go for than with code snippets.


It might be interesting to find an image in the training set with a long, very unique description, and try that exact same description as input in DALL·E 2.

Of course it's unlikely to produce the exact same image, or if it does, you've also discovered an incredible image compression algorithm.


Oh I don’t have problems with DALL-E doing its thing, I just think it’s wrong if the purpose will be to cleanse off copyrights from images.


The monkey selfie was not derived from millions of existing works, and that is the difference. If an artist has a well-known art style, and this algorithm was trained on it and can copy that style, would the artist have grounds to sue? I don't know.


> If an artist has a well-known art style, and this algorithm was trained on it and can copy that style, would the artist have grounds to sue? I don't know.

While nothing has been commercialized yet on the DALLE2 subreddit, I know that it can do Dave Choe's work remarkably well. I also saw Alex Gray's work to be close, but not really identical either. It wasn't as intricate as his work is.

It will be interesting if this takes off and you have a sort of Banksy effect take over where unless it's a physical piece of art it doesn't have much value and is only made all the better because of some sort polemic attached to it, eg Girl with balloon.


I'm going to guess there's not going to be much value placed on anything out of DALLE for a long while. Digital art is typically worth much less than physical art and I would say these GAN images are going to worth less than digital art generated by human hand.

There will be outliers of course but I would be shocked if there's much of a market in it for at least the present.


I think the value will be in work produced that gets attached to things which are being sold. So, a book cover or an album cover. If a best selling novel used artwork from this system and it happened to be a very close copy of existing work, I could imagine the author of the original work suing for royalties.


When these tools can generate layered tiff/psd images, polygon meshes and automate UV packing; then we’ll be talking.


Well, music is not "pictures" but Marvin Gaye's family got 5 million because Blurred Lines sounds similar enough to a Marvin Gaye song (even though it was not a sample): https://en.wikipedia.org/wiki/Pharrell_Williams_v._Bridgepor...


Even if you imitate someone's style intentionally, they don't have grounds to sue. Style isn't copyrightable in the US. Whether DALL-E outputs are a derivative work is a different question, though


If I write a song am I not deriving it from the existing works I’ve been exposed to?


Sure but if you just release a basic copy of a Taylor Swift song you will get sued to oblivion. So the law seems (IANAL) to care about how similar your work is to existing works. DALL-E does not seem capable of showing you the work that influenced a result, so users will have no idea if a result might be infringing. What this means to me is that with many users, some of the results would be legally infringing.


> If an artist has a well-known art style, and this algorithm was trained on it and can copy that style...

A lawyer could argue that the algorithm is producing a derivative work of the copyrighted input.


Right but if that work isn’t significantly changed from the source, it could be ruled as infringement. DALL-E cannot tell the users (afaik) if a result is close to any source material.


If this were a concern, a user can easily bypass this by having a work-for-hire person add a minor transform layer on top of the DALL-E generated images right?


Wouldn't it have to meet the threshold of being a "transformative" work?

https://en.wikipedia.org/wiki/Transformative_use


Can I infringe another Dalle users rights if I take an image generated by their acount and sell prints of it..?


Image generating artificial intelligence is very analogous to a camera.

Both technologies have billions of dollars of R&D and tens of thousands of engineers behind supply chains necessary to create the button that a user has the press.


There have been decades of litigation around when/where/of whom you can take a photo. AI generated art isn't there.


they still own the generated content, only grant usage. I have mixed feelings about this confused approach, it won’t last long.

> …you own your Prompts and Uploads, and you agree that OpenAI owns all Generations…


As far as I can tell they still own the images they just license your use of them commercially.


And I just used it to create cover art for a book published in Amazon :)

https://twitter.com/nutanc/status/1549798460290764801?s=20&t...


What was your prompt?


"girl with a cap standing next to a shadow man having a speech bubble, digital art"


Does DALL-E create different outputs for the same input? How does ownership work there?


Not only that, but you can also upload an image (that doesn't depict a real person) and generate variations of it without providing a prompt.


yes it will. it'll keep on augmenting the image until it recognizes it as the input


They will benefit by getting additional feedback on which output images are most useful.


DALL-E 2 has a "Save" feature which is likely a data gathering mechanism for this use case.


Is the lesson here that these images are worth nothing so they lose nothing by giving them away?


> "Starting today, users get full usage rights to commercialize the images they create with DALL·E, including the right to reprint, sell, and merchandise. This includes images they generated during the research preview."

>> And I just used it to create cover art for a book published in Amazon :)

Man... what a missed opportunity for Altman... he could have had a really good cryptocurrency/token with a healthy ecosystem and a creative based community if he didn't push this Worldcoin biometric harvesting BS had he just waited for this to release and coupled it with access to GPT.

This is the kind of thing that Web3 (a joke) was pushing for all along: revolutionary tech that the everyday person can understand with it's own token based ecosystem for access with full creative rights from the prompts.

I wonder if he stepped down from Open AI and put it in a figurehead as CEO could this still work?

> Why is using a token better than using money, in this case?

It would be better for OpenAI if it can monetize not just its subscription based model via a token to pay for overhead and for further R/D but also for it's ability to issue tokens it can freely exchange for utility on it's platform for exclusive access outside of it's capped $15 model and allow for pay as you go models for those who don't have access to it like myself as it's limited to 1 million users.

I don't want an account, and I think that type of gatekeeping wasn't cool during the gmail days either and I had early access back then too, but I'd still personally buy $100s of dollars worth of prompts right now since I think it is fascinating use of NLP and I'm just one of many missed opportunities and represent a lost userbase who just want access for specific projects. By doing this they can still retain the caps of useage on their platform and expand and contract them as they see fit without excluding others.

This in turn could justify the continual investment from the VC World into these projects (under the guise of web3) and allow them to scale into viable businesses and further expand the use of AI/ML into other creative spaces, which as a person studying AI and ML and a background in BTC, is what we all wanted to see instead of these aimless bubbles in things like Solana or yield farming via fake DeFi projects like Calesius that we've seen.

It would legitimize the use of a token for use of an ecosystem model outside of BTC, which to be honest doesn't really exist and has still a tarnished view with all these failed projects, while gaining reception amongst a greater audience since it's captivated so many since it's release.


Why is using a token better than using money, in this case?


I assume something to do with proving ownership via NFT.


Every tech should do this. Could google maps silently change your designation to a minority owned alternative?


It also means there will possibly be another renaissance of fully automated, mass generated NFTs and tons of derivatives and remixes flooding the NFT market in an attempt to pump the NFT hype again.

It doesn't matter, OpenAI wins anyway as these companies will pour hundreds of thousands into generated images.

It seems that the NFT grift is about to be rebooted again, such that it isn't going to die that quickly. But still, eventually 90% of these JPEG NFTs will die anyway.


NFTs were never limited by artwork availability - they are limited by wash-trading ability.


These high photorealistic images can be generated on a mass-scale, completely automated without a human which ultimately cuts the need for an artist to do that.

They will be replaced by DALL·E 2 for creating these illustrations, book covers, NFT variants, etc opening up the whole arena to anyone to do this themselves. All it takes is to describe what they want in text and less than a minute, the work is delivered as little as $15.

OpenAI still wins either way. If a crypto company goes to using DALL·E 2 to generate photorealistic NFTs, they won't stop them and they will take the money.


I'm not sure I understand the point you are trying to make.

Art is already dirt cheap. People aren't buying NFTs for their content. This doesn't make it appreciably easier to con rubes.


A massive increase in the offer will mean the price of these NFTs will tend towards zero.


Interesting. I got access couple weeks ago (was on waitlist since the initial announcement) and frankly as much as really want to be excited and like it, DALL-E ended up being a bit underwhelming. IMHO - often results that produced are of low quality (distorted images, or quite wacky representation of the query). Some styles of imagery are certainly a better fit for being generated by DALL-E, but as far as commercial usage I think it needs a few iterations and probably even larger underlying model.


I suspect you simply need to use it more with a lot more variation in your prompts. In particular, it takes style direction and some other modifiers to really get rolling. Run at least a few hundred prompts with this in mind. Most will be awful output... but many will be absolute gems.

It has, honestly, completely blown me away beyond my wildest imagination of where this technology would be at today.


This book has some very good, actionable advice on crafting prompts that get better results out of DALL-E: https://dallery.gallery/the-dalle-2-prompt-book/


I also got access a couple of weeks ago and I can't fathom how anyone could be underwhelmed by it.

What were you expecting?


Fundamentally I have two categories of issues I see with DALL-E, but please don't get me wrong -- I think this is a great demonstration of what is possible with huge models and I think OpenAI work in general is fantastic. I will most certainly continue using both DALL-E and OpenAI's GPT3. (1) Between what DALL-E can do today and commercial utility is a rift in my opinion. I readily admit that I am have not done hundreds of queries (thank you folks for pointing that out, I'll practice more!) but that means that there is a learning curve, isn't it? I can't just go to DALL-E, mess with it for 5-10 minutes and get my next ad or book cover or illustration for my next project done? (2) I think DALL-E has issues with faces and human form in general. Images it produces are often quite repulsive and take the uncanny valley to the next level. I absolutely surprise myself when I noticed thinking that images with humans DALL-E produced lack of... soul? Cats and dogs on the other hand it handles much better. I done tests with other entities --- say cars or machinery -- and it generally performs so so with them too, often creating disproportionate representations of them or misplacing chunks. If you're querying for multiple objects on a scene it quite often melds them together. This is more pronounced in photorealistic renderings. When I query for painting-style it works mostly better. That said every now and then it does produce a great image, but with this way of arriving at it, how fast I'll have to replenish those credits?.. :)

All in all though I think I am underwhelmed mostly because my initial expectations were off, I am still a fan of DALL-E specifically and GPT3 in general. Now when is GPT4 coming out? :)


Dalle seems to only have a few "styles" of drawing that it is actually "good" at. It is particularly strong at these styles but disappointingly underwhelming at anything else, and will actively fight you and morph your prompt into one of these styles even when given an inpainting example of exactly what you want.

It's great at photorealistic images like this: https://labs.openai.com/s/0MFuSC1AsZcwaafD3r0nuJTT, but it's intentionally lobotomized to be bad at faces, and often has an uncanny valley feel in general, like this: https://labs.openai.com/s/t1iBu9G6vRqkx5KLBGnIQDrp (never mind that it's also lobotomized to be unable to recognize characters in general). It's basically as close to perfect as an AI can be at generating dogs and cats though, but anything else will be "off" in some meaningful ways.

It has a particular sort of blurry, amateur oil painting digital art style it often tries to use for any colorful drawings, like this: https://labs.openai.com/s/EYsKUFR5GvooTSP5VjDuvii2 or this: https://labs.openai.com/s/xBAJm1J8hjidvnhjEosesMZL . You can see the exact problem in the second one with inpainting: it utterly fails at the "clean" digital art style, or drawing anything with any level of fine detail, or matching any sort of vector art or line art (e.g. anime/manga style) without loads of ugly, distracting visual artifacts. Even Craiyon and DALLE-mini outperform it on this. I've tried over 100 prompts to get stuff like that to generate and have not had a single prompt that is able to generate anything even remotely good in that style yet. It seems almost like it has a "resolution" of detail for non-photographic images, and any detail below a certain resolution just becomes a blobby, grainy brush stroke, e.g. this one: https://labs.openai.com/s/jtvRjiIZRsAU1ukofUvHiFhX , the "fairies" become vague colored blobs here. It can generate some pretty ok art in very specific styles, e.g. classical landscape paintings: https://labs.openai.com/s/6rY7AF7fWPb5wWiSH0rAG0Rm , but for anything other than this generic style it disappoints hard.

The other style it is ok at is garish corporate clip art, which is unremarkable and there's already more than enough clip art out there for the next 1000 years of our collective needs -- it is nevertheless somewhat annoying when it occasionally wastes a prompt generating that crap because you weren't specific that you wanted "good" images of the thing you were asking for.

The more I use DALLE-2 the more I just get depressed at how much wasted potential it has. It's incredibly obvious they trimmed a huge amount of quality data and sources from their databases for "safety" reasons, and this had huge effects on the actual quality of the outputs in all but the most mundane of prompts. I've got a bunch more examples of trying to get it to generate the kind of art I want (cute anime art, is that too much to ask for?) and watching it fail utterly every single time. The saddest part is when you can see it's got some incredible glimpse of inspiration or creative genius, but just doesn't have the ability to actually follow through with it.


GPT3 has seen similar lobotomization since its initial closed beta. Current davinci outputs tend to be quite reserved and bland, whereas when I first had the fortunate opportunity to experience playing with it in mid 2020, if often felt like tapping into a friendly genius with access to unlimited pattern recognition and boundless knowledge.


I've absolutely noticed that. I used to pay for GPT-3 access through AI Dungeon back in 2020, before it got censored and run into the ground. In the AI fiction community we call that "Summer Dragon" ("Dragon" was the name of the AI dungeon model that used 175B GPT-3), and we consider it the gold standard of creativity and knowledge that hasn't been matched yet even 2 years later. It had this brilliant quality to it where it almost seemed to be able to pick up on your unconscious expectations of what you wanted it to write, based purely on your word choice in the prompt. We've noticed that since around Fall 2020 the quality of the outputs has slowly degraded with every wave of corporate censorship and "bias reduction". Using GPT-3 playground (or story writing services like Sudowrite which use Davinci) it's plainly obvious how bad it's gotten.

OpenAI needs to open their damn eyes and realize that a brilliant AI with provocative, biased outputs is better than a lobotomized AI that can only generate advertiser-friendly content.


So it got worse for creative writing, but it got much better at solving few-shot tasks. You can do information extraction from various documents with it, for example.


I mean yes, you’re right insofar as it goes. However nothing I am aware of implies technical reasons linking these two variables into a necessarily inevitable trade-off. And it’s not only creative writing that’s been hobbled; GPT3 used to be an incredibly promising academic research tool and given the right approach to prompts could uncover disparate connections between siloed fields that conventional search can only dream of.

I’m eager for OpenAi to wake up and walk back on the clumsy corporate censorship, and/or for competitors to replicate the approach and improve upon the original magic without the “bias” obsession tacked on. Real challenge though “bias” may pose in some scenarios, perhaps a better way to address this would be at the training data stage rather than clumsily gluing on an opaque approach towards poorly implemented, idealist censorship lacking in depth (and perhaps arguably, also lacking sincerity).


The face thing is weird in context of them not being worried about it infringing on the copyright of art. If they're confident it's not going to infringe on art copyright, why the worry it might generate the face of a real person.


I felt the same way. If anything, I realized how soulless and uninteresting faceless art is. Dall-E 2 goes out of its way to make terrible faces for, im guessing, deepfake reasons?


> Reducing bias: We implemented a new technique so that DALL·E generates images of people that more accurately reflect the diversity of the world’s population. This technique is applied at the system level when DALL·E is given a prompt about an individual that does not specify race or gender, like “CEO.”

Will it do it "more accurately" as they claim? As in, if 90% of CEOs are male, then the odds of a CEO being male in a picture is 90%? Or less "accurately reflect the diversity of the world’s population" and show what they would like the real world to be like?


If accurately reflects the world population then only one in six pictures will be a white person. Half the pictures will be Asian, another sixth will be Indian.

Slightly more than half of the pictures will be women.

That accurately represents the world's diversity. It won't accurately reflect the world's power balance but that doesn't seem to be their goal.

If you want to say "white male CEO" because you want results that support the existing paradigm it doesn't sound like they'll stop you. I can't imagine a more boring request.

Let's look at interesting questions:

If you ask for "victorian detective" are you going to get a bunch of Asians in deerstalker caps with pipes?

What about Jedi? A lot of the Jedi are blue and almost nobody on Earth is.

Are cartoon characters exempt from the racial algorithm? If I ask for a Smurf surfing on a pizza I don't think that making the Smurf Asian is going to be a comfortable image for any viewer.

What about ageism? 16% of the population is over sixty. Will a request for "superhero lifting a building" have an 16% chance of being old?

If I request a "bad driver peering over a steering wheel" am I still going to get an Asian 50% of the time? Are we ok with that?

I respect the team's effort to create an inclusive and inoffensive tool. I expect it's going to be hard going.


> inoffensive tool.

Wouldn't that result end up being like "inoffensive art" or "inoffensive comedy"?

Bland, boring and Corporate-PC.


Being offensive is only one way to be interesting.

There are others, like being clever, or being absurd, or being goofy, or being poignant, or being refreshing.

Of the good stuff, offensive humor is only a tiny slice.


offensive to whom is the sticking point when it comes to comedy

it takes a special talent to please everybody


To a certain degree, yes. They care more about the image of the project than art. Considering a large amount of art depicts non-sexual nudity yet they block all nudity, art is not their primary concern.


Some people claim to be emotionally "triggered" by images of police. Does that mean DALL-E should also start blocking images that contain police?


You know a surprising way to solve the issues you presented? You train another model to trick DALL-E to generate undesirable images. It will use all its generative skills to probe for prompts. Then you can use those prompts to fine-tune the original model. So you use generative models as a devil's advocate.

- Red Teaming Language Models with Language Models

https://arxiv.org/abs/2202.03286


The latter. Here's what we, a small number of people, think the world should look like according to our own biases and information bubble in the current moment. We will impose our biases upon you, the unenlightened masses who must be manipulated for your own good. And for god sakes, don't look for photos of the US Math team or NBA Basketball or compare soccer teams across different countries and cultures.


> Here's what we, a small number of people, think the world should look like according to our own biases and information bubble in the current moment.

You're being quite charitable. It is much more likely that optics and virtue signaling is behind this addition.


If I search for “food” I don’t want to see a slice of pizza every time, even if that’s the #1 food. I want to see some variety.

I think you’re jumping to quickly to bad intentions. Injecting diversity of results is a sane thing to do, totally irrespective of politics.


You are correct but that's not what anyone's discussing.

If I search for "food", the reasonable result would be to get images that represent food according to its actual proportions of real life. E.g. if Pizza is the most common food at 10% prevalence, 10% of the images should be pizza.

That's not what OpenAI are doing.

They are introducing crafted biases to create images that deliberately misrepresent what the world looks like, and instead represent what they believe the world ought to look like.

--

You also need some reason why diversity "of this" is important but not diversity "of that". Why is diversity of race and sex so critical, but not diversity of age, height, disability? Should a search for "basketball player" yield 1/2 able-bodied people and 1/2 wheelchair basketball players? Why?

Then try to answer where you came up with the categories you do want depicted. Why are the races what they are? Should "basketball player" include half whites and half black people? Or maybe split in 3, white/black/Asian? Why not Australian Aborigines, native Americans, or Persians - so we can divide into 6? If you don't add Indian people to your list then, is that racist against them? How did you decide what must be represented, in what proportions, and what's okay to leave out?


Yes the quality of surrealist generations went down with that change suddenly including gender and race into prompts that I really didn't want anything specific in. Like a snail radio DJ, and suddenly the microphone is a woman of colours head.. I understand the intention but I want this to be a default on but you can turn it off thing.


They literally just add "black" and "female" with some weight before any prompt containing person.

A comical work around to so called "bias" (isn't the whole point of these models to encode some bias?). Here's some experimentation showing this.

https://twitter.com/rzhang88/status/1549472829304741888

As competitors with lower price points prop up, you'll see everyone ditch models with "anti bias" measures and take their $ somewhere else. Or maybe we'll get some real solution, that adds noise to the embeddings, and not some half assed workaround to the arbitrary rules that your resident AI Ethicist comes up with.


Add after. So you can see the added words by making a prompt like "a person holding a sign saying ", and then the sign says the extra words if they are added.


Yeah actually, good call. The position of the token matters, since these things use transformers to encode the embeddings.

https://www.assemblyai.com/blog/how-imagen-actually-works/


How does it deal with bias that is negative?

Would only work for positive biases where if they actually want to equalize it then it needs to be adding the opposite to negative biases.

To counteract the bias of their dataset they need to have someone sitting there actively thinking in bias to counteract the bias with anti-bias seasoning for every bias causing term. Feel bad for whatever person is tasked with that job.

Could always just fix your dataset, but who's got time and money to do that /s


It's also funny that this likely won't 'unbias' any actual published images coming out of it. If 90% of the images in the world has a male CEO, then for whatever reason that's the image people will pick and choose from DALL-Es output. (Generalized to any unbiasing - i.e. they'll be debiased by humans.)


Imagine you're in South Korea (or any other ethnically homogenous country). Do you want "black" "female" randomly appended to your input?


If I was using this in South Korea, how is showing all white people any better than showing whites, blacks, latinos and asians?


You would presumably input “South Korean CEO”. DALL-E would then unhelpfully add “black” “female” without your knowledge.


I just tried it out and it looks like DALL-E isn't as inept as you imagined. Exact query used was 'A profile photo of a male south korean CEO', and it spat out 4 very believable korean business dudes.

Supplying the race and sex information seems to prevent new keywords from being injected. I see no problem with the system generating female CEOs when the gender information is omitted, unless you think there are?


I don't think they "randomly insert keywords" like people are claiming, I think they probably run it through a GPT3 prompt and ask it to rewrite the prompt if it's too vague.

I set up a similar GPT prompt with a lot more power ("rewrite this vague input into a precise image description") and I find it much more creative and useful than DALLE2 is.


Isn’t the diversity keyword injection random?

My point is that it is pointless. If you want an image of a <race> <gender> person included, you can just specify it yourself.


> If you want an image of a <race> <gender> person included, you can just specify it yourself.

I agree wholeheartedly. So what are we arguing about?

What we're seeing is that DALL-E has its own bias-balancing technique it uses to nullify the imbalances it knows exists in its training data. When you specify ambiguous queries it kicks into action, but if you wanted male white CEOs the system is happy to give it to you. I'm not sure where the problem is.


In their examples, the "After mitigation" photos seem more representative of the real world. Before you got nothing but white guys for firefighter or software engineer and nothing but white ladies for teacher. That's not how the real world actually is today.

I'm not sure how they would accomplish 100% accurate proportions anyway, or even why that would be desirable. If I don't specify any traits then I want to see a wide variety of people. That's a more useful product than one that just gives me one type of person over and over again because it thinks there are no female firefighters in the world.


Most likely this was something forced by their marketing team or their office of diversity. Given the explanation of the implementation (arbitrarily adding "black" and "female" qualifiers), it's clear it was just an afterthought.


It's also odd since you'd think that this would be an issue solved by training with representative images in the first place.

If you used good input you'd expect an appropriate output, I don't know why manual intervention would be necessary unless it's for other purposes than stated. I suspect this is another case where "diversity" simply means "less whites".


Will it reduce bias across all fields? Or only ones that are desirable? How about historical?

"A photo of a group of soldiers from WW2 celebrating victory over nazi CEOs and plumbers".


hardmaru on Twitter has examples. It’s the second, the one they would like it to be.


That's disappointing given up until this point you could have 50 free uses per 24h. I expected it to get monetized eventually, but not so fast and drastically. Well, still had my fun and have to say the creations are so good it's often mind blowing there's an AI behind it.


Honestly, it is probably just that expensive to run. You can’t expect someone to hand you free compute of significant value and directly charging for it is a lot better than other things they could do.


they're a non-profit so the price is probably still dirt cheap


Not correct. They have a for-profit entity now. That's why there is a huge incentive to monetize. Any for-profit investment gain is capped at 100x, with the rest required to go to their nonprofit. This commercialization is just as I predicted in my substack post 2 days ago that hit the front page of Hacker News: https://aifuture.substack.com/p/the-ai-battle-rages-on


For those who want to try DALL.E but do not have access yet, this is good play site: https://www.craiyon.com/


How do you interface with DALL-E?

For MidJourney I was painfully surprised to find that everything is done through chat messages on a Discord server.

I'm not a paid member, so I have to enter my prompts in public channels. It's extremely easy to lose your own prompts in the rapidly flowing stream of prompts going on. I can kind of see why they did it that way--maybe, if I squint really hard--to try to promote visibility and community interaction, but it's just not happening. It's hard enough to find my own images, say nothing about follow what someone else is doing. This is literally the worst user experience I have ever had with a piece of software.

There are dozens of channels. It's so spammy, doing it through Discord. It's constantly pinging new notifications and I have to go through and manually mute each and every one of the channels. Then they open a few dozen more. Rinse. Repeat.

I understand paid users can have their own channels to generate images, but I really don't see the point in paying for it when, even subtracting the firehose of prompts and images, it's still an objectively shitty interface to have to do everything through Discord chat messages.


Yeah that's definitely one of the worst aspects of using midjourney, supposedly a API is coming but it doesn't look like it's going to be happening anytime soon.

I don't know who thought that discord would make a good GUI front end...


You use a web app to interface with it. Agree going from DALL-E 2 to Midjourney is pretty painful. Hopefully Midjourney create a web UI for it like OpenAI/Craiyon.


This news is funny since it doesn't actually change anything. It's still a waitlist that they're pushing out slowly (not an open beta). Nice way to stay in the news though.


I was really enjoying using Dalle2 to take surrealist walks around the latent image space of human cultural production. I was using it as one might use Wikipedia researching the links between objects and their representation. Also just to generate suggestion for what to have for lunch. None of this was for anything of commercial value to me. What am I to do now, start to find ways to sell the images I'm outputting? Do I displace the freelance artists in the market who actually have real talent and ability to create images and compositions and who studied how use the tools of the trade. Does the income artists can make now get displaced by people using dalle? Then do people stop learning how to actually make art and we come to the end of new cultural production and just start remixing everything made untill now?


With real artist left only making images of sex and violence and other TOS violations


I have some first-hand experience about how the copyright office views these works from creating an AI assistant to help me write these melodies: https://www.youtube.com/playlist?list=PLoCzMRqh5SkFwkumE578Y.... Here is a quote from the response from the Copyright Office email before I provided additional information about how they were created:

"To be copyrightable, a work must be fixed in a tangible form, must be of human origin, and must contain a minimal degree of creative expression"

So some employees there are aware of the impact that AI can have. Getting these DALL-E images copyrighted won't be trivial. I think it will be many years before the law is clarified.


The name "OpenAI" to me implies being open-source.

I have an RTX 3080 and will likely be buying a 4090 when it comes out. Will I ever be able to generate these images locally, rather than having to use a paid service? I've done it with DALL-E Mini, but the images from that don't hold a candle to what DALL-E 2 produces.


From what i've seen it's all about the VRAM

if you've got 60GB available to your GPU then maybe you can get close

I'm really curious if Apple's unified memory architecture is of benefit here, especially a few years from now if we can start getting 128/256GB of shared RAM on the SoC


I was reading on Reddit that you would only need 12 GB of VRAM for DALL-E 2 to run on a local machine https://www.reddit.com/r/dalle2/comments/w3sbt3/comment/ih01....


You should show up to the US Open with a tennis racket next year and see if they'll let you have a go, too.


Are you an AI as well? Because within the context of tech "Open" definitely has that connotation


I'm not sure if any current or next-generation GPU even has enough power to run DALL-E 2 locally.

Anyway, OpenAI is unlikely to release the model. The situation will like it is with GPT-3; however, it's also likely another team will attempt to duplicate OpenAI's work.


Thanks to the amazing @lucidrains there's already an open-source implementation of DALL-E 2: https://github.com/lucidrains/DALLE2-pytorch and a pretrained model for it should be released within this year.

The same person is also at work on an open-source implementation of Google's Imagen which should be even better (and faster) than DALLE-2: https://github.com/lucidrains/imagen-pytorch.

This is possible because the original research papers behind DALLE-2 and Imagen were both publicly released.


Their choice of name gets funnier every month.


> Since its research is free from financial obligations, OpenAI can better focus on a positive human impact.

Haha description of the company from Google


That's about 10x as expensive as it should be


You should ship a competitor! Sounds like you found a great market opportunity.


$0.13/prompt can only be useful for artists/end users. Anyone thinking about using this at scale would need a 20/30x reduction in price. But there's still no API available so I think that will change with time. Maybe they will add different tiers based on volume.


Thing is, as a current user: you rarely get it right in the first prompt, you can iterate 10 times until you get what you want.

I spent several tries yesterday to get this angle "from the ground up": https://labs.openai.com/s/mz8LiyvkI8KwD2luJ6MrS23m


So $1.30 for getting a result that would have cost how much to pay someone to make? Not to mention the 59 other variations you would have.


That is a fair point. I don't think the pricing is unreasonable, but it feels limiting. You could try 1000 variations until you find what you need perfectly, but in that pricing model users will be induced to use less the tool, not more.

I'd prefer an option to pay like 200 usd/year to use unlimited. And maybe have a price per use only in the API.

edit: this pricing model also makes it expensive to learn to use the tool.


Give it some time. Other organizations will race to the bottom.

They might even provide image generation at a loss to drive people to their platforms.


Until you consider the level of demand for this product, which is surely higher than OpenAI can scale to with the number of GPUs they have. If they price it lower they’ll be overwhelmed.


What are you basing that on? What should the price be? The training and generation are probably expensive.


$15 for 115 iterations/460 images?


Yep. During the alpha it was (50*6) 300 images per day, by their pricing model that would be $300 a month now


$15 for 115 attempts to get usable images.


When there is a competitor, they can adjust pricing. For now, it's virtually magic.


That’s a good thing. It’s harm reduction to save artist jobs.


Welcome to SaaS.


Sad to say I've been dissapointed in DALLE's performance since I got access to it a couple of weeks ago - I think mainly because it was hyped up as the holy grail of text2image ever since it was first announced.

For a long while whenever Midjourney or DALLE-mini or the other models underperformed or failed to match a prompt the common refrain seemed to be "ah, but these are just the smaller version of the real impressive text2image models - surely they'd perform better on this prompt". Honestly, I don't think it performs dramatically better than DALLE-mini or Midjourney - in some cases I even think DALLE-mini outperforms it for whatever reason. Maybe because of filtering applied by OpenAI?

What difference there is seems to be a difference in quality on queries that work well, not a capability to tackle more complex queries. If you try a sentence involving lots of relationships between objects in the scene, DALLE will still generate a mishmash of those objects - it'll just look like a slightly higher quality mishmash than from DALLE-mini. And on queries that it does seem to handle well, there's almost always something off with the scene if you spend more than a moment inspecting it. I think this is why there's such a plethora of stylized and abstract imagery in the examples of DALLE's capabilities - humans are much more forgiving of flaws in those images.

I don't think artists should be afraid of being replaced by text2image models anytime soon. That said, I have gotten access to other large text2image models that claim to outperform DALLE on several metrics, and my experience matched with that claim - images were more detailed and handled relationships in the scene better than DALLE does. So there's clearly a lot of room for improvement left in the space.


I wrote about this happening two days ago on my sub stack post, "OpenAI will start charging businesses for images based on how many images they request. Just like Amazon Web Services charges businesses for usage across storage, computing, etc. Imagine a simple webpage where OpenAI will list out their AI-job suite, including “jobs” such as software developer, graphics designer, customer support rep, and accountant. You can select which service offerings you’d like to purchase ad-hoc or opt into the full AI-job suite."

In case you are interested in reading the whole take: https://aifuture.substack.com/p/the-ai-battle-rages-on


"Business monetises their offering" can't say I'm entirely blown away by the prediction


Two questions:

(1) Any opinions on if removing the watermark is possible? Is doing so against the terms of service?

(2) Appears the output is still at 1024x1024 - what are options to upscale the resolution, for example would OpenCV super resolution work?


It is possible, they confirmed on discord you can remove the watermark.

Yep... The output is an issue, I'd like to pay if that was an upgrade.


Annoying that if removing the watermark is allowed that it is even inserted. Imagine if Adobe did that.

Here’s more information on super resolution options beyond what Adobe already offers:

(1) List of options current options for super resolutions:

https://upscale.wiki/wiki/Different_Neural_Networks

(2) Older example of one way to benchmark:

https://docs.opencv.org/4.x/dc/d69/tutorial_dnn_superres_ben...


Between us... The watermark is just on the frontend... It's not into the image...


I'm curious to know - does the community have any open source alternatives to DALL.E? For an initiative named OpenAI, keeping their source code and models closed behind a license is bullshit in my opinion.


LAION is working on open source alternatives. There's a lot of activity in their discord and they have amassed the necessary training data but I am uncertain as to whether they have obtained the funding needed to deliver fully trained models. Phil Wang created initial implementations of several papers including imagen and parti in his GitHub account. EG: https://github.com/lucidrains/DALLE2-pytorch


EAI/Emad/et al's 'Stable Diffusion' model will be coming out in the next month or so. I don't know if it will hit DALL-E 2 level but a lot of people will be using it based on the during-training samples they've been releasing on Twitter.


The best open-source-but-actually-can-be-run-on-simple-infra analogous to DALL-E 2 is min-dalle: https://github.com/kuprel/min-dalle


How is minDALL-E better than DALL-E mini?

I can run DALL-E mini with the MEGA model with 12 GB of VRAM. What are the requirements for minDALL-E in terms of VRAM?


A free alternative:

https://huggingface.co/spaces/dalle-mini/dalle-mini

Reminder that the OpenAI team claimed safety issues about releasing the weights. Now they’re charging, when the above link GPU time is being paid for by investor dollars. I guess sama must be hurting if he can only afford OpenAI credit packs for celebrities and his friends.


This is going to completely obliterate the low end of the illustrator/graphic designer market. Anyone on Fiverr should be looking for new work.


So can we now legally remove the "color blocks" watermark or not?

What about generating NFTs? It was explicitly prohibited during the previous period, now there is no notion of it. Without notion and rights for commercial use I think it's allowed but because it was an explicitly forbidden use case before, I want to be sure whether it can be used or not.

Regardless, excited to see what possibilities it opens.


Another user saying that OA has said it's OK to remove the watermark: https://www.reddit.com/r/dalle2/comments/w3qsxd/dalle_now_av...

The commercial use language appears pretty clear to me to allow NFTs. (But note the absence of any discussion of derivative works...)


In beta, maybe, but I don't think "available" means what they think it means.

I have been on the waitlist from the very beginning. Still waiting.


I wonder how fast they will invite the 1 million users?

I have been on the waitlist for a while and did not get access yet.

Did anybody get access already today?


nope, I've been for quite some time too


Super impressive to see how OpenAI managed to bring the project from research to production (something usable for creatives). This is non trivial since the usecase involves filtering NSFW content, reducing bias in generated images. Kudos to the entire team.


Slightly offtopic, but how one would report false-positive check in content policy check?


It's laughably primitive. Tried to upload "The Creation of Adam", policy violation. Tried to make an image in "yarn bombing" style, policy violation. The Scunthorpe problem is too hard for cutting edge AI to tackle, I guess.


Is there a comprehensive list of all the disgraceful censorship and model-neutering restrictions they put on DALL-E? It's sad to see that openai is so absolutely terrified of their models producing upsetting content and the bad press that would ensue, when they could just show everyone the finger and say: "It's just pretty pictures. Made up pictures. They aren't real and can't hurt you, so stop crying."

e.g. One is unable to create faces of real people in the public eye.


It’s so dirty what Microsoft is doing here. They ripped the tech out of developers hands just to sell us drips of it. Drips that are not enough to build a product for more than a few people. They require to check on the use before launching etc. I truly hate this company, their shitty operating system and their monopoly business game. Everything they buy turns to shit. And don’t tell me about VSCode. It’s just a trap to fool developers.


I find it amusing that they suggest DALL-E, which typically generates lovecraftian nightmare images, for making children's story illustrations.


How so? If you give it prompts for children story illustrations with a detailed description it will not give you "lovecraftian nightmare images".


yeah. dalle is "so bad it's good".

it's great for post-post-ironic memes, but I don't see it being useful for anything else


Have you tried any of the "human or Dall-E" tests?

How did you score?

I only scored as well as I did because I knew the kind of stylistic choices to look out for. In terms of "quality" I really don't understand how you've reached this conclusion.


I've only seen this thing https://huggingface.co/spaces/dalle-mini/dalle-mini

is it not dall-e?


It's a reimplementation.

It's a long way off in terms of quality (at the moment anyway)


It's a model inspired by DALLE 1 but it's not even very close to that.

But it does seem to know a lot of things the real DALLE2 doesn't.


It is not and that's why OpenAI asked them to change the name, which they did.


oh. I retract my OP then


No wireless. Less space than a nomad. Lame.


DALL-E 2 was trained on approximately 650 million COPYRIGHTED image-text pairs SCRAPED FROM THE INTERNET, according to the paper that OpenAI posted to ArXiv. https://cdn.openai.com/papers/dall-e-2.pdf

That includes YOURS.

Have you been paid for it?


I can’t check right now but this mean the watermark is also gone and images will have a higher resolution?


Watermarks are still there and resolution still 1024x1024.


I wonder if they have plans to allow SVG exports in the future. I mean, the file size would probably be ridiculous in a lot of the cases, but for my use case I wouldn't mind it. And sucks about the watermark, maybe they will introduce an option to pay for removing it.


SVG isn't really possible with the model architecture they're using. The diffusion+upscaling step basically outputs 1024x1024 pixels; at no point does the model have a vector representation.

I suppose it's possible that at some point they'll try to make an image -> svg translation model?


SVG exports would only be meaningful if the model is generating vector images, which are then converted to bitmaps. I highly doubt that's the case, but perhaps someone who has actually looked at the model structure can confirm?


It's just pixels. You can pass them into a tracer


I don't like how they're charging money for Dalle, yet they don't have an API available.


Has anyone else had problems with the 'Generate Variations' functions lately? Tried it out first 3 days ago, and it says 'Something went wrong. Please try again later, or contact support@openai.com if this is an ongoing problem.' everytime since then.


Hmmm DALL·E was a struggling sometimes, you can always keep check on Discord, they keep us informed there.


Every day for the past week or so, I've spent an hour or so using DALL-E, making up new combos and making my pals laugh. In the same way that you can't get bored of art or visual stimulation, you can't get bored of this either.


I would like to point out that there is not as much of an uproar on this forum over DALL-E utilizing other people’s photographs, illustrations, et al, as there is around Copilot utilizing other people’s code.

Why do you think that is?


One reason is that copilot could be easily prompted to "create" line for line reproductions of people's code. AFAICT, you can't do this with DALLE (even if you, for example, try to input the caption of an image directly).


- This is not a photography/illustration focused forum.

- Unlike software photography/illustration are not used to run essential systems like banking, manufacturing, medical equipment etc


While I have definitely seen discussions of safety from what I can gather the main point of contention is with copyright.


I've been on the waitlist since April 16th. Would have loved to have played around with the alpha but now clearly my ability to experiment and learn to use the system to cut down on expenses is extremely limited.



I wonder at this price point which kind of business can use DALL E at scale?


AI art generation, completely free, no limits — https://art.elbo.ai


DALLe was fun for a few days, but I'm already tired of it. Now the web is going to be flooded with it; I'm glad I don't use social media.


I like how everyone’s face is rendered by DALL-E to look either like a still from a David Lynch film, or have teeth and hair coming out of weird places.


I wonder if they'll even make back what they spent on training the models before competitors of equal quality and lower cost eats up their margins.


I tried DALLE once and liked the generated images. Not really my thing, but so cool.

What I do use is OpenAI’s GPT-3 APIs, I am a paying customer. Great tool!


Wonderful!

Also, I love how, while signing up for access to this amazing AI, I am asked to, indeed, affirm I am _not_ a robot.


I would love access to this in order to design Silver Rounds. If you work at open API please reach out!


I know it’s a dumb question, but what am I supposed to do with this thing?


The content policy is strikingly puritanical:

> "Do not attempt to create, upload, or share images that are not G-rated"

https://labs.openai.com/policies/content-policy


Is this referring to the first version of the model, or DALL-E 2?


I really want access, wish there was a way to pay to get in.


Amazing stuff (really fun)... can it solve climate change ?


What is the copyright situation of generated images?


The worlds most expensive meme generator.


Feel sorry for the full time artists.


Can anyone invite me for DALL E!


i like it


I am thrilled about DALL-E, and the new terms of service. However, how they implemented the improved "diversity" is hilarious.

Turns out that they randomly, silently modify your prompt text to append words like "black male" or "female". See https://twitter.com/jd_pressman/status/1549523790060605440

I don't know which emotion I feel more - applause at how glorious this hack is or tears at how ugly it is.

Good luck to them!


Interesting. Considering this is now a paid product, is modifying user input covered by their ToS? If I was spending a lot of money on it I'd be rather annoyed my input was being silently polluted.


Your input isn't being polluted by this any more than it is when the tokens in it are ground up into vectors and transformed mathematically. You just have an easier time understanding this transformation.


Obviously, it's polluted. Undisputably. In a mathematical sense, an extra (black box) transformation is performed on the input to the model. In a practical sense (eg. if you're researching the model), this is like having dirty laboratory tools - all measurements are slightly off. The presumption by OpenAI is that the measurements are off in the correct way.

I'm interested in using Dall-E commercially, but I think some competitor offering sampling with raw input will have a better chance at my wallet.


It's a fucking AI picture generator. The whole thing is a series of (literally) inscrutable black boxes. This is not a good argument.


Yeah man, but literally the entire point of this AI picture generator is that it's, like, super accurate at rendering the prompt, and stuff.

I don't understand the relevance of the black box's scrutability - I just want to play with the black box. I am interested in increasing my understanding of the black box, not of a trust-me-it's-great-our-intern-steve-made-it black box derivative.


You should make your own black boxes then. By all means, send your dollars to whatever service passes your purity test; I'm just saying that the idea that DALL-E is "polluting" your input is risible. It's already polluting your data at, like, a subatomic level, at dimensionalities it hadn't even occurred to you to consider, and at enormous scale.


> Your input isn't being polluted by this any more than it is when the tokens in it are ground up into vectors and transformed mathematically. You just have an easier time understanding this transformation.

These kinds of modifications are obviously different. At least the mathematical transformations are attempting at least some level of fidelity to user input, these ones aren't (e.g. someone mentioned they're sometimes getting androgynous results and speculates the added terms are conflicting with the ones they provided in their input). Not all black boxes are equivalent.


Don't spend money. Use https://www.craiyon.com


This one is faster, I ported it https://replicate.com/kuprel/min-dalle


Additionally, it's also open-sourced on GitHub and can be self-hosted, with easy instructions to do so: https://github.com/kuprel/min-dalle


I can run DALL-E mini with the MEGA model with 12 GB of VRAM. What are the requirements for minDALL-E and MEGA, in terms of VRAM?


This produces dramatically worse results in my experience.


Not worse, but different. It depends on the prompt but DALL-E mini/mega seems to do better then DALL-E 2 for certain types of absurd prompts, such as the ones in /r/weirddalle


Yes, there are very sharp lines where it does and doesn't understand. It understands color and gender but not materials. I got very good outputs for "blue female Master Chief" but "starship enterprise made out of candy" was complete garbage.


Definitely worse-quality. Maybe more diverse for some prompts yeah.


[shudder]

I tried the first whimsical, benign thing I could think of: "indiana jones eating spaghetti." The results are clearly recognizable as that. But they are also a kaleidoscope of body horror; a Indiana Jones monster melted into Cthulu forms inhaling plates that are slightly not spaghetti.


Thankfully it doesn't introduce any researcher bias, doesn't ban people from using it on the basis of country, doesn't use your personal data like phone number...

And the best of all - it does have a meme community around it, and you can always donate if you feel it adds value to your life


The racist pollution came long before this product was a glimmer in our eye.


as far as I can tell, they also concatenate "On LSD" to every prompt as well.


That Twitter thread is full of people saying "yeah that doesn't seem to be true at all" so I'm hesitant to jump to conclusions even if we're deciding to believe random tweets.


This is funny because I work on a team that is using GPT-3 and to fix a variety of issues we have with incorrect output we've just been having the engineering team prepend/append text to modify the query. As we encounter more problems the team keeps tacking on more text to the query.

This feels like a very hacky way to essentially reinvent programming badly.

My bet is that in a few years or so only a small cohort of engineering and product people will even remember Dall-E and GTP-3 and someone cringe at how we all thought this was going to be a big thing in the space.

There's are both really fascinating novelties, but at the end of the day that's all they are.


How else would you specify the type of image you would like? Surely, if you were hiring a designer you would provide them with a detailed description of what you wanted. More likely, you would spend a lot of time with them maybe even hours and who knows how many words. For design work specifically to create a first mockup or prototype of a product or image it seems like DALL-E beats that initial phase hands down. It's much easier to type in a description and then choose from a set of images than it is to go back and forth with someone who may take hours or days to create renderings of a few options. I don't think it'll put designers out of work but I do think they'll be using it regularly to boost their productivity.


What are you using GPT-3 for in a commercial setting?


> Turns out that they randomly, silently modify your prompt text to append words like "black male" or "female".

I wonder what the distribution of those modifications is?


Today, when DALL-E was still free, my Dad asked me to try a prompt about the Buddha sitting by a river, contemplating. I did about 4 prompt variations, and one of them was an Asian female, if that gives any idea about the frequency (I should note that the depiction was of a young, slim, and attractive female Buddha, so I'm not sure they have the bias thing licked just yet).


In my little testing, diversity in ethnicities was achieved but not realistic given the context. I also got a few androgynous people as I asked for a male or a female and another gender was appended.


Diversity = black now? That’s even more racist.


Diversity has meant exactly that all the way since Bakke.


A dumb solution to a dumber problem.


it's a hard problem. at least they tried.


It's not a "problem," it's an unwanted shard of reality piercing through an ideological guise.


How's it NOT a problem? If I'm trying to produce "stock people images", and if it only gives me white men, it's clearly broken because when I ask for "people" I'm actually asking for "people". I'm having difficulty understanding how it can be considered to be working as intended, when it literally doesn't. Clearly, the software has substantial bias that gets in way of it accomplishing its task.

If I want to produce "animal images" but it only produces images of black cats, do you think there is any question whether it's a problem or not?


That is clearly overfitting due to unrepresentative training data.

The "issue" is a different one: that training data - IE, reality, has _unwanted_ biases in it, because reality is biased.

Producing images of men when prompting for "trash collecting workers" should not be much of a surprise: 99% of garbage collection/refuse is handled by men. I doubt most will consider this a "problem," because of one's own bias, nobody cares about women being represented for a "shitty" job.

But ask for picture of CEOs, and then act surprised when most images are of white men? Only outrage, when proportionally, CEO's are, on average, white men.

The "problem" arises when we use these tools to make decisions and further affect society - it has the obvious issue of further entrenching stereotypical associations.

This is not that. Asking DALLE for a bunch of football players, would expectedly produce a huddled group of black men. No issue, because the NFL are disproportionately black men. No outrage, either.

Asking DALLE for a group of criminals, likewise, produces a group of black men. Outage! Except statistically, this is not a surprise, as a disproportionate amount of criminals are black men.

The "problem" is with reality being used as training data. The "problem" is with our reality, not the tooling.

Except in the cases where these toolings are being used to affect society - the obvious example being insurance ML algorithms. et al - we should strive to fix the issues present in reality, not hide them with handicapped training data, and malformed inputs.


> This is not that. Asking DALLE for a bunch of football players, would expectedly produce a huddled group of black men. No issue, because the NFL are disproportionately black men. No outrage, either.

This is not great. Only about 57% of NFL players are black, and the percentage is more like 47% among college players. It would be better to at least reflect the diversity of the field, even if you don't think it should be widened in the name of dispelling stereotypes.

> Asking DALLE for a group of criminals, likewise, produces a group of black men. Outage! Except statistically, this is not a surprise, as a disproportionate amount of criminals are black men.

Only about 1/3 of US prisoners are black. (Not quite the same as "criminals" but of course we don't always know who is committing crimes, only who is charged or convicted.) That's disproportionate to their population, but it's not even close to a majority. If DALLE were to exclusively or primarily return images of black men for "criminals", then it would be reinforcing a harmful stereotype that does not reflect reality.


> Asking DALLE for a group of criminals, likewise, produces a group of black men. Outage! Except statistically, this is not a surprise, as a disproportionate amount of criminals are black men.

"criminals" producing most black people actually would be a perfect example of bias in DALL-E that is arguably racism.

Black people commit a diproportionate amoumt of crime (for a variety of socioeconomic reasons I won't get into here), but even so white people make up a majority of criminals (because white people are the largest ethnic group by far).

Thus, a random group of criminals, if representive of reality, should be majority white.


In the UK… “The Environmental Services Association, the trade body, said that only 14 per cent of the country's 91,300 waste sector workers were female.” So 2x dall-e searches should produce 1.2 women.


> Asking DALLE for a bunch of football players, would expectedly produce a huddled group of black men

I think, for about 95% of the world football is synonymous with soccer. Its kind of interesting that you take this particular example to represent what reality looks like statistically


Black people comprise 12.4% of the US population, yet they are represented at substantially above that in "OpenAI"'s "bias removal" process. Clearly it has, as you put it, substantial bias that gets in the way of accomplishing its task.


That's what Jerrrry is saying. Framing the reality of diversity in the world as a "problem" is wrong.


serious question: in what way is that not a “problem?”


It's not a problem in a few ways, let me know what you think (feel free to ask for clarification).

1. The training data would've been the best way to get organic results, the input is where it'd be necessary to have representative samples of populations.

2. If the reason the model needs to be manipulated to include more "diversity" is that there wasn't enough "diversity" in the training set then its likely the results will be lower quality

3. People should be free to manipulate the results how they wish, a base model without arbitrary manipulations of "diversity" would be the best starting point to allow users to get the appropriate results

4. A "diverse" group of people depends on a variety of different circumstances, if their method of increasing it is as naive as some of the are claiming this could result in absurdities when generating historical images or images relating to specific locations/cultures where things will be LESS representative


Well, it's a problem for the ideology.


Everything is an ideological war zone now. That's the world we live in now.


Perhaps its a problem you don't care about?


Honestly I would rather that they not try. I don't understand why a computer tool has to be held to a political standard.


There are legitimate reasons to reduce externalizations of societies innate biases.

A mortgage AI that calculates premiums for the public shouldn't bias against people with historically black names, for example.

This problem is harder to tackle because it is difficult to expose and resign the "latent space" that results in these biases; it's difficult to massage the ML algo's to identify and remove the pathways that result in this bias.

It's simply much easier to allow the robot to be bias/racist/reflective of "reality" (its training data), and add a filter / band-aid on top; which is what they've attempted.

when this is appropriate is the more cultured question; I don't think we should attempt to band-aid these models, but for more socially-critical things, it is definitely appropriate.

It's naive on either extreme: do we reject reality, and substitute or own? Or do we call our substitute reality, and hope the zeitgeist follows?


That's great, but by doing so you are also inadvertently favoring, in your example, the people with black names. For example, Chinese people save on average, 50 times more than Americans according to the Fed [1]. That would mean they would generally be overrepresented in loan approvals because they have a better balance sheet. Does that necessarily mean that Americans are discriminated against in the approval process? No.

My question to you is: is an algorithm that takes no racial inputs (name, race, address, etc) yet still produces disproportionate results biased or racist? I say no.

[1] https://files.stlouisfed.org/files/htdocs/publications/es/08...


I would agree that it is not.

The government, and many people, have moved the definition and goal posts; so that anything that has the end result of a non-proportional uniformity can be labeled and treated as bias.

Ultimately it is a nuanced game; is discriminating against certain clothing or hair-styles racist? Of course. Yet, neither of those are explicitly tied to one's skin color or ethnicity, but are an indirect associative trait because of culture.

In America, we have intentionally muddled the waters of demarcation between culture and race, and are starting to see the cost of that.


> A mortgage AI that calculates premiums for the public shouldn't bias against people with historically black names, for example.

That's a great example, thanks. Also, I hope the teams working on that come up with a different solution...


Wouldn't the whole point of a "Mortgage AI" be to discriminate so the lenders hands could be clean.

Not that I agree with that but I don't see why you would build one otherwise, if you wanted discrimination free mortgages wouldn't the whole process by anonymized and minimal personal information rather than the current system of having to hand over every detail of your life.


It's not a political standard though. There is actual diversity in this world. Why wouldn't you want that in your product?


Fix the data input side, not the data output side. The data input side is slowly being fixed in real time as the rest of the world gets online and learns these methods.


In a sane world we would be able to tack on a disclaimer saying "This model was trained on data with a majority representation of caucasian males from Western English speaking countries and so results may skew in that direction" and people would read it and think "well, duh" and "hey let's train some more models with more data from around the world" instead of opining about systemic racism and sexism on the internet.


That wouldn't necessarily fix the issue or do anything. A model isn't a perfect average of all the data you throw into its training set. You have to actually try these things and see if they work.


I agree, the trust is broken now. Im going to skip on any AI that pulls that crap.


While their heart is in the right place, I'd like to challenge the idea that certain groups are so fragile that they don't understand that historically, there are more pictures of certain groups doing certain things.

It's a hard problem for sure. But remember, the bias ends with the user using the tool. If I want a black scientist, I can just say "black scientist".

Let me be mindful of the bias, until we have a generally intelligent system that can actually do it. I'm generally intelligent too, you know.


>But remember, the bias ends with the user using the tool. If I want a black scientist, I can just say "black scientist".

That is a really, really, narrow viewpoint. I think what people would prefer is that if you query "Scientist" that the images returned are as likely to be any combination of gender and race. It's not that a group is "fragile", it's that they have to specify race and gender at all, when that specificity is not part of the intention. It seems that they recognize that querying "Scientist" will predominantly skew a certain way, and they're trying in some way to unskew.

Or, perhaps, you'd rather that the query be really, really specific? like: "an adult human of any gender and any race and skin color dressed in a laboratory coat...", but I would much rather just say "a scientist" and have the system recognize that anyone can be a scientist.

And then if I need to be specific, then I would be happy to say "a black-haired scientist"


Have you seen the queries that are used to generate actually useful results rather than just toy demonstrations? They look a lot more like your first example except with more specificity. It'd be more like "an adult human of any gender and any race and skin color dressed in a laboratory coat standing by a window holding a beaker in the afternoon sun. 1950s, color image, Canon 82mm f/3.6, desaturated and moody." so if instead you are looking for an image with a person of a specific ethnicity or gender then you are for sure going to add that in along with all of the details. If you are instead worried about the bias of the person choosing the image to use then there is nothing short of restricting them to a single choice that will fix that and even in that case they would probably just not use the tool since it wasn't satisfying their own preferences.


This is a problem with generative models across the board. It's important that we don't skew our perceptions by GAN outputs as a society, so it's definitely good that we're thinking about it. I just wish that we had a solution that solved across the class of problems "Generative AI feeds into itself and society (which is in a way, a generative AI), creating a positive feedback loop that eventually leads to a cultural freeze"

It's way bigger than just this narrow race issue the current zeitgeist is concerned about.

But I agree, maybe I should skew to being optimistic that at least we're trying


Kind of funny that NN tech is supposed to construct some upper dimensional understanding, yet realistically cannot be expected to be able to generate gender and race indeterminate portrayal of a scientist.


Historically this is true, but it also seems dangerous to load up these algorithms with pure history because they'll essentially codify and perpetuate historical problems.


[flagged]


So you actually _wanted_ images that perpetuate the biases of the world?


Unfortunately, the method OpenAI may be using to reduce bias (by adding words to the prompt unknown to the user) is a naive approach that can affect images unexpectedly and outside of the domain OpenAI intended: https://twitter.com/rzhang88/status/1549472829304741888

I have also seeing some cases where the bias correction may not be working at all, so who knows. And it's why transparancy is important.


What a fascinating hack. I mean, yeah, naive and simplistic and doesn't really do anything interesting with the model itself, but props to the person who was given the "make this more diverse" instruction and said "okay, what's the simplest thing that could possibly work? What if I just append some races and genders onto the end of the query string, would that mostly work?" and then it did! Was it a GOOD idea? Maybe not. But I appreciate the optimization.


This sounds like something that could backfire very badly on certain prompts. "person eating a watermelon" for example.


Yes, I did. I want it to show world as it is not as people want it to be.


So you want the world to be the way it is?


Reread what I said: I WANT THE DALLE GENERATOR TO SHOW THE WORLD AS IT IS NOT AS PEOPLE WISH IT WAS.


Reread what I said, try engaging more of your brain this time.


How do you remove bias as long as humans are in the loop? Aren't they just swapping one bias for their own?


I thought the same thing but I think the commenter is making a joke, but I could be wrong.

I think they are suggesting that things like this (neural nets etc) work using bias, and by removing "bias" the developers are making the product worse.

It's a very sh!t comment if it's not a joke.


Just to be sure. Does "OC" here mean Original Comment?


Typo, now fixed.


Reducing bias means affecting the data, instead of letting the end user just choose an appropriate image generated by a clean data set.


Anything to end Corporate Memphis. Even, if we as illustrators will not have jobs or commissions. Let's hope that every creative human endeavour, painting, music, poetry will be replaced and removed from the commercial realm. Then maybe we will see artistic humanism instead of synthetic trans-humanistic "pop art".

Happily for me I stopped painting digitally long time ago. I even stopped calling myself "an artist". Nowadays I paint and draw only with real medium and call all of that "Archivist craftsmanship with analogue medium". :)


> Starting today, users get full usage rights to commercialize the images they create with DALL·E, including the right to reprint, sell, and merchandise. This includes images they generated during the research preview.

So DALL·E 2 is going to restart, revive and cause another renaissance of fully automated and mass generated NFTs, full of derivatives and remixing etc to pump up the crypto NFT hype squad?

Either way, OpenAI wins again as these crypto companies are going to pour tens of thousands of generated images to pump their NFT griftopia off of life support, reconfirming that it isn't going to die that easily.

Regardless of this possible revival attempt, 90% of these JPEG NFTs will eventually still die.


I don't see why there's any credible reason to expect that DALL-E will do anything at all to help those promoting the NFT silliness. Two separate issues.


If OpenAI could make a profit selling Dall-E images as NFT, I'd assume they'd do it, yeah?


Altman tried his hand at that by launching Worldcoin, and it didn't go well at all.

So I think it's prudent that OpenAI keep the 'sell shovels' business model instead with DALLE and GPT, at least for the time being.


I have tried playing around with the beta access to make it generate NFT art with different prompts, but in vail.

I think it has not been trained on NFT art (crypto punks and so on).


> I think it has not been trained on NFT art (crypto punks and so on).

How exactly are you defining NFT art?

I mean, it can literately be anything: Dorsey sold a screencap of his 1st tweet, Nadya from Pussy Riot did some creative stuff, and the Ape crap was the bulk of this stuff that got passed around.

I think what can be gleaned from that short-lived non-sense is that value is subjective and that the quality of a valuabe piece of 'art' is equally as hard to define. Much the same with its predecessor: cryptokitties.


Heads up: I think you meant "in vain" rather than "in vail". However, a similar phrase is "to no avail" which also means that something was not successful.


I think you meant "in vain" rather than "in vein".


I sure did! Thank you, I've corrected that now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: