I was supposed to be making a video game, but got a bit sidetracked when DALL·E came out and made this website on the side: http://dailywrong.com/ (yes I should get SSL).
It's like The Onion, but all the articles are made with GPT-3 and DALL·E. I start with an interesting DALL·E image, then describe it to GPT-3 and ask it for an Onion-like article on the topic. The results are surprisingly good.
Somehow these articles are more readable than typical AI-generated search engine fodder... Is it because I'm entering the site with an expectation of nonsense?
Probably because, by the creator's own admission, the articles are heavily cherry-picked to make sure the output is decent, which is probably a lot more human effort than goes into the aforementioned search engine fodder.
That's... so weirdly ironic I can't even! Blogspam websites are made by real humans with little oversight, while a literal AI with oversight generates better results.
That said, with a little tweaking, these technologies can - and probably already are - being used to churn out blogspam websites left and right, fully automatic.
Feels like the headlines could be generated similar to the style of "They Fight Crime!"
"He's a hate-fuelled neurotic farmboy searching for his wife's true killer. She's a tortured insomniac snake charmer from a family of eight older brothers. They fight crime!"
>He's an unconventional gay paranormal investigator moving from town to town, helping folk in trouble. She's a violent motormouth wrestler from the wrong side of the tracks. They fight crime!
>He's a Nobel prize-winning sweet-toothed rock star who believes he can never love again. She's a strong-willed communist widow with a knack for trouble. They fight crime!
>He's an obese white trash barbarian with a secret. She's a virginal thirtysomething traffic cop with the power to bend men's minds. They fight crime!
The results with things that are artworks or more general concepts are fascinating, but there is for sure something creepy with "photorealistic" human eyes and faces going on...
If you want to see some really creepy AI generated human "photo" faces, take a look at Bots of New York:
Unfortunately the content of that project is a hostage of Facebook now - similar to ransomware gangsters they force you to do something to get the data, in this case you need to create an account and take part in that global surveillance network. I do not understand why people do that.
Hyperbole will get you nowhere good (Just ask RMS).
How about wording your comment in a way that highlights why it’s a shame these pictures aren’t accessible for those without a Facebook account, and skip the whole “you’re murdering puppies” bit?
We joke about it, but an early and very cheap robotic floor cleaner I had was one of those weasel balls constrained in a flat ring harness with a dusting cloth underneath. It was entertaining and not completely useless.
Put a guinea pig in there and you'd get the same effect.
Actually got a chuckle out of the duck one (http://dailywrong.com/man-finally-comfortable-just-holding-a...). Thanks! I hope your keep generating them. Kind of wish there weren't a newsletter nag, but on the other hand it adds to the realism. Could be worthwhile to generate the text of the nag with gpt too; call it a kind of lampshading.
Haha, I was in a very similar boat when I built https://novelgens.com -- I was also supposed to be making a video game, but got a bit sidetracked with VQGAN+CLIP and other text/image generation models.
Now I'm using that content in the video game. I wonder if you could use these articles as some fake news in your game, too. :)
At first I came up with them myself, but found that it often comes up with better ones, so I ask it for variations.
I think I got it to even fill the title given a picture, something like “Article picture caption: Man holding an apple.
Article title: ...”. Might experiment more with that in the future.
Hmmm, these seem less deranged than headings from the previous Markov-chain bots—and kinda less interesting because of that.
I guess Markov chains for the headings, Dall-E for images, maybe GPTx for comments. And/or the GPT models should be made wackier somehow—less coherent, perhaps.
This a fucking fantastic site, it’s absolutely hilarious, and I’ve bookmarked it - I kinda unironically want to set it as my home page - but just a heads up that the CSS is broken for me on my iPhone SE2.
The images don’t scale properly with the rest of the site, they’re massive compared to the content.
I’m curious, if they’re only making DALL-E accessible now, and if GPT-3 was never really accessible (as far as I know). How do you have access to these things to generate text and images?
How do you generate the original image? And what about the subsequent images, do they come automatically from the text? I'd love to know more about the process.
I have been having a blast with DALL-E, spending about an hour a day trying out wild combinations and cracking my friends up. I cannot imagine getting bored of it; it's like getting bored with visual stimulus, or art in general.
In fact, I've been glad to have a 50/day limit, because it helps me contain my hyperfocus instincts.
The information about new pricing is, to me as someone just enjoying making crazy imagines, a huge drag. It means that to do the same 50/day I'd be spending $300/month.
OpenAI: introduce a $20/month non-commercial plan for 50/day, and I'll be at the front of the line.
I think people don't realize how huge these models really are.
When they're free, it's pretty cool. But charge an amount where there's actual profit in the product? Suddenly seems very expensive and not economically viable for a lot of use cases.
We are still in the "you need a supercomputer" phase of these models for now. Something like DALLE mini is much more accessible but the results aren't good enough. Early early days.
> I think people don't realize how huge these models really are.
They really aren't that large by the contemporary scaling race standards.
DALLE-2 has 3.5B parameters, which should fit on an old GPU like Nvidia RTX2080, especially if you optimize your model for inference [1][2] which is commonly done by ML engineers to minimize costs. With optimized model, your memory footprint is ~1 byte per parameter, and some less than 1 ratio (commonly ~0.2) of all parameters to store intermediate activations.
You should be able to run it on Apple M1/M2 with 16GB RAM via CoreML pretty fine, if an order of magnitude slower than on an A100.
Training isn't unreasonably costly as well: you can train a model given O(100k)$ which is less than a yearly salary of a mid-tier developer in silicon valley.
There is no reason these models shouldn't be trained cooperatively and run locally on our own machines. If someone is interested in cooperating with me on such a project, my email is in the profile.
It's true that image models are much less of a burden on GPU VRAM than a model like BLOOM where fitting it into a few A100s is ideal, but these diffusion models are a PITA for a ordinary hobbyist in terms of total compute: the CLIP pass over the text input is almost free, but then you feed it into the diffusion model, for one sample you'll be doing 10-100 forward passes (depending on how fancy the diffusion methods are - maybe even 1000 passes if you're using older/simpler ones), and for interactive use, you really want more like 6-9 separate samples; then they have to pass through the upscalers, which are diffusion models themselves and need to do a bunch of forward passes to denoise it. If you do 1 sample in 10s on your 1 consumer GPU, which would be pretty good, 6-9 means a joykilling minute+ wait. And then the user will pick a variation or edit one or tweak the prompt, and start all over again! It's like being back on 16kb dialup waiting for the newest .com to load.
All valid points, of course. As an independent explorer I adapted my workflow to use night's worth of workstation compute to generate a crop of new images from a simple templated prompt language.
It also helps a lot to have at least two presets - "exploratory" and "hq", to minimize iteration time and maximize quality of promising prompts.
Still, I think optimization of diffusion models for efficient inference isn't yet pushed to the limits. At least if we look at what's available to the public - AFAIK public inference software distributions didn't even quantize their weights.
min-dalle with MEGA model params takes about 20 seconds to generate 9 images on a RTX3080 if you run it locally, including the param loading time (which is about 8 of those seconds)
I've been playing with neural text to speech APIs recently and wondered the same. Why are all the cloud providers charging around $16 per million characters? Do they really need to run on massive machines or are they just scalping/trying to recover training costs?
In the unCLIP/DALL-E 2 paper[0], they train the encoder/decoder with 650M/250M images respectively. The decoder alone has 3.5B parameters, and the combined priors with the encoder/decoder are the in the neighborhood of ~6B parameters. This is large, but small compared to the name-brand "large language models" (GPT3 et. al.)
This means the parameters of the trained model fit in something like 7GB (decoder only, half-precision floats) to 24GB (full model, full-precision). To actually run the model, you will need to store those parameters, as well as the activations for each parameter on each image you are running, in (video) memory. To run the full model on device at inference time (rather than r/w to host between each stage of the model) you would probably want an enterprise cloud/data-center GPU like an NVIDIA A100, especially if running batches of more than one image.
The training set size is ~97TB of imagery. I don't think they've shared exactly how long the model trained for, but the original CLIP dataset announcement used some benchmark GPU training tasks that were 16 GPU-days each. If I were to WAG the training time for their commercial DALL-E 2 model, it'd probably be a couple of weeks of training distributed across a couple hundred GPUs. For better insight into what it takes to train (the different stages/components of) a comparable model, you can look through an open-source effort to replicate DALL-E 2.[2]
I know you're half joking here but there are more consumer-affordable versions like the Geforce RTX 3090ti ($1600 for 24GB). It may not do CUDA work as fast as the A100 but it'll be able to run the model.
For the half-precision version at 7GB there are a ton more options (the RTX 3060 has 12GB for example at ~$450).
Thanks for the really excellent insight and links.
I do hope that the conversation starts to acknowledge the difference between sunk costs and running costs.
Employees, office leases and equiment are all happening, regardless and ongoing.
Training DALL-E 2: very expensive, but done now. A sunk cost where every dollar coming in makes the whole endeavor more profitable.
Operating the trained model: still expensive, but you can chart out exactly how expensive by factoring in hardware and electricity.
I believe that by not explicitly separating these different columns when discussing expense vs profit, we're making it harder than it needs to be to reason about what it actually costs every time someone clicks Generate.
Facebook released over 100 pages of notes a few months ago detailing their training process for a model that is similar in size. Does anyone have a link? I can't seem to find it in my notes, googling links to posts that have been removed or are behind the facebook walled garden.
But I seem to remember they were running 1,000+ 32gb GPUs for 3 months to train it and keeping that infrastructure running day-to-day and tweaking parameters as training continued was the bulk of the 100 pages. It is beyond the reach of anybody but a really big company, at least in the area of very large models, and the large models are where all the recent results are. I wish I was more bullish on algorithm improvements meaning you can get better results on less hardware; there will definitely be some algorithm improvements, but I think we might really need more powerful hardware too. Or pooled resources. Something. These models are huge.
> Facebook released over 100 pages of notes a few months ago detailing their training process for a model that is similar in size. Does anyone have a link?
If I had to guess, based on other large models, it’s in the range of hundreds of GBs. It might even be in the TB range. To host that model for fast production SaaS inference requires many GPUs. An A100 has 80GB, so a dozen A100s just to keep it in memory, and more if that doesn’t meet the request demand.
Training requires even more GPUs, and I wouldn’t be surprised if they used more than 100 and trained over 3 months.
> Training requires even more GPUs, and I wouldn’t be surprised if they used more than 100 and trained over 3 months.
Based on this blog post where they scale to 7,500 'nodes', they say:
> A large machine learning job spans many nodes and runs most efficiently when it has access to all of the hardware resources on each node.
So I wouldn't be surprised if they do have a total of 7500+ GPUs to balance workloads between. TO add, OpenAI has a long history of getting unlimited access to Google's clusters of GPUs (nowadays they pay for it, though). When they were training 'OpenAI Five' to play Dota 2 at the highest level, they were using 256 P100 GPUs on GCP[0] and they casually threw 256 GPUs at 'clip' for a short while in January of 2021[1].
Training is obviously very expensive, and ideally they'd want to recoup that investment. But I'm curious as to what the marginal cost is to run the model after it's trained. Is it close to 30 images per dollar, like what they're charging now? Or do training costs make up the majority of that price?
How hard would it be to spin off a variant of this with more focused data models that cater to specific styles or art-types? Like say, a data model only for drawing animals. Or one only for creating new logos?
Generative networks are worth exploring for randomly creating things in a given category, see this recent HN post about food pictures: https://news.ycombinator.com/item?id=32167704
I’ve been creating generative art since 2016 and I’ve been anxiously waiting for my invite. I wont be able to afford to generate the volume of images it takes to get good ones at this price point.
I can afford $20/mo for something like this but I just can’t swing $200 to $300 it realistically takes to get interesting art out of these CLIP-centric models.
Heck, the initial 50 images isn’t even enough to get the hang of how the model behaves.
MidJourney is a good alternative. Maybe not quite as good as DALL-E, but close enough, without a waitlist and with hobby-friendly prices ($10/month for 200 images/month, or $30 for unlimited)
If you’re technically inclined, I urge you to explore some newer Colabs being shared in this space. They offer vastly more configurable tools, work great for free on Google Colab, are straightforward to run on a local machine.
Meanwhile we should prepare ourselves for a future where the best generative models cost a lot more as these companies slice and dice the (huge) burgeoning market here.
Yeah I've been having fun with it recreating bad Heavy Metal album art (https://twitter.com/P_Galbraith/status/1548597455138463744). It's good, but surprisingly difficult to direct it when you have a composition in mind. A few of these I burned through 20-30 prompts to get and I can't see myself forking up hundreds of dollars to roll the dice.
My brother is a digital artist and while excited at first he found it to be not all that useful. Mainly because it falls apart with complex prompts, especially when you have a few people or objects in a scene, or specific details you need represented, or a specific composition. You can do a lot with in-painting but it requires burning a lot of credits.
I'm sure the novelty wears off. But I'm already coming up with several applications for it.
On the personal side, I've been getting into game development, but the biggest roadblock is creating concept art. I'm an artist but it takes a huge amount of time to get the ideas on paper. Using DALLE will be a massive benefit and will let me expedite that process.
It's important to note that this is not replacing my entire creative process. But it solves the issue I have, where I'm lying in bed imagining a scene in my mind, but don't have the time or energy to sketch it out myself.
>I'm an artist but it takes a huge amount of time to get the ideas on paper.
this is what I really like about DALLE-mini, it's ability to create pretty good basic outlines for a scene. it's low resolution enough that there's room for your own creativity while giving you a good template to spring off from. things like poses, composition of multiple people, etc.
I've used AI to try out different composition/layout possibilities. Sometimes it comes up with an arrangement of objects I hadn't considered. Sometimes it uses colors in really interesting ways. Great jumping-off point for drafting.
I'm in exactly the same boat. I got tired of waiting around for openai to take me off their waitlist and used DALLE-mini (now craiyon) to generate large batches of concept art for a project I was working on. I picked the ones that, despite being low-res blobs, conveyed the right mood or had an interesting composition of elements. I then layered my favorite elements of those and painted over, adding details wherever I wanted, and came out with something much better than I would've been able to make alone.
I've been having a blast using it in my dungeons and dragons games. If you type in, say, "dnd village battlemap" it's really pretty usable. Not to mention the wild magic weapons and monsters it can come up with.
I’ve been using generative models as an art form in and of themselves since the mid/late 2010s. I like generating mundane things that bump right up along the edge of the uncanny valley and finding categories of images that challenge the model (e.g. for CLIP, phrases that have a clear meaning but are infrequently annotated).
Generating itself can be art. I’m not going to win a Pulitzer here, it’s for the personal joy of it, but I will certainly never get tired of it.
I don't know how to say this without sounding like a jerk, even if I bend over backwards to preface that this isn't my intent: this statement says more about your creativity and curiosity than a ceiling on how entertaining DALL-E can be to someone who could keep multiple instances busy, like grandma playing nine bingo cards at once.
Knowing that it will only get better - animation cannot be far behind - makes me feel genuinely excited to be alive.
Dall-e has novelty, but no intent, meaning, originality. Yes the author can be creative at generating prompts, but visually I haven’t seen it generate anything that feels artistically interesting. If you want pre-existing concepts in novel combinations then yes it works.
It’s good at “in the style of” but there’s no “in a new style”.
It has a house style too that tends to feel Reddit-like.
Isn't every "new style" just a novel combination of pre-existing concepts? Nothing new under the sun and all that.
Either way, I feel like your view is an exhaustingly pessimistic take on AI-generated art. I mean, sure, most of what DALL-E generates is pretty mundane, but other times I have been surprised at how bizarre and unique certain images are.
You seem to imply that because an AI is not human, its art is not imbued with meaning or originality -- but I find that an AI's non-human nature is precisely what _makes_ the art so original and meaningful.
> Isn't every "new style" just a novel combination of pre-existing concepts?
At the extreme limit, maybe. But within art or even digital art, then new styles are actually not that rare, humans are pretty good at generating them. Maybe they grab inspiration from nature, visual phenomena, etc, so in that sense it's not "new" but it is "new to the medium". In art you new styles all the time. DALL-E will never do that by it's very nature, and so it's easy to see how it's boring.
And that's just the stylistic level, but it's happening at almost all levels. It's almost definitional that it doesn't innovate, only remix.
It's strange framing this as pessimistic, it's not really optimistic nor pessimistic, it just is. It's also not AI, and that's important to realize: it's a statistical model that generates purely based on pre-existing training. It's very nature is without-meaning and without-originality. That doesn't detract from it being cool or interesting or helpful or enjoyable. I find it cool and useful.
But it's not innovative or creative or meaningful by itself.
> It's very nature is without-meaning and without-originality.
That's a pretty bold claim. What are humans but statistical models that generate based on pre-existing training -- and yet, are humans not with-meaning and with-originality?
Of course, human brains are some large order of magnitude more complex than the neural nets that underlie most AIs, but we can already see areas where these "simplistic" AIs outperform humans on specific tasks. So what prevents the arts from being one of those areas? If not now, in some not-too-distant future?
One thing humans seem to have that is beyond statistics is creativity. In that stats explain what is, and creativity takes what is and makes a dot outside that other people appreciate. No model has demonstrated even attempting the dot, let alone having a good chance of success. What DALL-E does is draw a dot between a few existing points, but never outside.
Humans incidentally have three more things that make for interestingness: emotions derived from feelings, long term memory, and roughly storytelling (~ an ability to turn long term memory into long form recall with a specific reaction intended to a specific audience). I don’t think ML has any of those, but it likely (eventually) gets the latter two.
Meaningness/interestingness require at least a few of those, and it’s what puts most art in a different category than games or math.
I would say it helps to first think what you want to get out of it.
If your task is "show me something that breaks through our hyperspeed media", then I guess some obscure museum is a better place than an ML model.
If your task is "find the best variation on theme X" or "quick draft visualization", they are often very useful. I am sure there will be many further tasks to which current and future models will be well suited. They are not magic picture machines. At least not yet.
I've got thousands of queries, and a LOT of them have generated things I genuinely see as having artistic value, I've probably got 200~ images that I would 100% hang/display in my home ('woodcarving' and 'stone carving' queries rarely disappointed me)
is it some unique form of art, no, but can it produce works I want to see in a medium or style that already exists to a level that it is believable as authentic human art, absolutely.
People like me, with zero artistic ability, are able to take part in creating visually pleasing works. I imagine artists would also find great value in it, being able to feed a few queries in with what they are thinking of creating to draw inspiration, or even putting their own work in and generating variations that may lead to inspiration for new works.
Same. I generated several thousand images and found it a chore, outside of the daily theme on the discord server, to try and even think of anything to query. It was also discouraging when sometimes you'd hit pure gold for 4-5 of the 6 images, then you'd be lucky to get 1 out of the 6 that was worth saving for several more queries. Now it's down to 4 images and... yeah...
I'm not going to try and profit from the images, I don't need them for any business uses or anything, so to me it was a fun for a while and now just something I'll largely put out of mind.
I was actually forcing myself to go through the whole 50/day because I knew it wouldn't be free forever, and I wanted to get better at it. I'm glad I did, but I wish I did more.
MidJourney gives ~unlimited generation for $30/month, and is nearly as good. Unlike DALL-E it doesn't deliberately nerf face generation. I've been having a blast.
> Curbing misuse: To minimize the risk of DALL·E being misused to create deceptive content, we reject image uploads containing realistic faces and attempts to create the likeness of public figures, including celebrities and prominent political figures. We also used advanced techniques to prevent photorealistic generations of real individuals’ faces
> trying out wild combinations and cracking my friends up
Wait until the next edition comes out where it automatically learns the sorts of things that crack you up and starts generating them without any input from you.
How many years away is this? 5? 10? I seriously doubt it'll be longer than that, considering the recent advances of autoregressive models, and the overall trajectory of ML the last decade.
Since many people will start generating their first images soon, be sure to check out this amazing DALL-E prompt engineering book [0]. It will help you get the most out of DALL-E.
I hope that every science teacher that can - provide this to every student. This is the future they live in now. They should know these as well as they know how to install an app on a device.
Wait until we have a DALL-E -- Enabled Custom EMOJI stream - whereby, every text you send out has it corresponding DALL-E resultant image for every txt --
Then we can compare images from different people at different times but the prompt was identical... and see what the resultant library of emoji<-->PROMPT looks like?
What about using Dall-e as a watermark for 'nft' signature 'notary' of an email.
If DALL-E provided a unique PID# for every image - and that PID was a key that only the OP runner of the image has - it can be used to authenticate an image to a text source... ??? (Assuming that no two prompts have the same result ever, but assigning a unique id that CAN be used to replay the image to verify it was generated when an original email/SMS was actually sent - it could be a unique way to timestamp authenticity/provenance of a thing...
Thanks for this! A bit of prompt engineering know-how will help me get the most bang for the buck out of this beta. I also just want to say that dallery.gallery is delightfully clever naming.
Surprised by the lack of comments on the ethics of DALL-E being trained on artists content whereas copilot threads are chock full of devs up in arms over models trained on open source code. Isn’t it the same thing?
I recently talked with a concept artist about DALL-E and first thing they mentioned was "you know that's all stolen art, right?" Immediately made me think of GitHub Copilot.
However the artists being featured in DALL-E's newsletters can't stop gushing about 'the new instrument they are learning how to play' and other such metaphors that are meant to launder what's going on.
My theory is that the professions most at-risk for automation are acting on their anxieties. Must not be a lot of freelance artists on HN, and a whole lot of programmers.
I think the artists have an even clearer case. I don't think GitHub Copilot is ready to steal anyone's job yet. But DALL-E is poised to replace all formerly commissioned filler art for magazines, marketing sites, and blogs. Now the only point to hiring a human is to say you hired a human. Our filler art is farm-to-table.
Having used copilot for over a year now, it isn't there to replace programmers. It isn't called GitHub Pilot, and it doesn't do well with generating original ideas. Sure, if your job is to create sign up forms in HTML then sure, it'll do your job in a second, but if you're creating more complex systems, copilot is just there to help save you time when writing code (which is just implementing ideas).
Think of it like a set of powertools saving you time over manual tools.
Agreed. But I also understand the anxieties. I'd say a very significant % of programmers are not creating complex systems; they're coding up mostly CRUD UIs that have a great deal in common. It's getting to that inflection point where less programmers might be needed [??] ... Let's wait and see.
I first read the artist's reply as "you know all art is stolen, right" which made more sense to me. If you look at the history of art, you'd also know that it's true.
> My theory is that the professions most at-risk for automation are acting on their anxieties
That's not my problem with Copilot. I think tools and methods that can free human from some amount of work are good in a correctly organized society. They have been existing for a long time, too. They let us free time for other stuff that can't be automated. This extra free time could theoretically let us have more leisure or rest time too. I also trust myself to be able to learn another job if mine can ever be automated.
But I don't want my work to be reused under terms I don't approve of. There are some things I don't want to help with my work and this is reflected in the licenses I choose. I totally sympathize with artists who don't want their work to be reused in ways they don't like. I don't find this hard to understand. I also don't find it hard to understand that if an artist do some work that you should pay for to use is not happy with their work being reused without being paid. They should get paid a tiny bit for each generated art if theirs is in the training set, and only if they approve this use. That's would be only fair, the set would not be possible without those artists.
(Good for me, my personal code is not on GitHub for other, older, reasons)
This entire concept of AI learning using copyrighted works is going to be really tested in courts at some point, perhaps very soon, if not already.
However if the result is adequately different, I don't see how it is different from someone viewing other's work and then being "inspired" and creating something new. If you think about it the vast majority of things are built on top of existing ideas.
Quite true. Best case, we're seeing DJ Spooky style culture jamming/remixing. But more likely it is as you write.
On the other hand, the market for stock photography was already decimated by the internet. Where previously skilled photographers would create libraries of images to exemplify various terms and sell these as stock, in the last decade or so, an art director with the aid of a search engine could rapidly produce similar results.
Of course. Because the majority of the tech bros on this site are self centered and think of arts as a lowly field deserving of no respect. While something slightly resembling some boilerplate lego code they wrote is a criminal act to learn from.
If you really want to learn, visit github.com. There are over 200 million freely available, open source code repositories for you to study and learn from.
Surely being suprised by the lack of comments on the ethics of DALL-E on HN is the same as the lack of comments on the ethics of co-pilot on some artists forum. I highly doubt you're going to find r/artists or whatever up in arms about co-pilot, even if they are about DALL-E.
Well, I can't go ask Caravaggio or Gentileschi to paint my query since they've been dead hundreds of years. But being able to to feed a query containing much more modern concepts in and get a baroque style painting in that specific style is wonderful.
Plus what has already been said about a lot of art being an imitation/derivation of previous works.
It's because the furor over AI replicating human artists already played out over earlier AI iterations. Remember when thisfursonadoesnotexist.com was flamed for stealing furry art? Turns out that many artists shared an extremely generic style that the AI could easily replicate.
It feels like it would be good for this to not be a legal grey area. Whether it's considered a large copyright infringement conspiracy or a form of fair use, it would be good if the law reached a position on that sooner rather than later.
What is ok if they do something different, and not ok if they just repaint the same thing, depending on the source. And this software surely repaints a lot of the same things, so the only question is the source.
In such analogy, Copilot is no different than looking at someone code a bubble sort and then later doing one yourself. Some people on this forum had issue with that.
There are a few of those discussions going on in artist's circles these days. I imagine they'll get sued for doing this, but it'll probably take a very famous artist or a hell of a class action suit to make it happen.
> Preventing harmful images: We’ve made our content filters more accurate so that they are more effective at blocking images that violate our content policy — which does not allow users to generate violent, adult, or political content
What is defined as political content? Can I prompt DALL-E to draw ”Fat Putin”?
Something I haven’t seen anyone talking about with these huge models: how do future models get trained when more content online is model generated to start with? Presumably you don’t wanna train a model on autogenerated images or text, but you can’t necessarily know which is which.
This precise thing is causing a funny problem in specialty areas. People are using e.g. Google Lens to identify plants, birds and insects, which sometimes returns wrong answers e.g. say it sees a picture of a Summer Tanager and calls it a Cardinal. If the people then post "Saw this Cardinal" and the model picks up that picture/post and incorporates it into its training set, it's just reinforcing the wrong identification..
That's not really a new problem, though. At one point someone got some bad training data about an old Incan town, the misidentification spread, and nowadays we train new human models to call it Macchu Picchu.
It's a cybernetic feedback system. Dalle is used to create new images, the images that people find most interesting and noteworthy get shared online, and reincorporated into the training data, but now filtered through human desire.
I wonder if human artists can demand that their work not be used for modelling. So as the robots are stuck using older styles for their creations, the humans will keep creating new styles of art.
In this situation, the low-background steel is the MS-COCO dataset, associated with the Fréchet inception distance computed by comparing the statistical divergence between the high-level vector outputs of passing MS-COCO images through Google’s InceptionV3 classifier, and passing DALL-E images (or its competitors) through it.
For now at least, there is a detectable difference in variety.
One interesting comment about this is that some models actually benefit from being fed their own output. Alphafold for instance was fed with its own 'high likelihood' outputs (as demis hassabis described in his lex friedman interview).
Training on auto generated images collected off the Internet is gonna be fine for a while since the images surfacing will be curated (ie. selected as good/interesting/valuable) still mostly by humans.
> Getting humans to refine your data is the best solution right now
Source ?
All those big models are trained with data for which the source is not known or vetted. The amount of data needed is not human-refinable.
For example for language models we train mostly on subsets of CommonCrawl + other things. CommonCrawl data is “cleaned” by filtering out known bad sources and with some heuristics such as ratio of text to other content, length of sentences etc.
The final result is a not too dirty but not clean huge pile of data that comes from millions of sources that no human as vetted and that no one in the team using the data knows about.
The same applies to large images dataset, e.g. Laon 400m that also comes from CommonCrawl and is not curated.
At least cleaning it up is an embarrassingly parallel problem, so if you had the resources to throw incentives at millions of casual gamers, you might make a nice dent on Clip.
Alternatively, making a captcha where half the data is unlabeled, and half is labeled, forcing users to categorize data for you as they log into accounts.
But how would you know? A random string of text or an image with the watermark removed is going to be very hard to distinguish generated from human written.
I think with the terms requiring explicitly telling which images/parts were generated, they could be filtered out and prevent a feedback loop of "generated in/generated out" images. I'm sure there will be some illegal/against terms of use cases there but the majority should represent fair use.
I fully expect stock image sites to be swamped by DALL-E generated images that match popular terms (e.g. "business person shaking hands"). Generate the image for $0.15. Sell it for $1.00.
DALLE images are still only 1024 px wide. Which has its uses, but I don’t think the stock photo industry is in real danger until someone figures out a better AI superresolution system that can produce larger and more detailed images.
You can obtain any size by using the source image with the masking feature. Take the original and shift it then mask out part of the scene and re-run. Sort of like a patchwork quilt, it will build variations of the masked areas with each generation.
Once the API is released, this will be easier to do in a programmatic fashion.
Note: Depending on how many times you do this... I could see there being a continuity problem with the extremes of the image (eg: the far left has no knowledge of the far right). An alternative could be to scale the image down and mask the borders then later scale it back up to the desired resolution.
This scale and mask strategy also works well for images where part of the scene has been clipped that you want to include (EG: Part of a character's body outside the original image dimensions). Scale the image down, then mask the border region, and provide that to the generation step.
Another commenter mentioned Topaz AI upscaling, and Pixelmator has the "ML Super Resolution" feature; both work remarkably well IMO. There are a number of drop-in and system default resolution enhancement processes that work in a pinch, but the quality is lacking compared to the commercial solutions. There are still some areas where DALL-E 2 is lacking in realism, but anyone handy with photo editing tools could amend those shortcomings fairly quickly.
On-demand stock photo generation probably is the next step, particularly when combined with other free media services (Unsplash immediately comes to mind). Simply choose a "look" or base image, add contextual details, and out pops a 1 of 1 stock photo at a fraction of the cost of standard licensing. It'll be very exciting seeing what new products/services will make use of the DALL-E API, how and where they integrate with other APIs, use cases, value adds like upscaling and formatting, etc.
I paid extra to get the higher quality model using the in-app purchase option. It crushes the phone's battery life, but runs in only ~10 seconds on an iPhone 13 Pro for a single 1000x1000 input image.
I've recently updated waifu2x and I've seen it now supports lots of algorithms for different use cases and contexts and it also supports other tasks like frame interpolation. So could you briefly explain in what is cupscale better than it?
Considering waifu2x is the name of an algorithm I assumed it was just that algorithm. There's also no mention of other models on the demo page or the Github page as far as I can see.
My bad! Yes Waifu2x is just a single algorithm, you are right.
The confusion originates from the fact that I was using a GUI project for Waifu2x called "Waifu2x Extension GUI" (https://github.com/AaronFeng753/Waifu2x-Extension-GUI) which other than Waifu2x also supports other algorithms like Real-ESRGAN, Real-CUGAN, SRMD, RealSR, Anime4K, RIFE, IFRNet, CAIN, DAIN, and ACNet.
So as you said Cupscale is surely more advanced than Waifu2x (the single algorithm), but do you think it's also better than Waifu2x Extension GUI?
Yes, but I usually find myself playing with this stuff when I have some free time and relaxing outside or on the couch, and it’s nice to be able to do it all on the phone.
Makes me imagine stock image sites in the near future. Where your search term ("man looks angrily at a desktop computer") gets a generated image in addition to the usual list of stock photos.
Maybe it would be cheaper. I imagine it would one day. And maybe it would have a more liberal usage license.
At any rate, I look forward to this. And I look forward to the inevitable debates over which is better: AI generation or photographer.
In my experience it doesn’t require that much cherry picking if you use a carefully crafted prompt. For example: “ A professional photography of a software developer talking to a plastic duck on his desk, bright smooth lighting, f2.2, bokeh, Leica, corporate stock picture, highly detailed”
Additionally, wherever it classically falls over (such as currently for realistic human faces), there will be second pass models that both detect and replace all the faces with realistic ones. People are already using models that alter eyes to be life-like with excellent results (many of the dalle-2 ones appear somewhat dead atm).
Even this image is just an illusion of a perfect photo, which is a blur for most part, see the face of duck. I had access since past 4 5 days and it fails badly whenever I tried to create any unusual scene.
For the first few days when it was announced I use to look deep even in real photos in search of generative artifacts. They are not so difficult to spot now, most of the times anyway.
If the price is low enough, you can have humans rank generated images (maybe using Mechanical Turk or a similar service), and from that ranking choose only the highest quality DALL-E generated images.
Yes I have. And I realized it as soon as I started experimenting that mind blowing results are mostly cherry picking.
It's very good at generating art style images. These kind of images are mostly amazing most of the times. But the Photorealistic images only work with cherry picking.
> And I realized it as soon as I started experimenting that mind blowing results are mostly cherry picking
Me and you must have very different definitions of "cherry picking". For prompts that fall within it's scope (i.e not something unusually complex or obscure) I get usable results probably 90% of the time.
Can you give me some examples of prompts that you tried where you found good results difficult to obtain?
I get bad results on unusual prompts, you are right about that.
It did generate good dslr like face closeups, as good as Nvidia does, most of the times but not always. Sometimes there are weird artifacts and face does not make sense.
Dslr style blurry photos are mostly good. From the looks of images I follow, imagen is probably more believable. Don't know how much cherry picking goes on there. See this thread [1] for example. I failed to generate image like this (honey dress) in dalle2.
So what's the loss? It's not like stock photos are the highest art form. Surely, for some people it means they need to change their business model, but all those just needing pictures to illustrate something the process will be much smoother.
There has been trouble with generating life-like eyes but a second pass with a model tuned around making realistic faces has been very successful at fixing that.
One of the commercial use cases this post mentions is authors who want to add illustrations to children's stories.
I wonder if there is a way for DALL-E to generate a character, then persist that character over subsequent runs. Otherwise, it would be pretty difficult to generate illustrations that depict a coherent story.
Example ...
Image 1 prompt: A character named Boop, a green alien with three arms, climbs out of its spaceship.
Image 2 prompt: Boop meets a group of three children and shakes hands with each one.
You can't do that. I can't see this working well for children's book illustrations unless the story was specifically tailored in a way that makes continuity of style and characters irrelevant.
As an aside, Ursula Vernon did pretty well under the constraint you described. She set a comic in a dreamscape and used AI to generate most of the background imagery: https://twitter.com/UrsulaV/status/1467652391059214337
It's not the "specify the character positions in text" proposed, but still a neat take on using this sort of AI for art.
You mean just generate a single large image with all the stuff you want for the whole story, and then use cropping and inpainting to get only the piece you want for each page?
Wait until someone trains a model like this, for porn.
There seems to be a post-DALLE obscenity detector on openAI's tool, as so far I've found it to be entirely robust against deliberate typos designed to avoid simple 'bad word lists'.
Ask it for a "pruple violon" and you get purple violins... you get the deal.
"Metastable" prompts that may or may not generate obscene (content with nudity, guns, violence as I've found) results sometimes shown non-obscene generations, and sometimes trigger a warning.
I’ve thought about this and in fact porn generation sounds like a good thing?? It ensures that it’s victimless. Of course, there is a problem with generation of illegal (underage) porn but other than this, I think it could be helpful for this world.
If all of the child porn industry switched to generated images they'd still be horrible people but many kids would be saved from having these pictures taken. So a commercial model should certainly ban it, but I don't think it's the biggest thing we have to worry about.
If I had to guess, I'd bet they have a supervised classifier trained to recognize bad content (violence, porn, etc) that they use to filter the generated images before passing them to the user, on top of the bad word lists.
This is mentioned, "content filters" are "blocking images that violate our content policy — which does not allow users to generate violent, adult, or political content, among other categories" and they "limited DALL·E’s exposure to these concepts by removing the most explicit content from its training data."
I suspect it's more a business restriction than a moral one. If OpenAI allows people to make porn with these tools, people will make a ton of it. OpenAI will become known as "the company that makes the porn-generating AIs," not "the company that keeps pushing the boundaries of AI." Being known as the porn-ai company is bad for business, so they restrict it.
Because it is a solid line (the policy rule) drawn across a fractal boundary (actual porn), and given lots of attempts you can find somewhere inside the line but across the boundary.
Stopping more than so many attempts makes this much harder / much less likely.
"Starting today, users get full usage rights to commercialize the images they create with DALL·E, including the right to reprint, sell, and merchandise. This includes images they generated during the research preview."
I assumed this was going to be the sticking point for wider usage for a long time. They're now saying that you have full rights to sell Dall-E 2 creations?
I think they are reacting to competition. MidJourney is amazing, was easier to get into, gives you commercial rights, and frankly I found more fun to use and even better output in most instances.
The only thing I don’t like about MidJourney is the Discord based interface. I think I can grok why Dave chose this route as it bakes in an active community element and allows users to pick up prompt engineering techniques osmotically… but I’d prefer a clean DALL-E style app and cli / api access.
In case you don’t know, you can at least PM the MidJourney bot so you have an uncluttered workspace.
It’s clearly personally preference, but I loathe Discord but love it for MidJourney. As you said, there’s an interactive element where I see other people doing cool things and adapting part of their prompts and vice versa. It really is fun. And when you do it in a PM, you have all your efforts saved. DALL-E is pretty clunky in that you have to manually save an image or lose it once your history rolls off.
I've completely changed my mind after spending the last few days neck deep in it around the clock. Sleep is overrated! MidJourney is awesome and the way it's implemented within Discord is a masterstroke of elegant simplicity.
Thanks. Yeah fair point; I haven’t ponied up for a subscription yet so am still stuck in public channels and often find my generations get lost in the stream. Imagine you’re right and having the PM option would change the experience drastically for the better albeit still within Discord’s visually chaotic environment.
I have access to both and they're good for different things. DALL-E seems somewhat more likely to know what you mean. Midjourney seems better for making interesting fantasy and science fiction environments.
For comparison, I tried generating images of accordions. Midjourney doesn't really understand that an accordion has a bellows [1]. DALL-E manages to get the right shape much of the time, if you don't look too closely: [2], [3]. Neither of them knows the difference between piano and button accordions.
Neither of them can draw a piano keyboard accurately, but DALL-E is closer if you don't look too hard. (The black notes aren't in alternating groups of two and three.)
Neither of them understands text; text on a sign will be garbled. Google's Parti project can do this [4], but it's not available to the public.
I expect DALL-E will have many people sign up for occasional usage, because if you don't use it for a few months, the free credits will build up. But Midjourney's pricing seems better if you use it every day?
MidJourney definitely struggles more with complex prompts from what I saw. If you like the output more, that’s subjective, but I think DALL•E is the leader in the space by a wide margin.
I think both have strengths and weaknesses, but I don’t disagree DALL-E in most instances is technically better at matching prompts. But I often enjoyed, artistically, the results of MidJourney more; it just felt fun to use and explore.
I guess it depends on what you like/enjoy? It's not good at photorealistic, but it comes up with some pretty entertaining (and pretty?) 'arty' type stuff. I go on regularly just to play around for fun.
Previously, OpenAI asserted they owned the generated images, so the new licensing is a shift in that aspect. GPT-3 also has a "you own the content" clause as well.
Of course, that clause won't deter a third party from filing a lawsuit against you if you commercialize a generated image too close to something realistic, as the copyrights of AI generated content still hasn't been legally tested.
AFAIK only people can own copyright (the monkey selfie case tested this), and machine-generated outputs don't count as creative work (you can't write an algorithm that generates every permutation of notes and claim you own every song[1]), so DALL-E-generated images are most likely copyright-free. I presume OpenAI only relies on terms of service to dictate what users are allowed to do, but they can't own the images, and neither can their users.
> DALL-E-generated images are most likely copyright-free
The US Copyright Office did make a ruling that might suggest that recently[1], but crucially, in that case, the AI "didn't include an element of human authorship." The board might rule differently about DALL-E because the prompts do provide an opportunity for human creativity.
And there's another important caveat that the felixreda.eu link seems to miss. DALL-E output, whether or not it's protected by copyright, can certainly infringe other copyrights, just like the output of any other mechanical process. In short, Disney can still sue if you distribute DALL-E generated images of Marvel characters.
DALL-E can generate recognizable pictures of Homer Simpson, Batman and other commercial properties. Such images could easily be considered derivative works of the original copyrighted images that were used as training input. I'm sure there are plenty of corporate IP lawyers ready to argue the point at court.
I'm kind of surprised that no one had found "verbatim copy" cases as were made with GitHub Copilot. Such exact copies in photography are likely easier to go for than with code snippets.
It might be interesting to find an image in the training set with a long, very unique description, and try that exact same description as input in DALL·E 2.
Of course it's unlikely to produce the exact same image, or if it does, you've also discovered an incredible image compression algorithm.
The monkey selfie was not derived from millions of existing works, and that is the difference. If an artist has a well-known art style, and this algorithm was trained on it and can copy that style, would the artist have grounds to sue? I don't know.
> If an artist has a well-known art style, and this algorithm was trained on it and can copy that style, would the artist have grounds to sue? I don't know.
While nothing has been commercialized yet on the DALLE2 subreddit, I know that it can do Dave Choe's work remarkably well. I also saw Alex Gray's work to be close, but not really identical either. It wasn't as intricate as his work is.
It will be interesting if this takes off and you have a sort of Banksy effect take over where unless it's a physical piece of art it doesn't have much value and is only made all the better because of some sort polemic attached to it, eg Girl with balloon.
I'm going to guess there's not going to be much value placed on anything out of DALLE for a long while. Digital art is typically worth much less than physical art and I would say these GAN images are going to worth less than digital art generated by human hand.
There will be outliers of course but I would be shocked if there's much of a market in it for at least the present.
I think the value will be in work produced that gets attached to things which are being sold. So, a book cover or an album cover. If a best selling novel used artwork from this system and it happened to be a very close copy of existing work, I could imagine the author of the original work suing for royalties.
Even if you imitate someone's style intentionally, they don't have grounds to sue. Style isn't copyrightable in the US. Whether DALL-E outputs are a derivative work is a different question, though
Sure but if you just release a basic copy of a Taylor Swift song you will get sued to oblivion. So the law seems (IANAL) to care about how similar your work is to existing works. DALL-E does not seem capable of showing you the work that influenced a result, so users will have no idea if a result might be infringing. What this means to me is that with many users, some of the results would be legally infringing.
Right but if that work isn’t significantly changed from the source, it could be ruled as infringement. DALL-E cannot tell the users (afaik) if a result is close to any source material.
If this were a concern, a user can easily bypass this by having a work-for-hire person add a minor transform layer on top of the DALL-E generated images right?
Image generating artificial intelligence is very analogous to a camera.
Both technologies have billions of dollars of R&D and tens of thousands of engineers behind supply chains necessary to create the button that a user has the press.
> "Starting today, users get full usage rights to commercialize the images they create with DALL·E, including the right to reprint, sell, and merchandise. This includes images they generated during the research preview."
>> And I just used it to create cover art for a book published in Amazon :)
Man... what a missed opportunity for Altman... he could have had a really good cryptocurrency/token with a healthy ecosystem and a creative based community if he didn't push this Worldcoin biometric harvesting BS had he just waited for this to release and coupled it with access to GPT.
This is the kind of thing that Web3 (a joke) was pushing for all along: revolutionary tech that the everyday person can understand with it's own token based ecosystem for access with full creative rights from the prompts.
I wonder if he stepped down from Open AI and put it in a figurehead as CEO could this still work?
> Why is using a token better than using money, in this case?
It would be better for OpenAI if it can monetize not just its subscription based model via a token to pay for overhead and for further R/D but also for it's ability to issue tokens it can freely exchange for utility on it's platform for exclusive access outside of it's capped $15 model and allow for pay as you go models for those who don't have access to it like myself as it's limited to 1 million users.
I don't want an account, and I think that type of gatekeeping wasn't cool during the gmail days either and I had early access back then too, but I'd still personally buy $100s of dollars worth of prompts right now since I think it is fascinating use of NLP and I'm just one of many missed opportunities and represent a lost userbase who just want access for specific projects. By doing this they can still retain the caps of useage on their platform and expand and contract them as they see fit without excluding others.
This in turn could justify the continual investment from the VC World into these projects (under the guise of web3) and allow them to scale into viable businesses and further expand the use of AI/ML into other creative spaces, which as a person studying AI and ML and a background in BTC, is what we all wanted to see instead of these aimless bubbles in things like Solana or yield farming via fake DeFi projects like Calesius that we've seen.
It would legitimize the use of a token for use of an ecosystem model outside of BTC, which to be honest doesn't really exist and has still a tarnished view with all these failed projects, while gaining reception amongst a greater audience since it's captivated so many since it's release.
It also means there will possibly be another renaissance of fully automated, mass generated NFTs and tons of derivatives and remixes flooding the NFT market in an attempt to pump the NFT hype again.
It doesn't matter, OpenAI wins anyway as these companies will pour hundreds of thousands into generated images.
It seems that the NFT grift is about to be rebooted again, such that it isn't going to die that quickly. But still, eventually 90% of these JPEG NFTs will die anyway.
These high photorealistic images can be generated on a mass-scale, completely automated without a human which ultimately cuts the need for an artist to do that.
They will be replaced by DALL·E 2 for creating these illustrations, book covers, NFT variants, etc opening up the whole arena to anyone to do this themselves. All it takes is to describe what they want in text and less than a minute, the work is delivered as little as $15.
OpenAI still wins either way. If a crypto company goes to using DALL·E 2 to generate photorealistic NFTs, they won't stop them and they will take the money.
Interesting. I got access couple weeks ago (was on waitlist since the initial announcement) and frankly as much as really want to be excited and like it, DALL-E ended up being a bit underwhelming. IMHO - often results that produced are of low quality (distorted images, or quite wacky representation of the query). Some styles of imagery are certainly a better fit for being generated by DALL-E, but as far as commercial usage I think it needs a few iterations and probably even larger underlying model.
I suspect you simply need to use it more with a lot more variation in your prompts. In particular, it takes style direction and some other modifiers to really get rolling. Run at least a few hundred prompts with this in mind. Most will be awful output... but many will be absolute gems.
It has, honestly, completely blown me away beyond my wildest imagination of where this technology would be at today.
Fundamentally I have two categories of issues I see with DALL-E, but please don't get me wrong -- I think this is a great demonstration of what is possible with huge models and I think OpenAI work in general is fantastic. I will most certainly continue using both DALL-E and OpenAI's GPT3.
(1) Between what DALL-E can do today and commercial utility is a rift in my opinion. I readily admit that I am have not done hundreds of queries (thank you folks for pointing that out, I'll practice more!) but that means that there is a learning curve, isn't it? I can't just go to DALL-E, mess with it for 5-10 minutes and get my next ad or book cover or illustration for my next project done?
(2) I think DALL-E has issues with faces and human form in general. Images it produces are often quite repulsive and take the uncanny valley to the next level. I absolutely surprise myself when I noticed thinking that images with humans DALL-E produced lack of... soul? Cats and dogs on the other hand it handles much better.
I done tests with other entities --- say cars or machinery -- and it generally performs so so with them too, often creating disproportionate representations of them or misplacing chunks. If you're querying for multiple objects on a scene it quite often melds them together. This is more pronounced in photorealistic renderings. When I query for painting-style it works mostly better. That said every now and then it does produce a great image, but with this way of arriving at it, how fast I'll have to replenish those credits?.. :)
All in all though I think I am underwhelmed mostly because my initial expectations were off, I am still a fan of DALL-E specifically and GPT3 in general. Now when is GPT4 coming out? :)
Dalle seems to only have a few "styles" of drawing that it is actually "good" at. It is particularly strong at these styles but disappointingly underwhelming at anything else, and will actively fight you and morph your prompt into one of these styles even when given an inpainting example of exactly what you want.
It's great at photorealistic images like this: https://labs.openai.com/s/0MFuSC1AsZcwaafD3r0nuJTT, but it's intentionally lobotomized to be bad at faces, and often has an uncanny valley feel in general, like this: https://labs.openai.com/s/t1iBu9G6vRqkx5KLBGnIQDrp (never mind that it's also lobotomized to be unable to recognize characters in general). It's basically as close to perfect as an AI can be at generating dogs and cats though, but anything else will be "off" in some meaningful ways.
It has a particular sort of blurry, amateur oil painting digital art style it often tries to use for any colorful drawings, like this: https://labs.openai.com/s/EYsKUFR5GvooTSP5VjDuvii2 or this: https://labs.openai.com/s/xBAJm1J8hjidvnhjEosesMZL . You can see the exact problem in the second one with inpainting: it utterly fails at the "clean" digital art style, or drawing anything with any level of fine detail, or matching any sort of vector art or line art (e.g. anime/manga style) without loads of ugly, distracting visual artifacts. Even Craiyon and DALLE-mini outperform it on this. I've tried over 100 prompts to get stuff like that to generate and have not had a single prompt that is able to generate anything even remotely good in that style yet. It seems almost like it has a "resolution" of detail for non-photographic images, and any detail below a certain resolution just becomes a blobby, grainy brush stroke, e.g. this one: https://labs.openai.com/s/jtvRjiIZRsAU1ukofUvHiFhX , the "fairies" become vague colored blobs here. It can generate some pretty ok art in very specific styles, e.g. classical landscape paintings: https://labs.openai.com/s/6rY7AF7fWPb5wWiSH0rAG0Rm , but for anything other than this generic style it disappoints hard.
The other style it is ok at is garish corporate clip art, which is unremarkable and there's already more than enough clip art out there for the next 1000 years of our collective needs -- it is nevertheless somewhat annoying when it occasionally wastes a prompt generating that crap because you weren't specific that you wanted "good" images of the thing you were asking for.
The more I use DALLE-2 the more I just get depressed at how much wasted potential it has. It's incredibly obvious they trimmed a huge amount of quality data and sources from their databases for "safety" reasons, and this had huge effects on the actual quality of the outputs in all but the most mundane of prompts. I've got a bunch more examples of trying to get it to generate the kind of art I want (cute anime art, is that too much to ask for?) and watching it fail utterly every single time. The saddest part is when you can see it's got some incredible glimpse of inspiration or creative genius, but just doesn't have the ability to actually follow through with it.
GPT3 has seen similar lobotomization since its initial closed beta. Current davinci outputs tend to be quite reserved and bland, whereas when I first had the fortunate opportunity to experience playing with it in mid 2020, if often felt like tapping into a friendly genius with access to unlimited pattern recognition and boundless knowledge.
I've absolutely noticed that. I used to pay for GPT-3 access through AI Dungeon back in 2020, before it got censored and run into the ground. In the AI fiction community we call that "Summer Dragon" ("Dragon" was the name of the AI dungeon model that used 175B GPT-3), and we consider it the gold standard of creativity and knowledge that hasn't been matched yet even 2 years later. It had this brilliant quality to it where it almost seemed to be able to pick up on your unconscious expectations of what you wanted it to write, based purely on your word choice in the prompt. We've noticed that since around Fall 2020 the quality of the outputs has slowly degraded with every wave of corporate censorship and "bias reduction". Using GPT-3 playground (or story writing services like Sudowrite which use Davinci) it's plainly obvious how bad it's gotten.
OpenAI needs to open their damn eyes and realize that a brilliant AI with provocative, biased outputs is better than a lobotomized AI that can only generate advertiser-friendly content.
So it got worse for creative writing, but it got much better at solving few-shot tasks. You can do information extraction from various documents with it, for example.
I mean yes, you’re right insofar as it goes. However nothing I am aware of implies technical reasons linking these two variables into a necessarily inevitable trade-off. And it’s not only creative writing that’s been hobbled; GPT3 used to be an incredibly promising academic research tool and given the right approach to prompts could uncover disparate connections between siloed fields that conventional search can only dream of.
I’m eager for OpenAi to wake up and walk back on the clumsy corporate censorship, and/or for competitors to replicate the approach and improve upon the original magic without the “bias” obsession tacked on. Real challenge though “bias” may pose in some scenarios, perhaps a better way to address this would be at the training data stage rather than clumsily gluing on an opaque approach towards poorly implemented, idealist censorship lacking in depth (and perhaps arguably, also lacking sincerity).
The face thing is weird in context of them not being worried about it infringing on the copyright of art. If they're confident it's not going to infringe on art copyright, why the worry it might generate the face of a real person.
I felt the same way. If anything, I realized how soulless and uninteresting faceless art is. Dall-E 2 goes out of its way to make terrible faces for, im guessing, deepfake reasons?
> Reducing bias: We implemented a new technique so that DALL·E generates images of people that more accurately reflect the diversity of the world’s population. This technique is applied at the system level when DALL·E is given a prompt about an individual that does not specify race or gender, like “CEO.”
Will it do it "more accurately" as they claim? As in, if 90% of CEOs are male, then the odds of a CEO being male in a picture is 90%? Or less "accurately reflect the diversity of the world’s population" and show what they would like the real world to be like?
If accurately reflects the world population then only one in six pictures will be a white person. Half the pictures will be Asian, another sixth will be Indian.
Slightly more than half of the pictures will be women.
That accurately represents the world's diversity. It won't accurately reflect the world's power balance but that doesn't seem to be their goal.
If you want to say "white male CEO" because you want results that support the existing paradigm it doesn't sound like they'll stop you. I can't imagine a more boring request.
Let's look at interesting questions:
If you ask for "victorian detective" are you going to get a bunch of Asians in deerstalker caps with pipes?
What about Jedi? A lot of the Jedi are blue and almost nobody on Earth is.
Are cartoon characters exempt from the racial algorithm? If I ask for a Smurf surfing on a pizza I don't think that making the Smurf Asian is going to be a comfortable image for any viewer.
What about ageism? 16% of the population is over sixty. Will a request for "superhero lifting a building" have an 16% chance of being old?
If I request a "bad driver peering over a steering wheel" am I still going to get an Asian 50% of the time? Are we ok with that?
I respect the team's effort to create an inclusive and inoffensive tool. I expect it's going to be hard going.
To a certain degree, yes. They care more about the image of the project than art. Considering a large amount of art depicts non-sexual nudity yet they block all nudity, art is not their primary concern.
You know a surprising way to solve the issues you presented? You train another model to trick DALL-E to generate undesirable images. It will use all its generative skills to probe for prompts. Then you can use those prompts to fine-tune the original model. So you use generative models as a devil's advocate.
- Red Teaming Language Models with Language Models
The latter. Here's what we, a small number of people, think the world should look like according to our own biases and information bubble in the current moment. We will impose our biases upon you, the unenlightened masses who must be manipulated for your own good. And for god sakes, don't look for photos of the US Math team or NBA Basketball or compare soccer teams across different countries and cultures.
You are correct but that's not what anyone's discussing.
If I search for "food", the reasonable result would be to get images that represent food according to its actual proportions of real life. E.g. if Pizza is the most common food at 10% prevalence, 10% of the images should be pizza.
That's not what OpenAI are doing.
They are introducing crafted biases to create images that deliberately misrepresent what the world looks like, and instead represent what they believe the world ought to look like.
--
You also need some reason why diversity "of this" is important but not diversity "of that". Why is diversity of race and sex so critical, but not diversity of age, height, disability? Should a search for "basketball player" yield 1/2 able-bodied people and 1/2 wheelchair basketball players? Why?
Then try to answer where you came up with the categories you do want depicted. Why are the races what they are? Should "basketball player" include half whites and half black people? Or maybe split in 3, white/black/Asian? Why not Australian Aborigines, native Americans, or Persians - so we can divide into 6? If you don't add Indian people to your list then, is that racist against them? How did you decide what must be represented, in what proportions, and what's okay to leave out?
Yes the quality of surrealist generations went down with that change suddenly including gender and race into prompts that I really didn't want anything specific in. Like a snail radio DJ, and suddenly the microphone is a woman of colours head.. I understand the intention but I want this to be a default on but you can turn it off thing.
As competitors with lower price points prop up, you'll see everyone ditch models with "anti bias" measures and take their $ somewhere else. Or maybe we'll get some real solution, that adds noise to the embeddings, and not some half assed workaround to the arbitrary rules that your resident AI Ethicist comes up with.
Add after. So you can see the added words by making a prompt like "a person holding a sign saying ", and then the sign says the extra words if they are added.
Would only work for positive biases where if they actually want to equalize it then it needs to be adding the opposite to negative biases.
To counteract the bias of their dataset they need to have someone sitting there actively thinking in bias to counteract the bias with anti-bias seasoning for every bias causing term. Feel bad for whatever person is tasked with that job.
Could always just fix your dataset, but who's got time and money to do that /s
It's also funny that this likely won't 'unbias' any actual published images coming out of it. If 90% of the images in the world has a male CEO, then for whatever reason that's the image people will pick and choose from DALL-Es output. (Generalized to any unbiasing - i.e. they'll be debiased by humans.)
I just tried it out and it looks like DALL-E isn't as inept as you imagined. Exact query used was 'A profile photo of a male south korean CEO', and it spat out 4 very believable korean business dudes.
Supplying the race and sex information seems to prevent new keywords from being injected. I see no problem with the system generating female CEOs when the gender information is omitted, unless you think there are?
I don't think they "randomly insert keywords" like people are claiming, I think they probably run it through a GPT3 prompt and ask it to rewrite the prompt if it's too vague.
I set up a similar GPT prompt with a lot more power ("rewrite this vague input into a precise image description") and I find it much more creative and useful than DALLE2 is.
> If you want an image of a <race> <gender> person included, you can just specify it yourself.
I agree wholeheartedly. So what are we arguing about?
What we're seeing is that DALL-E has its own bias-balancing technique it uses to nullify the imbalances it knows exists in its training data. When you specify ambiguous queries it kicks into action, but if you wanted male white CEOs the system is happy to give it to you. I'm not sure where the problem is.
In their examples, the "After mitigation" photos seem more representative of the real world. Before you got nothing but white guys for firefighter or software engineer and nothing but white ladies for teacher. That's not how the real world actually is today.
I'm not sure how they would accomplish 100% accurate proportions anyway, or even why that would be desirable. If I don't specify any traits then I want to see a wide variety of people. That's a more useful product than one that just gives me one type of person over and over again because it thinks there are no female firefighters in the world.
Most likely this was something forced by their marketing team or their office of diversity. Given the explanation of the implementation (arbitrarily adding "black" and "female" qualifiers), it's clear it was just an afterthought.
It's also odd since you'd think that this would be an issue solved by training with representative images in the first place.
If you used good input you'd expect an appropriate output, I don't know why manual intervention would be necessary unless it's for other purposes than stated. I suspect this is another case where "diversity" simply means "less whites".
That's disappointing given up until this point you could have 50 free uses per 24h. I expected it to get monetized eventually, but not so fast and drastically. Well, still had my fun and have to say the creations are so good it's often mind blowing there's an AI behind it.
Honestly, it is probably just that expensive to run. You can’t expect someone to hand you free compute of significant value and directly charging for it is a lot better than other things they could do.
Not correct. They have a for-profit entity now. That's why there is a huge incentive to monetize. Any for-profit investment gain is capped at 100x, with the rest required to go to their nonprofit.
This commercialization is just as I predicted in my substack post 2 days ago that hit the front page of Hacker News: https://aifuture.substack.com/p/the-ai-battle-rages-on
For MidJourney I was painfully surprised to find that everything is done through chat messages on a Discord server.
I'm not a paid member, so I have to enter my prompts in public channels. It's extremely easy to lose your own prompts in the rapidly flowing stream of prompts going on. I can kind of see why they did it that way--maybe, if I squint really hard--to try to promote visibility and community interaction, but it's just not happening. It's hard enough to find my own images, say nothing about follow what someone else is doing. This is literally the worst user experience I have ever had with a piece of software.
There are dozens of channels. It's so spammy, doing it through Discord. It's constantly pinging new notifications and I have to go through and manually mute each and every one of the channels. Then they open a few dozen more. Rinse. Repeat.
I understand paid users can have their own channels to generate images, but I really don't see the point in paying for it when, even subtracting the firehose of prompts and images, it's still an objectively shitty interface to have to do everything through Discord chat messages.
Yeah that's definitely one of the worst aspects of using midjourney, supposedly a API is coming but it doesn't look like it's going to be happening anytime soon.
I don't know who thought that discord would make a good GUI front end...
You use a web app to interface with it. Agree going from DALL-E 2 to Midjourney is pretty painful. Hopefully Midjourney create a web UI for it like OpenAI/Craiyon.
This news is funny since it doesn't actually change anything. It's still a waitlist that they're pushing out slowly (not an open beta). Nice way to stay in the news though.
I was really enjoying using Dalle2 to take surrealist walks around the latent image space of human cultural production. I was using it as one might use Wikipedia researching the links between objects and their representation. Also just to generate suggestion for what to have for lunch. None of this was for anything of commercial value to me. What am I to do now, start to find ways to sell the images I'm outputting? Do I displace the freelance artists in the market who actually have real talent and ability to create images and compositions and who studied how use the tools of the trade. Does the income artists can make now get displaced by people using dalle? Then do people stop learning how to actually make art and we come to the end of new cultural production and just start remixing everything made untill now?
I have some first-hand experience about how the copyright office views these works from creating an AI assistant to help me write these melodies: https://www.youtube.com/playlist?list=PLoCzMRqh5SkFwkumE578Y.... Here is a quote from the response from the Copyright Office email before I provided additional information about how they were created:
"To be copyrightable, a work must be fixed in a tangible form, must be of human origin, and must contain a minimal degree of creative expression"
So some employees there are aware of the impact that AI can have. Getting these DALL-E images copyrighted won't be trivial. I think it will be many years before the law is clarified.
The name "OpenAI" to me implies being open-source.
I have an RTX 3080 and will likely be buying a 4090 when it comes out. Will I ever be able to generate these images locally, rather than having to use a paid service? I've done it with DALL-E Mini, but the images from that don't hold a candle to what DALL-E 2 produces.
if you've got 60GB available to your GPU then maybe you can get close
I'm really curious if Apple's unified memory architecture is of benefit here, especially a few years from now if we can start getting 128/256GB of shared RAM on the SoC
I'm not sure if any current or next-generation GPU even has enough power to run DALL-E 2 locally.
Anyway, OpenAI is unlikely to release the model. The situation will like it is with GPT-3; however, it's also likely another team will attempt to duplicate OpenAI's work.
Thanks to the amazing @lucidrains there's already an open-source implementation of DALL-E 2: https://github.com/lucidrains/DALLE2-pytorch and a pretrained model for it should be released within this year.
The same person is also at work on an open-source implementation of Google's Imagen which should be even better (and faster) than DALLE-2: https://github.com/lucidrains/imagen-pytorch.
This is possible because the original research papers behind DALLE-2 and Imagen were both publicly released.
$0.13/prompt can only be useful for artists/end users. Anyone thinking about using this at scale would need a 20/30x reduction in price. But there's still no API available so I think that will change with time. Maybe they will add different tiers based on volume.
That is a fair point. I don't think the pricing is unreasonable, but it feels limiting. You could try 1000 variations until you find what you need perfectly, but in that pricing model users will be induced to use less the tool, not more.
I'd prefer an option to pay like 200 usd/year to use unlimited. And maybe have a price per use only in the API.
edit: this pricing model also makes it expensive to learn to use the tool.
Until you consider the level of demand for this product, which is surely higher than OpenAI can scale to with the number of GPUs they have. If they price it lower they’ll be overwhelmed.
Sad to say I've been dissapointed in DALLE's performance since I got access to it a couple of weeks ago - I think mainly because it was hyped up as the holy grail of text2image ever since it was first announced.
For a long while whenever Midjourney or DALLE-mini or the other models underperformed or failed to match a prompt the common refrain seemed to be "ah, but these are just the smaller version of the real impressive text2image models - surely they'd perform better on this prompt". Honestly, I don't think it performs dramatically better than DALLE-mini or Midjourney - in some cases I even think DALLE-mini outperforms it for whatever reason. Maybe because of filtering applied by OpenAI?
What difference there is seems to be a difference in quality on queries that work well, not a capability to tackle more complex queries. If you try a sentence involving lots of relationships between objects in the scene, DALLE will still generate a mishmash of those objects - it'll just look like a slightly higher quality mishmash than from DALLE-mini. And on queries that it does seem to handle well, there's almost always something off with the scene if you spend more than a moment inspecting it. I think this is why there's such a plethora of stylized and abstract imagery in the examples of DALLE's capabilities - humans are much more forgiving of flaws in those images.
I don't think artists should be afraid of being replaced by text2image models anytime soon. That said, I have gotten access to other large text2image models that claim to outperform DALLE on several metrics, and my experience matched with that claim - images were more detailed and handled relationships in the scene better than DALLE does. So there's clearly a lot of room for improvement left in the space.
I wrote about this happening two days ago on my sub stack post, "OpenAI will start charging businesses for images based on how many images they request. Just like Amazon Web Services charges businesses for usage across storage, computing, etc. Imagine a simple webpage where OpenAI will list out their AI-job suite, including “jobs” such as software developer, graphics designer, customer support rep, and accountant. You can select which service offerings you’d like to purchase ad-hoc or opt into the full AI-job suite."
I'm curious to know - does the community have any open source alternatives to DALL.E?
For an initiative named OpenAI, keeping their source code and models closed behind a license is bullshit in my opinion.
LAION is working on open source alternatives. There's a lot of activity in their discord and they have amassed the necessary training data but I am uncertain as to whether they have obtained the funding needed to deliver fully trained models. Phil Wang created initial implementations of several papers including imagen and parti in his GitHub account. EG: https://github.com/lucidrains/DALLE2-pytorch
EAI/Emad/et al's 'Stable Diffusion' model will be coming out in the next month or so. I don't know if it will hit DALL-E 2 level but a lot of people will be using it based on the during-training samples they've been releasing on Twitter.
Reminder that the OpenAI team claimed safety issues about releasing the weights. Now they’re charging, when the above link GPU time is being paid for by investor dollars. I guess sama must be hurting if he can only afford OpenAI credit packs for celebrities and his friends.
So can we now legally remove the "color blocks" watermark or not?
What about generating NFTs? It was explicitly prohibited during the previous period, now there is no notion of it. Without notion and rights for commercial use I think it's allowed but because it was an explicitly forbidden use case before, I want to be sure whether it can be used or not.
Regardless, excited to see what possibilities it opens.
Super impressive to see how OpenAI managed to bring the project from research to production (something usable for creatives). This is non trivial since the usecase involves filtering NSFW content, reducing bias in generated images. Kudos to the entire team.
It's laughably primitive. Tried to upload "The Creation of Adam", policy violation. Tried to make an image in "yarn bombing" style, policy violation. The Scunthorpe problem is too hard for cutting edge AI to tackle, I guess.
Is there a comprehensive list of all the disgraceful censorship and model-neutering restrictions they put on DALL-E? It's sad to see that openai is so absolutely terrified of their models producing upsetting content and the bad press that would ensue, when they could just show everyone the finger and say: "It's just pretty pictures. Made up pictures. They aren't real and can't hurt you, so stop crying."
e.g. One is unable to create faces of real people in the public eye.
It’s so dirty what Microsoft is doing here. They ripped the tech out of developers hands just to sell us drips of it. Drips that are not enough to build a product for more than a few people. They require to check on the use before launching etc.
I truly hate this company, their shitty operating system and their monopoly business game.
Everything they buy turns to shit. And don’t tell me about VSCode. It’s just a trap to fool developers.
Have you tried any of the "human or Dall-E" tests?
How did you score?
I only scored as well as I did because I knew the kind of stylistic choices to look out for. In terms of "quality" I really don't understand how you've reached this conclusion.
DALL-E 2 was trained on approximately 650 million COPYRIGHTED image-text pairs SCRAPED FROM THE INTERNET, according to the paper that OpenAI posted to ArXiv. https://cdn.openai.com/papers/dall-e-2.pdf
I wonder if they have plans to allow SVG exports in the future. I mean, the file size would probably be ridiculous in a lot of the cases, but for my use case I wouldn't mind it. And sucks about the watermark, maybe they will introduce an option to pay for removing it.
SVG isn't really possible with the model architecture they're using. The diffusion+upscaling step basically outputs 1024x1024 pixels; at no point does the model have a vector representation.
I suppose it's possible that at some point they'll try to make an image -> svg translation model?
SVG exports would only be meaningful if the model is generating vector images, which are then converted to bitmaps. I highly doubt that's the case, but perhaps someone who has actually looked at the model structure can confirm?
Has anyone else had problems with the 'Generate Variations' functions lately?
Tried it out first 3 days ago, and it says 'Something went wrong. Please try again later, or contact support@openai.com if this is an ongoing problem.' everytime since then.
Every day for the past week or so, I've spent an hour or so using DALL-E, making up new combos and making my pals laugh. In the same way that you can't get bored of art or visual stimulation, you can't get bored of this either.
I would like to point out that there is not as much of an uproar on this forum over DALL-E utilizing other people’s photographs, illustrations, et al, as there is around Copilot utilizing other people’s code.
One reason is that copilot could be easily prompted to "create" line for line reproductions of people's code. AFAICT, you can't do this with DALLE (even if you, for example, try to input the caption of an image directly).
I've been on the waitlist since April 16th. Would have loved to have played around with the alpha but now clearly my ability to experiment and learn to use the system to cut down on expenses is extremely limited.
I like how everyone’s face is rendered by DALL-E to look either like a still from a David Lynch film, or have teeth and hair coming out of weird places.
Interesting. Considering this is now a paid product, is modifying user input covered by their ToS? If I was spending a lot of money on it I'd be rather annoyed my input was being silently polluted.
Your input isn't being polluted by this any more than it is when the tokens in it are ground up into vectors and transformed mathematically. You just have an easier time understanding this transformation.
Obviously, it's polluted. Undisputably. In a mathematical sense, an extra (black box) transformation is performed on the input to the model. In a practical sense (eg. if you're researching the model), this is like having dirty laboratory tools - all measurements are slightly off. The presumption by OpenAI is that the measurements are off in the correct way.
I'm interested in using Dall-E commercially, but I think some competitor offering sampling with raw input will have a better chance at my wallet.
Yeah man, but literally the entire point of this AI picture generator is that it's, like, super accurate at rendering the prompt, and stuff.
I don't understand the relevance of the black box's scrutability - I just want to play with the black box. I am interested in increasing my understanding of the black box, not of a trust-me-it's-great-our-intern-steve-made-it black box derivative.
You should make your own black boxes then. By all means, send your dollars to whatever service passes your purity test; I'm just saying that the idea that DALL-E is "polluting" your input is risible. It's already polluting your data at, like, a subatomic level, at dimensionalities it hadn't even occurred to you to consider, and at enormous scale.
> Your input isn't being polluted by this any more than it is when the tokens in it are ground up into vectors and transformed mathematically. You just have an easier time understanding this transformation.
These kinds of modifications are obviously different. At least the mathematical transformations are attempting at least some level of fidelity to user input, these ones aren't (e.g. someone mentioned they're sometimes getting androgynous results and speculates the added terms are conflicting with the ones they provided in their input). Not all black boxes are equivalent.
Not worse, but different. It depends on the prompt but DALL-E mini/mega seems to do better then DALL-E 2 for certain types of absurd prompts, such as the ones in /r/weirddalle
Yes, there are very sharp lines where it does and doesn't understand. It understands color and gender but not materials. I got very good outputs for "blue female Master Chief" but "starship enterprise made out of candy" was complete garbage.
I tried the first whimsical, benign thing I could think of: "indiana jones eating spaghetti." The results are clearly recognizable as that. But they are also a kaleidoscope of body horror; a Indiana Jones monster melted into Cthulu forms inhaling plates that are slightly not spaghetti.
Thankfully it doesn't introduce any researcher bias, doesn't ban people from using it on the basis of country, doesn't use your personal data like phone number...
And the best of all - it does have a meme community around it, and you can always donate if you feel it adds value to your life
That Twitter thread is full of people saying "yeah that doesn't seem to be true at all" so I'm hesitant to jump to conclusions even if we're deciding to believe random tweets.
This is funny because I work on a team that is using GPT-3 and to fix a variety of issues we have with incorrect output we've just been having the engineering team prepend/append text to modify the query. As we encounter more problems the team keeps tacking on more text to the query.
This feels like a very hacky way to essentially reinvent programming badly.
My bet is that in a few years or so only a small cohort of engineering and product people will even remember Dall-E and GTP-3 and someone cringe at how we all thought this was going to be a big thing in the space.
There's are both really fascinating novelties, but at the end of the day that's all they are.
How else would you specify the type of image you would like? Surely, if you were hiring a designer you would provide them with a detailed description of what you wanted. More likely, you would spend a lot of time with them maybe even hours and who knows how many words. For design work specifically to create a first mockup or prototype of a product or image it seems like DALL-E beats that initial phase hands down. It's much easier to type in a description and then choose from a set of images than it is to go back and forth with someone who may take hours or days to create renderings of a few options. I don't think it'll put designers out of work but I do think they'll be using it regularly to boost their productivity.
Today, when DALL-E was still free, my Dad asked me to try a prompt about the Buddha sitting by a river, contemplating. I did about 4 prompt variations, and one of them was an Asian female, if that gives any idea about the frequency (I should note that the depiction was of a young, slim, and attractive female Buddha, so I'm not sure they have the bias thing licked just yet).
In my little testing, diversity in ethnicities was achieved but not realistic given the context. I also got a few androgynous people as I asked for a male or a female and another gender was appended.
How's it NOT a problem? If I'm trying to produce "stock people images", and if it only gives me white men, it's clearly broken because when I ask for "people" I'm actually asking for "people". I'm having difficulty understanding how it can be considered to be working as intended, when it literally doesn't. Clearly, the software has substantial bias that gets in way of it accomplishing its task.
If I want to produce "animal images" but it only produces images of black cats, do you think there is any question whether it's a problem or not?
That is clearly overfitting due to unrepresentative training data.
The "issue" is a different one: that training data - IE, reality, has _unwanted_ biases in it, because reality is biased.
Producing images of men when prompting for "trash collecting workers" should not be much of a surprise: 99% of garbage collection/refuse is handled by men. I doubt most will consider this a "problem," because of one's own bias, nobody cares about women being represented for a "shitty" job.
But ask for picture of CEOs, and then act surprised when most images are of white men? Only outrage, when proportionally, CEO's are, on average, white men.
The "problem" arises when we use these tools to make decisions and further affect society - it has the obvious issue of further entrenching stereotypical associations.
This is not that. Asking DALLE for a bunch of football players, would expectedly produce a huddled group of black men. No issue, because the NFL are disproportionately black men. No outrage, either.
Asking DALLE for a group of criminals, likewise, produces a group of black men. Outage! Except statistically, this is not a surprise, as a disproportionate amount of criminals are black men.
The "problem" is with reality being used as training data. The "problem" is with our reality, not the tooling.
Except in the cases where these toolings are being used to affect society - the obvious example being insurance ML algorithms. et al - we should strive to fix the issues present in reality, not hide them with handicapped training data, and malformed inputs.
> This is not that. Asking DALLE for a bunch of football players, would expectedly produce a huddled group of black men. No issue, because the NFL are disproportionately black men. No outrage, either.
This is not great. Only about 57% of NFL players are black, and the percentage is more like 47% among college players. It would be better to at least reflect the diversity of the field, even if you don't think it should be widened in the name of dispelling stereotypes.
> Asking DALLE for a group of criminals, likewise, produces a group of black men. Outage! Except statistically, this is not a surprise, as a disproportionate amount of criminals are black men.
Only about 1/3 of US prisoners are black. (Not quite the same as "criminals" but of course we don't always know who is committing crimes, only who is charged or convicted.) That's disproportionate to their population, but it's not even close to a majority. If DALLE were to exclusively or primarily return images of black men for "criminals", then it would be reinforcing a harmful stereotype that does not reflect reality.
> Asking DALLE for a group of criminals, likewise, produces a group of black men. Outage! Except statistically, this is not a surprise, as a disproportionate amount of criminals are black men.
"criminals" producing most black people actually would be a perfect example of bias in DALL-E that is arguably racism.
Black people commit a diproportionate amoumt of crime (for a variety of socioeconomic reasons I won't get into here), but even so white people make up a majority of criminals (because white people are the largest ethnic group by far).
Thus, a random group of criminals, if representive of reality, should be majority white.
In the UK… “The Environmental Services Association, the trade body, said that only 14 per cent of the country's 91,300 waste sector workers were female.” So 2x dall-e searches should produce 1.2 women.
> Asking DALLE for a bunch of football players, would expectedly produce a huddled group of black men
I think, for about 95% of the world football is synonymous with soccer. Its kind of interesting that you take this particular example to represent what reality looks like statistically
Black people comprise 12.4% of the US population, yet they are represented at substantially above that in "OpenAI"'s "bias removal" process. Clearly it has, as you put it, substantial bias that gets in the way of accomplishing its task.
It's not a problem in a few ways, let me know what you think (feel free to ask for clarification).
1. The training data would've been the best way to get organic results, the input is where it'd be necessary to have representative samples of populations.
2. If the reason the model needs to be manipulated to include more "diversity" is that there wasn't enough "diversity" in the training set then its likely the results will be lower quality
3. People should be free to manipulate the results how they wish, a base model without arbitrary manipulations of "diversity" would be the best starting point to allow users to get the appropriate results
4. A "diverse" group of people depends on a variety of different circumstances, if their method of increasing it is as naive as some of the are claiming this could result in absurdities when generating historical images or images relating to specific locations/cultures where things will be LESS representative
There are legitimate reasons to reduce externalizations of societies innate biases.
A mortgage AI that calculates premiums for the public shouldn't bias against people with historically black names, for example.
This problem is harder to tackle because it is difficult to expose and resign the "latent space" that results in these biases; it's difficult to massage the ML algo's to identify and remove the pathways that result in this bias.
It's simply much easier to allow the robot to be bias/racist/reflective of "reality" (its training data), and add a filter / band-aid on top; which is what they've attempted.
when this is appropriate is the more cultured question; I don't think we should attempt to band-aid these models, but for more socially-critical things, it is definitely appropriate.
It's naive on either extreme: do we reject reality, and substitute or own? Or do we call our substitute reality, and hope the zeitgeist follows?
That's great, but by doing so you are also inadvertently favoring, in your example, the people with black names. For example, Chinese people save on average, 50 times more than Americans according to the Fed [1]. That would mean they would generally be overrepresented in loan approvals because they have a better balance sheet. Does that necessarily mean that Americans are discriminated against in the approval process? No.
My question to you is: is an algorithm that takes no racial inputs (name, race, address, etc) yet still produces disproportionate results biased or racist? I say no.
The government, and many people, have moved the definition and goal posts; so that anything that has the end result of a non-proportional uniformity can be labeled and treated as bias.
Ultimately it is a nuanced game; is discriminating against certain clothing or hair-styles racist? Of course. Yet, neither of those are explicitly tied to one's skin color or ethnicity, but are an indirect associative trait because of culture.
In America, we have intentionally muddled the waters of demarcation between culture and race, and are starting to see the cost of that.
Wouldn't the whole point of a "Mortgage AI" be to discriminate so the lenders hands could be clean.
Not that I agree with that but I don't see why you would build one otherwise, if you wanted discrimination free mortgages wouldn't the whole process by anonymized and minimal personal information rather than the current system of having to hand over every detail of your life.
Fix the data input side, not the data output side. The data input side is slowly being fixed in real time as the rest of the world gets online and learns these methods.
In a sane world we would be able to tack on a disclaimer saying "This model was trained on data with a majority representation of caucasian males from Western English speaking countries and so results may skew in that direction" and people would read it and think "well, duh" and "hey let's train some more models with more data from around the world" instead of opining about systemic racism and sexism on the internet.
That wouldn't necessarily fix the issue or do anything. A model isn't a perfect average of all the data you throw into its training set. You have to actually try these things and see if they work.
While their heart is in the right place, I'd like to challenge the idea that certain groups are so fragile that they don't understand that historically, there are more pictures of certain groups doing certain things.
It's a hard problem for sure. But remember, the bias ends with the user using the tool. If I want a black scientist, I can just say "black scientist".
Let me be mindful of the bias, until we have a generally intelligent system that can actually do it. I'm generally intelligent too, you know.
>But remember, the bias ends with the user using the tool. If I want a black scientist, I can just say "black scientist".
That is a really, really, narrow viewpoint. I think what people would prefer is that if you query "Scientist" that the images returned are as likely to be any combination of gender and race. It's not that a group is "fragile", it's that they have to specify race and gender at all, when that specificity is not part of the intention. It seems that they recognize that querying "Scientist" will predominantly skew a certain way, and they're trying in some way to unskew.
Or, perhaps, you'd rather that the query be really, really specific? like: "an adult human of any gender and any race and skin color dressed in a laboratory coat...", but I would much rather just say "a scientist" and have the system recognize that anyone can be a scientist.
And then if I need to be specific, then I would be happy to say "a black-haired scientist"
Have you seen the queries that are used to generate actually useful results rather than just toy demonstrations? They look a lot more like your first example except with more specificity. It'd be more like "an adult human of any gender and any race and skin color dressed in a laboratory coat standing by a window holding a beaker in the afternoon sun. 1950s, color image, Canon 82mm f/3.6, desaturated and moody." so if instead you are looking for an image with a person of a specific ethnicity or gender then you are for sure going to add that in along with all of the details. If you are instead worried about the bias of the person choosing the image to use then there is nothing short of restricting them to a single choice that will fix that and even in that case they would probably just not use the tool since it wasn't satisfying their own preferences.
This is a problem with generative models across the board. It's important that we don't skew our perceptions by GAN outputs as a society, so it's definitely good that we're thinking about it. I just wish that we had a solution that solved across the class of problems "Generative AI feeds into itself and society (which is in a way, a generative AI), creating a positive feedback loop that eventually leads to a cultural freeze"
It's way bigger than just this narrow race issue the current zeitgeist is concerned about.
But I agree, maybe I should skew to being optimistic that at least we're trying
Kind of funny that NN tech is supposed to construct some upper dimensional understanding, yet realistically cannot be expected to be able to generate gender and race indeterminate portrayal of a scientist.
Historically this is true, but it also seems dangerous to load up these algorithms with pure history because they'll essentially codify and perpetuate historical problems.
Unfortunately, the method OpenAI may be using to reduce bias (by adding words to the prompt unknown to the user) is a naive approach that can affect images unexpectedly and outside of the domain OpenAI intended: https://twitter.com/rzhang88/status/1549472829304741888
I have also seeing some cases where the bias correction may not be working at all, so who knows. And it's why transparancy is important.
What a fascinating hack. I mean, yeah, naive and simplistic and doesn't really do anything interesting with the model itself, but props to the person who was given the "make this more diverse" instruction and said "okay, what's the simplest thing that could possibly work? What if I just append some races and genders onto the end of the query string, would that mostly work?" and then it did! Was it a GOOD idea? Maybe not. But I appreciate the optimization.
I thought the same thing but I think the commenter is making a joke, but I could be wrong.
I think they are suggesting that things like this (neural nets etc) work using bias, and by removing "bias" the developers are making the product worse.
Anything to end Corporate Memphis. Even, if we as illustrators will not have jobs or commissions. Let's hope that every creative human endeavour, painting, music, poetry will be replaced and removed from the commercial realm. Then maybe we will see artistic humanism instead of synthetic trans-humanistic "pop art".
Happily for me I stopped painting digitally long time ago. I even stopped calling myself "an artist". Nowadays I paint and draw only with real medium and call all of that "Archivist craftsmanship with analogue medium". :)
> Starting today, users get full usage rights to commercialize the images they create with DALL·E, including the right to reprint, sell, and merchandise. This includes images they generated during the research preview.
So DALL·E 2 is going to restart, revive and cause another renaissance of fully automated and mass generated NFTs, full of derivatives and remixing etc to pump up the crypto NFT hype squad?
Either way, OpenAI wins again as these crypto companies are going to pour tens of thousands of generated images to pump their NFT griftopia off of life support, reconfirming that it isn't going to die that easily.
Regardless of this possible revival attempt, 90% of these JPEG NFTs will eventually still die.
I don't see why there's any credible reason to expect that DALL-E will do anything at all to help those promoting the NFT silliness. Two separate issues.
> I think it has not been trained on NFT art (crypto punks and so on).
How exactly are you defining NFT art?
I mean, it can literately be anything: Dorsey sold a screencap of his 1st tweet, Nadya from Pussy Riot did some creative stuff, and the Ape crap was the bulk of this stuff that got passed around.
I think what can be gleaned from that short-lived non-sense is that value is subjective and that the quality of a valuabe piece of 'art' is equally as hard to define. Much the same with its predecessor: cryptokitties.
Heads up: I think you meant "in vain" rather than "in vail". However, a similar phrase is "to no avail" which also means that something was not successful.
It's like The Onion, but all the articles are made with GPT-3 and DALL·E. I start with an interesting DALL·E image, then describe it to GPT-3 and ask it for an Onion-like article on the topic. The results are surprisingly good.