On Friday, a team of researchers at the University of Chicago released a research paper outlining "Nightshade," a data poisoning technique aimed at disrupting the training process for AI models, reports MIT Technology Review and VentureBeat. The goal is to help visual artists and publishers protect their work from being used to train generative AI image synthesis models, such as Midjourney, DALL-E 3, and Stable Diffusion.
The open source "poison pill" tool (as the University of Chicago's press department calls it) alters images in ways invisible to the human eye that can corrupt an AI model's training process. Many image synthesis models, with notable exceptions of those from Adobe and Getty Images, largely use data sets of images scraped from the web without artist permission, which includes copyrighted material. (OpenAI licenses some of its DALL-E training images from Shutterstock.)
AI researchers' reliance on commandeered data scraped from the web, which is seen as ethically fraught by many, has also been key to the recent explosion in generative AI capability. It took an entire Internet of images with annotations (through captions, alt text, and metadata) created by millions of people to create a data set with enough variety to create Stable Diffusion, for example. It would be impractical to hire people to annotate hundreds of millions of images from the standpoint of both cost and time. Those with access to existing large image databases (such as Getty and Shutterstock) are at an advantage when using licensed training data.
No comments:
Post a Comment