Scrape is one of those wonderful words producing diminishing returns the minute the syllable leaves your mouth. In the real world a scrape is no big deal. No one ever went to the hospital for a scrape. A scrape is not a cut no matter how you slice it.
When OpenAI poohbahs insisted they were merely scraping the Internet to create generative artificial intelligence (AI), purveyors of original content might be forgiven if they believed the wound would never rise to the level of a copyright-protected Band-aid.
Of course, if one eschews appropriate counter-measures then untreated scrapes can lead to infections serious enough to kill you. Sepsis comes to mind. So does the ancient torture known as the scaphe, whereby skin would be scraped down to the bone. Even Job, with all the tribulations visited upon him by The Lord, added scraping to his list of tortures.
“Then Job took a piece of broken pottery and scraped himself with it as he sat among the ashes,” as recounted by Tom Holland in the brilliant Dominion: How The Christian Revolution Remade The World.
“A criminal sentenced to the scaphe had no free hands with which to scrape himself, of course,” Holland writes, “and yet the power to make flesh rot on the bones was, in the age of Persian greatness, a peculiarly terrifying marker of royal power. What, though, of the claim made by Darius and his heirs, that when they put their victims to torture they did so in the cause of truth, and justice, and light?”
Consider the implications of scraping fatalities as you weigh the intellectual property (IP) news of the day: The New York Times is suing both Microsoft Corp. and OpenAI, in all its corporate and nonprofit permutations, in the United States Southern District of New York, for copyright infringement based on what the suit calls, in bold italics: “A Business Model Based on Mass Copyright Infringement.”
“Defendants seek to free-ride on The Times’s massive investment in its journalism,” the lawsuit states, with the copyrighted content claiming use of “The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.”
On the table, according to the self-reporting in the newspaper of record, are “billions of dollars in statutory and actual damages” related to the “unlawful copying and use of The Times’s uniquely valuable works.”
“We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models,” an OpenAI representative responded in a statement. "Our ongoing conversations with The New York Times have been productive and moving forward constructively, so we are surprised and disappointed with this development. We're hopeful that we will find a mutually beneficial way to work together, as we are doing with many other publishers.”
In fact, OpenAI already cut deals with the Associated Press wire service and the German publisher Axel Springer, owner of both Politico and Business Insider in the United States. Other countermeasures have been launched. Microsoft, for its part, has indicated it would indemnify its customers against copyright claims and pay legal costs. The venture capital firm Andreesen Horowitz (a16z) went even further in comments filed with the U.S. Copyright Office, saying copyright claims of any kind would be a death knell for the nascent AI industry.
“Imposing infringement liability for the use of copyrighted works in AI model training, notwithstanding the case law that clearly demonstrates why such uses are fair, would be extremely misguided,” according to a16z. “Among other things, it would upset at least a decade’s worth of investment-backed expectations that were premised on the current understanding of the scope of copyright protection in this country. The bottom line is this: imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development.”
The comedian Sarah Silverman gleaned the threat of generative AI copyright infringement in a New York minute: she filed suit in 2023 against Meta and OpenAI, claiming the LLMs “ingested” her memoir for training purposes. The Author’s Guild weighed in with their own lawsuit backed by the likes of bestselling authors Jonathan Franzen and John Grisham. Getty Images, though not averse to cutting A.I. deals for their copyrighted images, also filed suit this year against an A.I. company using Getty’s copyrighted visual materials without authorization.
All of this scrapping about AI model training and internet scraping conceals the real IP opportunity for creators of original intellectual property. The Sarah Silvermans and John Grishams of the world have an opportunity to create walled gardens of their own original one-of-a-kind IP.
The AI copyright opportunity makes me think of the vibrancy and chaos on the street in cities like Marrakesh, Morocco—and the difference once one steps within the walls of a riad, there to find a curated respite at worst and an earthly paradise at best.
My guess is some authors will always want to license their original IP for generative AI purposes, but any compensation of that kind will be split with a gazillion other content creators worldwide. The real money for creators, both individual and corporate, will come when people use the IP they own and control in an AI environment. When people come inside the walled garden they should have permission to be there and/or be willing to pay for the right to leverage the IP on hand. Of course enforcement is sure to be an issue.
“Content creators actively should monitor digital and social channels,” according to an article in the Harvard Business Review, “for the appearance of works that may be derived from their own.”
Good luck with that. Better to adopt technological solutions, like watermarks and metadata, known to everyone in tech, so there can be no confusion about permission and origin when it comes to using intellectual property. The idea, after all, is for creators to generate revenue—not just generative AI—in a world of original IP.
Take it from Job: scrapes of any kind are best avoided.