An Immodest Proposal to OpenAI and the NYT
To be clear, this post is not a derivative work of Jonathan Swift.
As you know, Bob, the New York Times has sued Microsoft and OpenAI for copyright infringement. I would like to propose a solution that should satisfy both parties. Of course, I am no lawyer, and this is no legal proposal; it's much more sweeping than that.
The problem, according to the NYT, is that OpenAI's large language models can be used to generate near-copies of NYT articles. Can these models be coerced into doing so easily, or consistently? Doesn't matter! The issue is that, if they can — and, more generally, if AI models generate what lawyers call "derivative works," be they articles or images clearly heavily inspired by Greg Rutkowski — then compensation is due to the original copyright holder.
As a novelist / journalist / copyright-holder myself, I approve … but as an AI enthusiast / practitioner, I (and ultimately, I expect, the world) would be saddened to see copyright concerns slow or shackle this remarkable new technology, even when it is not being used to create derivative works. How to resolve this dilemma?
Fortunately there is a precedent of sorts. I give you the American Society of Composers, Authors, and Publishers, a.k.a. ASCAP. For more than a century they have dealt with a very similar situation: they license the public performance rights of musical works to venues, broadcasters, streaming services, etc.
You may never have heard of them. This is good! Their whole thing is to be an invisible middle layer. The idea is that "when a song is played, the user does not have to pay the copyright holder directly, nor does the music creator have to bill a radio station for use of a song."
Instead, ASCAP collects license fees from their “users” (10,000+ radio stations, 100,000s of bars / restaurants / etc., billions of public performances annually) and pays them back to rightsholders as royalties. Since 1941 they've operated under a consent decree such that "anyone unable to negotiate satisfactory terms with ASCAP ... may go to the oversight court in the Southern District of New York and litigate, and the terms set by the court will be binding."
Perfect! Right? We set up an ASCAP-like entity for generative AI, call it the Terran Society for Available Generative AI, or TSAGA. If and whenever someone uses AI to generate a derivative work for any kind of commercial use, they must purchase a license for that work. (This could even be enforced at the API level for private models.) TSAGA collects license feeds and pays out royalties to copyright holders, just like ASCAP, and everyone has only a single entity to deal with. Simple! Well, compared to the alternatives.
"But wait, Jon," you say, Bob, "this analogy doesn't hold at all. It's relatively easy for ASCAP to find venues, bars, restaurants, etc., determine what music they're playing, and sue them if they aren't licensees. But how will we track down derivative works across the entire Internet?"
To which I say: here we just follow the DMCA model! If derivative works aren't published with a TSAGA license, they get promptly taken down. (And of course rightsholders can choose to never accept derivative works.) The DMCA isn't perfect, but people generally grumblingly admit it works tolerably well.
"But wait, Jon," you say, Bob, getting a little frustrated now, "that's not my point, my point is, how can you possibly determine what is and isn't a derivative work, on the fly, automatically, in a consistent way? That's impossible! You can't just say 'I know it when I see it.'"
To which I say: "It's not 'I know it when I see it.' It's 'it knows it when it sees it.'
The italicized it being—naturally—the open-source, open-weights AI model that TSAGA will train for the sole purpose of determining whether something is a derivative work, one which all parties, and ultimately the courts, can test and accept. One with a public API anyone can use ahead of publishing anything. One with regular updates and a good, fast appeals process. Think of it as a better TurnItIn for generative AI.
Of course, the above only makes sense in a world where derivative works created by generative AI become common enough to justify the expense of creating TSAGA. Will this actually happen? …I have my doubts. Is it actually easy to use OpenAI to create NYT articles today? It is not.
Moreover this is obviously more a dream than an immodest proposal. It may be technically possible — but getting the courts to accept an AI model as even a first-pass authority on whether something counts as a “derivative work” is, let’s face it, a dream. Today. But in the long run I think we might actually get used to using AI (with human oversight & expedient appeals!) to administer, monitor, and correct AI processes and outputs. Indeed it might even be the best of all possible AI futures.