
The Government of India has proposed a hybrid model to address the copyright concerns around generative AI. As much as it needs to be commended for thinking outside the box, the solution they have come up with is fatally flawed.
This is a link-enhanced version of an article that first appeared in the Mint. You can read the original here. For the full archive of all the Ex Machina articles, please visit my website.
Last week, the Department for Promotion of Industry and Internal Trade (DPIIT) released a working paper on ‘Generative AI and Copyright,’ recommending a ‘hybrid model’ that it claims will balance the need to promote AI development with creator rights. It suggests that AI companies in India should pay a mandatory blanket licence fee (a percentage of their global revenue) for using copyrighted materials to train their models. It recommends the establishment of a body called the Copyright Royalties Collective for AI Training (CRCAT) that will collect licence fees from AI developers for distribution to registered creators through existing Collective Management Organisations (CMOs).
While at first blush this might seem like an elegant solution, not only is this approach deeply flawed, it is likely to do more harm than good to the small creators it is supposed to protect.
To better understand this, let’s identify who the winners and losers are in this proposal.
First, let’s consider the AI companies. They will now have to pay a licence fee, which, even if it is a small percentage, will likely be a significant amount since it will be based on their global revenue. However, since AI revenues are derived from subscription and usage-based fees, this is a cost that I expect will be largely passed on to consumers. Which means that, in the long run, this will probably have a minimal impact on their profitability. On the other hand, the proposal shields AI companies from copyright lawsuits, which means that, for a small fee (much of which they can pass on), AI firms can eliminate a significant legal risk to their business model. They are clearly net winners.
Then, there are the intermediaries—the CRCAT as well as all the other CMOs—whose sole purpose is to collect and distribute licence revenues to creators. Collective licensing schemes such as this are little more than value transfer mechanisms, with intermediaries taking their cut throughout the chain. CMOs typically retain 9% to 15% of all money collected to cover ‘administrative expenses.’
In addition, collective licensing schemes result in ‘black box’ revenues—unmatched funds where artists entitled to licence fees cannot be identified—which the CMO could dispose of at its discretion. All of which seems to suggest that whichever way you cut it, intermediaries are clear winners. To make matters worse, given India’s poor track record with CMOs in the past, the creation of a new super-CMO does not inspire confidence.
Which brings us to the creators, where the issue is more nuanced. After all, there are many different types of creators. The ones who first come to mind are superstars—A.R. Rehman, Taylor Swift and the like—who already earn a substantial income from their creative works. Under the DPIIT proposal, these artists are likely to receive a disproportionately large share of the AI licence money, as the distribution mechanism requires that additional weightage be given to the relative value of each registered work. This means that a tiny handful of major artists (who are already very wealthy) will receive substantial payouts, while small creators would have to settle for a pittance.
What’s worse is that the DPIIT proposal prohibits any opt-out, so copyright holders cannot withhold their works from being used for AI training. While this is beneficial for the AI industry, small creators who are, in any case, only likely to receive a few paise in compensation, will have no way to remove their works from the training data. All they can do is register to get whatever little payment the government-appointed committee decides they are entitled to.
But there is an even more fundamental problem with this approach. The structure of AI training makes traditional licensing mechanisms, such as those suggested in the working paper, work very differently when compared to the music or written-text industries. While it provides ‘legal certainty’ to AI companies and guarantees administrative fees for CMOs, it simply cannot provide meaningful income to the average individual creator.
In music streaming, a song is the ‘unit’ of value. Each play of the song entitles the artist to remuneration. This is easily measurable, and while administratively complex if managed through layers of intermediaries, it enables a clear correlation between use of the work and the artist’s remuneration.
The unit of value in AI is a ‘token’ (a fragment of a word or a small part of an image). Unlike music or text, where it is possible to establish a direct correlation between use of the work and the payment that must be made for it, attribution breaks down in the case of AI. The training process disperses each token’s influence across billions of model parameters and the model’s outputs emerge from this dense statistical mixture, making it impossible to trace any output back to a single token or an individual’s work.
Since attribution is likely to be a significant problem, much (if not all) of the licence revenue will end up in a ‘black box,’ from which no individual artist will be able to claim a share. The only ones who can would be those so well established that their works are overrepresented in the dataset.
What the DPIIT has proposed is a Reverse Robin Hood—a legally sanctioned transfer of wealth from small individual creators to established artists and the intermediaries that serve them. It is hard to imagine an outcome further removed from the desired objective.
Share Dialog
Rahul Matthan
2 comments
This makes sense
well said bro