It’s quite ironic that artificial intelligence companies can freely “borrow” content from publishers, photographers, and artists to train their models. Yet, they strongly oppose the use of their data by other AI models.
A recent example of this double standard is Reddit’s decision to file a lawsuit against Anthropic for allegedly using bots to access its content without authorization. Reddit is not pleased with Anthropic’s actions, claiming that its bots accessed the platform over 100,000 times in July 2024, despite earlier assurances from the company that its bots had been blocked from doing so.
As stated by Reddit, this lawsuit highlights the inconsistency in Anthropic’s behavior—its public persona as a law-abiding entity contrasts sharply with its private misconduct. Reddit’s chief legal officer, Ben Lee, emphasized that the unauthorized access could be worth “billions of dollars,” underscoring the financial implications of such data scraping.
He further noted that Reddit’s platform hosts nearly two decades of rich, human conversations that are vital for training language models. Anthropic’s legal troubles don’t stop with Reddit.
Just last August, the company faced a class-action lawsuit in California from three authors who accused it of building a multibillion-dollar business by unlawfully using hundreds of thousands of copyrighted books. Another significant lawsuit emerged in October 2023 when Universal Music took legal action against Anthropic for reportedly infringing on their copyrighted song lyrics.
This situation reflects a broader trend where multiple AI companies find themselves embroiled in disputes over copyright infringement. As AI technology continues to develop, companies are increasingly relying on scraped online content for training.
While some publishers have struck agreements for their data to be used, many artists, writers, and photographers are becoming more vocal about protecting their intellectual property.