This is funny.
This week, Reddit moved to block search engines other than Google from crawling the site by updating its robot.txt file, which blocks crawlers from those search engines.
Microsoft’s Bing stopped crawling Reddit after updating the platform’s robots.txt file on July 1, effectively denying access to all unapproved search engines and preventing Reddit results from appearing in other search engines.
Except for Google, of course.
Reddit signed a $60 million annual data deal with Google in February that allowed Google to significantly increase traffic to Reddit’s pages, a deal that appears to empower Reddit to set precedents regarding data access as it expands its revenue potential.
However, Reddit says this isn’t specifically related to the Google deal.
According to Reddit:
“This is completely unrelated to our recent partnership with Google. We have been in discussions with multiple search engines and have not been able to reach agreements with all of them, as some search engines are unable or unwilling to make enforceable commitments regarding the use of Reddit content, particularly in their AI.”
AI training has been a big focus for Reddit and X (formerly Twitter), with many nascent AI projects scraping both platforms to source human-generated input for LLM. Both X and Reddit have increased the price of API access to prevent AI projects from profiting from their insights, giving them more control over which AI projects they allow to use such information in their initiatives.
Reddit’s move to restrict access to search scrapers is similarly an attempt to exert more control over data in order to maximize profits.
That makes sense: Now a public company, Reddit is seeking to drive shareholder value by any means possible, and building the business through a variety of means will be key to its long-term viability.
Reddit’s data is highly valuable because the community covers a wide range of niche topics and provides human insights and answers to common web queries, which helps improve AI chatbots and systems. That’s why Google chose to pay Reddit for access.
Reddit is currently seeking similar deals with other search engines, and if they don’t reach an agreement, it appears they will terminate their agreements with those search engines. This will reduce Reddit’s traffic to some extent by reducing referral links, but Reddit believes the high value it places on its data makes such an impact worth the risk.
It will be interesting to see if other platforms follow suit and if Google and others are forced to make data deals to maintain access to scrapers. The company with the most valuable data will win the AI race. Reddit arguably provides the highest quality data inputs and it will be interesting to see if more platforms and publishers try to make access more valuable as well.
This could mean that larger companies will secure valuable data partnerships and other companies will be forced to repeatedly train models on AI-generated output, squeezing many smaller AI projects out of the market.
This will result in lower quality results and reduced usage, and ultimately, platforms like Reddit, Meta and X, which have a steady stream of user input, are likely to come out on top in this race.
Let’s see how it plays out.