Platform

Apple AI Bot Blocked: Top News Sites Say No To Data Scraping

Published on September 2, 2024

In recent months, several major news organizations have opted to block Apple’s data collection efforts for AI training.

Per a WIRED report, The New York Times, The Financial Times, The Atlantic, the USA Today network, and other news outlets have taken steps to prevent Apple’s web-crawling bot, Applebot-Extended, from using their data for AI.

Applebot-Extended: A Tool for Controlling Data Usage

Applebot-Extended is an extension of Apple’s original web crawler, Applebot, which was first launched in 2015 to support search functions in Siri and Spotlight. With the growing focus on AI, Apple expanded the use of Applebot to gather data for training its AI models.

The introduction of Applebot-Extended allows publishers to specifically block their content from being used for this purpose without affecting its inclusion in Apple’s search products.

Website owners can block Applebot-Extended by modifying their robots.txt files, a long-standing tool that directs web crawlers on interacting with a site.

The Role of Robots.txt in AI Training

Although compliance with robots.txt is not legally enforced, it has traditionally been followed by most web crawlers.

This protocol has become increasingly central in the broader discussion around AI training, as it allows websites to selectively permit or deny access to various bots.

A recent investigation revealed that major tech companies had used content from thousands of YouTube videos to train AI models without creators’ knowledge or consent. Consequently, YouTuber David Millette filed a lawsuit against OpenAI, alleging the unauthorized use of video content for AI training.

Analysis of Website Blocking Trends

Recent analyses show that a small but significant percentage of high-traffic websites are already blocking Applebot-Extended.

WIRED cites a study by the AI-detection startup Originality AI, which found that around 7% of the 1,000 websites analyzed had implemented such a block.

Another cited analysis by Dark Visitors, an AI agent watchdog, reported that approximately 6% of 1,000 high-traffic websites had taken similar action.

Data journalist Ben Welsh told WIRED that over a quarter of the 1,167 news websites he surveyed had blocked Applebot-Extended, indicating that awareness and usage of the tool are gradually increasing.

Strategic Considerations Behind Blocking AI Bots

Other AI-specific bots, like those from OpenAI and Google, face more widespread blocking.

Welsh’s research indicated that 53% of the news sites he analyzed have blocked OpenAI’s bot, while nearly 43% have blocked Google Extended.

The decision to block these AI bots often relates to broader publisher strategies. Some news organizations may hold back their data in hopes of striking partnership deals with AI companies.

For instance, Condé Nast, the parent company of WIRED, initially blocked OpenAI’s bots but reversed this decision after entering into a partnership with the company.

AI’s Growing Role in Content Creation

WIRED exposes a growing trend where access to data is increasingly becoming a bargaining chip in negotiations between content creators and AI developers.

Billion Dollar Boy’s March report sheds light on the widespread adoption of generative AI among creators, with 90% using the technology to produce content. Moreover, 91% of creators said they use generative AI at least weekly.

The findings were equally striking for brands; 92% of marketers surveyed have commissioned creator content designed fully or partially using generative AI.

Furthermore, Twicsy’s recent study revealed a disparity in earnings and engagement between human and AI influencers in social media marketing.

Human influencers earned an average of $78,777 per post compared to $1,694 for their AI counterparts — 46x more. Revenue streams differed as well. While 37% of human influencers generate income through brand partnerships, only 8% of AI influencers rely on this traditional model.

The Influencer Marketing Factory’s May report showed that 36% of surveyed Americans demand transparency from AI influencers, believing they should disclose their non-human nature in their social media profiles and bios.