Platform
ByteDance’s Web Scraper Is On A Massive Data Grab Spree, Outpacing OpenAI, Google
ByteDance, the parent company of TikTok, launched a web crawler called Bytespider that is collecting online data at a rate far exceeding that of other major tech companies.
Fortune cites research from bot management firm Kasada, which shows that Bytespider, which began operations in April, is scraping web data approximately 25 times faster than GPTbot, the crawler used by OpenAI for its ChatGPT platform.
The ByteDance bot also outpaces similar tools from Google, Meta, Amazon, and Anthropic, collecting data at 3,000 times the rate of Anthropic’s ClaudeBot.
Kasada CEO Sam Crowther told Fortune that Bytespider’s activity has intensified over the past six weeks, with significant spikes in scraping observed.
The bot does not respect robots.txt; a code website publishers use to signal that their data should not be collected.
Several major news organizations have used this code in recent months to block Apple’s data collection efforts for AI training.
This aggressive data collection comes as ByteDance faces potential challenges in the U.S. market. Recent legislation signed by President Biden requires ByteDance to sell TikTok or face a U.S. ban, citing national security concerns.
ByteDance’s AI Ambitions
The surge in data scraping suggests ByteDance is ramping up efforts to develop its AI capabilities.
Earlier this month, reports emerged that ByteDance plans to use Huawei chips to develop a new AI model as China intensifies efforts to reduce dependence on U.S. technology amid tightening export controls.
The company allegedly ordered over 100,000 Huawei chips but received fewer than 30,000 in July.
Last year, the company reportedly used OpenAI’s technology to build its own large language model (LLM), a practice that violates OpenAI’s terms of service. Earlier this year, ByteDance released Doubao, a chat-based LLM.
According to Fortune, industry sources familiar with ByteDance’s strategy indicate the company is likely developing a new LLM.
Online Search War?
One potential application for this technology is enhancing TikTok’s search function. The platform recently updated its search capabilities for advertisers, allowing them to identify trending keywords in real time for ad targeting.
A source close to the company revealed that an improved AI model with current internet trend data could further expand TikTok’s search environment.
This development could challenge Google’s dominance in digital advertising by offering a “completely biddable space with keywords and topics” within TikTok’s popular platform.
Recent research by Forbes Advisor and Talker Research revealed a shift in online search behavior among Gen Z, with many opting for “social searching” on platforms like TikTok and Instagram over traditional search engines like Google.
According to the study, 45% of Gen Z are more likely to use social media for searches, compared to about 35% of millennials, 20% of Gen X, and less than 10% of Boomers.
ByteDance and TikTok representatives did not respond to Fortune’s requests for comment on the research findings or their AI development plans.