Webmasters are raising alarms over Facebook’s aggressive crawling practices, particularly the behavior of its user agent, “facebookexternalhit,” which is causing significant strain on web hosting servers.
Many site owners are reporting that these bots are generating overwhelming spikes in traffic, leading to performance issues that could jeopardise site reliability.
One webmaster described their experience, stating, “Our website gets hammered every 45 to 60 minutes with spikes of approximately 400 requests per second from 20 to 30 different IP addresses within Facebook’s netblocks.
Between these spikes, the traffic is manageable, but the sudden load is risky.” This webmaster, like many others, expressed a desire for Facebook’s bots to distribute their requests more evenly over time, akin to the behavior of Googlebot and other search engine crawlers.
The excessive requests from Facebook’s bots have not only disrupted the user experience but have also caused costly resource consumption for many site owners.
With smaller websites particularly affected, some have resorted to implementing stricter rules in their robots.txt files in a bid to protect their servers from the relentless onslaught. However, because Facebooks BOT is a scraper and not a crawler, it simply ignores the instructions.
The problem has sparked discussions within the web development community, with many experts urging Facebook to reconsider its crawling strategies.
The excessive requests from Facebook’s bots have disrupted user experiences and caused costly resource consumption for many site owners.
To combat this, many webmasters are turning to Cloudflare, which offers robust tools to manage traffic and implement rate limiting.
By setting up rate limiting, webmasters can effectively throttle the number of requests coming from Facebook’s bots, helping to alleviate server strain during peak times.
“I don’t want to block the bot entirely, but the current pattern is unsustainable,” the webmaster added. “Using Cloudflare’s rate limiting has allowed us to protect our site while still enabling Facebook to access our content for link previews.” a webmaster said.
In a Cloudflare post one user said, “I am writing to express my concern about the excessive crawling activity of Facebook’s crawler. This excessive crawling is causing significant performance issues and potential downtime for our website.”
“Our web server logs indicate that Facebook’s crawler (facebookexternalhit/1.1 –
2a03:2880:22ff:7::face:b00c) is making multiple requests to our wordpress website every second , even during off-peak hours,”
“During peak hours, the crawler’s activity spikes to tens of thousands of requests per minute. This excessive crawling is overwhelming our servers and causing them to slow down or even crash.”
“We understand that Facebook’s crawler is necessary to index our website and make our content available to its users. However, we believe that the current level of crawling is excessive and unreasonable.”
For now, webmasters remain vigilant, closely monitoring their server performance and adjusting settings in an effort to navigate the challenges posed by Facebooks crawlers and other bad bots.
The outcome of this situation could set important precedents for how major tech companies manage web scraping and crawling in the future.