Top News

Digital memory at stake: Why news outlets block the Wayback Machine

Deutsche Welle | April 22, 2026 4:40 PM CST

The "Wayback Machine," custodian of digital memory, is fighting for its survival. An increasing number of media outlets are refusing to allow the Web Archive to archive their content.For 30 years, the archive.orginternet platform has been archiving digital content. The "Wayback Machine" contains more than 1 billion archived web pages and is considered an indispensable tool for journalists, researchers, historians and lawyers who wish to view deleted or modified online content in its original form. However, this unique project instigated by a San-Francisco-based non-profit is facing an existential crisis — and the most recent threat comes from those, of all things, who need the archive most urgently: the media themselves. A growing number of major media outlets is denying the internet archive access to their content. According to researchconducted by the Nieman Foundation for Journalism at Harvard University, at least 241 news outlets from nine countries are blocking the archive's web crawlers. They include the UK's Guardian, the New York Times, France's Le Monde and largest US newspaper conglomerate, USA Today Co. Media outlets bereave themselves of an important tool USA Today itself recently published a sensational report on efforts by the US immigration authority ICE to systematically hold back information on its detention policy. USA Today's research was based on archive.org's Wayback Machine. The same corporation which could publish this story only with help from the archive is now blocking access to its own content. Why, then, do media outlets decline to take part in a tool they use themselves? The answer is simple: fear of artificial intelligence. Publishing houses fear that AI firms such as OpenAI or Google will, via the archive, tap into their journalistic content on a massive scale in order to train their language models — without permission or compensation. A spokesperson for the New York Times, Graham James, put it bluntly: "The issue is that Times content on the Internet Archive is being used by AI companies in violation of copyright law to directly compete with us." AI bots sending up to tens of thousands of requests per second Data proves, in fact, that on the archive.org website, bots were used on a massive scale to search for content from media outlets in order to use it to train AI models — thus gaining access to the very data that is being withheld from them. Speaking to Wired magazine, Mark Graham, Director of the Wayback Machine, confirmed that several companies had intermittently accessed the archives with tens of thousands of requests per second — until servers were temporarily overloaded. Archive.org was not prepared for such a scenario, because the non-profit is committed to an open internet. Its motto is:"Like a paper library, we provide free access to researchers, historians, scholars, people with print disabilities, and the general public. Our mission is to provide Universal Access to All Knowledge." This rules out excluding bots and crawlers and has now led to sanctions from the major publishing and media outlets. Human rights organization Electronic Frontier Foundation (EFF), which specializes in digital issues, put it like this: "Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper." History of the internet is in danger of being lost In the meantime, more than 100 journalists have signed a petition in support of the internet archive. In their open letter they write: "In a digital media landscape where articles disappear due to link rot, corporate consolidation, or cost-cutting, reporters frequently rely on the Archive's Wayback Machine to recover pages that would otherwise be lost. Without that ongoing work to preserve the web, large parts of journalism's recent history would already be lost." Mark Graham told Wired magazinethat he was in talks with media outlets, with the purpose of restoring access. How this will turn out remains to be seen. His preliminary conclusion, however, sounds like a warning: "There's no question that the general locking-down of more and more of the public web is impacting society's ability to understand what's going on in our world." 'Web archives are part of public infrastructure' Martin Fehrensen, a media journalist and the founder of German website socialmediawatchblog.de, believes that archive.org is the only working chain of custody of the open web. If it was unable to perform its functions, there'd be significant implications, he told DW. "Millions of Wikipedia source notes lose their roots. Research on platform accountability — which general business terms are valid when, changes to moderation rules — will become significantly more difficult, digital evidence that can stand up in court ceases to exist," he said, adding that it was completely absurd for media outlets, in particular, to block access to the archive. There were two ways to solve the conflict, he believes. "We need a publisher dialogue with a clean technical separation between archiving and AI training — because that is the real conflict, not the archive." According to Fehrensen, a special legal status for web archives had to be established in the medium term. And in the long term, "web archiving has to be treated like public infrastructure, not as a single project by a San-Francisco-based NGO. The fact that, in 2026, it's still dependent on one single organization is the real structural failure." One conflict among many — but the most dramatic It's not the first time that the internet archive is fighting for its survival. During a cyberattack in September 2024, data was stolen from 31 million user accounts, which was a severe blow to the organization. That same year, the archive lost the copyright dispute "Hachette v. Internet Archive" in a US appeals court: Publishing houses Hachette, Penguin Random House, HarperCollins and Wiley had successfully sued over the free e-book lending program the archive had launched during the COVID-19 pandemic. More than 500,000 books had to be removed from the program; nonetheless archive.org faces damage claims amounting to millions of dollars. Compared to those setbacks, the current threat posed by media blockades is structurally more serious, because it can't be removed by a court verdict or a software update. It's the result of numerous corporate decisions that jointly undermine the Wayback Machine's core mission: complete archiving of the public web. This article was originally published in German.