Reddit says it will block the Internet Archive from indexing every page but its homepage, after catching AI companies scraping its data from the Wayback Machine

it was illegally collected by AI companies Andrew Nusca / Fortune : Ford's new EV strategy includes $2 billion U.S. investment Amanda Yeo / Mashable : Reddit is blocking Wayback Machine from archiving users' posts AdExchanger : Price And Promo; Reddit Ditches The Wayback Machine Rohit Singh / MEDIANAMA : Reddit Blocks Internet Archive's Wayback Machine from Accessing Most Of Its Content Matt Schimkowitz / The A.V. Club : Reddit halts the Wayback Machine because of AI scrapers Jibin Joseph / PCMag : Reddit Is Blocking Internet Archive to Halt Free Scraping of User Data Florence Nightingale / Cyber Security News : Reddit to Block Internet Archive as AI Companies Have Scraped Data From Wayback Machine Aabhas Sharma / Moneycontrol : Reddit restricts Internet Archive access over data scraping concerns The Indian Express : Reddit blocks Internet Archive's Wayback Machine from scraping data: What is it? Andrew Hutchinson / Social Media Today : Reddit Moves to Restrict The Internet Archive from Accessing its Communities Mike Wheatley / SiliconANGLE : Reddit says its blocking the Internet Archive to stop sneaky AI scrapers accessing its content Karissa Bell / Engadget : Reddit is restricting its availability to the Internet Archive's Wayback Machine Bluesky: Dare Obasanjo / @carnage4life : While understandable given the rise of AI scrapers, it's sad to see the open web dying a little each day. — Future generations won't realize how much they've lost. @natanael : The dream of an open internet slowly dying to greed from both AI companies and platforms acting as gatekeepers Cameron Wilson / @cameronwilson : we need to make a list of “open internet things that AI crawlers have ruined for all of us” www.theverge.com/new... Mike Cook / @mtrc : Fascinating feedback loops going on here. It's easier to take action against the smaller public entity than the actual source of the problem, so everyone else pays the price. [embedded post] @asura.dev : Saw this coming from a thousand miles away — Similar news, Perplexity falls back to alternative sources if it can't reach primary — This is a fool's errand - not because I'm “pro AI” or whatever nonsense waffle party crap, but because it simply is. Reddit always tries to own user generated content. … @jaubertmoniker.net : today in ‘nobody who writes about tech knows how it works’. is reddit looking at IA's logs. how did they “catch” IA (they didn't, they are guessing or outright lying) [embedded post] Ernie Smith / @ernie.tedium.co : AI scrapers ruining things, Internet Archive edition — www.theverge.com/news/757538/ ... @xkeeper.net : fondly reminded of when a gaming news website blocked IA in robots.txt with “Internet Archive provides us no value”, and then eventually shut down, leaving their entire site dead and gone — also i'm very curious how they would know if an AI company was scraping from IA. smells like some BS [embedded post] Mike McBride / @mikemcbrideonline.com : AI data greed is ruining everything. [embedded post] Adi Robertson / @thedextriarchy : I've been dreading this turn for months — the war over AI scraping data is undercutting a load-bearing part of the internet, and it will probably only get worse. [embedded post] Corey Quinn / @quinnypig.com : “Front Page of the Internet” throws up a paywall. [embedded post] X: Ed Newton-Rex / @ednewtonrex : AI companies are scraping content from the Wayback Machine for AI training. So if your website isn't blocking the ia_archiver crawler, your work is being used to train AI models. Reddit is now blocking the Internet Archive because of this. Many others will do the same. Exploitative AI training is killing the open web. Matt / @matthewphone : @BlackenedEDGE93 @verge The current data has already been scraped and you can literally watch the AI browse through the internet in real time. GPT5 is already circumventing bot detection, Gemini and GPT are being baked in at the browser level as we speak. Nothing reddit does to the IA is going to stop @adbusters : @verge “Reddit says that it has caught AI companies scraping its data from the Internet Archive's Wayback Machine” Well that's a bullshit excuse for making the web less transparent. [image] @neolithicoffic1 : @Pirat_Nation Decades later, Reddit will die, and the Internet Archive will still be there. And nobody will remember Reddit anymore. @pengwinpants : What the hell is happening to the internet? 🚫 Internet Archive/Wayback nuked 🚫 Book sites wiped 🚫 Forced ID checks They're killing the open web in real time and calling it “safety.” No wonder why us web3 nerds exist https://www.theverge.com/... @drowbb : The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles. It will only be able to index Reddit's homepage, which means Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day Matt / @matthewphone : @verge Something to do with the Gov taking Internet Archive I'm sure. https://blog.archive.org/... The AI can and will scrape Reddit in real time. Nate Hake / @natejhake : Generative AI is destroying the foundations of the Internet Welcome to the feudal walled garden era of the web ... @shiburizu : Difficult to understate how massive this is. Many communities (FGC, and otherwise) use Reddit or have a popular associated sub which hosts information. Any information left behind in comments will likely be lost when deleted (intentionally or not) now. Forums: r/technology : Reddit will block the Internet Archive r/InternetBrasil : Reddit começará a bloquear o Internet Archive r/LateStageCapitalism : Reddit will block the Internet Archive r/SomeOrdinaryGmrs : Reddit will block the Internet Archive r/Archivists : Reddit will block the Internet Archive r/TwoBestFriendsPlay : Reddit will block the Internet Archive r/Libraries : Reddit will block the Internet Archive r/internetarchive : Reddit will block the Internet Archive | The company says that AI companies have scraped data from the Wayback Machine, so it's going to limit what the Wayback Machine can access. r/DataHoarder : Reddit will block the Internet Archive r/Archiveteam : Reddit will block the Internet Archive r/gratefuldead : Reddit will block the Internet Archive r/Fauxmoi : Reddit will block the Internet Archive r/popculturechat : Reddit will block the Internet Archive r/redditstock : Reddit will block the Internet Archive as it was used to train AI by circumventing RDDTs content policy r/inthenews : Reddit will block the Internet Archive Beehaw : Reddit blocks Internet Archive to end sneaky AI scraping Msmash / Slashdot : Reddit Will Block the Internet Archive Ars OpenForum : Reddit blocks Internet Archive to end sneaky AI scraping See also Mediagazer

The Verge 2025-08-12 Jay Peters

Discussion

@carnage4life Dare Obasanjo on bluesky
While understandable given the rise of AI scrapers, it's sad to see the open web dying a little each day. — Future generations won't realize how much they've lost.
@natanael @natanael on bluesky
The dream of an open internet slowly dying to greed from both AI companies and platforms acting as gatekeepers
@cameronwilson Cameron Wilson on bluesky
we need to make a list of “open internet things that AI crawlers have ruined for all of us” www.theverge.com/new...
@mtrc Mike Cook on bluesky
Fascinating feedback loops going on here. It's easier to take action against the smaller public entity than the actual source of the problem, so everyone else pays the price. [embedded post]
@asura.dev @asura.dev on bluesky
Saw this coming from a thousand miles away — Similar news, Perplexity falls back to alternative sources if it can't reach primary — This is a fool's errand - not because I'm “pro AI” or whatever nonsense waffle party crap, but because it simply is. Reddit always tries to own…
@jaubertmoniker.net @jaubertmoniker.net on bluesky
today in ‘nobody who writes about tech knows how it works’. is reddit looking at IA's logs. how did they “catch” IA (they didn't, they are guessing or outright lying) [embedded post]
@ernie.tedium.co Ernie Smith on bluesky
AI scrapers ruining things, Internet Archive edition — www.theverge.com/news/757538/ ...
@xkeeper.net @xkeeper.net on bluesky
fondly reminded of when a gaming news website blocked IA in robots.txt with “Internet Archive provides us no value”, and then eventually shut down, leaving their entire site dead and gone — also i'm very curious how they would know if an AI company was scraping from IA. smells …
@mikemcbrideonline.com Mike McBride on bluesky
AI data greed is ruining everything. [embedded post]
@thedextriarchy Adi Robertson on bluesky
I've been dreading this turn for months — the war over AI scraping data is undercutting a load-bearing part of the internet, and it will probably only get worse. [embedded post]
@quinnypig.com Corey Quinn on bluesky
“Front Page of the Internet” throws up a paywall. [embedded post]
@ednewtonrex Ed Newton-Rex on x
AI companies are scraping content from the Wayback Machine for AI training. So if your website isn't blocking the ia_archiver crawler, your work is being used to train AI models. Reddit is now blocking the Internet Archive because of this. Many others will do the same. Exploi…
@matthewphone Matt on x
@BlackenedEDGE93 @verge The current data has already been scraped and you can literally watch the AI browse through the internet in real time. GPT5 is already circumventing bot detection, Gemini and GPT are being baked in at the browser level as we speak. Nothing reddit does to t…
@adbusters @adbusters on x
@verge “Reddit says that it has caught AI companies scraping its data from the Internet Archive's Wayback Machine” Well that's a bullshit excuse for making the web less transparent. [image]
@neolithicoffic1 @neolithicoffic1 on x
@Pirat_Nation Decades later, Reddit will die, and the Internet Archive will still be there. And nobody will remember Reddit anymore.
@pengwinpants @pengwinpants on x
What the hell is happening to the internet? 🚫 Internet Archive/Wayback nuked 🚫 Book sites wiped 🚫 Forced ID checks They're killing the open web in real time and calling it “safety.” No wonder why us web3 nerds exist https://www.theverge.com/...
@drowbb @drowbb on x
The Wayback Machine will no longer be able to crawl post detail pages, comments, or profiles. It will only be able to index Reddit's homepage, which means Internet Archive will only be able to archive insights into which news headlines and posts were most popular on a given day
@matthewphone Matt on x
@verge Something to do with the Gov taking Internet Archive I'm sure. https://blog.archive.org/... The AI can and will scrape Reddit in real time.
@natejhake Nate Hake on x
Generative AI is destroying the foundations of the Internet Welcome to the feudal walled garden era of the web ...
@shiburizu @shiburizu on x
Difficult to understate how massive this is. Many communities (FGC, and otherwise) use Reddit or have a popular associated sub which hosts information. Any information left behind in comments will likely be lost when deleted (intentionally or not) now.
r/technology r on reddit
Reddit will block the Internet Archive
r/InternetBrasil r on reddit
Reddit começará a bloquear o Internet Archive
r/LateStageCapitalism r on reddit
Reddit will block the Internet Archive
r/SomeOrdinaryGmrs r on reddit
Reddit will block the Internet Archive
r/Archivists r on reddit
Reddit will block the Internet Archive
r/TwoBestFriendsPlay r on reddit
Reddit will block the Internet Archive
r/Libraries r on reddit
Reddit will block the Internet Archive
r/internetarchive r on reddit
Reddit will block the Internet Archive | The company says that AI companies have scraped data from the Wayback Machine, so it's going to limit what the Wayback Machine can access.
r/DataHoarder r on reddit
Reddit will block the Internet Archive
r/Archiveteam r on reddit
Reddit will block the Internet Archive
r/gratefuldead r on reddit
Reddit will block the Internet Archive
r/Fauxmoi r on reddit
Reddit will block the Internet Archive
r/popculturechat r on reddit
Reddit will block the Internet Archive
r/redditstock r on reddit
Reddit will block the Internet Archive as it was used to train AI by circumventing RDDTs content policy
r/inthenews r on reddit
Reddit will block the Internet Archive

Chronicles

Reddit says it will block the Internet Archive from indexing every page but its homepage, after catching AI companies scraping its data from the Wayback Machine

Related Coverage

Discussion