Archiving your own internet

May 23

You should have your own software to defend your own internet from linkrot

2 Comments

I've been eyeing ArchiveBox [ https://archivebox.io/ ] for a while now. When I first came across it I was worried about affording the storage space, but also when I first came across it I was very poor, and I *could* buy more storage now.

---

I *am* in the habit of using grab-site (and later browsertrix-crawler: the 2026 Internet is less friendly to scrapers than even 2024, and browsertrix-crawler is better at getting around this) on my own blogs and on sites that I otherwise particularly like, and I have the Internet Archive extension installed and keep an eye out for pages marked with a 0. (*Don't* turn on automatic uploads, or you'll get fucked whenever you find out too late that a site was still using URL security-by-obscurity. I speak from personal experience.)

Come to think of it, I wonder if the increased anti-scrape measures on the post-LLM Internet are giving your rot counter some false(-ish) positives. Like, you can't naively scrape Tumblr at all anymore: if you're not using anti-anti-scrape, it will appear as if the site no longer exists.

---

I'm surprised that I'm the first person to notify the Wayback Machine about this post.

Reply (1)

Croissanthology

May 25

Hmm that's strange and concerning, my own software linked above should be backing up this post automatically! Actually wait I think it's just that I've been feeding it individual links for troubleshooting and haven't yet plugged in the domain as fodder for automatic backing up.

On false positives: yeah definitely. I didn't bother looking at the links individually myself, but there's a good chance a large chunk of those yellow dots are open to humans but not AIs. This is still kind of sad and a way for the internet to die a little. The cozy web is probably not the best outcome cf thebes https://x.com/voooooogel/status/2033713553841242212?s=20