Discussion about this post

User's avatar
Brin's avatar

I've been eyeing ArchiveBox [ https://archivebox.io/ ] for a while now. When I first came across it I was worried about affording the storage space, but also when I first came across it I was very poor, and I *could* buy more storage now.

---

I *am* in the habit of using grab-site (and later browsertrix-crawler: the 2026 Internet is less friendly to scrapers than even 2024, and browsertrix-crawler is better at getting around this) on my own blogs and on sites that I otherwise particularly like, and I have the Internet Archive extension installed and keep an eye out for pages marked with a 0. (*Don't* turn on automatic uploads, or you'll get fucked whenever you find out too late that a site was still using URL security-by-obscurity. I speak from personal experience.)

Come to think of it, I wonder if the increased anti-scrape measures on the post-LLM Internet are giving your rot counter some false(-ish) positives. Like, you can't naively scrape Tumblr at all anymore: if you're not using anti-anti-scrape, it will appear as if the site no longer exists.

---

I'm surprised that I'm the first person to notify the Wayback Machine about this post.

1 more comment...

No posts

Ready for more?