Does lemmy have any communities dedicated to archiving/hoarding data?
This post foreshadowed today’s AWS outage.
👀
Welcome to datahoarders.
We’ve been here for decades.
Also follow 3-2-1 people. 3 Backups, 2 storage mediums, 1 offsite.
“backups”? Pray tell, fine sir and or madam, what is that?
You know there’s only two kind of people, those who do backups and those that haven’t lost a hard drive/data before. Also: raid is no backup
Still remember the PSU blast taking out my main drive plus my backup drive in like 2001. I thought I was so good because I at least had a backup 😑. Those were the days 🤷🏻♀️
That sounds like an adventure!
Ya, me learning that a dinky psu is your worst enemy, i upgraded my SOs old duron to an athlon for work, which used more energy…
My condolences! That said Athlons were late 90s (?) cool.
I have been archiving Linux builds for the last 20 years so I could effectively install Linux on almost any hardware since 1998-ish.
I have been archiving docker images to my locally hosted gitlab server for the past 3-5 years (not sure when I started tbh). I’ve got around 100gb of images ranging from core images like OS to full app images like Plex, ffmpeg, etc.
I also have been archiving foss projects into my gitlab and have been using pipelines to ensure they remain up-to-date.
the only thing I lack are packages from package managers like pip, bundler, npm, yum/dnf, apt. there’s just so much to cache it’s nigh impossible to get everything archived.
I have even set up my own local CDN for JS imports on HTML. I use rewrite rules in nginx to redirect them to my local sources.
my goal is to be as self-sustaining on local hosting as possible.
respectable level of hoarding 🏅
Everyone should have this mindset regarding their data. I always say to my friends and family, “If you like it, download it.”. The internet is always changing and that piece of media that you like can be moved, deleted, or blocked at any time.
The pornhub collapse should have taught the average person that.
You’re awesome. Keep up the good work.
I would also add Openstreetmap to the list
I also recommend downloading “Flashpoint archive” to have flash games and animations to stay entertained.
There is a 4gb version and a 2.3TB version.
There is a 4gb version and a 2.3TB version.
That’s quite the range
When I downloaded it years ago it was 1.8TB. It’s crazy how big the archive is. The smaller one is just so it’s accessible to most people.
Is that Flash exclusive or do they accept other games from that era?
I’m not sure, but I do think it’s just flash
Neither are that bad honestly. I have jigdo scripts I run with every point release of Debian and have a copy of English Wikipedia on a Kiwix mirror I also host. Wikipedia is a tad over 100 GB. The source, arm64 and amd64 complete repos (DVD images) for Debian Trixie, including the network installer and a couple live boot images, are 353 GB.
Kiwix has copies of a LOT of stuff, including Wikipedia on their website. You can view their zim files with a desktop application or host your own web version. Their website is: https://kiwix.org/
If you want (or if Wikipedia is censored for you) you can also look at my mirror to see what a web hosted version looks like: https://kiwix.marcusadams.me/
Note: I use Anubis to help block scrapers. You should have no issues as a human other than you may see a little anime girl for a second on first load, but every once and a while Brave has a disagreement with her and a page won’t load correctly. I’ve only seen it in Brave, and only rarely, but I’ve seen it once or twice so thought I’d mention it.
I rarely get bounced by Anubis, but oddly enough it has happened to me a couple times in FF, I suspect it’s the fingerprinting resistance settings that cause this to happen? Hasn’t happened in a while though
I can answer one part of your question. Yes, it’s not as big as you think it is.

does this include images?
No
With images, it is 111,08 GB
That’s still incredibly low, I’d have assumed an enormous increase.
Compressed or uncompressed? Can it be directly read?
Can be read directly, like normal Wikipedia.
That’s very nice. Does it also include other languages, or would that take more space?
This is English only. Other languages are downloaded separately, though they typically take less space.
Nice.
How about, when included previous versions of pages? (excluding images)Not sure, not having that option. Can imagine not much more, if proper version history management is involved.
Yeah, seems like there’s nothing as simple as something similar to a
git cloneavailable.
One would probably have to download multiple full copies from different times and then merge them with deduplication, to get that answer.
I thought the whole point of torrenting was to decentralise distribution. I use torrents to get my distros.
In my own little bubble, I thought that’s how most people got their distro.
What happens when they just cut the underwater cables? Torrent over carrier pigeon for a linux distro would take ages
Sneakernet to the rescue. Some of you are too young to know about walking around with boxes full of disks.
A wise man once said
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.
It was trading CD-R’s during my high school days… good times. Napster was just starting to take off by the time we had a CD-R trading network set up, Napster just increased the amount of CD’s that got passed around.
Pigeon latency is horrible, but the bandwidth is pretty great. You could probably load up an adult pigeon with at least 12TB of media.
https://en.wikipedia.org/wiki/IP_over_Avian_Carriers
Just gonna leave this here for whoever wants to read more on the methodology and potential risks.
Over a 30-mile (48 km) distance, a single pigeon may be able to carry tens of gigabytes of data in around an hour, which on an average bandwidth basis compared very favorably to early ADSL standards, even when accounting for lost drives.
Compared to what I use at home now, this sounds great
A good way to see what the future of places like the U.S are is to look at places like North Korea, where they do exactly this, move files around on flash media to avoid the state censors.
We need some more community wifi projects
Community Wisps are cool
Tiny jump drives on pigeons is low key excellent imo
@Maroon I thought torrent technology to be a godsend for package managers.
Why none of them use it?
I mean, damn.
Turns out hosting a bunch of files is very cheap.
Torrents are often used for installers, but for packages it tends to be more trouble than what it’s worth. Is creating a torrent for a 4k library worth it?
git and the lot are a lot better at this than people realize.
Did I miss something? Whats happening to debian stable?
debian stable became the go to distro for long term usage in case our FOSS support structure goes haywire due to wars
Is there a context to this or just random thought?
You can ignore politics, but politics will not ignore you.
Is there a political movement targeting Debian and Wikipedia?
Conservatives hate knowledge, learning is toxic to them. Also the people who start with burning books usually end up burning people eventually
Removing books about sucking cum out of anuses from public schools isn’t really “burning books.” You can still buy them whenever you want, just not putting them in taxpayer funded schools with children.
EDIT: Had to add some details of the “books being burned [but really just removed from public school]”:
During public comment, one woman read a passage from “Yolo” by Lauren Miracle which is found in Freedom High School.
“I climbed onto of him and started kissing him in a way that said very clearly here I am, I’m ready to have sex,” the speaker read.
Another title, “Anatomy of a Single Girl” by Daria Snadowsky, was also read by a speaker.
“Guy tries rubbing my clitoris with his fingers, he wiggles his pelvis back and forth,” another woman read from the book.
“This is ridiculous that this school – any school – has this book,” the woman said to the board.
Julie Gebhards, the woman seen in the first video of our story, is a Hillsborough County mom of six children.
Gebhards read an excerpt from the book “Invisible Monsters Remix” by Chuck Palahniuk. According to the district’s online book library, the title is found in Steinbrenner High School.
“He shoots his load, and then plants his mouth on your anus and sucks out his own warm sperm, plus whatever lubricant and feces are present. That’s felching. It may or may not, I add, include kissing you to pass the sperm and fecal matter into your mouth,” Gebhards said.
The comment you’re replying to didn’t mention one specific book. You did to try and portray this as some noble cause, but even books such as Fahrenheit 451 and To Kill a Mockingbird have been banned by conservatives, and they most definitely aren’t about “sucking cum out of anuses” as you so dumbly put it.
Nice attempt, but this type of dodging around never worked and will never work.
We should ban mention of Christianity in public. We should also make it illegal for anyone to teach their children Christianity. Practicing Christians should be declared mentally ill, and if they practice their faith in front of children, they should be put on the sex offender registry.
These freaks actually put giant statues of a naked bleeding man up on full public display in buildings. And they believe the most holy book in the world is one that features incest, murder, rape, genocide, and often fully endorses these horrors. Their main ritual is a form of public ritual cannibalism.
Christians are too dangerous to be allowed near children.
Oh no sex scenes!!! /s
Prude american?
As if kids aren’t finding shit way worse on the internet on a daily basis. Well… maybe not felching that’s pretty vile. But still.
Oh no! What have you done! Now, I want to go try felching because I saw a message about it online with no context and I just. Have. To. Try. It.
… Oh wait, no. No, I don’t. Pfew!So what was your point again?
You aren’t making the point you think you’re making.
https://gizmodo.com/elon-musks-wikipedia-competitor-is-going-to-be-a-disaster-2000665751
Debian? Not that I’m aware of.
Yeah I heard of wikipedia, but not debian.
gestures at everything
I would add in some rom collections and book repositories as well. The whole library of Nintendo games is under a gig and would go a long way for entertaining people.
Book repos? I didn’t know such a thing existed. Can you share more?
Project Gutenberg has a large collection of public domain books
Thank you kindly
FWIW :
fabien@debian2080ti:/media/fabien/slowdisk$ ls -lhS offline_prep/ total 341G -rw-r--r-- 1 fabien fabien 103G Jul 6 2024 wikipedia_en_all_maxi_2024-01.zim -rw-r--r-- 1 fabien fabien 81G Apr 22 2023 gutenberg_mul_all_2023-04.zim -rw-r--r-- 1 fabien fabien 75G Jul 7 2024 stackoverflow.com_en_all_2023-11.zim -rw-r--r-- 1 fabien fabien 74G Mar 10 2024 planet-240304.osm.pbf -rw-r--r-- 1 fabien fabien 3.8G Oct 18 06:55 debian-13.1.0-amd64-DVD-1.iso -rw-r--r-- 1 fabien fabien 2.6G May 7 2023 ifixit_en_all_2023-04.zim -rw-r--r-- 1 fabien fabien 1.6G May 7 2023 developer.mozilla.org_en_all_2023-02.zim -rw-r--r-- 1 fabien fabien 931M May 7 2023 diy.stackexchange.com_en_all_2023-03.zim -rw-r--r-- 1 fabien fabien 808M Jun 5 2023 wikivoyage_en_all_maxi_2023-05.zim -rw-r--r-- 1 fabien fabien 296M Apr 30 2023 raspberrypi.stackexchange.com_en_all_2022-11.zim -rw-r--r-- 1 fabien fabien 131M May 7 2023 rapsberry_pi_docs_2023-01.zim -rw-r--r-- 1 fabien fabien 100M May 7 2023 100r-off-the-grid_en_2022-06.zim -rw-r--r-- 1 fabien fabien 61M May 7 2023 quantumcomputing.stackexchange.com_en_all_2022-11.zim -rw-r--r-- 1 fabien fabien 45M May 7 2023 computergraphics.stackexchange.com_en_all_2022-11.zim -rw-r--r-- 1 fabien fabien 37M May 7 2023 wordnet_en_all_2023-04.zim -rw-r--r-- 1 fabien fabien 23M Jul 17 2023 kiwix-tools_linux-armv6-3.5.0-1.tar.gz -rw-r--r-- 1 fabien fabien 16M Oct 6 21:32 be-stib-gtfs.zip -rw-r--r-- 1 fabien fabien 3.8M Oct 6 21:32 be-sncb-gtfs.zip -rw-r--r-- 1 fabien fabien 2.3M May 7 2023 termux_en_all_maxi_2022-12.zim -rw-r--r-- 1 fabien fabien 1.9M May 7 2023 kiwix-firefox_3.8.0.xpibut if you want the easier version just get Kiwix on whatever device in front of you right now (yes, even mobile phone assuming you have the space) then get whatever content you need.
If need a bit of help I recorded TechSovereignty at home, episode 11 - Offline Wikipedia, Kiwix and checksums with a friend just 3 weeks ago.
I also wrote randomly update https://fabien.benetou.fr/Content/Vademecum and coded https://git.benetou.fr/utopiah/offline-octopus but tbh KDE-Connect is much better now.
The point though is having such a repository takes minutes. If you don’t have the space, buy a 512Go microSD for 50EUR then put that on, stuff it in a drawer then move on. If you want to every 3 months or whenever you feel like it, updated it.
TL;DR: takes longer to write such a meme than actually do it.
Watch out for flash data corruption. Lots of cheap flash (USB sticks, SD cards, SSDs) lose data after just a few years of offline storage. Something something quantum tunnel bullshit, iirc.
So either look for media that guarantee long cold storage retention (lots of businesses need to keep shit for 10 years for tax reasons), or occasionally plug it in and let do the housekeeping.
It’s more that flash NAND uses a small electric charge to keep the NAND gates in the correct configuration. Over time, that charge dissipates. If you power the storage device every once in a while, you minimize these chances.
Here’s a video explaining why it happens to Wii U’s after being powered off for a while. https://youtu.be/JHME4zLs6Qs
User older flash tech can be useful here. You might not always need the highest density storage if you want to maintain files for a long time. Getting stuff built in a much larger process node makes for a much more stable form of storage.
Or look for industrial / business grade stuff with long retention times. Old flash also means less sophisticated controllers etc
Thanks but even though it’s on a plugged HDD I don’t even care for any of that data. What I mean is that none of that data is sensitive. It might be useful, potentially, but it’s not unique. What I mean is that if somehow my
.zimfile for Wikipedia was corrupted I could download it again from https://library.kiwix.org/#lang=eng&category=wikipedia or elsewhere in ~30min (just checked).What I’m trying to highlight here is more the process than the actual outcome.
TL;DR: yes, if one is actually serious about just getting and storing, they should verify periodically if the data is indeed fine. What I do want to highlight though is to first know how to do it at all. Anyway, you are right that for a proper solution on the long run one must understand how (cold) storage actually works. My heuristic is that it’s like can food (which I don’t use much), it might last a while, but not forever.
I thought the point of backing stuff up was to have things in case just downloading it again isn’t a viable option?
It can be but not to me. To me the point is to test what’s actually feasible and usable. It can be Wikipedia on my HDD but it could also be SO on a microSD or a RPi … or it could be something totally different on another piece of hardware with another piece of storage. It will depend on the context.
So again, sure, having the data itself feels nice but in practice I never really needed it. If tomorrow my HDD would die I would shrug. If tomorrow Kiwix library wouldn’t work anymore, I’d be disappointed but I could rely on
.zimfile elsewhere, e.g. on torrent trackers.IMHO the point isn’t files, the point is usable knowledge.
Edit : to be clear this isn’t philosophy, you can see exactly what I mean and even HOW I do it (and even when) with the edits of my public wiki or my git repositories.
Whoa, what are all those things you have?
Commenting inline :
-rw-r--r-- 1 fabien fabien 103G Jul 6 2024 wikipedia_en_all_maxi_2024-01.zim # encyclopedia Wikipedia English with images and more -rw-r--r-- 1 fabien fabien 81G Apr 22 2023 gutenberg_mul_all_2023-04.zim # Project Gutenberg, book collection in multiple languages -rw-r--r-- 1 fabien fabien 75G Jul 7 2024 stackoverflow.com_en_all_2023-11.zim # StackOverflow, programming questions and answers -rw-r--r-- 1 fabien fabien 74G Mar 10 2024 planet-240304.osm.pbf # OpenStreetMap low resolution for the whole World -rw-r--r-- 1 fabien fabien 3.8G Oct 18 06:55 debian-13.1.0-amd64-DVD-1.iso # Debian base ISO -rw-r--r-- 1 fabien fabien 2.6G May 7 2023 ifixit_en_all_2023-04.zim # iFixit colection of guides to fix appliances -rw-r--r-- 1 fabien fabien 1.6G May 7 2023 developer.mozilla.org_en_all_2023-02.zim # Web development documentation -rw-r--r-- 1 fabien fabien 931M May 7 2023 diy.stackexchange.com_en_all_2023-03.zim # Do It Yourself Q&A -rw-r--r-- 1 fabien fabien 808M Jun 5 2023 wikivoyage_en_all_maxi_2023-05.zim # WikiVoyage, the version of Wikipedia for traveling -rw-r--r-- 1 fabien fabien 296M Apr 30 2023 raspberrypi.stackexchange.com_en_all_2022-11.zim # Raspberry Pi Q&A -rw-r--r-- 1 fabien fabien 131M May 7 2023 rapsberry_pi_docs_2023-01.zim # Rasspberry Pi documentation -rw-r--r-- 1 fabien fabien 100M May 7 2023 100r-off-the-grid_en_2022-06.zim # Off the grid documents -rw-r--r-- 1 fabien fabien 61M May 7 2023 quantumcomputing.stackexchange.com_en_all_2022-11.zim # Quantum computer Q&A -rw-r--r-- 1 fabien fabien 45M May 7 2023 computergraphics.stackexchange.com_en_all_2022-11.zim # Computer graphics Q&A -rw-r--r-- 1 fabien fabien 37M May 7 2023 wordnet_en_all_2023-04.zim # Graph of words in English -rw-r--r-- 1 fabien fabien 23M Jul 17 2023 kiwix-tools_linux-armv6-3.5.0-1.tar.gz # Kiwix to read .zim files -rw-r--r-- 1 fabien fabien 16M Oct 6 21:32 be-stib-gtfs.zip # public transport database in Brussels, Belgium -rw-r--r-- 1 fabien fabien 3.8M Oct 6 21:32 be-sncb-gtfs.zip # train transport database in Belgium -rw-r--r-- 1 fabien fabien 2.3M May 7 2023 termux_en_all_maxi_2022-12.zim # Termux, Linux tooling on Android, documentation in English -rw-r--r-- 1 fabien fabien 1.9M May 7 2023 kiwix-firefox_3.8.0.xpi # Kiwix Web Extension for the Firefox browserBy the way, there’s now a Wikipedia 2025 snapshot.
I am currently trying to fit that on my phone somehow. I wish I could just omit the index database at the end that can’t be split it seems. I have to keep it, but when it’s split up, it doesn’t work anyway (search is broken that way) (https://github.com/openzim/zim-tools/issues/295).
My phone can only do FAT32 for SD cards…For 2024 Wikipedia, that seems to be around 18GiB of wasted space.
Thanks, updating (~20min) accordingly.
FWIW I have a CMF Nothing 1 and I can put a 500Go microSD in it.
I’ve got Ulefone Armor 24. It can take a 1TB Micro SD, but only FAT32. Why a Linux-based OS can only do FAT32, despite supporting other FSs on internal storage goes beyond me.
Weird, assuming you have Android 13 it should be usable at least as exFAT and thus can be large enough
Unfortunately, this is rather dependent on manufacturer (or rather how much they can fuck up).
Android 14, but without exFAT support.
I tried multiple, exFAT, ext4, f2fs, NTFS, nothing else works.
This is just minor datahoarding. I do it, on an extreme level.
Years ago I bought a physical encyclopedia. I remember having one as a kid and using it for school reports. Also just looking through it can be cool. Learning about something you never knew existed is just a unique experience and doing it through a physical book just deepens the whole experience.
I also learned the practice of printing a physical encyclopedia is going out of fashion. I think there is only one company the still prints a yearly encyclopedia and it’s not Encyclopedia Britannica of all things. Might have change since I bought my copy but go give some physical media some love if you can.
Okay so where do I find some cheap hard drives? Europe if possible :-)
look for dvr’s they have huge hdds in them and you can find them at thrift stores for cheap














