In light of the recent Crowdstrike crash revealing how weak points in IT infrastructure can have wide ranging effects, I figured this might be an interesting one.
The entirety of wikipedia is periodically uploaded here, along with many other useful wikis and How To websites (ex. iFixit tutorials and WikiHow): https://download.kiwix.org/zim
You select the archive you want, then the language and archive version (for example, you can get an archive with no pictures, to save on space). For the totality of the english wikipedia you’d select the “wikipedia_en_all_maxi_2024-01.zim”
The archives are packed as .zim files, which can be read with the Kiwix app completely offline.
I have several USBs I keep that have some of these archives along with the app installer. In the event of some major catastrophe I’d at least be able to access some potentially useful information. I have no stake in Kiwix, and don’t know if there are other alternative apps and schemes, just thought it was neat.
The text version of Wikipedia*
The images and other media are a hell of a lot more.
Without images Wikipedia is a “mere” 22.14gb.
https://en.m.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia#:~:text=The total number of pages,about 22.14 GB without media.
The 100Gb version mentioned above does only have thumbnails/lowres pictures, yeah. Better than nothing for some types of articles, but not everything. The true text-only version is actually only ~53Gb though.
Some of the high res photos are ridiculous.
Like a 8000x9000 uncompressed image of someone’s hand and weighs about 22mb.
I know that because I use a lot of royalty free images.
Is there an index of the images or something like that?
https://commons.wikimedia.org/
The images are categorised and there’s a search function.
Thank you very much!
it’s 102GB with images, 53GB without
I presume this is images directly hosted on English Wikipedia and not the entirety of Commons where the vast majority of images are kept, right?
Wikimedia is 373TB images. https://commons.m.wikimedia.org/wiki/Special:MediaStatistics
Kinda interesting at a broad level … that there’s still something to the efficiency of language.
Sure storage is cheap now, but so much of the calculation of the utility of data in modern tech is the presumption of an internet connection and retrieval of information over the network.
With the internet going to shit in various ways, local or decentralised computing is making more sense, at least depending on your priorities and perspective. And so all of a sudden, storage tradeoffs become a bit more meaningful. Do I need all of the pictures and media … or would a simple textual description suffice for most instances with high res media available at a more centralised archive if I’m really interested? A picture is worth 1000 words, but takes a hell of a lot more digital storage space!