cross-posted to: https://lemmy.world/post/2499861
As I said, I made a lossy reformat of the database and a lossless one for 6.0 Gib (6,477,905,920). compared to ~26GIB from Reddit, where fields are almost intentionally anti-compressed to take up more room.
If there is somewhere I can host it, let me know.
also, I couldn’t figure this out, do sqlite databses store any information on the creator or editor of a document?
why it's lossy
It’s missing a large table of base64 urandom technically required to recreate the document fully
!datahoarder@lemmy.ml looks active and seems like a good place for it
thanks, how do I crosspost/ move this one?
Using the web-ui, on this post there is an icon made up of two squares. It’s right next to the star for saving the post. That’s the cross post button.
thanks, made updates to the post
here are a few options that I see but never actually use.
- contact https://wiki.archiveteam.org/; they have some reddit related archival activities going on
- more academic type and will be very helpful for researchers:
- https://zenodo.org/
- https://figshare.com/ (this I’ve used and very easy)
- https://datadryad.org/
- https://academictorrents.com/ (pushshift archives are also on here)
- https://socialmediaarchive.org/
Your data don’t seem to be massive compared to the types of data people store on there. So I don’t think it’s gonna be an issue. Plus, if you deposit your data in 1 archivist place + 1 research place, the data may be used by more people. Don’t forget about licenses btw.
EDIT: added https://socialmediaarchive.org/ to the list, just found out about that.
Is this derived directly from the data reddit stored/created or is it a reconstruction of some kind from observing the r/place output? I’m tempted to look at the table structures but not tempted enough to download 4 gigs of it just yet.
rebuilt from reddit’s offitial sources, still messing with optomizations, is adding a color definitions table worth it?
edit, YES, only 32 unique colors ever
deleted by creator
Last I heard with large projects like this, people usually upload them to DEEZ