I will be stuck in low or no internet areas and having a way to save a whole website (such as a small community wiki or something) to browse while bored would be very nice. It’d be nice if its features like search could be kept working. Any suggestions for a Foss app that can do this?

  • u/lukmly013 💾 (lemmy.sdf.org)@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 hours ago

    I used wget to download static sites, or at least ones with simpler JavaScript, but it won’t download any required files that are only linked in JS code, so it probably won’t work for many sites.

    You also need to be careful when spanning hosts so that you don’t accidentally (attempt to) download the entire internet. And rate-limiting, useragent, robots file, filename limitations (so that it doesn’t save files with filename characters that have other uses in URLs like # and ?), filename extensions (to serve them back with correct mimetype), getting filenames from server rather than URL when appropriate, converting links (works in HTML files only), and I am probably forgetting something else.

    Oh, and it’s a single process doing one request at a time, so even just a page with too many images will take ages. E.g.: http://cyber.dabamos.de/88x31/ (currently offline).

    You can then easily serve them using NGINX, or just browse as files, though the latter may not work well on something like a phone. Oh, one more thing, image.jpg and Image.jpg would conflict on Android, and some websites have differences like that. It can only be stored within Termux (and served using NGINX in Termux).