So, I’m selfhosting immich, the issue is we tend to take a lot of pictures of the same scene/thing to later pick the best, and well, we can have 5~10 photos which are basically duplicates but not quite.
Some duplicate finding programs put those images at 95% or more similarity.

I’m wondering if there’s any way, probably at file system level, for the same images to be compressed together.
Maybe deduplication?
Have any of you guys handled a similar situation?

  • cizra@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 months ago

    You could store one “average” image, and deltas on it. Like Git stores your previous version + a bunch of branches on top.

    • WIPocket@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 months ago

      Note that Git doesnt store deltas. It will reuse unchanged files, but stores a (compressed) version of every file that has existed in the whole history, under its SHA1 hash.

      • cizra@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 months ago

        Indeed! Interesting! I made an experiment now with a non-compressible file (strings < /dev/urandom | head -n something) and it shows you’re right. 2nd commit, where I added a tiny line to that file, increased repo size by almost the size of the whole file.

        Thanks for this bit.