Hello,

I am going to upgrade my server, taking advantage of the fact that I am going to be able to put more hard disks, I wanted to take advantage of this to give a little more security (against loss) to my data.

Currently I have 2 hard drives in ext4 with information, and wanted to buy a third (same capacity all three) and place them in raid5, so that in the future, I can put more hard drives and increase the capacity.

Due to economic issues, right now I can only buy what would be the third disk, so it is impossible for me to back up the data I currently have.

The data itself is not valuable, in case any file gets corrupted, I could download it again, however there are enough teras (20) to make downloading everything a madness.

In principle I thought to put on this server (PC) a dietpi, a trimmed debian and maybe with mdadm make the raid. I have seen tutorials on how to do it (this for example https://ruan.dev/blog/2022/06/29/create-a-raid5-array-with-mdadm-on-linux ).

The question is, is there any way without having to format the hard drives with data?

Thank you and sorry for any mistakes I may make, English is not my mother language.

EDIT:

Thanks for yours answers!! I have several paths to investigate.

  • chiisana@lemmy.chiisana.net
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    5 months ago

    Even if you could free up only 1GB on each of the drives, you could start the process with a RAID5 of 1GB per disk, migrate two TB of data into it, free up the 2GB in the old disks, to expand the RAID and rinse and repeat. It will take a very long time, and run a lot of risk due to increased stress on the old drives, but it is certainly something that’s theoretically achievable.

    • HamsterRage@lemmy.ca
      link
      fedilink
      English
      arrow-up
      7
      ·
      edit-2
      5 months ago

      Technically, he would have three drives and only two drives of data. So he could move 1/3 of the data off each of the two drives onto the third and then start off with RAID 5 across the remaining 1/3 of each drive.

      • chiisana@lemmy.chiisana.net
        link
        fedilink
        English
        arrow-up
        4
        ·
        5 months ago

        This is smart! Should help reduce the number of loops they’d need to go through and could reduce the stress on the older drives.

    • just_another_person@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      2
      ·
      edit-2
      5 months ago

      Not at all possible whatsoever though. If he has two drives nearly full, he would never be able to fit all replicable data on a RAID 5 of any kind.

      What you’re describing as a solution is the “3 jugs of water” problem. The difference is you need only one coherent set of data in order to even start a RAID array. Juggling between disks in this case would never make the solution OP is asking if all data can’t fit on one single drive, due to the limitations of smallest drive capacity. You can’t just swap things around and eventually come up with a viable array if ALL data can’t be in one place at one time.

      • chiisana@lemmy.chiisana.net
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        2
        ·
        5 months ago

        They’re going for RAID5, not 6, so with the third drive these’s no additional requirement.

        Say for example if they have 2x 12T drive with 10T used each (they mentioned they’ve got 20T of data currently). They can acquire a 3rd 12T drive, create a RAID5 volume with 3x 1TB, thereby giving them 2TB of space on the RAID volume. They can then copy 2TB of data into the RAID volume, 1TB from each of the existing, verify the copy worked as intended, delete from outside, shrink FS outside on each of the drives by 1TB, add the newly available 1TB into the RAID, rebuild the array, and rinse and repeat.

        At the very end, there’d be no data left outside and the RAID volume can be expanded to the full capacity available… assuming the older drives don’t fail during this high stress maneuver.

        • LoboAureo@lemm.eeOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          5 months ago

          That is a clever aproach, and its just my caseuse, two 12 TB, about 19TB used.

          And its for a personal project, so, i don’t have any hurry.

          Only for clarification several days could be 1 or 2 weeks or we are talking of more time?

          • chiisana@lemmy.chiisana.net
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            5 months ago

            I’m afraid I don’t have an answer for that.

            It is heavily dependent on drive speed and number of times you’d need to repeat. Each time you copy data into the RAID, the array would need to write the data plus figuring out the parity data; then, when you expand the array, the array would need to be rebuilt, which takes more time again.

            My only tangentially relatable experience with something similar scale is with raid expansion for my RAID6 (so two parity here compared to one on yours) from 5x8TB using 20 out of 24TB to 8x8TB. These are shucked white label WD red equivalents, so 5k RPM 256Mb cache SATA drives. Since it was a direct expansion, I didn’t need to do multiple passes of shrinking and expanding etc., but the expansion itself I think took my server a couple of days to rebuild.

            Someone else mentioned you could potentially move some data into the third drive and start with a larger initial chunk… I think that could help reduce the number of passes you’d need to do as well, may be worth considering.

          • chiisana@lemmy.chiisana.net
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            1
            ·
            5 months ago

            OP Currently has in their possession 2 drives.

            OP has confirmed they’re 12TB each, and in total there is 19TB of data across the two drives.

            Assuming there is only one partition, each one might look something like this:

            Units: sectors of 1 * 512 = 512 bytes
            Sector size (logical/physical): 512 bytes / 4096 bytes
            I/O size (minimum/optimal): 4096 bytes / 4096 bytes
            Disklabel type: gpt
            Disk identifier: 12345678-9abc-def0-1234-56789abcdef0
            
            Device         Start        End            Sectors        Size      Type
            /dev/sda1      2048         23437499966    23437497919    12.0T     Linux filesystem
            

            OP wants to buy a new drive (also 12TB) and make a RAID5 array without losing existing data. Kind of madness, but it is achievable. OP buys a new drive, and set it up as such:

            Device         Start        End            Sectors        Size      Type
            /dev/sdc1      2048         3906252047     3906250000     2.0T      Linux RAID
            
            Unallocated space:
            3906252048      23437500000   19531247953    10.0T
            

            Then, OP must shrink the existing partition to something smaller, say 10TB for example, and then make use of the rest of the space as part of their RAID5 :

            Device         Start        End            Sectors        Size      Type
            /dev/sda1      2048         19531250000    19531247953    10.0T     Linux filesystem
            /dev/sda2      19531250001  23437499999    3906250000     2.0T      Linux RAID
            

            Now with the 3x 2TB partitions, they can create their RAID5 initially:

            sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=3 /dev/sda2 /dev/sdb2 /dev/sdc1

            Make ext4 partition on md0, copy 4TB of data (2TB from sda1 and 2TB from sdb1) into it, verify RAID5 working properly. Once OP is happy with the data on md0, they can delete the copied data from sda1 and sdb1, shrink the filesystem there (resize2fs), expand sda2 and sdb2, expand the sdc1, and resize the raid (mdadm --grow ...)

            Rinse and repeat, at the end of the process, they’d end up having all their data in the newly created md0, which is a RAID5 volume spanning across all three disks.

            Hope this is clear enough and that there is no more disconnect.