RAIC behaviour different from RAID behaviour #2106

New Issue

tahoe-lafs · 2013-11-14T21:03:27Z

sickness commented

2013-11-14 21:03:27 +00:00

Let's assume we have a local RAID5 set of 4 identical disks attached on a controller inside a computer.

This RAID5 level guarantees that if we lose 1 of 4 disks, we can continue to not only read, but also write on the set, but in degraded mode.

When we change the failed disk with a new one, the RAID takes care of repairing the set syncing the data in background and the 4th disk gets populated again with chunks of our waluable data (not only parity because we know that in RAID5 parity is striped but explaining this isn't the scope of this ticket)

starting condition:

DISK1chunk1 DISK2chunk2 DISK3chunk3 DISK4chunk4

broken disk:

DISK1chunk1 DISK2chunk2 DISK3chunk3 DISK4XXXXXX

new disk is put in place:

DISK1chunk1 DISK2chunk2 DISK3chunk3 DISK4[ ]

repair rebuilds DISK4's chunk of data reading the other 3 disks:

DISK1chunk1 DISK2chunk2 DISK3chunk3 DISK4chunk4

Now let's assume we have a tahoe-lafs RAIC set of 4 identical servers on a LAN.

To mimic the RAID5 behaviour we configure it to write 4 shares for every file, needing only any 3 of them to succesfully read the file.

So in this way we have a RAIC that should behave like a RAID5.

We can lose any 1 of these 4 servers, and still be able to read the data, and to repair it should we lose 1 server.

But what happens if we actually lose 1 of those 4 servers and then try to read/repair the data? or maybe even write new data?

We will end up having ALL the 4 shares on just 3 servers, and when we rebuild the 4th server
and put it back online, even repairing will not put shares on it because the file will be seen as already healthy, but now what if we lose that one server wich actually holds 2 shares of the same file?

starting condition:

SERV1share1 SERV2share2 SERV3share3 SERV4share4

broken server:

SERV1share1 SERV2share2 SERV3share3 SERV4XXXXXX

data is written, or scheduled repair is attempted and we get to this situation:

SERV1share1,share4 SERV2share2 SERV3share3 SERV4XXXXXX

new server is put in place:

SERV1share1,share4 SERV2share2 SERV3share3 SERV4[ ]

now if we try to repair situation remains the same because as of now the repairer
DOESN'T know that he has to actually rebalance share4 on SERV4, he just tell us the file is healthy

we can still read and write data, so far so good, isn't it?

but what if SERV1 now suddenly gets broken?

SERV1XXXXXX SERV2share2 SERV3share3 SERV4[ ]

ok we can replace it:

SERV1[ ] SERV2share2 SERV3share3 SERV4[ ]

ok now we have a problem: how can we rebuild if we need 3 shares of 4 but we have just 2 even if we previously had 4 servers and the file was listed as "healthy" by the repairer?

Let's assume we have a local RAID5 set of 4 identical disks attached on a controller inside a computer. This RAID5 level guarantees that if we lose 1 of 4 disks, we can continue to not only read, but also write on the set, but in degraded mode. When we change the failed disk with a new one, the RAID takes care of repairing the set syncing the data in background and the 4th disk gets populated again with chunks of our waluable data (not only parity because we know that in RAID5 parity is striped but explaining this isn't the scope of this ticket) starting condition: DISK1chunk1 DISK2chunk2 DISK3chunk3 DISK4chunk4 broken disk: DISK1chunk1 DISK2chunk2 DISK3chunk3 DISK4XXXXXX new disk is put in place: DISK1chunk1 DISK2chunk2 DISK3chunk3 DISK4[ ] repair rebuilds DISK4's chunk of data reading the other 3 disks: DISK1chunk1 DISK2chunk2 DISK3chunk3 DISK4chunk4 Now let's assume we have a tahoe-lafs RAIC set of 4 identical servers on a LAN. To mimic the RAID5 behaviour we configure it to write 4 shares for every file, needing only any 3 of them to succesfully read the file. So in this way we have a RAIC that should behave like a RAID5. We can lose any 1 of these 4 servers, and still be able to read the data, and to repair it should we lose 1 server. But what happens if we actually lose 1 of those 4 servers and then try to read/repair the data? or maybe even write new data? We will end up having ALL the 4 shares on just 3 servers, and when we rebuild the 4th server and put it back online, even repairing will not put shares on it because the file will be seen as already healthy, but now what if we lose that one server wich actually holds 2 shares of the same file? starting condition: SERV1share1 SERV2share2 SERV3share3 SERV4share4 broken server: SERV1share1 SERV2share2 SERV3share3 SERV4XXXXXX data is written, or scheduled repair is attempted and we get to this situation: SERV1share1,share4 SERV2share2 SERV3share3 SERV4XXXXXX new server is put in place: SERV1share1,share4 SERV2share2 SERV3share3 SERV4[ ] now if we try to repair situation remains the same because as of now the repairer DOESN'T know that he has to actually rebalance share4 on SERV4, he just tell us the file is healthy we can still read and write data, so far so good, isn't it? but what if SERV1 now suddenly gets broken? SERV1XXXXXX SERV2share2 SERV3share3 SERV4[ ] ok we can replace it: SERV1[ ] SERV2share2 SERV3share3 SERV4[ ] ok now we have a problem: how can we rebuild if we need 3 shares of 4 but we have just 2 even if we previously had 4 servers and the file was listed as "healthy" by the repairer?

tahoe-lafs added the

labels 2013-11-14 21:03:27 +00:00

tahoe-lafs added this to the 1.10.1 milestone 2013-11-14 21:03:27 +00:00

zooko commented

2013-11-14 22:53:44 +00:00

sickness: thanks for the detailed description of the issue! I agree with you that it would be a problem if we got to the end of this story you've written and lost a file that way.

There are several improvements we can make.

improvement 1: let repair improve file health (#1382)

The last chance we have to avoid this fate is in the step where a repair is attempted when the placement is already:

 SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ]

If we are ever in that state, and a repair (or upload) is attempted, then a copy of either share1 or share must be uploaded to SERV4 in order to improve the health of the file. The #1382 branch (by Mark Berger; currently in review — almost ready to commit to trunk!) fixes this, so that a repair or upload in that case would upload a share to SERV4.

Note that this improvement "let repair improve file health" is the same whether the state is:

 SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ]

or:

 SERV1[share1] SERV2[share2] SERV3[share3] SERV4[ ]

In either case, we want to upload a share to SERV4! The #1382 branch does this right.

improvement 2: launch a repair job when needed (#614)

If a "check" job is running, and it detects a layout like:

 SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ]

or:

 SERV1[share1] SERV2[share2] SERV3[share3] SERV4[ ]

Then what should it do? Trigger a repair job, or leave well enough alone? That depends on the user's preferred trade-off between file health and bandwidth-consumption. If the user has configured the setting that says "Try to keep the file spread across at least 4 servers", then it will trigger a repair. If the user has configured it to "Try to keep the file spread across at least 3 servers", then it will not. (Because to do so would annoy the user by using up their network bandwidth.)

This is the topic of #614. There is a patch from Mark Berger on that ticket, but I think there is disagreement or confusion over how it should work.

possible improvement 3: don't put multiple shares on a server (#2107)

Another possible change we could make is in the step where an upload-or-repair process was running and it saw this state:

 SERV1[share1] SERV2[share2] SERV3[share3] SERV4[XXXXXX]

and it decided to send an extra share to SERV1, resulting in this state:

 SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[XXXXXX]

I used to think this was a good idea for the uploader/repairer to do this (if we would implement improvement 1 and improvement 2 above!), but now I've changed my mind. I explained on #2107 my current reasoning. Possible improvement 3 is not provided by the #1382 branch. As far as I understand, the #1382 branch will go ahead and upload an extra share in this case.

sickness: thanks for the detailed description of the issue! I agree with you that it would be a problem if we got to the end of this story you've written and lost a file that way. There are several improvements we can make. ### improvement 1: let repair improve file health (#1382) The last chance we have to avoid this fate is in the step where a repair is attempted when the placement is already: ``` SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ] ``` If we are ever in that state, and a repair (or upload) is attempted, then a copy of either share1 or share *must* be uploaded to SERV4 in order to improve the health of the file. The #1382 branch (by Mark Berger; currently in review — *almost* ready to commit to trunk!) fixes this, so that a repair or upload in that case *would* upload a share to SERV4. Note that this improvement "let repair improve file health" is the same whether the state is: ``` SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ] ``` or: ``` SERV1[share1] SERV2[share2] SERV3[share3] SERV4[ ] ``` In either case, we want to upload a share to SERV4! The #1382 branch does this right. ### improvement 2: launch a repair job when needed (#614) If a "check" job is running, and it detects a layout like: ``` SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[ ] ``` or: ``` SERV1[share1] SERV2[share2] SERV3[share3] SERV4[ ] ``` Then what should it do? Trigger a repair job, or leave well enough alone? That depends on the user's preferred trade-off between file health and bandwidth-consumption. If the user has configured the setting that says "Try to keep the file spread across at least 4 servers", then it will trigger a repair. If the user has configured it to "Try to keep the file spread across at least 3 servers", then it will not. (Because to do so would annoy the user by using up their network bandwidth.) This is the topic of #614. There is a patch from Mark Berger on that ticket, but I think there is disagreement or confusion over how it should work. ### possible improvement 3: don't put multiple shares on a server (#2107) Another possible change we could make is in the step where an upload-or-repair process was running and it saw this state: ``` SERV1[share1] SERV2[share2] SERV3[share3] SERV4[XXXXXX] ``` and it decided to send an extra share to SERV1, resulting in this state: ``` SERV1[share1,share4] SERV2[share2] SERV3[share3] SERV4[XXXXXX] ``` I used to think this was a good idea for the uploader/repairer to do this (if we would implement improvement 1 and improvement 2 above!), but now I've changed my mind. I explained on #2107 my current reasoning. Possible improvement 3 is not provided by the #1382 branch. As far as I understand, the #1382 branch will go ahead and upload an extra share in this case.

zooko commented

2013-11-14 22:54:40 +00:00

sickness: each of the three (possible) improvements listed in comment:93909 have a separate ticket to track that improvement. So, unless there are any other changes you think we should consider to help with this situation, we should close this ticket.

sickness: each of the three (possible) improvements listed in [comment:93909](/tahoe-lafs/trac-2024-07-25/issues/2106#issuecomment-93909) have a separate ticket to track that improvement. So, unless there are any other changes you think we should consider to help with this situation, we should close this ticket.

tahoe-lafs added the

duplicate

label 2013-11-14 23:58:48 +00:00

daira closed this issue

2013-11-14 23:58:48 +00:00

Sign in to join this conversation.