Build intermitently-connected replication-only storage grid #2123

New Issue

tahoe-lafs · 2013-11-30T20:13:42Z

amontero commented

2013-11-30 20:13:42 +00:00

I'm trying to achieve the following Tahoe-LAFS scenario:

Assumptions

I'm the only grid administrator. Introducer, clients and storage nodes are all administered by me. However, not all of them are privacy safe, so Tahoe-LAFS provides granular accessibility to allowed files and privacy for the remaining files. Nice!
I'm the only grid uploader, no other users will be able to upload. I'll give them readonly caps. Think of it as a friendnet with a gatekeeper.
Nodes are most of the time isolated/offline from each other. This can be because of no internetworking connectivity or because a localnet Tahoe-in-a-box device is powered off.
Storage nodes can see/connect each other only at certain times during limited periods of time (rendezvous).

Requirements

All storage nodes should hold all the shares in the grid in order to provide desired reliability and offline-from-the-grid access to all files.
No wasted space. Here, "wasted space" is defined as "shares in excess of necessary to read the file locally" (>k). We want only to hold shares enough to have a full local replica of the grid readable, not any more.
To increase reliability/redundancy, we add more full-grid-replica nodes and repair. But each storage node should hold the entire grid on its own to be able to read from it offline. No node will know/needs to know how many other storage nodes exist, just get in contact with one of them from time to time.

Proposed setup

Configure a grid with k=1, h=1, N=2
Create cron or manual job to be run when nodes rendezvous. This job will be a deep-repair to ensure that nodes having new shares replicate to nodes still not holding them.

With this setup, the process would be as follows:

A "tahoe backup" is run against a locally reachable, disconnected from grid storage node. h=1 achieves "always happy" successful uploads. k=1 just is the simplest value, no stripping is desired. This step backups files into the grid by placing one share in the local storage node. Backup done.
Later, another node comes online/gets reachable. Either via cronjob or manual run, now it's time for the grid to achieve redundancy. No connectivity scheduling: we don't know when we'll see that node again. We run a deep-repair operation from any node. Having N=2 and only one share in the most up-to-date backup node, the arriving node would receive another share for each file it didn't knew previously. Replication done.

Current problem

Share placement as of it is today, cannot guarantee that no share is wrongly placed in a node where there is already one. With a k=1/N=2, if a cron triggered repair is run when node is isolated, we would be wasting space, since the local node already holds shares enough to retrieve the whole file.

Worse still: one single node holding both of N shares would prevent arriving nodes to get their replicas (their own local share), since repairer would be satisfied with both shares being present in the grid, even in the same node. This could lead to shares never achieving any replication outside of the creator node, creating a SPOF for data.

Proposed solution

Add server-side configuration option in storage nodes to make them gently reject holding shares in excess of k. This would address space wasting. Also, since local storage node would refuse to store an extra/unneeded share, a new storage node arriving would get the remaining share at repair time to fulfill the desired N, thus achieving/increasing replication.

Current/future placement improvements can't be relied on to be able to achieve this easily and, since look like it's more of a storage server-side policy, it's unlikely. At least, as far as I'm currenly able to understand share placement now or how that could even be achieved with minimal/enough guarantee (sometimes this gets quantum-physics to me). I think it's too much to rely on upload client behavior/decisions and they will have very limited knowledge window of the whole grid, IMO.

Apart from the described use case, this setting would be useful in other scenarios where the storage node operator should exercise some control for other reasons.

I've discussed this scenario already with Daira and Warner to ensure that the described solution would work for this scenario. As per Zooko's suggestion, I've done this writeup to allow some discussion before jumping into coding in my own branch as the next step. That's in a separate ticket (#2124), just to keep feature specs and implementation separate from this single use case, since I think other scenarios might come up that could benefit from implementing the proposed solution.

I've also collected implementation details while discussing this with Daira and Warner ~~
but I'll leave that for the followup ticket~~ that can also be found at #2124.

Anyone else interested in this scenario? Suggestions/imporvements?

For more info, this ticket is a subset of #1657. See also related issues: #793 and #1107.

I'm trying to achieve the following Tahoe-LAFS scenario: ### Assumptions * I'm the only grid administrator. Introducer, clients and storage nodes are all administered by me. However, not all of them are privacy safe, so Tahoe-LAFS provides granular accessibility to allowed files and privacy for the remaining files. Nice! * I'm the only grid uploader, no other users will be able to upload. I'll give them readonly caps. Think of it as a friendnet with a gatekeeper. * Nodes are most of the time isolated/offline from each other. This can be because of no internetworking connectivity or because a localnet Tahoe-in-a-box device is powered off. * Storage nodes can see/connect each other only at certain times during limited periods of time (rendezvous). ### Requirements * All storage nodes should hold all the shares in the grid in order to provide desired reliability and offline-from-the-grid access to all files. * No wasted space. Here, "wasted space" is defined as "shares in excess of necessary to read the file locally" (>k). We want only to hold shares enough to have a full local replica of the grid readable, not any more. * To increase reliability/redundancy, we add more full-grid-replica nodes and repair. But each storage node should hold the entire grid on its own to be able to read from it offline. No node will know/needs to know how many other storage nodes exist, just get in contact with one of them from time to time. ### Proposed setup * Configure a grid with k=1, h=1, N=2 * Create cron or manual job to be run when nodes rendezvous. This job will be a deep-repair to ensure that nodes having new shares replicate to nodes still not holding them. With this setup, the process would be as follows: 1. A "tahoe backup" is run against a locally reachable, disconnected from grid storage node. h=1 achieves "always happy" successful uploads. k=1 just is the simplest value, no stripping is desired. This step backups files into the grid by placing one share in the local storage node. Backup done. 2. Later, another node comes online/gets reachable. Either via cronjob or manual run, now it's time for the grid to achieve redundancy. No connectivity scheduling: we don't know when we'll see that node again. We run a deep-repair operation from any node. Having N=2 and only one share in the most up-to-date backup node, the arriving node would receive another share for each file it didn't knew previously. Replication done. ### Current problem Share placement as of it is today, cannot guarantee that no share is wrongly placed in a node where there is already one. With a k=1/N=2, if a cron triggered repair is run when node is isolated, we would be wasting space, since the local node already holds shares enough to retrieve the whole file. Worse still: one single node holding both of N shares would prevent arriving nodes to get their replicas (their own local share), since repairer would be satisfied with both shares being present in the grid, even in the same node. This could lead to shares never achieving any replication outside of the creator node, creating a SPOF for data. ### Proposed solution Add server-side configuration option in storage nodes to make them gently reject holding shares in excess of k. This would address space wasting. Also, since local storage node would refuse to store an extra/unneeded share, a new storage node arriving would get the remaining share at repair time to fulfill the desired N, thus achieving/increasing replication. Current/future placement improvements can't be relied on to be able to achieve this easily and, since look like it's more of a storage server-side policy, it's unlikely. At least, as far as I'm currenly able to understand share placement now or how that could even be achieved with minimal/enough guarantee (sometimes this gets quantum-physics to me). I think it's too much to rely on upload client behavior/decisions and they will have very limited knowledge window of the whole grid, IMO. Apart from the described use case, this setting would be useful in other scenarios where the storage node operator should exercise some control for other reasons. I've discussed this scenario already with Daira and Warner to ensure that the described solution would work for this scenario. As per Zooko's suggestion, I've done this writeup to allow some discussion before jumping into coding in my own branch as the next step. That's in a separate ticket (#2124), just to keep feature specs and implementation separate from this single use case, since I think other scenarios might come up that could benefit from implementing the proposed solution. I've also collected implementation details while discussing this with Daira and Warner ~~ but I'll leave that for the followup ticket~~ that can also be found at #2124. Anyone else interested in this scenario? Suggestions/imporvements? For more info, this ticket is a subset of #1657. See also related issues: #793 and #1107.

tahoe-lafs added the

labels 2013-11-30 20:13:42 +00:00

tahoe-lafs added this to the undecided milestone 2013-11-30 20:13:42 +00:00

zooko commented

2013-12-01 17:46:20 +00:00

"No wasted space. Here, "wasted space" is defined as "shares in excess of necessary to read the file locally" (>k). We want only to hold shares enough to have a full local replica of the grid readable, not any more."

This requirement is #2107, which, I must confess has confused the hell out of me. Please visit #2107 and see if you can explain your requirement precisely enough that someone could implement it.

"No wasted space. Here, "wasted space" is defined as "shares in excess of necessary to read the file locally" (>k). We want only to hold shares enough to have a full local replica of the grid readable, not any more." This requirement is #2107, which, I must confess has confused the hell out of me. Please visit #2107 and see if you can explain your requirement precisely enough that someone could implement it.

amontero commented

2013-12-01 19:53:41 +00:00

Linking ticket #2124 with feature implementation.