The separate introducer servers represent unnecessary complexity in an overall Tahoe-LAFS deployment #3457
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#3457
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
A useful Tahoe-LAFS deployment consists of:
At least one client is required or no service is being consumed. At least one storage server is required or not server is being offered.
A client can be put in touch with a storage server either by:
A deployment with no introducer servers has at least two advantages over a deployment with introducers:
A deployment with one or more introducer servers has at least one advantage over a deployment with none:
If we can provide the automatic client updates without operating any introducer servers then this would seem to be a clear win, picking up all of the advantages from both of the current deployment options.
Here are some tickets that are somewhat related as well:
Here's one idea. Take the introducer functionality of accepting announcements and delivering them to subscribers and allow it to be folded into a Tahoe-LAFS process of another sort - say, a storage server. Thus, operating a storage server would automatically provide introducer functionality. Since storage servers are already an essentially component of a Tahoe-LAFS deployment this does not add any new long-lived processes or operational components to a deployment. Since the introducer functionality is now available from storage servers, the dedicated introducer servers can now be removed. This reduces the overall complexity of a deployment.
This does not solve all of the problems we have with introducers. For example, a client will still need a statically configured list of introducers. At least one of these introducers will have to remain online as long as clients with that configuration continue to operate.
It also does not reduce the load on the introducer component. Each introducer must handle all announcements relevant to the deployment. This adds runtime cost of O(N) to each storage server that is also an introducer.
It does not defend against unreliable storage being announced since it can now just be announced to the introducer-in-storage server instead of the stand-alone introducer server.
It may mitigate the privacy concerns since clients are already going to maintain a long-lived connection to a storage server.
It does not address behavior in the face of misbehaving introducer clients.
Here's another idea. An entity creates and maintains a mutable list of announcements on a grid. It sets the encoding parameters for this object so that a full copy exists on every storage server which appears in the list of announcements. Storage clients are given bootstrap configuration which consists of a recent-enough copy of this list. If any storage server in the list is still reachable then the current copy of the list can be retrieved and the client can update its persisted configuration with that copy. Over time the storage client can continue to retrieve updates to this list. As long as the client retrieves an update before all storage servers it knew about become unreachable it can always find the latest configuration.
This mostly solves the problem of the static introducer list. There is still a bootstrap configuration but there is a mechanism to update it over time.
It has ... some ... impact on overall load on the system. Every storage server in a grid now has to maintain a copy of the list of announcements. However, the list doesn't have to be updated as frequently anymore. It only needs to be changed when the actual list of storage servers changes (compared to currently where an announcement is sent every time a storage server process starts). This probably means the storage requirements are higher (but still essentially negligible) and the runtime requirements are lower. Since there is no notification mechanism it will require clients to poll the storage object to find updates - however this can be quite low frequency and there are a number of reasons we want to add a notification mechanism to Tahoe-LAFS anyway, at which point it could be leveraged to remove the polling here.
It defends against unreliable storage being announced since now only "an entity" can control the list of announcements. It does this by completely denying open participation, of course. However, any entity could choose to maintain one of these lists containing any announcements that entity likes. Clients can pick the list (or lists!) they want to follow.
It should remove all privacy concerns since there is now no longer any difference between consuming storage for normal purposes and obtaining the storage announcement list (as it is the same as any other data on the grid).
It also addresses behavior in the face of misbehaving introducer clients. The entity managing the list might misbehave. In some ways that entity becomes the single point of failure in the system. For example, they might lose their keys and become unable to distribute further storage server updates. This would require a reconfiguration of all clients to follow a mutable object.
I think the core of the latter idea above is that using Tahoe-LAFS storage itself to propagate this information represents the minimum complexity for solving the problem that the introducer servers currently solve. A solution which is simpler in isolation may exist but since the purpose is to let clients use the storage servers, total system complexity cannot fall below that required to make use of those storage servers.
Here's a more concrete elaboration of the latter idea above.
The introducer-v3 manages a mutable directory on a grid (typically the grid for which it is the manager). This is the grid service directory (GSD). It encodes the mutable directory so that any single share from any single server in the grid is sufficient to reconstruct its contents. It holds the writecap for the directory in secret and, for certain operations, shares the readcap.
Into this directory, introducer-v3 will link readcaps for grid service announcement (GSAs) it wishes to share with any of its clients. These readcaps yield objects containing, for example, a storage service announcement.
A client is granted access to a grid by receiving two pieces of information. First, the readcap for the GSD. Second, one or more storage service fURLs. The client connects to any one storage service using the fURLs supplied and reads the GSD and all GSAs. Among the GSAs should be storage service announcements for all storage servers related to the introducer-v3 granting access. At this point the client can connect to and use all of the necessary storage servers. The client will monitor the GSD and GSAs to remain up to date with any reconfigurations which might take place (addition of new GSAs, removal of old GSAs, modifications to existing GSAs).
A manual CLI for this workflow might go something like this.
First, the operator of the introducer-v3 produces the two pieces of information required by the client:
Then the information is shared out-of-band with the client which enters it into their client:
The given configuration is recorded for use when the node is running.
When the node is started it:
Thereafter, it periodically repeats steps 2-4 to ensure it remains up-to-date with any changes made by its introducer.
A storage server can be taken out of service by unlinking its GSA from the GSD.
A storage server can self-report configuration changes (such as its location hints) by rewriting its own GSA. It takes care to encode the GSA so that every storage server known to it receives a share that is sufficient to reconstruct the GSA.
A new storage server can be commissioned by:
One thing I don't know is how easily a mutable object can be spread across N servers and then expanded to be available on N+1 servers.
Note fURLs can contain commas so the syntax examples above aren't literally possible.
Here's a draft PR that just has some docs in it with the grid introducer idea - https://github.com/tahoe-lafs/tahoe-lafs/pull/882