update webapi docs for distributed dirnodes #115
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#115
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Our current (temporary) situation is to put all vdrive "directory node"
information into an encrypted data structure that lives on a specific server.
This was fairly easy to implement, but lacks certain properties that we want,
specifically that it represents a single point of failure.
We want to improve the availability of dirnodes. There are a number of ways
to accomplish this, some cooler than others. One approach is to leave the
vdrive-server scheme in place but have multiple servers (each providing the
same TubID, using separate connection hints, or the usual sort of IP-based
load-balancer frontend box). This requires no change in code on the client
side, but puts a significant burden on the operators of the network: they
must run multiple machines.
A niftier approach would be to distribute the dirnode data in the same way we
distribute file data. This requires distributed mutable files (i.e. SSK
files), which will require a bunch of new code. It also opens up difficult
questions about synchronized updates when race conditions result in different
storage servers recording different versions of the directory.
The source:docs/dirnodes.txt file describes some of our goals and proposals.
I'm starting to think that a reasonable solution is to distribute the data
with SSK files, but have an optional central-coordinator node.
Small grids who don't want any centralization just don't use the coordinator.
They run the risk of two people changing the same dirnode in incompatible
ways, in which case they have to revert to an earlier version or something..
we'll need some tools to display the situation to the user, but not tools to
automatically resolve it.
Large grids who are willing to accept some centralization do use the
coordinator. Dirnode reads are still fully-distributed and reliable, however
the ability to modify a dirnode is contingent upon the coordinator being
available. In addition, dirnode-modification may be vulnerable to an attacker
who just claims the lock all day long (however we can probably rig this so
that only people with the dirnode's write-key can perform this attack, making
it a non-issue).
Each SSK could have the FURL of a coordinator in it, and clients who want to
change the SSK shares are supposed to first contact the coordinator and
obtain a temporary lock on the storage index. Then they're only supposed to
send the "SSK_UPDATE" message to the shareholders while they hold that lock.
The full sequence of events would look like:
Clients who are moving a file from one dirnode to another are allowed to
claim multiple locks at once, as long as they drop all locks while they wait
to retry.
If the coordinator is unavailable, the clients can proceed to update anyways,
and just run the risk of conflicts.
We have two current ideas about implementing SSKs. In the simplest form, we
store the same data on all shareholders (1-of-N encoding), and each
degenerate share has a sequence number. Downloaders look for the highest
sequence number they can find, and pick one of those shares at random.
Conflicts are expressed as two different shares with the same sequence
number.
In the more complex form, we continue to use k-of-N encoding, thus reducing
the amount of data stored on each host. In this form, it is important to add
a hash of the data (a hash of the crypttext is fine) to the version number,
because if there are conflicts, the client needs to make sure the k shares
they just pulled down are all for the same version (otherwise FEC will
produce complete garbage).
Personally, I'm not convinced k-of-N SSK is a good idea, but we should
explore it fully before dismissing it.
I'm working on a design for large mutable versioned distributed SSK-style
data structure. This could be used for either mutable files or for mutable
dirnodes. It allows fairly efficient access (both read and write) of
arbitrary bytes, even inserts/deletes of byteranges, and lets you refer to
older versions of the file. The design is inspired by Mercurial's "revlog"
format.
In working on it, I realized that you want your dirnodes to have higher
reliability and availability than the files they contain. Specifically, you
don't want the availability of a file to be significantly impacted by the
unavailability of one of its parent directories. This implies that the root
dirnode should be the most reliable thing of all, followed by the
intermediate directories, followed by the file itself. For example, we might
require that the dirnodes be 20dBA better than whatever we pick for the CHK
files. One way to think about this: pretend we have a directory hierarchy
that is 10 deep, and a file at the bottom, like
/1/2/3/4/5/6/7/8/9/10/file.txt . Now if the file has 40dBA availability
(99.99%), that means that out of one million attempts to retrieve it, we'd
expect to see 100 failures. If each dirnode has 60dBA, then we'd expect to
see 110 failures: 10 failures because an intermediate dirnode was
unavailable, 100 because the CHK shares were unavailable.
Given the same expansion factor and servers that are mostly availably, FEC
gets you much much much better availability than simple replication. For
example, 1-of-3 encoding (i.e. 3x replication) for 99% available servers gets
you 60dBA (i.e. 99.9999%), but 3-of-9 encoding for 99% servers gets you about
125dBA. The reason is easy to visualize: start killing off servers one at a
time; how many can you kill before the file is dead? 1-of-3 is a loss once
you've killed off 3 servers, whereas 3-of-9 is ok until you've lost 7
servers. If we use 1-of-6 encoding (6x replication), we get about 120dBA,
comparable to 3-of-9.
Anyways, the design I'm working on is complicated by FEC, and much simpler to
implement with straight replication. To get comparable availability, we need
to use more replication. So maybe dirnodes using this design should be
encoded with 1-of-5 or so.
These will be implemented on top of Small Mutable Files (#197), which are mutable but replace-only.
As mentioned in #207:
These last two tasks where completed in changeset:3605354a952d8efd, but there are a few more things to do:
Also to do for v0.7.0:
update the docs to describe the new kind of directories. I have "XXX change this" marked in a few places in the docs in my sandbox, but I haven't started writing replacement text yet.
Things left to do for 0.7.0:
to the directory page for the dirnode that I just created"
maybe for the future (post-0.7.0):
First priority is #231.
Then:
Oh, insert #232 as top-priority, even above #231.
add:
Finished the part about "If the client is configured to create no private directory, then do not put a link from the welcome page to the start.html page", in changeset:9848d2043df42bc3.
I bumped the part about showing the pending creation of the private directory into #234 -- "Nice UI for creation of private directory.".
#232 -- "peer selection doesn't rebalance shares on overwrite of mutable file" has been bumped out of Milestone 0.7.0 in favor of #233 -- "work-around the poor handling of weird server sets in v0.7.0".
Still to do in this ticket:
o return new URI in response body
o adds a special kind of when_done flag that means "please redirect me to the directory page for the dirnode that I just created"
changeset:50bc0d2fb34d2018 finishes test+implement
POST /uri?t=mkdir
, returning new URI (soon to be called "cap") in the response bodyStill to do in this ticket:
POST /uri?t=mkdir
?redirect_to_result=true
flag to request anHTTP 303 See Other
redirect to the resulting newly created directorySo currently there is a
POST /uri/?t=mkdir
which works and has unit tests, but it is using the technique of encoding the arguments into the URL, and it needs to switch to the technique of encoding the arguments into the request body, which is the standard for POSTs. There is also a button (a form) in my local sandbox, but that form produces POST queries with the arguments encoded into the body, so it doesn't work with the current implementation.I just pushed a change to make /uri look for the 't' argument in either the
queryargs or the form fields, using a utility function named get_arg() that
we could use to refactor other places that need args out of a request.
I think that "/uri" is the correct target of these commands. Note that
"/uri/" is a different place. Our current docs/webish.txt (section 1.g) says
that /uri?t=mkdir is the right place to do this, and the welcome page's form
(as rendered by Root.render_mkdir_form) winds up pointing at /uri, so I'm
going with "/uri" instead of "/uri/" .
To that end, I've changed the redirection URL that /uri?t=mkdir creates to
match: this redirection is emitted by the /uri page, and therefore needs to
be to "uri/$URI" instead of just "$URI". (The latter works if we were hitting
/uri/?t=mkdir, but not when we hit /uri?t=mkdir).
I've also changed the unit test to exercise "/uri?t=mkdir" instead of
"/uri/?t=mkdir", and to examine the redirection that comes back to make sure
it is correct.
See #233 -- "creation and management of "root" directories -- directories without parents".
Still to do:
I'm going to do this webapi.txt update on the plane tomorrow.
putting off updating webapi til after this release
distributed dirnodesto update webapi docs for distributed dirnodesBrian: I think you might have finished this ticket.
yup, just pushing the final docs changes now.