converge same file, same K, different N #678
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#678
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The underlying erasure code is "systematic", which means the first
K
shares are simply the segments of the file (except that the last one, which contains the end of the file, is padded out to be the same size as the others). The erasure code also has the property (I don't know what it is called) that the "check shares" or "secondary shares" -- the ones after the firstK
-- are also the same regardless of whatN
is.Therefore if you upload a file with, e.g.
K=13
,N=16
and then you re-upload it withK=13
,N=26
, then the index 0 through the index 15 share that you upload would be exactly the same as they were in the original upload (if you used the same encryption key to encrypt the file before erasure-coding it).However, Tahoe currently doesn't take advantage of this coincidence at all, because it doesn't use the same encryption key for those two files. Instead it includes
N
in the generation of the encryption key, and in the resulting immutable-file read-cap, so that the two uploads of the same file with the sameK
and differentN
result in completely different sets of shares and different read-caps. specs: [file-encoding.txt]source:docs/specifications/file-encoding.txt@20090222054054-4233b-ce16f0d882f804485e792782c1a9527db25114d0; source code: [upload.py]source:src/allmydata/immutable/upload.py@20090304013715-4233b-fbbf44eba801fb41795d4421bbcb7a028d45e6ff#L1116To fix this ticket, figure out how to retain all of the safety and security properties that Tahoe immutable-file currently have, while also letting those two uploads share their first 16 shares.
(By the way, the reason I was reminded of this is that I'm currently doing an upload exactly like this: dogfood tasting report . :-))
The following clump of tickets might be of interest to people who are interested in this ticket: #711 (repair to different levels of M), #699 (optionally rebalance during repair or upload), #543 ('rebalancing manager'), #232 (Peer selection doesn't rebalance shares on overwrite of mutable file.), #678 (converge same file, same K, different M), #610 (upload should take better advantage of existing shares), #573 (Allow client to control which storage servers receive shares).
Also related: #778 ("shares of happiness" is the wrong measure; "servers of happiness" is better).
One significant barrier to this is the share hash tree. When a
file is uploaded, we take the hash of all shares (well, the block
hash tree root for all shares) and put them into a list, then we
build another merkle tree over that list, and store the root of
that tree in the UEB (where it gets folded into the
integrity-checking part of the filecap).
This list is of length "N", the number of shares that were
created, padded up to a power-of-two boundary (for the merkle
tree). This makes it impossible to add more shares to the same
hash tree.
So if a file has already been uploaded with 3-of-10, and now you
decide you'd like to encode it with 3-of-16, then shares 0..9
will be the same, but the merkle tree that was distributed with
the first upload won't contain the hashes for shares 10..15, and
the UEB will be different for the two uploads, so the filecaps
will be different (to be specific, the readkey [therefore the
storage-index]and part will be the same, but the UEB hash that
provides all the integrity information will be different).
Let's analyze the compatibility grid. One one axis is the person
doing the download: either Sally holding the "short" 3-of-10
filecap, or Lucy holding the "long" 3-of-16 filecap. The other is
the class of share being used:
(in ideal circumstances, the 3-of-16 upload would skip the "B"
shares, but if that uploader didn't happen to see the "A" shares
out there already, they might generate "B" shares)
Sally handles "A" shares just fine, as usual. When she sees "B"
shares, the current code will reject them because the UEB is
different than the filecap. The shares don't contain a full share
hash tree, but rather just the minimal chain necessary to
validate that one share. If she has enough "A" shares to know the
expected value of the "B" share in question (i.e. that leaf of
the short tree), then she can use the "B" share with confidence
(she'd use the share, but ignore the long hash chain that comes
with it). This basically means she must get that share's sibling:
if she has A0, she can validate B1; if she has A3, she can
validate B2, etc.
Should she see "C" shares (say that 5 times fast!), she's stuck.
There is no hash chain that will get her from the C share's hash
to her filecap's UEB hash. She could download the C share anyways
and throw it into FEC, and then test the resulting ciphertext
segment against the ciphertext hash tree: if it matches, great,
the share was good. If it doesn't, then she won't even know which
share had a problem.
Now, what about Lucy? When she sees an "A" share, she'll be in
the same boat as Sally looking at a "B" share. If Lucy manages to
find a sibling "B" share for each potential "A" share, then she
can validate the share hash, and use the share. But if she can't
(such as if there are no "B" shares, because the uploader
didn't create any because it saw the "A" shares out there), then
she won't be able to validate the "A" shares, and then she'll be
in the Sally-vs-C-shares boat: try FEC and catch problems with
the ciphertext hash.
So, what could be done to improve this?
just the minimal chain: this would let Sally use arbitrary "B"
shares and Lucy use arbitrary "A" shares, without forcing them
to locate a sibling share. It still wouldn't let Sally use "C"
shares with confidence.
shares went into FEC, if the ciphertext hash fails then
iterate through the various combinations of shares, use some
sort of constraint-based logic to figure out which
combinations are still worth trying. The combinatorics will be
smaller with fewer non-validated shares, so the peer-selection
code should prioritize validatable shares.
or 3-of-32 or whatever, but then only actually upload the
first 10 shares. You still have to do all the encoding work,
however, because you have to compute the correct merkle tree
to put in those 10 shares, but you can save the bandwidth and
storage space for the shares that you'll create later. It's
not clear where the break-even point would be.
I'm not sure what else could be done.
For mutable files, it's a slightly different story. The encoding
scheme is basically the same, but if you're willing to modify all
the existing shares, you can increase the size of the share hash
tree later on (build a bigger tree, sign the root, then update
all shares with the new tree). Unfortunately I believe that the
share format puts the share hash tree before the share data, so
if you increase its size, you must also move the data (and the
remote API has no insert() method), so you'll basically have to
re-upload everything, including the old share data. So it's
possible, but expensive.
Tagging issues relevant to new cap protocol design.
See new related ticket #1340 (consider share-at-a-time uploader).
s/M/N/ to match our existing "k-of-N" terminology
converge same file, same K, different Mto converge same file, same K, different N