transverse block-hash-trees, plus segment-hash-tree #2387
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2387
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
While discussing #2386, Daira identified an alternate encoding scheme which might resolve the efficient-mutable-update problem.
The current encoding scheme starts with block-hash-trees (one per share, with one leaf per block, each block being an erasure-coded fragment of a different ciphertext segment) that culminate in the block-hash-tree-root (one per share). On top of that, a "transverse" share-hash-tree is built, which contains one leaf per share (leaf==block-hash-tree-root), culminating in the share-hash-tree-root. The share-hash-tree-root goes into the UEB (immutable) and the signed RSA block (mutable).
The drawback to this scheme is that modifying a single block causes new block-hash-tree-roots to be created for all shares, so we need block-hash-tree data from all shares to modify any of them. As #2386 describes, this means we can't efficiently modify unhealthy files.
An alternate encoding would rotate the trees. In this approach, the lower hash tree the transverse one. It is still a block-hash-tree, but instead of one leaf per segment (and one block-hash-tree per share), it would contain one leaf per share (and we'd have one block-hash-tree per segment). The block-hash-tree for seg0 uses the first block of each share as its N leaves, and contains enough information to prove that any one block (for segment X and share Y, X in range(numsegs), Y in range(num_servers==N)) belongs to a given segment (
block-hash-tree-rootX
).On top of that, we'd have a non-transverse "segment-hash-tree", with one leaf per segment. The segment-hash-tree-root would be included in the UEB as before. The entire segment-hash-tree would be replicated into each share (this is 64 bytes per segment).
Each block would be paired with the block-hash-tree Merkle chain needed to verify the block's membership in the block-hash-tree up to the block-hash-tree root (32 bytes * log(N)). Every time the client fetches a block, they also fetch this Merkle chain. Fetching this chain is so common that it might be best to interleave the hashes with the data blocks (to reduce seeks), rather than putting all the block-hash-tree data in a separate portion of the share (like #1687, but without the random-access requirement during upload).
If it weren't for this last interleaving trick, it might be possible to add this encoding to the current (immutable) scheme, which is somewhat open-ended. We can add new fields to the UEB (which is simple JSON-encoded), and the offsets table doesn't need to cover the entire file, so the UEB could also contain the offset of the new hash tree data. There's not much point to doing this, however, since fixing #2386 requires removing the non-transverse block-hash-trees.
Our encoding scheme is over eight years old, in fact the very first commit [4968ca9] was to calculate the storage/bandwidth/alacrity overhead of this scheme. It's hard to remember, but I don't think we considered rotating the trees back then. If we did, we might have dismissed for the following reasons:
O(filesize*k*ln(N))
. For each segment, you need k blocks, and for each block, you need an uncle chain of lengthln(N)
. The original encoding scheme lets you decide between alacrity and bandwidth overhead: if you know you're downloading the entire file and don't need to deliver the first bytes early, you can skip fetching the block-hash-tree and just recompute it entirely. The bandwidth overhead then reduces to the transverse share-hash-tree, which isO(k*ln(N))
. Since N is generally small, and filesizes can be large, this is a win. Of course if you want alacrity, you need the block-hash-tree Merkle chain for each segment, so you must downloadO(k*ln(filesize))
bytes for the first segment, and somewhat less for subsequent ones, requiring something likeO(filesize*k*ln(N))/2
for a complete linear read.It might be fair to say we've optimized for single-segment reads, at the expense of mutable updates.