store copy of block-hash-chain with each block #1687
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1687
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
One idea that Zooko and I came up with at pycon last week was to make
downloads more efficient by rearranging the shares to append a copy
of the necessary block-hash-tree nodes (its "uncle chain") to each
block.
The current share layout puts the whole block-hash-tree at the end of
the share. To read block 0, the client must fetch the block data,
plus log(N) hashes from the BHT. If it then reads block 1, it doesn't
need to get any new hashes (it already has the necessary internal
nodes). Reading block 2 requires it to get one new hash. The number
of hash nodes needed for each block of a sequential read follows a
funny sort of inverse-binary pattern: I think it equals the number of
bits that change between a binary representation of the current
segnum and the previous one, so it's log(N)-ish large for large
powers of two, and small for every-other segment. This is pretty
visible in the download-timeline visualizer layout. This layout
minimizes the number of bytes needed to download any given segment
(download alacrity), and minimizes the storage overhead spent on
integrity data : 2*N hashes (where N is the number of segments:
ceil(filesize/128KiB)).
The overhead of fetching those extra nodes has a noticeable effect on
the speed of the new immutable downloader. It's not the number of
bytes being fetched, but rather some per-request overhead: either
Foolscap serialization time, the method-invocation overhead, or
perhaps the extra disk IO (although I expect the filesystem to cache
that part pretty effectively).
We could instead put an extra copy of each chain on each block. So
block 0 would immediately be followed by the log(N)-long uncle-chain
necessary to do an independent "cold" verification of block 0, block
1 would be followed by the chain for block 1, etc. The hashes would
be sorted to put the higher-level nodes at the end, so that you'd
only need to read to the end of the chain on the first block.
Then the downloader could make exactly one contiguous read for block
0, a shorter single read for block 1, etc. The storage overhead would
grow: I think it would need Nln2(N) hashes, maybe N(1+ln2(N)), not
quite sure. But I think this would be pretty reasonable.. I need to
build a simulator and measure it.
The biggest problem is that this requires random-access during the
write process: the hash nodes aren't available until after the whole
file has been processed. So we'd need to enhance the storage-server
protocol to let you seek backwards to an earlier position in the file
and overwrite some placeholder bytes that you put there earlier. Or
define shares to be made up of multiple chunks and allow uploaders to
write those chunks out-of-order. Either way complicates the
share-upload process, and probably makes certain backends (i.e. S3)
trickier.
We could also have the client encode the entire file locally, fill in
the hash nodes, then upload the resulting shares. But our experience
with Tahoe's predecessor (the Allmydata derivative of the
!Mnet/Hivecache that we retroactively named "Mountain View") was that
this gives horrible disk access patterns: it was like filling in a
grid along one axis, then reading it back out along the other, and
the disk went into a massive seek frenzy. To fix this in Tahoe, we
moved the matrix-transposition burden to the network, and stream each
segment as we finish encoding it. So I'd definitely go for an updated
write-seek-write-close storage protocol before trying to encode the
whole file on the client side.
Closely related to #1543; maybe we should merge these tickets.