got 'WrongSegmentError' during repair #1223
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1223
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As I was working to improve the logging of 'tahoe deep-check' and 'tahoe check' (another ticket coming soon), I manually deleted shares from 22 different tahoe nodes to manually trigger a repair.
Encoding parameters of this file were N=66 and K=22.
The complete debug log as extracted by 'flogtool' is attached to this ticket.
Attachment debug.log (743646 bytes) added
(reformatted the 'tahoe check' output a bit for easier display)
Francois notes that the filesize was 135 bytes.
gleaned so far: the file has one segment. The repairer starts with a
get_segsize()
, which is currently lazily-implemented asget_segment(0)
. Log messages up through 2864211 are theget_segment(0)
, at which point the upload process starts, and spendsthrough 2864212 performing upload-share-placement.
The weird bit starts on message 2864212, where the repairer performs a 7-byte
read. It's as if the repairer is confused about the segment size (or the
repairer's uploader is confused about what a good chunksize should be), and
does a bunch of tiny reads instead of one whole segment. That's the first
problem, but it's merely a performance issue, not fatal.
The fatal problem is some sort of fencepost error. Grepping for "Segmentation
got data" shows a series of 7-byte reads that ends badly (remembering that
this is a 135-byte file):
The
[133-135)
should have been the last read, but for some reason itwent further and did that bogus
[140-135)
read. The "140" offset isbeyond the end of the file, and of course having a negative size is also a
problem.
The same file repair worked perfectly well with 1.7.1.
next observations: the Encoder (immutable/encode.py) is doing
input_piece_size = codec.get_block_size()
, which is then used as the read-size passed touploadable.read_encrypted
. I think this is much too small (by a factor of 'k').. we should probably be reading a full segment at a time.The docs for
CRSEncoder.encode
say that it takes a list of 'k' strings, each of length X where X*k=data_size, which should be the same ascodec.get_block_size
. The code inEncoder._gather_data
is doing a batch of 'k' reads, each of that same size. So even though we're doing a bunch of tiny reads, that code is doing the right thing.(idea for speed improvement: change
_gather_data
to do larger reads, merge the arbitrary-sized response chunks into one big string, then split intoget_block_size()
strings. If eachread_encrypted
is expensive, this will help a lot, although I don't think there are any places whereread_encrypted
turns into a network call, since the Helper transfers the whole file to disk first before uploading)So the next step is to investigate the upload process, to see why it appears to be overrunning the input file. Did it not invoke the tail-segment gathering code? Does it always overread, and we're only seeing a problem because the new-downloader's uploadable doesn't tolerate overreads?
oh, and a note about performance: each of those 7-byte reads is causing a whole (154-byte) segment to be fetched (and then, after decoding, most of the segment is discarded). The new-downloader caches the hash trees, so fetching a segment requires one round-trip. On Francois' grid, which seems pretty fast, it takes about 100ms for each segment. Repairing the whole 135-byte file with those tiny reads then takes about 5 seconds overall, which is horrible.
We could either add a one-segment cache to the downloader, so that you can do crazily-short sequential reads without so much overhead, or change the uploader to do larger (full-segment) reads, or both.
Replying to warner:
in [_encode_segment()]source:trunk/src/allmydata/immutable/encode.py?rev=4308&annotate=blame#L299
in [_gather_data()]source:trunk/src/allmydata/immutable/encode.py?rev=4308&annotate=blame#L370
Agreed—it looks like:
should have been:
also the tests for correctness below need to be changed:
should be:
Or even better I guess we should put at the top of the function:
and then the test code would say:
Okay it is my bed-time so I'm stopping here for now. Still need to inspect the rest of Brian's notes and see if I can do the next step: investigate the upload process.
so I think there are two problems: one performance-harming, one crashing:
the read_encrypted(input_chunk_size) will read about segsize/k bytes at a time. When k=3 this isn't too bad, but when k=22, it starts to hurt (especially for a small file, or a short tail segment of a multiple-segment file). zfec wants data in pieces of that size (input_chunk_size comes from zfec), but that doesn't mean we have to do
read_encrypted()
in chunks of that size.we should read full segments at a time, and then split them into
input_chunk_size
pieces after hashing. If we do it right, I don't think this will increase the memory footprint, although it will add another brief window where there's an extra 1*segsize in use (during the split and before we free the original segsize-sized buffer).this performance problem is more significant with new-downloader than with old-downloader. I think old-downloader would have cached the ciphertext onto disk (the old
CacheFileManager
or something). A client doing sequential reads of small ranges would trigger a full download and then read tiny chunks off disk (really out of the kernel's dirty FS buffers) at RAM speeds. But new-downloader, which doesn't use a disk cache, responds to each read() call by fetching a single segment. But since it doesn't cache those segments anywhere, a client which does a lot of tiny reads will trigger a lot of segment fetches, taking a round-trip each.the Repairer has probably been reading past the end of the input file all along. The old-downloader tolerated this (because it was really just reading the cachefile from disk). But the new-downloader does not, and in fact crashes when you ask it to
read()
with a starting offset that is beyond the end of the filewe should first fix Repairer to not do this, then we should fix new-downloader to tolerate it or at least raise a sensible exception
Attachment 1223.diff (9769 bytes) added
potential fix
this patch adds a test and fixes the issue. I'd like another pair of eyeballs on it, specifically because it changes the way that
read_encrypted
is called (segment-at-a-time instead of inputchunksize-at-a-time), and I'm concerned about inducing a fencepost error. But I think this new approach is a lot better: fewer reads, about the same memory footprint, remove some recursion. The only tiny drawback is that Uploadables (and other things that provideread_encrypted()
, like the Repairer, and the Offloaded uploadable) are now constrained to act more like regular disk files: if you do aread_encrypted(123)
, they are obligated to give you exactly 123 bytes, unless you're at EOF, in which case they can give you less than you asked for exactly once. Previously we weren't imposing this requirement, allowing Uploadables to return random amounts of data on eachread_encrypted()
call.Your patch contains the performance improvements discussed in the first part of comment:80491, and makes the new downloader more able to handle read requests beyond EOF than it was before, as mentioned in the second part of comment:80491, but doesn't contain any fixes for whatever repairer code was responsible for the weird offsets to begin with, unless I'm missing something obvious.
Other than that, it looks good to me; I didn't see any newly-introduced fencepost errors, nor any other issues.
Brian, is the idea to apply this patch before actually fixing the repairer code? Or shall we wait?
we can go ahead and apply this patch. The overrun (which is actually a behavior of the uploader, in the way it calls
read_encrypted
, rather than the repairer, which just glues an uploader to a downloader) is ok, now that the interface has changed to require that Uploadables tolerate overrun. We could change the uploader to carefully track how much data they've requested and reduce their readsize when they get close to the end, but I don't think it's worth it (it would make the code more complex, and now it only overruns once, at the very end).Kevan: thanks for the review! I'll land it now.
In changeset:c18953c169fe9e7c:
This needs a source:NEWS entry.
Attachment news-1223.dpatch (11424 bytes) added
Replying to zooko:
Here is it, please note that the patch bundle also contains the NEWS entry for #1045 because of darcs dependency handling.
source:NEWS entry in changeset:99a6e63814226fc2.