corrupted filesize in CHK filecap causes unexpected "bad hash error" #1529

Open
opened 2011-09-07 04:35:33 +00:00 by warner · 4 comments

IRC user "joepie91" noticed that changing a filecap like:

URI:CHK:wfnqcotc4ceokosmfobofjzoya:qcof5dae7usfth4zvti7fmievpsofu25lgcmdq7q2cnebg3txyfq:3:10:29680

to

URI:CHK:wfnqcotc4ceokosmfobofjzoya:qcof5dae7usfth4zvti7fmievpsofu25lgcmdq7q2cnebg3txyfq:3:10:29681

(by changing the last character, raising the filesize by one) causes the following exception to be raised:

2011-09-06 21:24:27-0700 [-] Unhandled Error
 Traceback (most recent call last):
  File "/Users/warner2/stuff/tahoe/tahoe-git-hanford/src/allmydata/immutable/downloader/node.py", line 492, in _check_ciphertext_hash
   raise BadCiphertextHashError(msg)
 allmydata.immutable.downloader.common.BadCiphertextHashError: hash failure in ciphertext_hash_tree: segnum=0, SI=lfkjg4jkgzee

I was able to duplicate it locally against current trunk.

It looks like 1: we're relying upon something in the filecap that should
merely be a hint, and 2: we're throwing a particularly weird error
message. A hash mismatch in the ciphertext_hash_tree either
indicates a failure in zfec decoding, or a deliberately malformed file
(in which the original ciphertext_hash_tree was computed over the
wrong data, then included in the UEB). So my guess is that we're
computing the predicted segment size from the filecap hint, but then not
updating it when the shares' UEBs tell us that the hint was wrong, and
trying to decode with the wrong data.

The test case for this should be pretty straightforward: upload a file
with a couple of segments, produce a mangled filecap, attempt a
download. The download ought to succeed. If we want to be picky about
the filecap being the source of truth, then we're allowed to fail, but
we need a better error message than BadHashError (maybe something
about "corrupt filecap" or "file size mismatch"). But I think success is
arguably more correct. (note: the same argument may apply to mismatches
in k and N).

IRC user "joepie91" noticed that changing a filecap like: `URI:CHK:wfnqcotc4ceokosmfobofjzoya:qcof5dae7usfth4zvti7fmievpsofu25lgcmdq7q2cnebg3txyfq:3:10:29680` to `URI:CHK:wfnqcotc4ceokosmfobofjzoya:qcof5dae7usfth4zvti7fmievpsofu25lgcmdq7q2cnebg3txyfq:3:10:29681` (by changing the last character, raising the filesize by one) causes the following exception to be raised: ``` 2011-09-06 21:24:27-0700 [-] Unhandled Error Traceback (most recent call last): File "/Users/warner2/stuff/tahoe/tahoe-git-hanford/src/allmydata/immutable/downloader/node.py", line 492, in _check_ciphertext_hash raise BadCiphertextHashError(msg) allmydata.immutable.downloader.common.BadCiphertextHashError: hash failure in ciphertext_hash_tree: segnum=0, SI=lfkjg4jkgzee ``` I was able to duplicate it locally against current trunk. It looks like 1: we're relying upon something in the filecap that should merely be a hint, and 2: we're throwing a particularly weird error message. A hash mismatch in the `ciphertext_hash_tree` either indicates a failure in zfec decoding, or a deliberately malformed file (in which the original `ciphertext_hash_tree` was computed over the wrong data, then included in the UEB). So my guess is that we're computing the predicted segment size from the filecap hint, but then not updating it when the shares' UEBs tell us that the hint was wrong, and trying to decode with the wrong data. The test case for this should be pretty straightforward: upload a file with a couple of segments, produce a mangled filecap, attempt a download. The download ought to succeed. If we want to be picky about the filecap being the source of truth, then we're allowed to fail, but we need a better error message than `BadHashError` (maybe something about "corrupt filecap" or "file size mismatch"). But I think success is arguably more correct. (note: the same argument may apply to mismatches in k and N).
warner added the
code-encoding
major
defect
1.9.0a1
labels 2011-09-07 04:35:33 +00:00
warner added this to the soon milestone 2011-09-07 04:35:33 +00:00

Brian: I think we are supposed to rely on that number in the cap--it is canonical and not just a hint.

Although actually now that I think about it, that was one of those things that I "tightened" and that you, I think, changed back to some degree when you write Brian's New Downloader.

IIRC my "tightened" interpretation of all those fields is still intact in the verifier but not in your new downloader. Might be interesting to see what the verifier says about that cap.

Brian: I think we *are* supposed to rely on that number in the cap--it is canonical and not just a hint. Although actually now that I think about it, that was one of those things that I "tightened" and that you, I think, changed back to some degree when you write Brian's New Downloader. IIRC my "tightened" interpretation of all those fields is still intact in the verifier but not in your new downloader. Might be interesting to see what the verifier says about that cap.

I reproduced this by uploading a file to Test Grid and downloading it and then changing the file size and re-downloading it. Then I ran the verifier on it and got an informative error message:

$ tahoe put /tmp/Hu-2010-Are_refined_carbohydrates_worse_than_saturated_fat\?.pdf
URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38424
$ tahoe get URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38424 > testdl
$ sha256sum testdl /tmp/Hu-2010-Are_refined_carbohydrates_worse_than_saturated_fat\?.pdf 
4ec083f719a6c77fa46e7e668d39a74357e295d54120560ce44963d4dd8512be  testdl
4ec083f719a6c77fa46e7e668d39a74357e295d54120560ce44963d4dd8512be  /tmp/Hu-2010-Are_refined_carbohydrates_worse_than_saturated_fat?.pdf
$ tahoe get URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38425 > testdl
Error during GET: 410 Gone
"NotEnoughSharesError: This indicates that some servers were unavailable, or that shares have been lost to server departure, hard drive failure, or disk corruption. You should perform a filecheck on this object to learn more.\x0a\x0aThe full error message is:\x0aran out of shares: complete= pending=Share(sh4-on-sroojqcx) overdue= unused= need 3. Last failure: None"
$ tahoe check  URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38425

Summary: Healthy
 storage index: krwpuniahfvg3oivybopc3h2eu
 good-shares: 5 (encoding is 3-of-5)
 wrong-shares: 0
$ tahoe check --verify  URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38425
ERROR: 500 Internal Server Error
"Traceback (most recent call last):\x0a  File \"/usr/local/lib/python2.7/dist-packages/foolscap-0.6.1-py2.7.egg/foolscap/call.py\", line 674, in _done\x0a    self.request.complete(res)\x0a  File \"/usr/local/lib/python2.7/dist-packages/foolscap-0.6.1-py2.7.egg/foolscap/call.py\", line 60, in complete\x0a    self.deferred.callback(res)\x0a  File \"/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py\", line 361, in callback\x0a    self._startRunCallbacks(result)\x0a  File \"/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py\", line 455, in _startRunCallbacks\x0a    self._runCallbacks()\x0a--- <exception caught here> ---\x0a  File \"/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py\", line 542, in _runCallbacks\x0a    current.result = callback(current.result, *args, **kw)\x0a  File \"/usr/local/lib/python2.7/dist-packages/allmydata/immutable/checker.py\", line 154, in _parse_and_validate\x0a    self.segment_size, self._verifycap.needed_shares))\x0aallmydata.immutable.checker.BadURIExtension: inconsistent erasure code params: utcpss: 38424 != self.tail_segment_size: 3, self._verifycap.size: 38425, self.segment_size: 38424, self._verifycap.needed_shares: 3\x0a"

The informative part on the end there says:
allmydata.immutable.checker.BadURIExtension: inconsistent erasure code params: utcpss: 38424 != self.tail_segment_size: 3, self._verifycap.size: 38425, self.segment_size: 38424, self._verifycap.needed_shares: 3

I reproduced this by uploading a file to Test Grid and downloading it and then changing the file size and re-downloading it. Then I ran the verifier on it and got an informative error message: ``` $ tahoe put /tmp/Hu-2010-Are_refined_carbohydrates_worse_than_saturated_fat\?.pdf URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38424 $ tahoe get URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38424 > testdl $ sha256sum testdl /tmp/Hu-2010-Are_refined_carbohydrates_worse_than_saturated_fat\?.pdf 4ec083f719a6c77fa46e7e668d39a74357e295d54120560ce44963d4dd8512be testdl 4ec083f719a6c77fa46e7e668d39a74357e295d54120560ce44963d4dd8512be /tmp/Hu-2010-Are_refined_carbohydrates_worse_than_saturated_fat?.pdf $ tahoe get URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38425 > testdl Error during GET: 410 Gone "NotEnoughSharesError: This indicates that some servers were unavailable, or that shares have been lost to server departure, hard drive failure, or disk corruption. You should perform a filecheck on this object to learn more.\x0a\x0aThe full error message is:\x0aran out of shares: complete= pending=Share(sh4-on-sroojqcx) overdue= unused= need 3. Last failure: None" $ tahoe check URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38425 Summary: Healthy storage index: krwpuniahfvg3oivybopc3h2eu good-shares: 5 (encoding is 3-of-5) wrong-shares: 0 $ tahoe check --verify URI:CHK:wrfacn3miglizt4zpleoxwe2de:qo52uzx5vvfq3p5ftenbwuy6h7ek2cgpf5afvrurg33ooug2qlka:3:5:38425 ERROR: 500 Internal Server Error "Traceback (most recent call last):\x0a File \"/usr/local/lib/python2.7/dist-packages/foolscap-0.6.1-py2.7.egg/foolscap/call.py\", line 674, in _done\x0a self.request.complete(res)\x0a File \"/usr/local/lib/python2.7/dist-packages/foolscap-0.6.1-py2.7.egg/foolscap/call.py\", line 60, in complete\x0a self.deferred.callback(res)\x0a File \"/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py\", line 361, in callback\x0a self._startRunCallbacks(result)\x0a File \"/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py\", line 455, in _startRunCallbacks\x0a self._runCallbacks()\x0a--- <exception caught here> ---\x0a File \"/usr/local/lib/python2.7/dist-packages/Twisted-11.0.0-py2.7-linux-x86_64.egg/twisted/internet/defer.py\", line 542, in _runCallbacks\x0a current.result = callback(current.result, *args, **kw)\x0a File \"/usr/local/lib/python2.7/dist-packages/allmydata/immutable/checker.py\", line 154, in _parse_and_validate\x0a self.segment_size, self._verifycap.needed_shares))\x0aallmydata.immutable.checker.BadURIExtension: inconsistent erasure code params: utcpss: 38424 != self.tail_segment_size: 3, self._verifycap.size: 38425, self.segment_size: 38424, self._verifycap.needed_shares: 3\x0a" ``` The informative part on the end there says: `allmydata.immutable.checker.BadURIExtension`: inconsistent erasure code params: `utcpss: 38424` != `self.tail_segment_size: 3`, `self._verifycap.size: 38425`, `self.segment_size: 38424`, `self._verifycap.needed_shares: 3`
Author

After discussing it with Zooko, I agree with his description: the filecap is authoritative/canonical. If we fetch a share that contains a UEB with a filesize that disagrees with the filecap, we should treat that share as invalid, and throw a useful error (FilecapFilesizeMismatchError?, although BadURIExtension is even better). The filecap's filesize is not just a hint.

The "new" immutable downloader currently ignores all the share fields it doesn't need, including the UEB's filesize field. In this case, I think it's better that the downloader explicitly (one might say gratuitously) compares the UEB's filesize against the filecap's filesize and throw FilecapFilesizeMismatchError, rather than pretending that the UEB's filesize is a mistake and attempting to decode the file with the filecap's filesize (which will result in the unhelpful ciphertext_hash_tree error in this ticket). Think of it as bringing the error forward to a better place.

Note that the root issue here is that the filesize is redundant: the file can be completely specified by just the readkey and the UEB hash. But it's awfully handy to include the filesize, k, and N in there, both to speed up downloads (parallelizing an appropriate number of DYHB queries) and to let deep-size run without actually fetching file data. Ideally, filecaps would only contain the minimal set of data necessary to retrieve the file. Having extra data there means we have to specify what happens when the extra data in the filecap disagrees with the data in a validated share, which means doing more work during download that can only possibly cause a failure.

After discussing it with Zooko, I agree with his description: the filecap is authoritative/canonical. If we fetch a share that contains a UEB with a filesize that disagrees with the filecap, we should treat that share as invalid, and throw a useful error (`FilecapFilesizeMismatchError`?, although `BadURIExtension` is even better). The filecap's filesize is not just a hint. The "new" immutable downloader currently ignores all the share fields it doesn't need, including the UEB's filesize field. In this case, I think it's better that the downloader explicitly (one might say gratuitously) compares the UEB's filesize against the filecap's filesize and throw `FilecapFilesizeMismatchError`, rather than pretending that the UEB's filesize is a mistake and attempting to decode the file with the filecap's filesize (which will result in the unhelpful ciphertext_hash_tree error in this ticket). Think of it as bringing the error forward to a better place. Note that the root issue here is that the filesize is redundant: the file can be completely specified by just the readkey and the UEB hash. But it's awfully handy to include the filesize, k, and N in there, both to speed up downloads (parallelizing an appropriate number of DYHB queries) and to let deep-size run without actually fetching file data. Ideally, filecaps would only contain the minimal set of data necessary to retrieve the file. Having extra data there means we have to specify what happens when the extra data in the filecap disagrees with the data in a validated share, which means doing more work during download that can only possibly cause a failure.
davidsarah commented 2012-03-12 19:08:08 +00:00
Owner

Replying to warner:

The "new" immutable downloader currently ignores all the share fields it doesn't need, including the UEB's filesize field. In this case, I think it's better that the downloader explicitly (one might say gratuitously) compares the UEB's filesize against the filecap's filesize and throw FilecapFilesizeMismatchError, rather than pretending that the UEB's filesize is a mistake and attempting to decode the file with the filecap's filesize (which will result in the unhelpful ciphertext_hash_tree error in this ticket). Think of it as bringing the error forward to a better place.

+1.

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/1529#issuecomment-85583): > The "new" immutable downloader currently ignores all the share fields it doesn't need, including the UEB's filesize field. In this case, I think it's better that the downloader explicitly (one might say gratuitously) compares the UEB's filesize against the filecap's filesize and throw `FilecapFilesizeMismatchError`, rather than pretending that the UEB's filesize is a mistake and attempting to decode the file with the filecap's filesize (which will result in the unhelpful ciphertext_hash_tree error in this ticket). Think of it as bringing the error forward to a better place. +1.
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1529
No description provided.