store less validation information in each share, to lower overhead #87

Closed
opened 2007-07-12 18:53:09 +00:00 by warner · 4 comments

Once we have confidence in our FEC and decryption code, we may feel comfortable removing the extra validation data from the shares. This would reduce our per-share storage overhead, and slightly reduce the per-file transmission overhead.

This would remove the plaintext hash (32B), the plaintext hash tree (32B * 2 * ceil(filesize/2MB)), the crypttext hash tree (same), and the crypttext hash (32B). For small files (less than 2MB), this would reduce the per-share overhead from 846 bytes to 718 bytes.

We would certainly want to implement #86 if we did this, to retain the ability to detect a mis-typed URI (using the wrong decryption key), since without a plaintext hash we'd have no other way to detect such corruption.

Once we have confidence in our FEC and decryption code, we may feel comfortable removing the extra validation data from the shares. This would reduce our per-share storage overhead, and slightly reduce the per-file transmission overhead. This would remove the plaintext hash (32B), the plaintext hash tree (32B * 2 * ceil(filesize/2MB)), the crypttext hash tree (same), and the crypttext hash (32B). For small files (less than 2MB), this would reduce the per-share overhead from 846 bytes to 718 bytes. We would certainly want to implement #86 if we did this, to retain the ability to detect a mis-typed URI (using the wrong decryption key), since without a plaintext hash we'd have no other way to detect such corruption.
warner added the
code
minor
enhancement
0.4.0
labels 2007-07-12 18:53:09 +00:00
warner added this to the eventually milestone 2007-07-12 18:53:09 +00:00
warner added
code-encoding
and removed
code
labels 2007-08-14 19:00:17 +00:00
zooko added
0.6.0
and removed
0.4.0
labels 2007-09-25 04:36:19 +00:00

We've resolved #86 as wontfix (there isn't any danger of getting ciphertext back from tahoe if you start with the wrong encryption key, since the storage index is derived from the encryption key).

There is another issue: it would be nice to have validation of ciphertext -- separately from validation of shares -- so that someone could write a client which isn't capable of erasure decoding, but is capable of checking the validation of the ciphertext, and which connects to a server that does the erasure decoding for it and gives it the ciphertext.

Personally, I'm not motivated by this need. I want to make the tahoe client itself efficient, well-packaged, and well-behaved enough that people who want to download data from tahoe while retaining confidentiality of their files simply run a tahoe client.

Furthermore, even if we are going to support a non-erasure-decoding-but-ciphertext-validating (and perhaps therefore also ciphertext-decrypting) client in the future, I suspect it will be okay to add validation on the ciphertext back in when we know that we'll need it.

So I'd be happy at this point to move ahead with this and leave in only the parts that we currently need.

We've resolved #86 as wontfix (there isn't any danger of getting ciphertext back from tahoe if you start with the wrong encryption key, since the storage index is derived from the encryption key). There is another issue: it would be nice to have validation of ciphertext -- separately from validation of shares -- so that someone could write a client which isn't capable of erasure decoding, but is capable of checking the validation of the ciphertext, and which connects to a server that does the erasure decoding for it and gives it the ciphertext. Personally, I'm not motivated by this need. I want to make the tahoe client itself efficient, well-packaged, and well-behaved enough that people who want to download data from tahoe while retaining confidentiality of their files simply run a tahoe client. Furthermore, even if we are going to support a non-erasure-decoding-but-ciphertext-validating (and perhaps therefore also ciphertext-decrypting) client in the future, I suspect it will be okay to add validation on the ciphertext back in when we know that we'll need it. So I'd be happy at this point to move ahead with this and leave in only the parts that we currently need.
warner modified the milestone from eventually to undecided 2008-06-01 20:52:53 +00:00

We have already removed the plaintext hash and plaintext hash tree in order to avoid a failure of confidentiality.

We have already removed the plaintext hash and plaintext hash tree in order to avoid a failure of confidentiality.
davidsarah commented 2009-12-13 01:36:54 +00:00
Owner

Replying to zooko:

We have already removed the plaintext hash and plaintext hash tree in order to avoid a failure of confidentiality.

(#453 asks to put back a per-file (not per-share) plaintext hash, in order to improve integrity in case of any problem with the FEC decoding or decryption. Also ticket:658#comment:60874 points out how this can be used to avoid redundant uploads/downloads.)

So the remaining part of this ticket asks to remove the per-share ciphertext hashes. However, I don't agree that it is a good idea to remove those: until we have #453, they are providing useful additional robustness in case of implementation error. Also, the saving for small files from removing them is only 64 bytes. Furthermore, without these hashes how would a share be fully verified by a verify cap holder, or the storage server? I suggest resolving wontfix.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/87#issuecomment-60878): > We have already removed the plaintext hash and plaintext hash tree in order to avoid a failure of confidentiality. (#453 asks to put back a **per-file** (not per-share) plaintext hash, in order to improve integrity in case of any problem with the FEC decoding or decryption. Also ticket:658#[comment:60874](/tahoe-lafs/trac-2024-07-25/issues/87#issuecomment-60874) points out how this can be used to avoid redundant uploads/downloads.) So the remaining part of this ticket asks to remove the per-share ciphertext hashes. However, I don't agree that it is a good idea to remove those: until we have #453, they are providing useful additional robustness in case of implementation error. Also, the saving for small files from removing them is only 64 bytes. Furthermore, without these hashes how would a share be fully verified by a verify cap holder, or the storage server? I suggest resolving wontfix.

Thanks, David-Sarah.

Thanks, David-Sarah.
zooko added the
wontfix
label 2009-12-13 02:33:23 +00:00
zooko closed this issue 2009-12-13 02:33:23 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#87
No description provided.