store less validation information in each share, to lower overhead #87
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#87
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Once we have confidence in our FEC and decryption code, we may feel comfortable removing the extra validation data from the shares. This would reduce our per-share storage overhead, and slightly reduce the per-file transmission overhead.
This would remove the plaintext hash (32B), the plaintext hash tree (32B * 2 * ceil(filesize/2MB)), the crypttext hash tree (same), and the crypttext hash (32B). For small files (less than 2MB), this would reduce the per-share overhead from 846 bytes to 718 bytes.
We would certainly want to implement #86 if we did this, to retain the ability to detect a mis-typed URI (using the wrong decryption key), since without a plaintext hash we'd have no other way to detect such corruption.
We've resolved #86 as wontfix (there isn't any danger of getting ciphertext back from tahoe if you start with the wrong encryption key, since the storage index is derived from the encryption key).
There is another issue: it would be nice to have validation of ciphertext -- separately from validation of shares -- so that someone could write a client which isn't capable of erasure decoding, but is capable of checking the validation of the ciphertext, and which connects to a server that does the erasure decoding for it and gives it the ciphertext.
Personally, I'm not motivated by this need. I want to make the tahoe client itself efficient, well-packaged, and well-behaved enough that people who want to download data from tahoe while retaining confidentiality of their files simply run a tahoe client.
Furthermore, even if we are going to support a non-erasure-decoding-but-ciphertext-validating (and perhaps therefore also ciphertext-decrypting) client in the future, I suspect it will be okay to add validation on the ciphertext back in when we know that we'll need it.
So I'd be happy at this point to move ahead with this and leave in only the parts that we currently need.
We have already removed the plaintext hash and plaintext hash tree in order to avoid a failure of confidentiality.
Replying to zooko:
(#453 asks to put back a per-file (not per-share) plaintext hash, in order to improve integrity in case of any problem with the FEC decoding or decryption. Also ticket:658#comment:60874 points out how this can be used to avoid redundant uploads/downloads.)
So the remaining part of this ticket asks to remove the per-share ciphertext hashes. However, I don't agree that it is a good idea to remove those: until we have #453, they are providing useful additional robustness in case of implementation error. Also, the saving for small files from removing them is only 64 bytes. Furthermore, without these hashes how would a share be fully verified by a verify cap holder, or the storage server? I suggest resolving wontfix.
Thanks, David-Sarah.