verifierid as storage index: not the whole story #5
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#5
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We've talked on and off about what key we should be using when looking up shares (the index sent to RIStorageServer.get_buckets). We're currently using the VerifierId. I'm wondering if we should be using some combination of the verifierid and the encoding parameters to make sure that this index consistently maps to the same set of shares, rather than merely shares generated from the same data.
The peer selection algorithm forces us to pick exactly one index value.
Pros and cons of different index values:
FileId:
VerifierId:
So I'm thinking that the share index needs to be verifierid plus a serialized representation of the encoding parameters. The serialized parameters can be compressed by just saying "v1" and havin that imply a certain algorithm applied to the filesize, but that should still give us the ability to change encoding parameters in the future and not wind up with incompatible shares that appear identical from the perspective of get_buckets().
There is certain information that needs to go into peer selection (depending upon the algorithm). The verifierid is one of them, the number of shares that were uploaded is another (at least for PeerSelection/TahoeThree and PeerSelection/DenverAirport .. PeerSelection/TahoeTwo does not need it). There is some information that can affect the shares being generated without influencing peer selection (like segment size): this data could be stored on the peers and retrieved at download time. Peers could store shares from multiple encoded forms of the same crypttext. The download process would involve the downloader asking a set of likely peers about a verifierid, and learning of a set of encoded forms, such that the peer has buckets for some forms and not others. The response that provides a list of encoded forms includes the encoding parameters, so the downloader could learn about how many buckets for that form it needs to recover the file. The second step would be to pick one form and retrieve references to sufficient buckets for that form, then finally the data could be fetched and decoded.
fix some wikinames
we can probably put this one off for a little while. If the storage index is randomly generated (or derived from something randomly generated, like the readkey), then this isn't a problem. We could also say that the storage index should be the hash of (readkey, encoding parameters).
currently (in, say, source:src/allmydata/upload.py@1000) the Uploadable is responsible for generating the readkey, and it is suggested that convergent uploads use a hash of the file's contents and the desired encoding parameters. We don't do that quite yet, but if we did, then the readkey would be different for different encodings of the same file, and we'd have the properties that we want.
Nowadays the storage index is the secure hash of the encryption key. Closing as fixed.