let the get_buckets() response include the first block #1109
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1109
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We could optimize out a round trip in download if, when asking whether a storage server has blocks, by sending it a
get_buckets()
, you could optionally specify that if it does have one of those then could it please send you one of them. Then the reply would be a set of buckets (remote references to an object on the server) plus the full data of one block.Brian: what do you think?
Yeah, in general, I think a more stateless immutable share-read interface would be better. The mutable share interface (which was written about 6 months later) is stateless, and that makes life a bit easier. That interface takes a read vector and a set of share numbers, with an empty set meaning "all shares that you are holding", and I think the same interface would work for immutable files.
Such an interface assumes that the server can efficiently open+seek+read the same file several times in quick succession (closing the filehandle between each call), whereas the current stateful interface keeps the filehandle open for the entire download. I suspect that most modern OS/filesystems cache recently-opened files and make this fairly quick, but it's worth doing some benchmarks to be certain. Also, we'd want to consider the interaction with GC and/or external tools which delete shares: keeping the filehandle open means a download will survive the share being deleted in the middle, whereas a stateless interface would not.
A stateless interface would also make us slightly more resistant to a DoS attack in which the attacker opens lots of shares at once and tries to fill the file-descriptor table.
I'd want to leave the signature of get_buckets() alone, and add a new method instead. The server-version-information dictionary could be used to advertise the availability of such a method.
And overall, yeah, I'd like to optimize out that extra round trip, because my new downloader (#798) can currently retrieve a small file in just two roundtrips, and with this fix we could get that down to just a single roundtrip, which would be great.
I was probably a bit over-enthusiastic about using Foolscap remote references when I wrote the immutable interface... incidentally, one conceivable benefit of the stateful interface could come up in server-driven share migration. That code (living in server A) could talk to server B and send it a "please copy my share" message, passing it the remote-reference to the share's
BucketReader
, or a client could use Foolscap's third-party-reference ("Gifts") feature to let A and B move the data directly between themselves without requiring client-side bandwidth for the copy. Of course, since all shares are publically readable, there's no authority-reducing benefit to doing it with bucket references over simply telling someone the storage-index and having them do the reads themselves. But when Accounting shows up, it might wind up to be handy to have a bucket-plus-read-authority object available to pass around.So perhaps for v1.8.0 we would add a new method called something like
get_buckets_data()
which implements the stateless interface, like [slot_readv()]source:src/allmydata/interfaces.py@4410#L165. Then new downloaders could use that, perhaps invoking that on the firstK
servers that they know of which support it, and invokingget_buckets()
on the remainingN-K
servers which they will need to use only if some of the firstK
servers fail.