resumption of incomplete transfers #218
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#218
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Peter mentioned to me that an important operational issue is resumption of large file transfers that are interrupted by network flapping.
To do this, we change storage servers so that they no longer delete the "incoming" data that was incompletely uploaded, on detection of a connection break. Then we extend the upload protocol so that uploaders learn about which blocks of a share are already present on the server and they don't re-upload those blocks.
Likewise on download.
That's an important user-facing feature. There are a couple of different
places where it might be implemented, some more appropriate that others. What
matters to the user is that their short-lived network link be usable to
upload or download large files.. they don't really care how exactly this
takes place.
The three places where I can see this happening (on upload) are:
shares to it, then stops. At some point later, another
node (possibly the same one) does the same thing. It
might be nice to have the second node learn about the
partial share and avoid re-uploading that data.
data and then returns a URI. We could say that PUTs to a child
name (PUT /uri/$DIRURI/foo.jpg) respond to early termination
by uploading the partial data anyways (and adding the
resulting URI to the directory), then a later PUT with some
Content-Range header that signals we want to modify (append
to) the existing data means the client node should download
that data, append the new data to it, then re-upload the whole
thing, then finally replace the partial child URI with the
whole one. Ick.
some sort, then do PUTs to that handle,
then close it, similar to the xmlrpc-based
webfront API we use on MV right now.
For download, things are a bit easier, since we can basically do
random-access reads from CHK files, and the HTTP GET syntax can pass a
Content-Range header that tells us which part of the file they want to read. We just have to implement support for that.
I'm probably leaning towards the third option (something above PUT), but it
depends a lot upon what sort of deployment options we're looking at and which
clients are stuck behind the flapping network link.
I believe (correct me if I'm wrong) the current thinking is that this feature
will be provided through the Offloaded Uploader (#116), operating in a
spool-to-disk-before-encode mode.
The idea is that the client (who has a full copy of the file and has done one
read pass to compute the encryption key and storage index) sends the SI to
the helper, which checks the appropriate storage servers and either says
"it's there, don't send me anything", "it isn't there, send me all your
crypttext", or "some of it is here on my local disk, send me the rest of the
crypttext". In the latter case, the helper requests the byte-range that it
still needs, repeating as necessary until it has the whole (encrypted) file
on the helper's disk. Then the helper encodes and pushes the shares. We
assume that the helper is running in a well-managed environment and neither
gets shut down frequently nor does it lose network connectivity to the
storage servers frequently. The helper is also much closer to the storage
servers, network-wise, so it is ok if an upload must be restarted as long as
the file doesn't have to be transferred over the home user's (slow) DSL line
multiple times.
This provides for the resume-interrupted-upload behavior for home users that
are running their own node (when using the Offloaded Uploader helper). This
does not help users who are running a plain web browser (and thus uploading
files with HTTP POSTs to an external web server).. to help a web browser,
we'd need an Active-X application or perhaps Flash or something. It also
doesn't help friendnet installations that do not have a helper node running
closer to the storage servers than the client. This seems like an acceptable
tradeoff.
as I read the milestones, this belongs in 0.8.0
Ok, this is now complete in the CHK upload helper. Clients which use the
helper will send their ciphertext to the helper, where it gets stored in a
holding directory (BASEDIR/helper/CHK_incoming/) until it is complete. If the
client is lost, the partial data is retained for later resumption. When the
incoming data is complete, it is moved to a different directorh
(CHK_encoding/) and then the encode+push process begins.
The #116 helper is not complete (it still does not have support for avoiding
uploads of file which are already present in the grid), but this portion of
it is, so I'm closing out this ticket.
I think we still need some sort of answer for incomplete downloads, so I'm
opening a new ticket for the download side (#288).