consider share-at-a-time uploader #1340
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1340
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As discussed in this thread:
http://tahoe-lafs.org/pipermail/tahoe-dev/2011-January/005963.html
(in response to discussion about UIs for #467 static-server-selection),
Chris Palmer described Octavia's is-it-uploaded-yet UI. I liked it, and
I'd like to explore how we might achieve something similar in Tahoe.
This ticket is to make sure we don't forget the idea.
The basic goal is to give each upload a sort of "red/yellow/green-light"
status indicator, showing how healthy/robust the file is. When the
upload first starts, it stays in the "red" state until there are enough
shares pushed to recover the file if it were to be deleted from the
original computer, then transitions into the "yellow" state. In that
state, more shares are uploaded until we've achieved the desired
diversity, at which point it goes "green".
The key differences from our current uploader:
"Nup" (closer to our old N value) which indicates how many shares we
want to upload. The idea is that we could create more shares if we
wanted to, without changing the encoding parameters or the filecaps.
over it, instead of the almost-streaming approach we have now
only uploads 'k' shares (discarding the rest)
again, but only upload some other subset of the shares, maybe 'k' at
a time, or maybe more.
Nup shares
is shut down, resume generating and uploading shares when it comes
back
red/yellow/green state of all current uploads. This lets users know
when it's safe to close their laptops.
The general notion is to sacrifice some efficiency to reduce the time
needed to get to a mostly-safe upload. The original downloader generates
and uploads all shares in parallel, which eliminates wasted CPU cycles
and almost doesn't need the disk (and #320 plus some new CHK format
could make it fully streaming). But if the process is interrupted before
the very last close() is sent to each share, the whole upload fails and
must be started again from scratch. This new uploader would need more
local disk storage (to handle multiple passes), and waste some amount of
CPU (encoding shares that were then discarded, unless they too were
stored on local disk, and we learned from Tahoe's predecessor that disks
don't do matrix transpositions well), but would get the file to at least
a recoverable state in about the same time a normal non-encoded FTP
upload would have finished, and then gets better after that.
(caching the hashtrees would save some of the hashing time on the second
pass, which may or may not be a win, since hashing is generally pretty
quick too)
Some assumptions that must be tested before this scheme is at all
realistic:
cheap
keeping the encoding space in reserve for the future) is cheap
At present,
zfec
doesn't have an API to create fewer than Nshares: you have to make all of them and then throw some away. It might
be possible to enhance
zfec
to allow this (and of course do itfaster than create-and-discard: ideally, the CPU time would be
proportional to the number of shares we retain), but I haven't looked at
the Reed-Solomon encoding scheme enough to tell. If so, then the second
and subsequent passes could avoid the encoding waste (we'd still
generate each share twice: once on the first pass to build the hash
trees, and a second time on some subsequent pass when we actually push
that share).
Octavia uses pure replication (k=1), which removes the question of
encoding overhead, so it's a bit easier to use this scheme in Octavia
than in Tahoe.
The big big win of this approach is the UI. The k-of-N encoding
parameters don't matter quite so much when you can keep creating more
shares until you're happy with their distribution, so there might be
less need for users to choose k/N instead of sticking with the defaults.
And the "red/yellow/green" status light is a dead-simple UI indicator,
like the way Dropbox tells you when it's done moving data around.
The next step is probably to do some zfec performance tests, to find out
what setting N=25 (and then discarding the extra shares) would do to our
upload speed.
I like it! See related tickets #678 (converge same file, same K, different M) and #711 (repair to different levels of M).