make SFTP frontend handle updates to MDMFs without downloading and uploading the entire file #1496
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1496
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
It appears that the current version of the #393 branch, in the SFTPD frontend, [downloads the entire MDMF file and then uploads the entire new version of it]source:ticket393-MDMF-2/src/allmydata/frontends/sftpd.py?annotate=blame&rev=5151#L815, even if the SFTP client has overwritten only a portion of it. This isn't a regression—Tahoe-LAFS v1.8 didn't have MDMF's at all, but did [the same download-entire-file-and-upload-entire-new-version]source:trunk/src/allmydata/frontends/sftpd.py?annotate=blame&rev=5127#L828 in order to let an SFTP client appear to "overwrite" a portion of an immutable file.
However, I think this should probably be considered a blocker for 1.9 final. Users could legitimately expect that the performance benefits of MDMFs -- namely spending only approximately A network usage to overwrite A bytes out of a B-byte MDMF -- will apply when the edit the file through SFTPD as well as when they [edit it through the WAPI]source:ticket393-MDMF-2/docs/frontends/webapi.rst?rev=5138#writing-uploading-a-file.
We should update [performance.rst]source:ticket393-MDMF-2/docs/performance.rst to state what the performance of MDMFs is in addition to the performance of SDMFs and immutables. If we were going to ship Tahoe-LAFS v1.9 with the current behavior (which seems like a bad idea to me at the moment), then we would need to add another section of MDMF as edited through SFTPD in addition to MDMF as edited through the WAPI.
Replying to zooko:
There are two applicable optimizations.
a) for immutable and MDMF files: download segments out-of-order, i.e. if the client tries to read from a segment beyond the last downloaded segment so far, schedule that segment to be downloaded next.
b) for MDMF files: when the SFTP file handle is closed, overwrite only segments that have changed.
I think you're talking about b). An [OverwriteableFileConsumer]source:src/allmydata/frontends/sftpd.py@5179#L294 instance already keeps track of regions that have been overwritten, but it currently discards information about regions that have also been fully downloaded, and it's slightly inconvenient to change that (because we use a heap to provide efficient access to the first remaining region that has not yet been downloaded). It's feasible to implement b) within the 1.9 schedule, but it does require some non-trivial code changes, so we'd probably want to do it before the beta.
I don't think this should be considered a blocker, though. Remember that the SFTP frontend never creates mutable files, even though it can read and write existing ones. So someone using SFTP as their main interface would rarely, if ever, be affected by the performance of MDMF as seen through SFTP.
Also, OverwriteableFileConsumer already has a fairly complicated implementation. I had planned to improve its test coverage before making any further optimizations. Currently it is not as well-tested as the rest of sftpd.py, partly because its behaviour depends nondeterministically on the timing of the download relative to the timing of requests from the SFTP client, which is more difficult to test (although it's possible to make the test deterministic by mocking the downloader).
If we don't implement this optimization for 1.9, we would just need to add a note that the SFTP frontend does not have any MDMF-specific optimizations, so its performance for MDMF is the same as for SDMF.
Replying to [davidsarah]comment:2:
As explained in /tahoe-lafs/trac-2024-07-25/issues/5455#comment:38 and /tahoe-lafs/trac-2024-07-25/issues/5455#comment:135, the MDMF uploader always uploads whole shares, even if its client tells it the regions that have changed. So making the SFTP frontend tell it which regions have changed should not be a blocker for 1.9.
(I'm not sure I agree with the reasoning in /tahoe-lafs/trac-2024-07-25/issues/5455#comment:38, but that's a separate issue.)
Actually, the memory usage for downloads should be better than for SDMF.
Replying to [davidsarah]comment:3:
I was mistaken about that; Kevan clarified in /tahoe-lafs/trac-2024-07-25/issues/5455#comment:152 :
I still think this should not be a blocker for 1.9, though, since it's too big a change.