cloud backend: redundant reads of chunks from cloud when downloading large files #1885

Closed
opened 2012-12-05 03:32:23 +00:00 by davidsarah · 8 comments
davidsarah commented 2012-12-05 03:32:23 +00:00
Owner

I uploaded a 7.7 MiB video as an MDMF file using the cloud backend on S3 (as of 1819-cloud-merge/022796fb), and then downloaded it. From flogtool tailing the storage server, I saw that it was reading the same chunks multiple times during the download. That suggests that the chunk cache is not operating well enough.

The file was being downloaded by playing it as a video in Chromium; I don't think that makes a difference.

Update: this also applies to immutable files if they are large enough.

I uploaded a 7.7 MiB video as an MDMF file using the cloud backend on S3 (as of [1819-cloud-merge/022796fb](https://github.com/davidsarah/tahoe-lafs/commit/022796fb7813c2f42d668a0ee3de9abae869deb5)), and then downloaded it. From `flogtool tail`ing the storage server, I saw that it was reading the same chunks multiple times during the download. That suggests that the chunk cache is not operating well enough. The file was being downloaded by playing it as a video in Chromium; I don't think that makes a difference. Update: this also applies to immutable files if they are large enough.
tahoe-lafs added the
code-storage
normal
defect
1.9.2
labels 2012-12-05 03:32:23 +00:00
tahoe-lafs added this to the 1.11.0 milestone 2012-12-05 03:32:23 +00:00
davidsarah commented 2012-12-05 03:37:28 +00:00
Author
Owner

During the upload and download, the server memory usage didn't go above 50 MiB according to the statmover graph.

During the upload and download, the server memory usage didn't go above 50 MiB according to the statmover graph.
davidsarah commented 2012-12-05 03:43:09 +00:00
Author
Owner

Same behaviour for a straight download, rather than playing a video. Each chunk seems to get read 5 times, and the first chunk (containing the header) many more times.

Same behaviour for a straight download, rather than playing a video. Each chunk seems to get read 5 times, and the first chunk (containing the header) many more times.
tahoe-lafs changed title from cloud backend: redundant reads of chunks from S3 when downloading large MDMF file to cloud backend: redundant reads of chunks from cloud when downloading large MDMF file 2013-05-24 22:12:10 +00:00
daira commented 2013-05-28 16:01:45 +00:00
Author
Owner

I changed ChunkCache to use a true LRU replacement policy, and that seems to have fixed this problem. (LRU is not often used because keeping track of ages can be inefficient for a large cache, but here we only need a cache of a few elements. In practice 5 chunks seems to be sufficient for the sizes of files I've tested; will investigate whether it's enough for larger files later.)

I changed `ChunkCache` to use a true LRU replacement policy, and that seems to have fixed this problem. (LRU is not often used because keeping track of ages can be inefficient for a large cache, but here we only need a cache of a few elements. In practice 5 chunks seems to be sufficient for the sizes of files I've tested; will investigate whether it's enough for larger files later.)
tahoe-lafs changed title from cloud backend: redundant reads of chunks from cloud when downloading large MDMF file to cloud backend: redundant reads of chunks from cloud when downloading large files 2013-05-28 16:01:45 +00:00
daira commented 2013-05-28 16:17:19 +00:00
Author
Owner

Hmm, that's an improvement, but the immutable downloader is not able to max out my downstream bandwidth -- each HTTP request is finishing before the next can be started, so we're not getting any pipelining. (I am getting ~ 1 MiB/s and should be getting ~ 1.8 MiB/s.)

Hmm, that's an improvement, but the immutable downloader is not able to max out my downstream bandwidth -- each HTTP request is finishing before the next can be started, so we're not getting any pipelining. (I am getting ~ 1 MiB/s and should be getting ~ 1.8 MiB/s.)
tahoe-lafs modified the milestone from 1.11.0 to 1.12.0 2013-07-22 20:48:41 +00:00

Milestone renamed

Milestone renamed
warner modified the milestone from 1.12.0 to 1.13.0 2016-03-22 05:02:25 +00:00

renaming milestone

renaming milestone
warner modified the milestone from 1.13.0 to 1.14.0 2016-06-28 18:17:14 +00:00

Moving open issues out of closed milestones.

Moving open issues out of closed milestones.
exarkun modified the milestone from 1.14.0 to 1.15.0 2020-06-30 14:45:13 +00:00

The established line of development on the "cloud backend" branch has been abandoned. This ticket is being closed as part of a batch-ticket cleanup for "cloud backend"-related tickets.

If this is a bug, it is probably genuinely no longer relevant. The "cloud backend" branch is too large and unwieldy to ever be merged into the main line of development (particularly now that the Python 3 porting effort is significantly underway).

If this is a feature, it may be relevant to some future efforts - if they are sufficiently similar to the "cloud backend" effort - but I am still closing it because there are no immediate plans for a new development effort in such a direction.

Tickets related to the "leasedb" are included in this set because the "leasedb" code is in the "cloud backend" branch and fairly well intertwined with the "cloud backend". If there is interest in lease implementation change at some future time then that effort will essentially have to be restarted as well.

The established line of development on the "cloud backend" branch has been abandoned. This ticket is being closed as part of a batch-ticket cleanup for "cloud backend"-related tickets. If this is a bug, it is probably genuinely no longer relevant. The "cloud backend" branch is too large and unwieldy to ever be merged into the main line of development (particularly now that the Python 3 porting effort is significantly underway). If this is a feature, it may be relevant to some future efforts - if they are sufficiently similar to the "cloud backend" effort - but I am still closing it because there are no immediate plans for a new development effort in such a direction. Tickets related to the "leasedb" are included in this set because the "leasedb" code is in the "cloud backend" branch and fairly well intertwined with the "cloud backend". If there is interest in lease implementation change at some future time then that effort will essentially have to be restarted as well.
exarkun added the
wontfix
label 2020-10-30 12:35:44 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1885
No description provided.