cloud backend uses lots of expensive LIST requests #2346

Closed
opened 2014-12-03 04:07:00 +00:00 by cloud_trouble · 4 comments
cloud_trouble commented 2014-12-03 04:07:00 +00:00
Owner

The cloud backend uses lots of expensive LIST requests with an Amazon S3 bucket from heavy use of GET Bucket. The GET Bucket request is billed as a LIST request and is 10 times more expensive than a GET Object request.

These LIST requests can be a large portion of the cost of using an S3 backend storage node. For example, my logs show 1.5 times as many GET Bucket requests as GET Object requests (with two storage nodes, one S3 bucket and one desktop computer) and the cost exceeds storage, transfer, and EC2 costs.

Here is some relevant code:
https://github.com/LeastAuthority/tahoe-lafs/blob/cloud-rebased/src/allmydata/storage/backends/cloud/cloud_common.py#L426

And relevant chat on IRC:

the list of shares is stored in a local database called the leasedb. that was added recently on the cloud branch, so I suspect we're not making optimal use of it yet
ISTR that zooko was arguing for treating the leasedb as authoritative as to whether a share exists, and I was arguing against for a reason that I can't remember right now. there's a ticket about it
Yes, the arguments about the trade-offs of treating leasedb as authoritative vs. advisory are encoded into tickets.
I seem to recall that treating leasedb as authoritative gets nice performance, including for this particular aspect, while trading off some other values.

The cloud backend uses lots of expensive LIST requests with an Amazon S3 bucket from heavy use of GET Bucket. The GET Bucket request is billed as a LIST request and is 10 times more expensive than a GET Object request. These LIST requests can be a large portion of the cost of using an S3 backend storage node. For example, my logs show 1.5 times as many GET Bucket requests as GET Object requests (with two storage nodes, one S3 bucket and one desktop computer) and the cost exceeds storage, transfer, and EC2 costs. Here is some relevant code: <https://github.com/LeastAuthority/tahoe-lafs/blob/cloud-rebased/src/allmydata/storage/backends/cloud/cloud_common.py#L426> And relevant chat on IRC: <daira1> the list of shares is stored in a local database called the leasedb. that was added recently on the cloud branch, so I suspect we're not making optimal use of it yet <daira1> ISTR that zooko was arguing for treating the leasedb as authoritative as to whether a share exists, and I was arguing against for a reason that I can't remember right now. there's a ticket about it <zooko> Yes, the arguments about the trade-offs of treating leasedb as authoritative vs. advisory are encoded into tickets. <zooko> I seem to recall that treating leasedb as authoritative gets nice performance, including for this particular aspect, while trading off some other values.
tahoe-lafs added the
code-storage
normal
defect
cloud-branch
labels 2014-12-03 04:07:00 +00:00
tahoe-lafs added this to the undecided milestone 2014-12-03 04:07:00 +00:00
tahoe-lafs modified the milestone from undecided to 1.12.0 2014-12-03 23:00:38 +00:00
tahoe-lafs changed title from cloud backend uses losts of expensive LIST requests to cloud backend uses lots of expensive LIST requests 2014-12-03 23:00:38 +00:00

Milestone renamed

Milestone renamed
warner modified the milestone from 1.12.0 to 1.13.0 2016-03-22 05:02:25 +00:00

renaming milestone

renaming milestone
warner modified the milestone from 1.13.0 to 1.14.0 2016-06-28 18:17:14 +00:00

Moving open issues out of closed milestones.

Moving open issues out of closed milestones.
exarkun modified the milestone from 1.14.0 to 1.15.0 2020-06-30 14:45:13 +00:00

The established line of development on the "cloud backend" branch has been abandoned. This ticket is being closed as part of a batch-ticket cleanup for "cloud backend"-related tickets.

If this is a bug, it is probably genuinely no longer relevant. The "cloud backend" branch is too large and unwieldy to ever be merged into the main line of development (particularly now that the Python 3 porting effort is significantly underway).

If this is a feature, it may be relevant to some future efforts - if they are sufficiently similar to the "cloud backend" effort - but I am still closing it because there are no immediate plans for a new development effort in such a direction.

Tickets related to the "leasedb" are included in this set because the "leasedb" code is in the "cloud backend" branch and fairly well intertwined with the "cloud backend". If there is interest in lease implementation change at some future time then that effort will essentially have to be restarted as well.

The established line of development on the "cloud backend" branch has been abandoned. This ticket is being closed as part of a batch-ticket cleanup for "cloud backend"-related tickets. If this is a bug, it is probably genuinely no longer relevant. The "cloud backend" branch is too large and unwieldy to ever be merged into the main line of development (particularly now that the Python 3 porting effort is significantly underway). If this is a feature, it may be relevant to some future efforts - if they are sufficiently similar to the "cloud backend" effort - but I am still closing it because there are no immediate plans for a new development effort in such a direction. Tickets related to the "leasedb" are included in this set because the "leasedb" code is in the "cloud backend" branch and fairly well intertwined with the "cloud backend". If there is interest in lease implementation change at some future time then that effort will essentially have to be restarted as well.
exarkun added the
wontfix
label 2020-10-30 12:35:44 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#2346
No description provided.