S3 backend: either handle or avoid truncated get_bucket responses #1678

Closed
opened 2012-02-16 18:04:45 +00:00 by davidsarah · 14 comments
davidsarah commented 2012-02-16 18:04:45 +00:00
Owner

The GET Bucket AWS call may return a truncated response, by default after 1000 objects (doc). Currently we don't take that into account (actually I forgot that we didn't :-( ), which might be causing some of the 410 Gone errors.

In the meantime, here is a patch to log this case as WEIRD, so that it will trigger an incident.

The GET Bucket AWS call may return a truncated response, by default after 1000 objects [(doc)](http://docs.amazonwebservices.com/AmazonS3/latest/API/RESTBucketGET.html). Currently we don't take that into account (actually I forgot that we didn't :-( ), which might be causing some of the 410 Gone errors. In the meantime, here is a patch to log this case as WEIRD, so that it will trigger an incident.
tahoe-lafs added the
code-storage
major
defect
1.9.0-s3branch
labels 2012-02-16 18:04:45 +00:00
tahoe-lafs added this to the soon milestone 2012-02-16 18:04:45 +00:00
davidsarah commented 2012-02-16 18:15:24 +00:00
Author
Owner

Attachment 1678-incident-on-truncate.darcs.patch (73295 bytes) added

S3 backend: make truncated GET Bucket responses trigger an incident. Does not include tests. refs #1678 [depends on the patch for #1589 due to an import in s3_common.py needed by both.]This

**Attachment** 1678-incident-on-truncate.darcs.patch (73295 bytes) added S3 backend: make truncated GET Bucket responses trigger an incident. Does not include tests. refs #1678 [depends on the patch for #1589 due to an import in s3_common.py needed by both.]This
davidsarah commented 2012-02-16 18:17:52 +00:00
Author
Owner

For some reason that patchfile doesn't include the change I recorded. Maybe a side-effect of using --ask-deps. Will fix.

For some reason that patchfile doesn't include the change I recorded. Maybe a side-effect of using --ask-deps. Will fix.
davidsarah commented 2012-02-16 18:49:39 +00:00
Author
Owner

Attachment 1678-incident-on-truncate-v2.darcs.patch (49975 bytes) added

S3 backend: make truncated GET Bucket responses trigger an incident. Does not include tests. refs #1678 [depends on the patch for #1589 due to an import in s3_common.py needed by both.]This

**Attachment** 1678-incident-on-truncate-v2.darcs.patch (49975 bytes) added S3 backend: make truncated GET Bucket responses trigger an incident. Does not include tests. refs #1678 [depends on the patch for #1589 due to an import in s3_common.py needed by both.]This
davidsarah commented 2012-02-16 18:51:34 +00:00
Author
Owner

1678-incident-on-truncate-v2.darcs.patch (recorded without --ask-deps) seems to include the change. Odd.

[1678-incident-on-truncate-v2.darcs.patch](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-504d-b508-82d2-c554f3ec0f23) (recorded without --ask-deps) seems to include the change. Odd.

I review 1678-incident-on-truncate-v2.darcs.patch . I saw no error, but as David-Sarah mentioned, it needs a test.

I review [1678-incident-on-truncate-v2.darcs.patch](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-504d-b508-82d2-c554f3ec0f23) . I saw no error, but as David-Sarah mentioned, it needs a test.
davidsarah commented 2012-02-20 18:03:49 +00:00
Author
Owner

Attachment 1678-incident-on-truncate-v3.darcs.patch (64604 bytes) added

S3 backend: make truncated GET Bucket responses trigger an incident. Includes tests and patches for #1589.

**Attachment** 1678-incident-on-truncate-v3.darcs.patch (64604 bytes) added S3 backend: make truncated GET Bucket responses trigger an incident. Includes tests and patches for #1589.

Okay, I reviewed the added tests in 1678-incident-on-truncate-v3.darcs.patch and saw no problem!

Okay, I reviewed the added tests in [1678-incident-on-truncate-v3.darcs.patch](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-504d-b508-82d2-c77eec8c06bd) and saw no problem!
davidsarah commented 2012-03-05 20:24:22 +00:00
Author
Owner

On secorp's LAE storage server, a 500 error (#1590) occurred after a sequence of truncated responses. With a bit of luck, the 500 might be a side effect of the truncated responses so that fixing the latter will also fix #1590.

On secorp's LAE storage server, a 500 error (#1590) occurred after a sequence of truncated responses. With a bit of luck, the 500 might be a side effect of the truncated responses so that fixing the latter will also fix #1590.
tahoe-lafs added
critical
and removed
major
labels 2012-03-05 20:24:22 +00:00
davidsarah commented 2012-03-09 00:15:56 +00:00
Author
Owner

Attachment s3-implement-prefix-queries.darcs.patch (142174 bytes) added

Implementation of prefix queries, for information only (doesn't fix the problem yet). Depends on txaws 0.2.1.post4, diff from 0.2.1 at https://leastauthority.com/static/patches/txAWS-0.2.1-to-post4.diff

**Attachment** s3-implement-prefix-queries.darcs.patch (142174 bytes) added Implementation of prefix queries, for information only (doesn't fix the problem yet). Depends on txaws 0.2.1.post4, diff from 0.2.1 at <https://leastauthority.com/static/patches/txAWS-0.2.1-to-post4.diff>
davidsarah commented 2012-03-09 00:27:08 +00:00
Author
Owner

I implemented prefix queries (so we no longer list all objects in the bucket and filter them, which is something that needed to be fixed anyway). That change seems to be working, and has given a measurable performance improvement of ~0.44 seconds per DHYB on secorp's server, but, it didn't stop the truncated responses as I expected. We no longer get lots of truncated responses with at or near 1000 objects, but we do still get occasional truncated responses with 0 or 1 objects. This makes no sense and is contrary to the S3 API documentation. Frustrating.

I implemented prefix queries (so we no longer list *all* objects in the bucket and filter them, which is something that needed to be fixed anyway). That change seems to be working, and has given a measurable performance improvement of ~0.44 seconds per DHYB on secorp's server, but, it didn't stop the truncated responses as I expected. We no longer get lots of truncated responses with at or near 1000 objects, but we do still get occasional truncated responses with 0 or 1 objects. This makes no sense and is contrary to the S3 API documentation. Frustrating.
davidsarah commented 2012-03-09 05:37:54 +00:00
Author
Owner

Actually the patch is basically correct. It appeared not to be because we were incorrectly reporting all queries as truncated. (I thought that BucketListing.is_truncated was a boolean rather than a string, and the string "false" is truthy. Down with implicit conversions!)

However, it turns out that truncated queries are not the cause of #1590 :-(

I'll post an updated patch for this ticket, fixing the incorrect detection of truncated queries, tomorrow.

Actually the patch is basically correct. It appeared not to be because we were incorrectly reporting all queries as truncated. (I thought that `BucketListing.is_truncated` was a boolean rather than a string, and the string `"false"` is truthy. Down with implicit conversions!) However, it turns out that truncated queries are *not* the cause of #1590 :-( I'll post an updated patch for this ticket, fixing the incorrect detection of truncated queries, tomorrow.
davidsarah commented 2012-05-17 00:24:03 +00:00
Author
Owner

Fixed mainly in [5634/ticket999-S3-backend].

Fixed mainly in [5634/ticket999-S3-backend].
tahoe-lafs added the
fixed
label 2012-05-17 00:24:03 +00:00
tahoe-lafs modified the milestone from soon to 1.12.0 2014-11-27 04:08:46 +00:00

Milestone renamed

Milestone renamed
warner modified the milestone from 1.12.0 to 1.13.0 2016-03-22 05:02:25 +00:00

renaming milestone

renaming milestone
warner modified the milestone from 1.13.0 to 1.14.0 2016-06-28 18:17:14 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1678
No description provided.