S3 backend: either handle or avoid truncated get_bucket responses #1678
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1678
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The GET Bucket AWS call may return a truncated response, by default after 1000 objects (doc). Currently we don't take that into account (actually I forgot that we didn't :-( ), which might be causing some of the 410 Gone errors.
In the meantime, here is a patch to log this case as WEIRD, so that it will trigger an incident.
Attachment 1678-incident-on-truncate.darcs.patch (73295 bytes) added
S3 backend: make truncated GET Bucket responses trigger an incident. Does not include tests. refs #1678 [depends on the patch for #1589 due to an import in s3_common.py needed by both.]This
For some reason that patchfile doesn't include the change I recorded. Maybe a side-effect of using --ask-deps. Will fix.
Attachment 1678-incident-on-truncate-v2.darcs.patch (49975 bytes) added
S3 backend: make truncated GET Bucket responses trigger an incident. Does not include tests. refs #1678 [depends on the patch for #1589 due to an import in s3_common.py needed by both.]This
1678-incident-on-truncate-v2.darcs.patch (recorded without --ask-deps) seems to include the change. Odd.
I review 1678-incident-on-truncate-v2.darcs.patch . I saw no error, but as David-Sarah mentioned, it needs a test.
Attachment 1678-incident-on-truncate-v3.darcs.patch (64604 bytes) added
S3 backend: make truncated GET Bucket responses trigger an incident. Includes tests and patches for #1589.
Okay, I reviewed the added tests in 1678-incident-on-truncate-v3.darcs.patch and saw no problem!
On secorp's LAE storage server, a 500 error (#1590) occurred after a sequence of truncated responses. With a bit of luck, the 500 might be a side effect of the truncated responses so that fixing the latter will also fix #1590.
Attachment s3-implement-prefix-queries.darcs.patch (142174 bytes) added
Implementation of prefix queries, for information only (doesn't fix the problem yet). Depends on txaws 0.2.1.post4, diff from 0.2.1 at https://leastauthority.com/static/patches/txAWS-0.2.1-to-post4.diff
I implemented prefix queries (so we no longer list all objects in the bucket and filter them, which is something that needed to be fixed anyway). That change seems to be working, and has given a measurable performance improvement of ~0.44 seconds per DHYB on secorp's server, but, it didn't stop the truncated responses as I expected. We no longer get lots of truncated responses with at or near 1000 objects, but we do still get occasional truncated responses with 0 or 1 objects. This makes no sense and is contrary to the S3 API documentation. Frustrating.
Actually the patch is basically correct. It appeared not to be because we were incorrectly reporting all queries as truncated. (I thought that
BucketListing.is_truncated
was a boolean rather than a string, and the string"false"
is truthy. Down with implicit conversions!)However, it turns out that truncated queries are not the cause of #1590 :-(
I'll post an updated patch for this ticket, fixing the incorrect detection of truncated queries, tomorrow.
Fixed mainly in [5634/ticket999-S3-backend].
Milestone renamed
renaming milestone