gateway won't serve any page; variety of interesting error messages in twistd.log #1278
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1278
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Some servers on the public test grid are misbehaving -- I can tell because updating my blog takes about ten minutes sometimes. Having updated my blog a few times this morning, my gateway got into a broken state where it will not even load the recent uploads and downloads page...
Oh, wait, now it works again. Hm, I will attach
twistd.log
, and here are a few of the interesting messages:There is one flog incident file. I'll attach it.
Attachment twistd.log (519229 bytes) added
Attachment 27Z-cgqrfqa.flog.bz2 (44340 bytes) added
Possibly fixed by the patch on #1366. However, to close this ticket we should have a reproducible test.
Hm, the #1366 patch would probably address the second weird message. But I don't think it could have been causing the web server to be unavailable forever. So yeah, let's leave this ticket open a bit longer until we understand what was going on.
not making it into 1.9
I've seen similar behavior with respect to the second exception, but suspect my symptoms are distinct from #1366, so I filed #1664.
I have not seen the
UploadUnhappinessError
. I am using an LAE account where thereN = K = 1
.I have a consulting client (codename "WAG") who is reporting an issue with very similar symptoms. They say that they have hundreds of thousands of people inconvenienced by this failure, so it is urgent for them to figure out what happened.
Zooko, what's the evidence that this is the same issue? In my experience, two problem reports are only likely to be the same issue if they result in similar error reporting/tracebacks (in this case "
exceptions.ValueError: I/O operation on closed file
", for example).Note that #1366 already fixed the "
exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.
" issue.Replying to daira:
First of all the observed behavior of the gateway is similar — goes slower and slower and then "locks ups" entirely, refusing to display even its own Welcome Page. But, like you say, actual stack traces are a more reliable indicator of a common cause, and we have them…
WAG got these messages, in an incident report file named
incident-2014-11-18--08-59-32Z-nchatry.flog.bz2
:and:
And these messages in an incident report file named
incident-2014-10-16--14-09-34Z-twxj47a.flog.bz2
:and:
Replying to daira:
Hm… could that issue have been causing a file descriptor leak?
#1664 was similar to #1366, but is still open, because for some reason nejucomo suspected that they were actually different issues, so that the fix to #1366 wouldn't fix #1664. Could #1664 be affecting WAG and could it cause a file descriptor leak? (I doubt it.)
Replying to [zooko]comment:8:
OK, I'm convinced.