upload failed -- "I/O operation on closed file" #1794
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1794
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
A call to
tahoe backup
on the command-line ended with this:The
twistd.log
file had:The versions are:
I did a darcs pull and got only a few minor patches, so the version that exhibited this error is almost identical to 1.9.2 release:
My consulting client (codename "WAG", see comment:81505) has this symptom as well. This is from their storage node's twistd.log:
I think that this is caused by file descriptor resource exhaustion, because I've seen this in that twistd.log:
A customer (now former customer) of LeastAuthority's S4 service, named AB, reported a problem, and his
twistd.log
contains this:(plus much more of the same)
Erstwhile customer AB reports this result from lsof:
with this twistd.log:
AB pressed the
Report An Incident
button, and mailed me the resulting incident report file. I'll attach it to this ticket, but here is the first evidence of weirdness in it:Okay here's a hypothesis. I need someone to verify this for me — warner, daira, dreid, ??
The hypothesis is that Nevow is
close()
'ing the temp file in stopProducing. The hypothesis is that Nevow is done producing, but Tahoe-LAFS isn't yet done consuming!Replying to zooko:
That seems plausible to me. To test it, we should instrument Nevow to log when it is closing the temp file; if this log entry comes just before the "I/O operation on closed file", then we've confirmed the hypothesis.
It also seems plausible that we can work around this by
os.dup
'ing the file handle when we get it (inrender_{PUT,POST}
), and closing our duplicate when we are finished. (I am assuming that Nevow cannot already have closed the file when we first get it; that would be a Nevow bug that we can't fix.)