log more info about Foolscap disconnections on storages nodes (to twistd.log) #896
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#896
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
All the storage nodes in my grid are logging the following exception a few times a day in
logs/twistd.log
.It'd be great to have a little more details about those disconnections, especially the peerid of the node with which the connection was lost.
hm, is there anything else in twistd.log that might help us figure out which piece of code is emitting that? We probably have dozens of bare log.err calls that could be producing it.
OTOH, we should probably do a comprehensive audit and add at least a msgid to all of our log.err calls.
Foolscap disconnections on storages nodesto log more info about Foolscap disconnections on storages nodes (to twistd.log)Replying to warner:
Unfortunately not.
I can take care of it, but how do you generate these
msgid
's ?In the context of twisted's log.err, it means adding a second positional
argument to all invocations of twisted.python.log.err, a string with a unique
name. For example, turning
log.err(f)
intolog.err(f, "OMG kaboom")
will turn this:into this:
The source tree has a simple tool for creating (probably-) unique message ids
(UMIDs), just run
python misc/make_umid
. Putting that in the log.err()call will then let us grep the source tree for the UMID later. It would also
be appropriate to include a short description of the code path that led to
the log.err call, but in practice it's usually more precise to use the UMID,
since the descriptions frequently end up saying stuff like "got unhandled
error from peer", without also saying "while we were trying to download the
share hash tree for an immutable file in
src/allmydata/immutable/downloader.py line 1234". But if you're feeling
wordy, go for it :)
For foolscap log calls, we usually pass this UMID as the
msgid=
argument, because it's pretty easy to extract the extra keyword arguments
from the logged message later on. But Twisted's log.err doesn't make this so
easy (kwargs are passed to observers but the twisted.log writer ignores
them), so the msgid needs to be passed as the second positional argument
(which does get written to twistd.log). Foolscap's log.err (and the tahoe
allmydata.util.log.err wrapper) accept the same second posarg.
find src -name '*.py' | xargs grep -n log.err
will tell you about allthe calls to log.err. There are three forms to be aware of:
thanks!