Tahoe reports catch-up incidents to a log gatherer with a Unicode filename, which results in them being dropped #1725
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1725
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
At source:src/allmydata/node.py@5469#L349, we have:
(ignore the comment; it's not relevant to this ticket).
Since
self.basedir
is Unicode, so isincident_dir
. foolscap mostly tolerates this, but sometimes ends up sending a Unicode filename to the log gatherer, which causes a type Violation, e.g.:The code in foolscap that creates the Unicode filenames is LogPublisher.list_incident_names in foolscap/logging/publish.py. Due to Python 2.x's implicit unicode<->str conversions (booo!) and "do what I thought you wanted" behaviour of the filesystem APIs, there is no Python type error.
The effect is that if a log-gatherer was down when incidents occurred and subsequently tries to catch up, those incidents will be dropped.
This is a regression that was introduced with the Unicode basedir changes released in 1.8 (specifically changeset:618db4867c68a6f9).
Attachment fix-and-test-1725.darcs.patch (96657 bytes) added
Make sure that foolscap.logging.log.setLogDir is called with a str (not unicode) path. Includes test. fixes #1725
One possible change would be to extend RILogObserver.new_incident's type-checking to allow unicode in addition to str. The old way of thinking is that things which are only ever going to be ASCII should be str, and things which might have non-ASCII chars should be unicode. The new way of thinking (exemplified by Python 3) is that things which contain non-human-meaningful binary data should be str (soon to be known as
bytestring
) and things which contain human-meaningful characters should be unicode. (Even if those human-meaningful characters will never be any but the characters found in ASCII.)So, if you feel like playing along with the Python way of doing things it makes sense to define the
name
variable (which looks like 'incident-TIMESTAMP-UNIQUE') as unicode.Well, I reviewed the patch -- fix-and-test-1725.darcs.patch -- and I agree that it will cause
setLogDir
to be called with astr
argument. I don't know what all the effects are of making that argument bestr
on all platforms. Presumably it works fine, because that's the old way of doing things and foolscap and Twisted know how to handle it. So, +0. I see no bug.Replying to zooko:
Well, maybe, but it's a Tahoe bug that it failed to adhere to the implicit contract of
setLogDir
as taking astr
. If we wanted to pass a Unicode path, we'd need to update foolscap to accept that (also for the "logport-furlfile" tub option), then change Tahoe to depend on that version of foolscap. And then foolscap would probably still end up converting it to astr
to preserve wire protocol compatibility with log gatherers running an earlier version. Too much hassle IMHO.Thanks for the review.
In changeset:a5553369105d6c9f:
In changeset:5646/ticket999-S3-backend: