Replace os.path (etc.) with twisted.python.filepath #1437
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1437
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Here is Glyph's advertisement for why you should use filepath:
http://glyph.twistedmatrix.com/2008/02/highlighting-buried-treasure-in-twisted.html
When I looked at the [code for FilePath](http://twistedmatrix.com/trac/browser/tags/releases/twisted-10.1.0/twisted/python/filepath.py) recently, it wasn't clear to me whether it properly supported Unicode paths. (It might to the extent that the
os.path
operations it's using do, but it didn't seem to have been explicitly considered, i.e. if the underlyingpath
can be a Unicode string then that would only work by coincidence.) Unicode path support is critical for Windows; if this is not supported then it would be a regression to switch fromunicode
strings toFilePath
.To clarify my previous comment, Windows filesystems -- and the Python APIs -- are capable of representing characters in paths that are not representable as a string in
sys.getfilesystemencoding()
(which is always"mbcs"
on Windows, i.e. the system's "ANSI" encoding). Currently we do handle such paths correctly in most cases (except that a Tahoe-LAFS source tree won't build in a Unicode directory).I really want to use
filepath
. It results in more readable and less error-prone code than using the smorgasbord ofopen()
,os.whatever
,shutil.whatever
, and so on. I think we should do it. However, there are several issues that the maintainers offilepath
are not going to fix anytime soon, or even one critical issue that they refuse to fix in the way that I want it done, so I think we'll need to do it by forking or embedding filepath into our codebase.The critical issue about which they and I disagree is: what do you do if you go to read a filename from the operating system, and you get a sequence of bytes which cannot be decoded in the nominal encoding? Their answer is: store the bytes, because if you write them back out to the same filesystem later, later users will presumably treat them as having the same encoding, whatever it was, even though our code doesn't know what it was. My answer is: raise an error, because we can't safely transport these bytes out to other systems, or process these bytes, and anyway this is probably a rare condition or even an outright error in the user's system. (That is what we long ago decided to do in Tahoe-LAFS, and we've never had a complaint about it. There are quite extensive discussion threads in old closed tickets and in the tahoe-dev email archive about this design decision.)
That discussion is https://twistedmatrix.com/trac/ticket/5203 . Other tickets are:
I tried to post this at https://twistedmatrix.com/trac/ticket/5203 but couldn't get past the spam filter:
It doesn't particularly surprise me that this issue has gone for three years without comments, because the Description is quite rambling and doesn't focus on what Zooko wants done. (In fact I'm not clear on what Zooko is asking for either.)
What I would like in order to be able to use
FilePath
in Tahoe-LAFS is either:FilePath(unicode_path)
to be specified to work, and to return Unicode strings from methods/attributes that currently return path components;UnicodeFilePath
class that works that way.Currently I think the status is that
FilePath(unicode_path)
mostly works by coincidence of the stdlib file APIs accepting Unicode paths, but that doesn't fill me with confidence.Note that representing paths as byte strings can't possibly work correctly in general on Windows. It is not sufficient to only be able to represent paths that have characters in the "ANSI" encoding.
The behaviour around undecodable paths is a secondary issue that we could work around; I wouldn't care if we needed wrapper functions to get the behaviour we wanted there.
Oh, it seems like what I want is https://twistedmatrix.com/trac/ticket/2366. (If anyone pastes my comment above to https://twistedmatrix.com/trac/ticket/5203, please mention that as well.)
(https://twistedmatrix.com/trac/ticket/7805) seems to address my concerns, at least at the API level. It will be a while before we depend on a version of Twisted that has this, though.