Replace os.path (etc.) with twisted.python.filepath #1437

Open
opened 2011-07-19 04:24:03 +00:00 by arch_o_median · 7 comments
arch_o_median commented 2011-07-19 04:24:03 +00:00
Owner

The filepath file system interface offers a single consistent object oriented interface for the various file system operations we need. Initially we just wanted to know if a file was the descendant of some directory, but this seemingly trivial question is not easily answered with the traditional python file system libraries (e.g. shutil, os.path, the open builtin, ... etc). Moreover these functions are not to be found in a single place, and lack a consistent interface.
Additionally we think filepath is unique in offering atomic overwrite, (through setContent). So this ticket is to replace all the older ways of manipulating the filesystem with shiny new FilePath.

> The filepath file system interface offers a single consistent object oriented interface for the various file system operations we need. Initially we just wanted to know if a file was the descendant of some directory, but this seemingly trivial question is not easily answered with the traditional python file system libraries (e.g. shutil, os.path, the open builtin, ... etc). Moreover these functions are not to be found in a single place, and lack a consistent interface. > Additionally we think filepath is unique in offering atomic overwrite, (through setContent). So this ticket is to replace all the older ways of manipulating the filesystem with shiny new FilePath.
tahoe-lafs added the
unknown
minor
defect
1.8.2
labels 2011-07-19 04:24:03 +00:00
tahoe-lafs added this to the undecided milestone 2011-07-19 04:24:03 +00:00

Here is Glyph's advertisement for why you should use filepath:

http://glyph.twistedmatrix.com/2008/02/highlighting-buried-treasure-in-twisted.html

Here is Glyph's advertisement for why you should use filepath: <http://glyph.twistedmatrix.com/2008/02/highlighting-buried-treasure-in-twisted.html>
davidsarah commented 2011-07-19 16:18:29 +00:00
Author
Owner

When I looked at the [code for FilePath](http://twistedmatrix.com/trac/browser/tags/releases/twisted-10.1.0/twisted/python/filepath.py) recently, it wasn't clear to me whether it properly supported Unicode paths. (It might to the extent that the os.path operations it's using do, but it didn't seem to have been explicitly considered, i.e. if the underlying path can be a Unicode string then that would only work by coincidence.) Unicode path support is critical for Windows; if this is not supported then it would be a regression to switch from unicode strings to FilePath.

When I looked at the [code for [FilePath](wiki/FilePath)](http://twistedmatrix.com/trac/browser/tags/releases/twisted-10.1.0/twisted/python/filepath.py) recently, it wasn't clear to me whether it properly supported Unicode paths. (It might to the extent that the `os.path` operations it's using do, but it didn't seem to have been explicitly considered, i.e. if the underlying `path` can be a Unicode string then that would only work by coincidence.) Unicode path support is critical for Windows; if this is not supported then it would be a regression to switch from `unicode` strings to `FilePath`.
tahoe-lafs added
code
and removed
unknown
labels 2011-07-19 16:18:29 +00:00
davidsarah commented 2011-07-19 16:25:33 +00:00
Author
Owner

To clarify my previous comment, Windows filesystems -- and the Python APIs -- are capable of representing characters in paths that are not representable as a string in sys.getfilesystemencoding() (which is always "mbcs" on Windows, i.e. the system's "ANSI" encoding). Currently we do handle such paths correctly in most cases (except that a Tahoe-LAFS source tree won't build in a Unicode directory).

To clarify my previous comment, Windows filesystems -- and the Python APIs -- are capable of representing characters in paths that are not representable as a string in `sys.getfilesystemencoding()` (which is always `"mbcs"` on Windows, i.e. the system's "ANSI" encoding). Currently we do handle such paths correctly in most cases (except that a Tahoe-LAFS source tree won't build in a Unicode directory).
zancas removed their assignment 2011-08-11 04:42:35 +00:00
zancas self-assigned this 2011-08-11 04:42:35 +00:00

I really want to use filepath. It results in more readable and less error-prone code than using the smorgasbord of open(), os.whatever, shutil.whatever, and so on. I think we should do it. However, there are several issues that the maintainers of filepath are not going to fix anytime soon, or even one critical issue that they refuse to fix in the way that I want it done, so I think we'll need to do it by forking or embedding filepath into our codebase.

The critical issue about which they and I disagree is: what do you do if you go to read a filename from the operating system, and you get a sequence of bytes which cannot be decoded in the nominal encoding? Their answer is: store the bytes, because if you write them back out to the same filesystem later, later users will presumably treat them as having the same encoding, whatever it was, even though our code doesn't know what it was. My answer is: raise an error, because we can't safely transport these bytes out to other systems, or process these bytes, and anyway this is probably a rare condition or even an outright error in the user's system. (That is what we long ago decided to do in Tahoe-LAFS, and we've never had a complaint about it. There are quite extensive discussion threads in old closed tickets and in the tahoe-dev email archive about this design decision.)

That discussion is https://twistedmatrix.com/trac/ticket/5203 . Other tickets are:

I really want to use `filepath`. It results in more readable and less error-prone code than using the smorgasbord of `open()`, `os.whatever`, `shutil.whatever`, and so on. I think we should do it. However, there are several issues that the maintainers of `filepath` are not going to fix anytime soon, or even one critical issue that they refuse to fix in the way that I want it done, so I think we'll need to do it by forking or embedding filepath into our codebase. The critical issue about which they and I disagree is: what do you do if you go to read a filename from the operating system, and you get a sequence of bytes which cannot be decoded in the nominal encoding? Their answer is: store the bytes, because if you write them back out to the *same* filesystem later, later users will presumably treat them as having the same encoding, whatever it was, even though our code doesn't know what it was. My answer is: raise an error, because we can't safely transport these bytes out to *other* systems, or process these bytes, and anyway this is probably a rare condition or even an outright error in the user's system. (That is what we long ago decided to do in Tahoe-LAFS, and we've never had a complaint about it. There are quite extensive discussion threads in old closed tickets and in the tahoe-dev email archive about this design decision.) That discussion is <https://twistedmatrix.com/trac/ticket/5203> . Other tickets are: * <https://twistedmatrix.com/trac/ticket/2366> * <https://twistedmatrix.com/trac/ticket/5279> * <https://twistedmatrix.com/trac/ticket/5280>
daira commented 2014-12-20 04:17:12 +00:00
Author
Owner

I tried to post this at https://twistedmatrix.com/trac/ticket/5203 but couldn't get past the spam filter:


It doesn't particularly surprise me that this issue has gone for three years without comments, because the Description is quite rambling and doesn't focus on what Zooko wants done. (In fact I'm not clear on what Zooko is asking for either.)

What I would like in order to be able to use FilePath in Tahoe-LAFS is either:

  • for FilePath(unicode_path) to be specified to work, and to return Unicode strings from methods/attributes that currently return path components;
  • or alternatively, a UnicodeFilePath class that works that way.

Currently I think the status is that FilePath(unicode_path) mostly works by coincidence of the stdlib file APIs accepting Unicode paths, but that doesn't fill me with confidence.

Note that representing paths as byte strings can't possibly work correctly in general on Windows. It is not sufficient to only be able to represent paths that have characters in the "ANSI" encoding.

The behaviour around undecodable paths is a secondary issue that we could work around; I wouldn't care if we needed wrapper functions to get the behaviour we wanted there.

I tried to post this at <https://twistedmatrix.com/trac/ticket/5203> but couldn't get past the spam filter: ---- It doesn't particularly surprise me that this issue has gone for three years without comments, because the Description is quite rambling and doesn't focus on what Zooko wants done. (In fact I'm not clear on what Zooko is asking for either.) What I would like in order to be able to use `FilePath` in Tahoe-LAFS is either: * for `FilePath(unicode_path)` to be specified to work, and to return Unicode strings from methods/attributes that currently return path components; * or alternatively, a `UnicodeFilePath` class that works that way. Currently I think the status is that `FilePath(unicode_path)` mostly works by coincidence of the stdlib file APIs accepting Unicode paths, but that doesn't fill me with confidence. Note that representing paths as byte strings can't possibly work correctly in general on Windows. It is not sufficient to only be able to represent paths that have characters in the "ANSI" encoding. The behaviour around undecodable paths is a secondary issue that we could work around; I wouldn't care if we needed wrapper functions to get the behaviour we wanted there.
daira commented 2014-12-20 23:05:53 +00:00
Author
Owner

Oh, it seems like what I want is https://twistedmatrix.com/trac/ticket/2366. (If anyone pastes my comment above to https://twistedmatrix.com/trac/ticket/5203, please mention that as well.)

Oh, it seems like what I want is <https://twistedmatrix.com/trac/ticket/2366>. (If anyone pastes my comment above to <https://twistedmatrix.com/trac/ticket/5203>, please mention that as well.)
daira commented 2015-04-05 15:36:40 +00:00
Author
Owner

(https://twistedmatrix.com/trac/ticket/7805) seems to address my concerns, at least at the API level. It will be a while before we depend on a version of Twisted that has this, though.

(https://twistedmatrix.com/trac/ticket/7805) seems to address my concerns, at least at the API level. It will be a while before we depend on a version of Twisted that has this, though.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1437
No description provided.