rename stringutils.py to encodingutil.py and/or move contents into fileutil.py #1072

Closed
opened 2010-06-08 04:17:22 +00:00 by zooko · 13 comments

See also #47 (use pyutil as a separate package and contribute src/allmydata/util/* into pyutil)

See also #47 (use pyutil as a separate package and contribute src/allmydata/util/* into pyutil)
zooko added the
code
minor
enhancement
1.6.1
labels 2010-06-08 04:17:22 +00:00
zooko added this to the undecided milestone 2010-06-08 04:17:22 +00:00
tahoe-lafs modified the milestone from undecided to 1.7.1 2010-06-21 03:15:07 +00:00
davidsarah commented 2010-07-12 00:51:52 +00:00
Owner

Attachment rename-stringutils-drop-listdir_unicode-and-open_unicode.dpatch (32851 bytes) added

Rename stringutils to encodingutil, and drop listdir_unicode and open_unicode (since the Python stdlib functions work fine with Unicode paths). Also move some utility functions to fileutil.

**Attachment** rename-stringutils-drop-listdir_unicode-and-open_unicode.dpatch (32851 bytes) added Rename stringutils to encodingutil, and drop listdir_unicode and open_unicode (since the Python stdlib functions work fine with Unicode paths). Also move some utility functions to fileutil.
davidsarah commented 2010-07-12 00:54:43 +00:00
Owner

The patch bundle is dependent on one of the patches for #1051, because that added a reference to stringutils that needed to be renamed.

The patch bundle is dependent on one of the patches for #1051, because that added a reference to stringutils that needed to be renamed.
tahoe-lafs changed title from rename stringutils.py to unicodeutil.py and/or move contents into fileutil.py to rename stringutils.py to encodingutil.py and/or move contents into fileutil.py 2010-07-12 00:55:40 +00:00
Author

I would have done the renaming with darcs replace. That way if there is a different patch that adds a use of stringutils then when it is combined with this patch it will automatically be changed ("commuted") to use encodingutil instead.

I'm pretty skeptical of the part about dropping listdir_unicode(). Did you confirm that the builtin os.listdir() passes the unit tests that François wrote for listdir_unicode()? If os.listdir() does pass those tests then I think this shows a hole in the tests. :-)

os.listdir(someunicodeobj) is specified to return plain str containing the bytes of a filename if the filename doesn't decode correctly with the sys.getfilesystemencoding(). That's probably not what we want, and in any case it is definitely not what listdir_unicode() does.

My summary of the behavior of os.listdir() is at the end of this letter:

http://tahoe-lafs.org/pipermail/tahoe-dev/2009-March/001379.html

(Note that in Python 3 os.listdir() is changed to behave in a way that is, in my humble opinion, even worse... But nevermind Python 3 for now.)

Here is my latest and greatest idea about how Tahoe-LAFS ''could'' handle ill-encoded filenames in a byte-oriented filesystem (i.e. in Unix not Mac OS X):

http://tahoe-lafs.org/pipermail/tahoe-dev/2009-May/001670.html

It is worth considering the five possible Requirements in that message. With our current unicode support as of Tahoe-LAFS v1.7.0 we have achieved Requirement 1 (unicode), Requirement 2 (faithful if unicode). We have not achieved Requirement 3 (no file left behind), Requirement 4 (faithful bytes if not unicide), or Requirement 5 (no loss of information).

Nowadays I am pretty skeptical of the value of Requirement 4.

I would have done the renaming with `darcs replace`. That way if there is a different patch that adds a use of stringutils then when it is combined with this patch it will automatically be changed ("commuted") to use encodingutil instead. I'm pretty skeptical of the part about dropping `listdir_unicode()`. Did you confirm that the builtin `os.listdir()` passes the unit tests that François wrote for `listdir_unicode()`? If `os.listdir()` *does* pass those tests then I think this shows a hole in the tests. :-) `os.listdir(someunicodeobj)` is specified to return plain str containing the bytes of a filename if the filename doesn't decode correctly with the `sys.getfilesystemencoding()`. That's probably not what we want, and in any case it is definitely not what `listdir_unicode()` does. My summary of the behavior of `os.listdir()` is at the end of this letter: http://tahoe-lafs.org/pipermail/tahoe-dev/2009-March/001379.html (Note that in Python 3 `os.listdir()` is changed to behave in a way that is, in my humble opinion, even worse... But nevermind Python 3 for now.) Here is my latest and greatest idea about how Tahoe-LAFS ''could'' handle ill-encoded filenames in a byte-oriented filesystem (i.e. in Unix not Mac OS X): http://tahoe-lafs.org/pipermail/tahoe-dev/2009-May/001670.html It is worth considering the five possible Requirements in that message. With our current unicode support as of Tahoe-LAFS v1.7.0 we have achieved Requirement 1 (unicode), Requirement 2 (faithful if unicode). We have not achieved Requirement 3 (no file left behind), Requirement 4 (faithful bytes if not unicide), or Requirement 5 (no loss of information). Nowadays I am pretty skeptical of the value of Requirement 4.
Author

P.S. Of course I don't really think we should try to get any more of those Requirements satisfied in v1.7.1! Even if we could do it in time, our users don't expect shiny new improvements in their point releases, just bugfixes. :-)

P.S. Of course I don't really think we should try to get any more of those Requirements satisfied in v1.7.1! Even if we could do it in time, our users don't expect shiny new improvements in their point releases, just bugfixes. :-)
Author

Oh sorry, the mailing list message that I linked to in comment:77783 as my latest and greatest idea is not actually my latest and greatest. After I wrote that message I subsequently realized that a good behavior would be that if you load an ill-encoded filename into Tahoe-LAFS then its representation looks identical to or similar to the representation of that file when you view it with Nautilus, GNU ls, or whatever other tools would have the same problem with ill-encoded filenames. I think this should be added as Requirement 6 (familiar gibberish): "If you copy an ill-encoded filename into Tahoe-LAFS, its filename looks identical to or similar to what you see when you view it with other tools (e.g. Nautilus, GNU ls, etc.)".

Oh sorry, the mailing list message that I linked to in [comment:77783](/tahoe-lafs/trac-2024-07-25/issues/1072#issuecomment-77783) as my latest and greatest idea is *not* actually my latest and greatest. After I wrote *that* message I subsequently realized that a good behavior would be that if you load an ill-encoded filename into Tahoe-LAFS then its representation looks identical to or similar to the representation of that file when you view it with Nautilus, GNU ls, or whatever other tools would have the same problem with ill-encoded filenames. I think this should be added as Requirement 6 (familiar gibberish): "If you copy an ill-encoded filename into Tahoe-LAFS, its filename looks identical to or similar to what you see when you view it with other tools (e.g. Nautilus, GNU ls, etc.)".
davidsarah commented 2010-07-13 04:38:25 +00:00
Owner

Replying to zooko:

I would have done the renaming with darcs replace.

Sorry, don't trust darcs replace. I prefer to do replaces manually.

That way if there is a different patch that adds a use of stringutils then when it is combined with this patch it will automatically be changed ("commuted") to use encodingutil instead.

s/automatically be changed/scarily be mangled/g :-)

I'm pretty skeptical of the part about dropping listdir_unicode(). Did you confirm that the builtin os.listdir() passes the unit tests that François wrote for listdir_unicode()?

Those are tests of how listdir_unicode() is implemented in terms of os.listdir, rather than its functional behaviour.

os.listdir(someunicodeobj) is specified to return plain str containing the bytes of a filename if the filename doesn't decode correctly with the sys.getfilesystemencoding(). That's probably not what we want, and in any case it is definitely not what listdir_unicode() does.

Ah, I hadn't realized it did that. You're right, we can't drop it in that case. I will revert those changes.

Discussion of ill-encoded filenames more generally should go in ticket #731.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1072#issuecomment-77783): > I would have done the renaming with `darcs replace`. Sorry, don't trust `darcs replace`. I prefer to do replaces manually. > That way if there is a different patch that adds a use of stringutils then when it is combined with this patch it will automatically be changed ("commuted") to use encodingutil instead. `s/automatically be changed/scarily be mangled/g` :-) > I'm pretty skeptical of the part about dropping `listdir_unicode()`. Did you confirm that the builtin `os.listdir()` passes the unit tests that François wrote for `listdir_unicode()`? Those are tests of how `listdir_unicode()` is implemented in terms of `os.listdir`, rather than its functional behaviour. > `os.listdir(someunicodeobj)` is specified to return plain str containing the bytes of a filename if the filename doesn't decode correctly with the `sys.getfilesystemencoding()`. That's probably not what we want, and in any case it is definitely not what `listdir_unicode()` does. Ah, I hadn't realized it did that. You're right, we can't drop it in that case. I will revert those changes. Discussion of ill-encoded filenames more generally should go in ticket #731.
davidsarah commented 2010-07-13 04:44:46 +00:00
Owner

Attachment rename-stringutils-drop-open_unicode.dpatch (31441 bytes) added

Rename stringutils to encodingutil, and drop open_unicode (since the Python 'open' function works fine with Unicode paths).

**Attachment** rename-stringutils-drop-open_unicode.dpatch (31441 bytes) added Rename stringutils to encodingutil, and drop open_unicode (since the Python 'open' function works fine with Unicode paths).
Author

Replying to [davidsarah]comment:9:

Did you confirm that the builtin os.listdir() passes the unit tests that François wrote for listdir_unicode()?

Those are tests of how listdir_unicode() is implemented in terms of os.listdir, rather than its functional behaviour.

I don't understand. For example this test: test_listdir_unicode(). Wouldn't it have noticed that the listdir function was failing to raise error on an undecodable entry (when the mock_getfilesystemencoding was set to 'ascii')? Wouldn't that have shown that your patch was breaking something?

Replying to [davidsarah]comment:9: > > Did you confirm that the builtin `os.listdir()` passes the unit tests that François wrote for `listdir_unicode()`? > > Those are tests of how `listdir_unicode()` is implemented in terms of `os.listdir`, rather than its functional behaviour. I don't understand. For example this test: [test_listdir_unicode()](http://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/test/test_stringutils.py?annotate=blame&rev=4490#L130). Wouldn't it have noticed that the listdir function was failing to raise error on an undecodable entry (when the `mock_getfilesystemencoding` was set to 'ascii')? Wouldn't that have shown that your patch was breaking something?
Author

Okay I've read rename-stringutils-drop-open_unicode.dpatch and it looks good.

Okay I've read [rename-stringutils-drop-open_unicode.dpatch](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-6b41-42e2-e97b-c2d766b77bba) and it looks good.
Author

Replying to davidsarah:

The patch bundle is dependent on one of the patches for #1051, because that added a reference to stringutils that needed to be renamed.

Interestingly, if you used darcs replace then this wouldn't depend on that one. I'm not sure whether that would be better or worse. :-)

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/1072#issuecomment-77781): > The patch bundle is dependent on one of the patches for #1051, because that added a reference to stringutils that needed to be renamed. Interestingly, if you used `darcs replace` then this wouldn't depend on that one. I'm not sure whether that would be better or worse. :-)
tahoe-lafs modified the milestone from 1.7.1 to 1.8β 2010-07-17 06:05:57 +00:00
davidsarah commented 2010-07-17 22:40:02 +00:00
Owner

Applied in changeset:11077ea74de4d59a. (Was that intentional?)

Applied in changeset:11077ea74de4d59a. (Was that intentional?)
tahoe-lafs added the
fixed
label 2010-07-17 22:40:02 +00:00
tahoe-lafs modified the milestone from 1.8β to 1.7.1 2010-07-17 22:40:02 +00:00
davidsarah commented 2010-07-18 02:00:04 +00:00
Owner

It wasn't intentional, but we decided to commit this for 1.7.1 anyway.

The version that was applied was the older rename-stringutils-drop-listdir_unicode-and-open_unicode.dpatch. changeset:a8161c915a30e18c updates this to the equivalent of the rename-stringutils-drop-open_unicode.dpatch that zooko had reviewed.

It wasn't intentional, but we decided to commit this for 1.7.1 anyway. The version that was applied was the older rename-stringutils-drop-listdir_unicode-and-open_unicode.dpatch. changeset:a8161c915a30e18c updates this to the equivalent of the rename-stringutils-drop-open_unicode.dpatch that zooko had reviewed.
davidsarah commented 2010-07-18 03:50:56 +00:00
Owner

This caused test failures on some platforms, which were fixed in changeset:bdb10553eb4a461c.

This caused test failures on some platforms, which were fixed in changeset:bdb10553eb4a461c.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1072
No description provided.