rename stringutils.py to encodingutil.py and/or move contents into fileutil.py #1072
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1072
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
See also #47 (use pyutil as a separate package and contribute src/allmydata/util/* into pyutil)
Attachment rename-stringutils-drop-listdir_unicode-and-open_unicode.dpatch (32851 bytes) added
Rename stringutils to encodingutil, and drop listdir_unicode and open_unicode (since the Python stdlib functions work fine with Unicode paths). Also move some utility functions to fileutil.
The patch bundle is dependent on one of the patches for #1051, because that added a reference to stringutils that needed to be renamed.
rename stringutils.py to unicodeutil.py and/or move contents into fileutil.pyto rename stringutils.py to encodingutil.py and/or move contents into fileutil.pyI would have done the renaming with
darcs replace
. That way if there is a different patch that adds a use of stringutils then when it is combined with this patch it will automatically be changed ("commuted") to use encodingutil instead.I'm pretty skeptical of the part about dropping
listdir_unicode()
. Did you confirm that the builtinos.listdir()
passes the unit tests that François wrote forlistdir_unicode()
? Ifos.listdir()
does pass those tests then I think this shows a hole in the tests. :-)os.listdir(someunicodeobj)
is specified to return plain str containing the bytes of a filename if the filename doesn't decode correctly with thesys.getfilesystemencoding()
. That's probably not what we want, and in any case it is definitely not whatlistdir_unicode()
does.My summary of the behavior of
os.listdir()
is at the end of this letter:http://tahoe-lafs.org/pipermail/tahoe-dev/2009-March/001379.html
(Note that in Python 3
os.listdir()
is changed to behave in a way that is, in my humble opinion, even worse... But nevermind Python 3 for now.)Here is my latest and greatest idea about how Tahoe-LAFS ''could'' handle ill-encoded filenames in a byte-oriented filesystem (i.e. in Unix not Mac OS X):
http://tahoe-lafs.org/pipermail/tahoe-dev/2009-May/001670.html
It is worth considering the five possible Requirements in that message. With our current unicode support as of Tahoe-LAFS v1.7.0 we have achieved Requirement 1 (unicode), Requirement 2 (faithful if unicode). We have not achieved Requirement 3 (no file left behind), Requirement 4 (faithful bytes if not unicide), or Requirement 5 (no loss of information).
Nowadays I am pretty skeptical of the value of Requirement 4.
P.S. Of course I don't really think we should try to get any more of those Requirements satisfied in v1.7.1! Even if we could do it in time, our users don't expect shiny new improvements in their point releases, just bugfixes. :-)
Oh sorry, the mailing list message that I linked to in comment:77783 as my latest and greatest idea is not actually my latest and greatest. After I wrote that message I subsequently realized that a good behavior would be that if you load an ill-encoded filename into Tahoe-LAFS then its representation looks identical to or similar to the representation of that file when you view it with Nautilus, GNU ls, or whatever other tools would have the same problem with ill-encoded filenames. I think this should be added as Requirement 6 (familiar gibberish): "If you copy an ill-encoded filename into Tahoe-LAFS, its filename looks identical to or similar to what you see when you view it with other tools (e.g. Nautilus, GNU ls, etc.)".
Replying to zooko:
Sorry, don't trust
darcs replace
. I prefer to do replaces manually.s/automatically be changed/scarily be mangled/g
:-)Those are tests of how
listdir_unicode()
is implemented in terms ofos.listdir
, rather than its functional behaviour.Ah, I hadn't realized it did that. You're right, we can't drop it in that case. I will revert those changes.
Discussion of ill-encoded filenames more generally should go in ticket #731.
Attachment rename-stringutils-drop-open_unicode.dpatch (31441 bytes) added
Rename stringutils to encodingutil, and drop open_unicode (since the Python 'open' function works fine with Unicode paths).
Replying to [davidsarah]comment:9:
I don't understand. For example this test: test_listdir_unicode(). Wouldn't it have noticed that the listdir function was failing to raise error on an undecodable entry (when the
mock_getfilesystemencoding
was set to 'ascii')? Wouldn't that have shown that your patch was breaking something?Okay I've read rename-stringutils-drop-open_unicode.dpatch and it looks good.
Replying to davidsarah:
Interestingly, if you used
darcs replace
then this wouldn't depend on that one. I'm not sure whether that would be better or worse. :-)Applied in changeset:11077ea74de4d59a. (Was that intentional?)
It wasn't intentional, but we decided to commit this for 1.7.1 anyway.
The version that was applied was the older rename-stringutils-drop-listdir_unicode-and-open_unicode.dpatch. changeset:a8161c915a30e18c updates this to the equivalent of the rename-stringutils-drop-open_unicode.dpatch that zooko had reviewed.
This caused test failures on some platforms, which were fixed in changeset:bdb10553eb4a461c.