unicode arguments on the command-line #565
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#565
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
How do we know what encoding was used to encode the filenames or other arguments that are passed in via Python 2's
sys.argv
? If we don't know, do we assume that it is utf-8, thus making it incompatible with platforms that don't encode arguments with utf-8? Or do we leave it undecoded, thus making it impossible to correctly inspect the string for the presence of '/' chars?As a data point, here's how it is handled in Python 3.0.
Some system APIs like os.environ and sys.argv can also present problems when the bytes made available by the system is not interpretable using the default encoding. Setting the LANG variable and rerunning the program is probably the best approach.
Source: What's new in Python 3.0
We should probably implement something working in a similair way for python 2.
unicode arguments on the command-lineto unicode arguments on the command-lineWindows-only
http://bugs.python.org/issue2128 suggests that on Python 2.6.x for Windows, any non-ASCII characters will have been irretrievably mangled to question-marks in
sys.argv
. Unfortunatelywin32api.GetCommandLine
seems to callGetCommandLineA
, notGetCommandLineW
. The bzr project solved this problem by usingctypes
to callGetCommandLineW
: https://bugs.launchpad.net/bzr/+bug/375934 . (bzr is GPL'd, so we can use that code.)Note that this would require passing the correct unicode argv into
twisted.python.usage.Options.parseOptions
from source:src/allmydata/scripts/runner.py , i.e. change source:windows/tahoe.py to do(assuming that
twisted.python.usage.Options
handles Unicode correctly, which I haven't tested).Needed for #534 which has milestone 1.7.0.
Here's some code to get Unicode argv that should work on both Windows (including cygwin) and Unix. On Unix, it assumes that arguments are encoded according to the current locale encoding (or UTF-8 if that could not be determined by Python).
I really want to see this patch in trunk in the next 48 hours for Tahoe-LAFS v1.7, but I can't contribute to it myself right now.
Getting this working on Windows is more difficult than I thought. I have successfully got it to work by hacking the setuptools-generated entry script like this:
but only by invoking this script directly from the command line, not via the
tahoe.exe
wrapper. The latter mangles the arguments beyond hope of recovery.It isn't necessary for the extra code to be in the entry script; it could be in source:allmydata/scripts/runner.py . However, Zooko and I decided that changing how the CLI entry works on Windows would be too disruptive for 1.7, so we're dropping support for Unicode args on Windows until the next release.
This ticket is fixed for other platforms in 1.7.
Attachment back-out-windows-specific-unicode-argv.dpatch (47775 bytes) added
Back out Windows-specific Unicode argument support for v1.7.
The patch looks correct.
back-out-windows-specific-unicode-argv.dpatch was applied in changeset:32d9deace3d82637.
See #1074 for a patch that reenables Unicode argument support on Windows (but requires further discussion and refinement).
The #1074 patch is now finished.
In [4627/ticket798]:
Fixed; see ticket:1074#comment:29 for changesets.