reduce unnecessary escaping in quote_output #1135
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1135
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The [quote_output]source:src/allmydata/util/encodingutil.py@4543#L140 function, which is used to output possibly-Unicode strings to
stdout
andstderr
, does more backslash-escaping than is necessary for the output to be interpreted unambiguously. In particular, it will double any backslashes in the string, which makes Windows paths look ugly.We can reduce this by making it follow POSIX-like escaping rules: within single quotes, backslashes are not special, and the only printable character that cannot appear is a single quote. If we do have a single quote or control characters in the string, then we use double quotes with backslash-escaping.
Please hold off reviewing this; the
_double_quote
function is broken for some inputs.Attachment encodingutil-reduce-quote-escaping.dpatch (10514 bytes) added
util.encodingutil: change quote_output to do less unnecessary escaping, and to use double-quotes more consistently when needed. This version avoids u-escaping for characters that are representable in the output encoding, when double quotes are used; and includes tests. fixes #1135
This seems like a utility which is not at all Tahoe-LAFS-specific and could be useful to lots of other Python programmers.
Do you have any ideas about how to make it available to them?
I've been trying to do that sort of thing by publishing the utility in my pyutil library. (See also #47 use pyutil as a separate package and contribute src/allmydata/util/ into pyutil*) I'm not sure if this has helped anyone! Nobody uses pyutil as far as I am aware. (Hm, although http://pypi.python.org/pypi/pyutil says that it has been download more than a thousand times. I wonder.) Recently when I added a new feature to pyutil that I wanted people to be aware of I also made a new separate package named jsonutil and separately uploaded it to pypi and announced it on the python mailing list: http://pypi.python.org/pypi/jsonutil .
Maybe the Right thing to do for this is to package up everything into pyutil (per #47) and then to open requests of the Python upstream maintainers that they consider some (or all) of those functions as additions to the Python Standard Library.
Over the years (almost ten years, now) of my maintaining pyutil, I've seen many of the features of pyutil get added as standard features of the Python Standard Library, but in every case because someone else independently discovered this idea, independently implemented it, and pushed it into the Python Standard Library. I guess if we want the Python folks to notice features like this in pyutil we have to actually tell them. :-)
For reviewers: the [test_quote_output_mock]attachment:encodingutil-reduce-quote-escaping.dpatch#L107 method (which tests
quote_output
with the default output encoding) has been simplified on the ticket1074 branch to not use mocks:It's probably easier to review this code [as it is on the ticket1074 branch]source:ticket1074/src/allmydata/util/encodingutil.py#L164 ([tests here]source:ticket1074/src/allmydata/test/test_encodingutil.py#L294), rather than the patch.
In [4600/ticket798]:
Replying to david-sarah@…:
This ticket should not have been auto-closed yet, because the commit was on a branch.
Applied to trunk in changeset:28e6ad51a794067f. Still could do with review.
For the benefit of reviewers, the POSIX shell escaping that this is approximately trying to emulate is specified here, except that that spec doesn't discuss Unicode. Generally when a Unicode character can be represented in the output encoding as itself, we do so, otherwise we use an \x, \u or \U escape (whichever is shortest). Interesting questions are:
Minimalizing, or omitting, the quoteing of unicode characters that can be represented in the output encoding is pretty neat.
However regarding backslash quoting behaviour: wont this break other stuff due to things such as "C:\new folder\fav_pet\targh42" being interpreted as "C:" followed by a newline, followed by "ew folder", followed by formfeed, followed by "av_pet", followed by tab and finally ending with "argh42"
I think doubling the backslashes in Windows pathnames is unavoidable.
Perhaps we (probably David-Sarah) could add documentation (in the form of doc file, docstring, or comment) to the patch addressing Zarutian's concern.
In changeset:6ce3ec6d0d363e68:
Replying to Zarutian:
No, because
single-quote escapingsingle-quoting doesn't give a special meaning to backslashes.This is described in the comment at source:src/allmydata/util/encodingutil.py@4616#L195, but not in any end-user documentation. I've now added some in source:docs/frontends/CLI.rst.
Replying to davidsarah:
-snip-
I say that having consistant handling of backslashes and having to deal with some doubling of them is preferable to other issues such as subtle modal context changes. Because I know in the future it will happen that some poor sod will miss the subtle distinction between single- and double-quotes in the quoted output. (I know I have, in the past elsewhere)
However if you want to use single quotes to turn of backslash escaping when it might not be warrented then feel free to do so. (But dont blame me when the poor sod comes along ;)
Replying to [Zarutian]comment:19:
I guess that's a -0 on this change (which was made in 1.8) :-/. FWIW, I think that casual users will probably not notice that the single-quoted strings are quoted at all, and so they won't be confused by them being different to double-quoted strings.
(Previous to
quote_output
being added changeset:80252debcd94fc28, in many cases we just had names being printed to stdout/err using"... '%s' ..." % (name,)
, which was definitely wrong.)I think we can probably close this ticket now, unless anyone wants to do a more technical code review.
A quick note: I cant see any obvious bugs (such as off by one, fence posting or logic ones) in this patch so I think it is most propably okay.
(Please note that I do not know Python and have to infer what I have seen in other similiar languages both syntactically and semantically.)