Unicode bug in grid to grid copies #1224
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1224
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
A grid to grid copy involving non-ASCII filenames fails. This is likely another occurrence of bug #534.
I had assumed that
urllib.quote
was supposed to UTF-8-then-percent-encode Unicode strings, but it's not documented as doing so, so that was probably wishful thinking.This seems to be http://bugs.python.org/issue1712522. Apparently you have to convert to UTF-8 manually.
Note that we have a
unicode_to_url
method in source:src/allmydata/util/encodingutil.py that should probably be used for this (or maybe we should add aquote_unicode_url
method, if it turns out that we normally need to convert and percent-escape at the same time).This isn't actually a regression from v1.7.1 to v1.8.0 is it?
(Maybe we should fix it in v1.8.1 anyway, just because it is easy to fix, impacts actual users like François, the fix is unlikely to cause other problems, and it is "unfinished business" from the new univode support in v1.7.0.)
A patch to fix this bug and add a test has been pushed in my git repository which is available there:
http://github.com/ctrlaltdel/tahoe-lafs/tree/ticket/1224
There are other instances of
urllib.quote
with a name (as opposed to a cap URI) as argument, intahoe_backup.py
,tahoe_mkdir.py
,tahoe_put.py
, andweb/directory.py
I think.Replying to davidsarah:
I already did a grep in the whole tree to find other occurrences of this bug, here's what I came up with.
tahoe_backup.py
Function
put_child
gets only called withpath="Latest"
orpath=now
which are both ASCII strings. But you're right, this is probably safer to useunicode_to_url
there as well. I pushed a new commit in my git branch with this change.tahoe_mkdir.py
The
path
variable comes from theget_alias
function which already returns an UTF-8 encoded string.tahoe_put.py
It uses the
get_alias
function as well.web/directory.py
In this file, the
name
is always encoded as an UTF-8 string before use.I reviewed the git commit and it looks good.
Brian, could you merge this patch into trunk and push it into the darcs repo at dev.allmydata.org:/home/darcs/tahoe-lafs/trunk? Thanks!
I reviewed the change to tahoe_backup.py and that also looks good.
Okay, Brian could you also push that one from comment:80520 into trunk then? :-)
Oh, do these need a
NEWS
entry?In changeset:14ee763c542b61c5:
In changeset:2610f8e0aa6e2221: