does cp -r work as expected? #104

Closed
opened 2007-08-15 19:33:53 +00:00 by zooko · 9 comments

It would be good if the command-lines

allmydata-tahoe get

and

allmydata-tahoe put

supported the --recursive or -r option so that you could upload or download and entire collection of files with one command-line.

There are actually a host of issues that arise in implementing this, such as those mentioned in the "names versus identifiers" section of webapi.txt, and quoted here:

For example, suppose you are writing code which recursively downloads the
contents of a directory. The first thing your code does is fetch the listing
of the contents of the directory. For each child that it fetched, if that
child is a file then it downloads the file, and if that child is a directory
then it recurses into that directory. Now, if the download and the recurse
actions are performed using the child's name, then the results might be
wrong, because for example a child name that pointed to a sub-directory when
you listed the directory might have been changed to point to a file, in which
case your attempt to recurse into it would result in an error and the file
would be skipped, or a child name that pointed to a file when you listed the
directory might now point to a sub-directory, in which case your attempt to
download the child would result in a file containing HTML text describing the
sub-directory!

These problems can be avoided by traversing identifiers instead of names, but the next problems can't. The next problems are that dirnodes can recurse (a dirnode can contain an entry pointing to another dirnode which contains an entry pointing to the first), or can converge (two entries in the same or different dirnodes can point to the same object). We could implement a recursive download of such things by (perhaps arbitrarily) choosing one path to be a real link and the other to be a symlink. But Windows doesn't have symlinks. Another option would be to abort and print an error message if such a pattern is encountered.

It would be good if the command-lines ``` allmydata-tahoe get ``` and ``` allmydata-tahoe put ``` supported the `--recursive` or `-r` option so that you could upload or download and entire collection of files with one command-line. There are actually a host of issues that arise in implementing this, such as those mentioned in the "names versus identifiers" section of webapi.txt, and quoted here: ``` For example, suppose you are writing code which recursively downloads the contents of a directory. The first thing your code does is fetch the listing of the contents of the directory. For each child that it fetched, if that child is a file then it downloads the file, and if that child is a directory then it recurses into that directory. Now, if the download and the recurse actions are performed using the child's name, then the results might be wrong, because for example a child name that pointed to a sub-directory when you listed the directory might have been changed to point to a file, in which case your attempt to recurse into it would result in an error and the file would be skipped, or a child name that pointed to a file when you listed the directory might now point to a sub-directory, in which case your attempt to download the child would result in a file containing HTML text describing the sub-directory! ``` These problems can be avoided by traversing identifiers instead of names, but the next problems can't. The next problems are that dirnodes can recurse (a dirnode can contain an entry pointing to another dirnode which contains an entry pointing to the first), or can converge (two entries in the same or different dirnodes can point to the same object). We could implement a recursive download of such things by (perhaps arbitrarily) choosing one path to be a real link and the other to be a symlink. But Windows doesn't have symlinks. Another option would be to abort and print an error message if such a pattern is encountered.
zooko added the
code-frontend
major
enhancement
0.4.0
labels 2007-08-15 19:33:53 +00:00
zooko added this to the undecided milestone 2007-08-15 19:33:53 +00:00
zooko self-assigned this 2007-08-15 19:33:53 +00:00
zooko modified the milestone from undecided to 0.6.0 2007-08-15 21:34:26 +00:00
zooko modified the milestone from 0.6.0 to 0.7.0 2007-09-19 22:55:45 +00:00
zooko changed title from recursive get and recursive put to command-line: recursive get and recursive put 2007-10-01 18:17:13 +00:00
Author

Promoting this to Milestone 0.6.1 because my favorite customer, Peter, wants it.

Promoting this to Milestone 0.6.1 because my favorite customer, Peter, wants it.
zooko added
0.6.0
and removed
0.4.0
labels 2007-10-01 19:25:42 +00:00
zooko modified the milestone from 0.7.0 to 0.6.1 2007-10-01 19:25:42 +00:00
Author

bumping this to v0.7

bumping this to v0.7
zooko modified the milestone from 0.6.1 to 0.7.0 2007-10-13 06:50:48 +00:00
Author

We're focussing on an imminent v0.7.0 (see the roadmap) which hopefully has [ -- Small Distributed Mutable Files] and also a fix for [ -- bad SHA-256]. So I'm bumping less urgent tickets to v0.7.1.

We're focussing on an imminent v0.7.0 (see [the roadmap](http://allmydata.org/trac/tahoe/roadmap)) which hopefully has [#197 #197 -- Small Distributed Mutable Files] and also a fix for [#199 #199 -- bad SHA-256]. So I'm bumping less urgent tickets to v0.7.1.
zooko added
0.6.1
and removed
0.6.0
labels 2007-11-01 18:14:13 +00:00
Author

We need to choose a manageable subset of desired improvements for v0.7.1, scheduled for two week hence, so I'm bumping this one into v0.7.2, scheduled for mid-December.

We need to choose a manageable subset of desired improvements for [v0.7.1](http://allmydata.org/trac/tahoe/milestone/0.7.1), scheduled for two week hence, so I'm bumping this one into [v0.7.2](http://allmydata.org/trac/tahoe/milestone/0.7.2), scheduled for mid-December.
zooko added
0.7.0
and removed
0.6.1
labels 2007-11-13 18:22:08 +00:00
zooko added
code-frontend-cli
and removed
code-frontend
labels 2008-01-15 21:36:41 +00:00
zooko added this to the undecided milestone 2008-01-23 04:19:03 +00:00

this is being replaced by "cp -r", and might be sufficiently done by now (although we may wish to put off closing this until "cp -r" works a bit better). Moving this to 1.2.0 with the idea that it might be closed by the 1.1.0 release.

this is being replaced by "cp -r", and might be sufficiently done by now (although we may wish to put off closing this until "cp -r" works a bit better). Moving this to 1.2.0 with the idea that it might be closed by the 1.1.0 release.
warner modified the milestone from eventually to 1.2.0 2008-06-01 21:02:33 +00:00
Author

I don't understand why you put it into Milestone 1.2.0 if you think it is ready to be closed as a feature added to 1.1.0.

Also, what did you do about convergent links (as mentioned in the initial note on this ticket), and what did you do about link cycles? And did you avoid the weirdness of race conditions, as described in the initial note of this ticket, by using caps instead of names as the "next links"?

Thanks!

I don't understand why you put it into Milestone 1.2.0 if you think it is ready to be closed as a feature added to 1.1.0. Also, what did you do about convergent links (as mentioned in the initial note on this ticket), and what did you do about link cycles? And did you avoid the weirdness of race conditions, as described in the initial note of this ticket, by using caps instead of names as the "next links"? Thanks!
zooko modified the milestone from 1.2.0 to 1.1.0 2008-06-07 19:34:48 +00:00
Author

Okay, there is a complete implementation of cp -r, but we haven't analyzed some of the potential issues mentioned in this ticket, or whether this UI is sufficient, or whether it is not actually completely complete. So, later we'll consider these questions, and we're leaving this ticket open to remind us to do that.

Okay, there is a complete implementation of `cp -r`, but we haven't analyzed some of the potential issues mentioned in this ticket, or whether this UI is sufficient, or whether it is not actually completely complete. So, later we'll consider these questions, and we're leaving this ticket open to remind us to do that.
zooko modified the milestone from 1.1.0 to 1.2.0 2008-06-09 18:30:16 +00:00
zooko modified the milestone from 1.5.0 to eventually 2009-06-30 12:39:27 +00:00
tahoe-lafs changed title from command-line: recursive get and recursive put to does cp -r work as expected? 2009-12-13 03:55:23 +00:00
tahoe-lafs added
task
and removed
enhancement
labels 2009-12-13 03:56:16 +00:00
tahoe-lafs modified the milestone from eventually to 1.7.0 2010-02-02 03:17:39 +00:00
tahoe-lafs modified the milestone from 1.7.0 to soon 2010-06-16 03:59:31 +00:00
davidsarah commented 2012-11-26 00:36:58 +00:00
Owner

This ticket is way too vague.

TahoeDirectorySource and TahoeDirectoryTarget in [source:git/src/allmydata/scripts/tahoe_cp.py] have cache dictionaries that seem as though they might have the effect of copying cycles correctly between two Tahoe directories, but I don't see a unit test for that in allmydata.test.test_cli.Cp.

is one way in which tahoe cp -r does not do the right thing. I also don't think it will do the right thing when copying a cyclic Tahoe directory to a local disk, although perhaps obscures that.

I filed to add tests for both cyclic cases.

OTOH, TahoeDirectorySource does ''not'' have the following bug:

Now, if the download and the recurse actions are performed using the child's name, then the results might be wrong, because for example a child name that pointed to a sub-directory when you listed the directory might have been changed to point to a file, [...] or a child name that pointed to a file when you listed the directory might now point to a sub-directory...

Is there anything more to do on this ticket, or is it covered by and ?

This ticket is way too vague. `TahoeDirectorySource` and `TahoeDirectoryTarget` in [source:git/src/allmydata/scripts/tahoe_cp.py] have cache dictionaries that seem as though they might have the effect of copying cycles correctly between two Tahoe directories, but I don't see a unit test for that in `allmydata.test.test_cli.Cp`. #712 is one way in which `tahoe cp -r` does not do the right thing. I also don't think it will do the right thing when copying a cyclic Tahoe directory to a local disk, although perhaps #712 obscures that. I filed #1878 to add tests for both cyclic cases. OTOH, `TahoeDirectorySource` does ''not'' have the following bug: > Now, if the download and the recurse actions are performed using the child's name, then the results might be wrong, because for example a child name that pointed to a sub-directory when you listed the directory might have been changed to point to a file, [...] or a child name that pointed to a file when you listed the directory might now point to a sub-directory... Is there anything more to do on this ticket, or is it covered by #712 and #1878?
daira commented 2013-08-28 16:47:41 +00:00
Owner

Closed for vagueness.

Closed for vagueness.
tahoe-lafs added the
invalid
label 2013-08-28 16:47:41 +00:00
daira closed this issue 2013-08-28 16:47:41 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#104
No description provided.