'tahoe cp' copying to a mutable file should replace the contents #1304

Closed
opened 2011-01-13 08:25:43 +00:00 by davidsarah · 18 comments
davidsarah commented 2011-01-13 08:25:43 +00:00
Owner

When tahoe cp copies onto an existing mutable file (in a writable directory), it replaces the directory entry. This is not consistent with SFTP, which would replace the contents of the file. Replacing the contents is more similar to a conventional filesystem, and I think more likely to be expected by users who are using mutable files.

Notice that TahoeFileTarget in source:src/allmydata/scripts/tahoe_cp.py@4768#L184 never uses the mutable variable passed into its constructor.

Although the comment there mentions #835, this is not the same bug. #835 is about adding options to allow specifying whether directories are copied as mutable or immuutable.

When `tahoe cp` copies onto an existing mutable file (in a writable directory), it replaces the directory entry. This is not consistent with SFTP, which would replace the contents of the file. Replacing the contents is more similar to a conventional filesystem, and I think more likely to be expected by users who are using mutable files. Notice that TahoeFileTarget in source:src/allmydata/scripts/tahoe_cp.py@4768#L184 never uses the `mutable` variable passed into its constructor. Although the comment there mentions #835, this is not the same bug. #835 is about adding options to allow specifying whether *directories* are copied as mutable or immuutable.
tahoe-lafs added the
code-frontend-cli
major
defect
1.8.1
labels 2011-01-13 08:25:43 +00:00
tahoe-lafs added this to the 1.9.0 milestone 2011-01-13 08:25:43 +00:00
davidsarah commented 2011-01-13 08:53:15 +00:00
Author
Owner

There's also a case for making tahoe cp pay attention to the no-write flag introduced in #1063. It's only a fairly weak case though, because that flag was added for the benefit of clients who are expecting permissions semantics as close as possible to a POSIX filesystem (e.g. clients using sshfs), and users of tahoe cp are probably not expecting that.

There's also a case for making `tahoe cp` pay attention to the `no-write` flag introduced in #1063. It's only a fairly weak case though, because that flag was added for the benefit of clients who are expecting permissions semantics as close as possible to a POSIX filesystem (e.g. clients using sshfs), and users of `tahoe cp` are probably not expecting that.
Author
Owner

I would expect copy with an existing target to write a new file in the same directory with a temporary name and then call the rename() system call. rename guarantees that either the old target or the new will still exist, even during a system crash (obviously those are local semantics). However, I see that NetBSD cp opens the file for writing. mv behaves as I expect, and I am pretty sure rsync does too.

I would expect copy with an existing target to write a new file in the same directory with a temporary name and then call the rename() system call. rename guarantees that either the old target or the new will still exist, even during a system crash (obviously those are local semantics). However, I see that NetBSD cp opens the file for writing. mv behaves as I expect, and I am pretty sure rsync does too.
kevan commented 2011-07-16 20:58:51 +00:00
Author
Owner

I've been working on this, and I hope to have it fixed by 1.9.

I've been working on this, and I hope to have it fixed by 1.9.
kevan commented 2011-07-26 19:24:16 +00:00
Author
Owner

Attachment 1304.darcspatch (27893 bytes) added

**Attachment** 1304.darcspatch (27893 bytes) added
kevan commented 2011-07-26 19:25:03 +00:00
Author
Owner

In the weekly phonecall of two weeks ago (I think), I mentioned that I thought this was a bug in the webapi. That is wrong -- I misunderstood how tahoe cp assigns sources to targets.

I think that the fact that tahoe cp will replace a mutable file with an immutable file is pretty unintuitive, and that's what it does now. Replacing the old contents with new contents also helps tahoe cp preserve use cases that rely on mutable files in directories having consistent caps after updates.

How do we want to behave when tahoe cp's authority is inadequate to replace the contents of the target? I like failure in that case; I don't like the idea of falling back to directory modification from content replacement for mutable files, since they're pretty different behaviors.

I've attached a patch. I'll set 'review-needed' once we've decided what to do when the target mutable file isn't writable and I've written a test for that.

In the weekly phonecall of two weeks ago (I think), I mentioned that I thought this was a bug in the webapi. That is wrong -- I misunderstood how `tahoe cp` assigns sources to targets. I think that the fact that `tahoe cp` will replace a mutable file with an immutable file is pretty unintuitive, and that's what it does now. Replacing the old contents with new contents also helps `tahoe cp` preserve use cases that rely on mutable files in directories having consistent caps after updates. How do we want to behave when `tahoe cp`'s authority is inadequate to replace the contents of the target? I like failure in that case; I don't like the idea of falling back to directory modification from content replacement for mutable files, since they're pretty different behaviors. I've attached a patch. I'll set 'review-needed' once we've decided what to do when the target mutable file isn't writable and I've written a test for that.
davidsarah commented 2011-07-27 16:29:41 +00:00
Author
Owner

Replying to kevan:

How do we want to behave when tahoe cp's authority is inadequate to replace the contents of the target? I like failure in that case; I don't like the idea of falling back to directory modification from content replacement for mutable files, since they're pretty different behaviors.

+1, that case should be an error.

I've attached a patch. I'll set 'review-needed' once we've decided what to do when the target mutable file isn't writable and I've written a test for that.

I'll review what you have so far.

Replying to [kevan](/tahoe-lafs/trac-2024-07-25/issues/1304#issuecomment-81949): > How do we want to behave when `tahoe cp`'s authority is inadequate to replace the contents of the target? I like failure in that case; I don't like the idea of falling back to directory modification from content replacement for mutable files, since they're pretty different behaviors. +1, that case should be an error. > I've attached a patch. I'll set 'review-needed' once we've decided what to do when the target mutable file isn't writable and I've written a test for that. I'll review what you have so far.
davidsarah commented 2011-07-28 23:19:45 +00:00
Author
Owner

Excellent tests! Very comprehensive. Now this just needs the test for when the target file isn't writeable.

Excellent tests! Very comprehensive. Now this just needs the test for when the target file isn't writeable.
kevan commented 2011-07-29 17:10:00 +00:00
Author
Owner

Thanks for the review, davidsarah. I've attached another patch with a new test.

The error message that we test for is a webapi error message. That seems a bit lazy -- it didn't require any code changes, for example -- and it's not the most informative error message I've ever seen. The only way I've thought of to do better is to teach tahoe cp to inspect the error response from the webapi if it gets one so it can append an explanation to the output, and I'm not sure I like the idea of coupling tahoe cp to the exception messages printed by the webapi in that way.

(another initially attractive option is a sanity check performed on tahoe cp's targetmap before copying starts. Based on this, we could either abort the copy operation or print a warning if we found a readonly mutable target. All of that would rely upon JSON gathered from a different HTTP request than the request(s) actually modifying data, though, so any check would be inherently unreliable. Given that, I don't think I'd want to stop the copy operation if a readonly target was found, since I wouldn't want users to get the idea that we could do that reliably and I wouldn't want to annoy users if that fact got in their way, but I'm not as against printing a warning and then continuing, though at that point we might as well just figure out a way to write a better error message after the fact, since that solves the problem in more cases)

Thanks for the review, davidsarah. I've attached another patch with a new test. The error message that we test for is a webapi error message. That seems a bit lazy -- it didn't require any code changes, for example -- and it's not the most informative error message I've ever seen. The only way I've thought of to do better is to teach `tahoe cp` to inspect the error response from the webapi if it gets one so it can append an explanation to the output, and I'm not sure I like the idea of coupling `tahoe cp` to the exception messages printed by the webapi in that way. (another initially attractive option is a sanity check performed on `tahoe cp`'s targetmap before copying starts. Based on this, we could either abort the copy operation or print a warning if we found a readonly mutable target. All of that would rely upon JSON gathered from a different HTTP request than the request(s) actually modifying data, though, so any check would be inherently unreliable. Given that, I don't think I'd want to stop the copy operation if a readonly target was found, since I wouldn't want users to get the idea that we could do that reliably and I wouldn't want to annoy users if that fact got in their way, but I'm not as against printing a warning and then continuing, though at that point we might as well just figure out a way to write a better error message after the fact, since that solves the problem in more cases)
kevan commented 2011-07-29 17:10:54 +00:00
Author
Owner

Attachment 1304.2.darcspatch (34562 bytes) added

added error test

**Attachment** 1304.2.darcspatch (34562 bytes) added added error test
davidsarah commented 2011-07-29 19:07:16 +00:00
Author
Owner

Typo in comment: "# Store the URIs for later user." -> "later use".

When running the tests I get:

FAIL]: allmydata.test.test_cli.Cp.test_cp_overwrite_readonly_mutable_file

Traceback (most recent call last):
  File "/home/davidsarah/tahoe/trunk/src/allmydata/test/test_cli.py", line 2150, in _check_error_message
    self.failUnlessIn("need write capability to publish", err)
twisted.trial.unittest.FailTest: 'need write capability to publish' not in 'Error during PUT: 400 Bad Request\nPUT to a mutable file: replace or update requested with read-only cap\n'

The 400 response has a reasonable error message -- is it just the test that needs changing, or did you get a different message?

Typo in comment: "# Store the URIs for later user." -> "later use". When running the tests I get: ``` FAIL]: allmydata.test.test_cli.Cp.test_cp_overwrite_readonly_mutable_file Traceback (most recent call last): File "/home/davidsarah/tahoe/trunk/src/allmydata/test/test_cli.py", line 2150, in _check_error_message self.failUnlessIn("need write capability to publish", err) twisted.trial.unittest.FailTest: 'need write capability to publish' not in 'Error during PUT: 400 Bad Request\nPUT to a mutable file: replace or update requested with read-only cap\n' ``` The 400 response has a reasonable error message -- is it just the test that needs changing, or did you get a different message?
davidsarah commented 2011-07-29 19:11:00 +00:00
Author
Owner

Replying to kevan:

(another initially attractive option is a sanity check performed on tahoe cp's targetmap before copying starts. Based on this, we could either abort the copy operation or print a warning if we found a readonly mutable target. All of that would rely upon JSON gathered from a different HTTP request than the request(s) actually modifying data, though, so any check would be inherently unreliable.

Yes, that's a general problem with trying to make distributed operations have all-or-nothing semantics. Anyway, even if this could be improved, we don't need to do so for 1.9.

Replying to [kevan](/tahoe-lafs/trac-2024-07-25/issues/1304#issuecomment-81952): > (another initially attractive option is a sanity check performed on `tahoe cp`'s targetmap before copying starts. Based on this, we could either abort the copy operation or print a warning if we found a readonly mutable target. All of that would rely upon JSON gathered from a different HTTP request than the request(s) actually modifying data, though, so any check would be inherently unreliable. Yes, that's a general problem with trying to make distributed operations have all-or-nothing semantics. Anyway, even if this could be improved, we don't need to do so for 1.9.
kevan commented 2011-07-29 20:31:28 +00:00
Author
Owner

I got a different error message:

Error during PUT: 500 Internal Server Error
[...]
exceptions.AssertionError: need write capability to publish

Are you testing with my #393 patches applied? It looks like the message you see is something that I added in #393, but which isn't in trunk yet. I like the #393 400 more than the 500 error, though, and we'll have to change the test anyway once #393 is landed. I could change it in my patch now and we could land #1304 once #393 is landed so there isn't a spurious test failure. Alternatively, we could leave the patch as-is, land it now, and then change the test once #393 is landed.

Thanks for spotting the typo; I'll upload a fresh patch.

I got a different error message: ``` Error during PUT: 500 Internal Server Error [...] exceptions.AssertionError: need write capability to publish ``` Are you testing with my #393 patches applied? It looks like the message you see is something that I added in #393, but which isn't in trunk yet. I like the #393 400 more than the 500 error, though, and we'll have to change the test anyway once #393 is landed. I could change it in my patch now and we could land #1304 once #393 is landed so there isn't a spurious test failure. Alternatively, we could leave the patch as-is, land it now, and then change the test once #393 is landed. Thanks for spotting the typo; I'll upload a fresh patch.
kevan commented 2011-07-29 20:32:20 +00:00
Author
Owner

Attachment 1304.3.darcspatch (34561 bytes) added

fixed typo

**Attachment** 1304.3.darcspatch (34561 bytes) added fixed typo
davidsarah commented 2011-07-29 20:38:24 +00:00
Author
Owner

Replying to kevan:

I got a different error message:

Error during PUT: 500 Internal Server Error
[...]
exceptions.AssertionError: need write capability to publish

Are you testing with my #393 patches applied?

Yes, that was the problem.

Alternatively, we could leave the patch as-is, land it now, and then change the test once #393 is landed.

+1.

Replying to [kevan](/tahoe-lafs/trac-2024-07-25/issues/1304#issuecomment-81955): > I got a different error message: > > ``` > Error during PUT: 500 Internal Server Error > [...] > exceptions.AssertionError: need write capability to publish > ``` > > Are you testing with my #393 patches applied? Yes, that was the problem. > Alternatively, we could leave the patch as-is, land it now, and then change the test once #393 is landed. +1.
davidsarah commented 2011-07-30 03:19:36 +00:00
Author
Owner

Fixed in changeset:448278e807e489e4.

Fixed in changeset:448278e807e489e4.
tahoe-lafs added the
fixed
label 2011-07-30 03:19:36 +00:00
davidsarah closed this issue 2011-07-30 03:19:36 +00:00
davidsarah commented 2011-07-30 04:09:09 +00:00
Author
Owner

This seems to have caused a test failure on Windows: http://tahoe-lafs.org/buildbot/builders/FreeStorm%20WinXP-x86%20py2.6/builds/539/steps/test/logs/stdio

It seems that a command line URI argument is being passed as Unicode when it should be a str. I can't immediately see why it only fails on Windows.

This seems to have caused a test failure on Windows: <http://tahoe-lafs.org/buildbot/builders/FreeStorm%20WinXP-x86%20py2.6/builds/539/steps/test/logs/stdio> It seems that a command line URI argument is being passed as Unicode when it should be a str. I can't immediately see why it only fails on Windows.
tahoe-lafs removed the
fixed
label 2011-07-30 04:09:09 +00:00
davidsarah reopened this issue 2011-07-30 04:09:09 +00:00
davidsarah commented 2011-07-30 04:14:35 +00:00
Author
Owner

Oh, I see. simplejson.loads doesn't consistently return the same type, it returns either unicode or str depending on its version and the phase of the moon. For example,

self.rw_uri = data["rw_uri"]

should be

self.rw_uri = to_str(data["rw_uri"])

where to_str is defined in source:src/allmydata/util/encodingutil.py (similarly for all the URIs read from JSON).

Oh, I see. `simplejson.loads` doesn't consistently return the same type, it returns either `unicode` or `str` depending on its version and the phase of the moon. For example, ``` self.rw_uri = data["rw_uri"] ``` should be ``` self.rw_uri = to_str(data["rw_uri"]) ``` where `to_str` is defined in source:src/allmydata/util/encodingutil.py (similarly for all the URIs read from JSON).
david-sarah@jacaranda.org commented 2011-07-30 04:24:04 +00:00
Author
Owner

In changeset:16fd14a78ad86536:

test_cli.py: use to_str on fields loaded using simplejson.loads in new tests. refs #1304
In changeset:16fd14a78ad86536: ``` test_cli.py: use to_str on fields loaded using simplejson.loads in new tests. refs #1304 ```
tahoe-lafs added the
fixed
label 2011-07-30 04:39:38 +00:00
davidsarah closed this issue 2011-07-30 04:39:38 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1304
No description provided.