add 'tahoe mirror' command, use backupdb #597

Open
opened 2009-01-28 21:05:59 +00:00 by warner · 6 comments

It would be nice to have a CLI tool which does a minimum-effort copy, from
local disk into a Tahoe directory. This tool should behave like "rsync -a
--delete": when done, the target directory should look exactly like the local
disk's directory. By running the tool on a periodic basis, the Tahoe
directory will contain a single most-recent-version backup of the local disk.

tahoe mirror ~/music tahoe:my-music

To make this as fast as possible, the default mode will use two assumptions:

  • the target Tahoe directory has not been changed by other parties since the
    last invocation of 'tahoe mirror'
  • any file changes will modify either their timestamp or filesize

Both assumptions can be disabled with argv flags, at the expense of doing
more work. If both assumptions are accepted, then a null backup should
require no network traffic and no file reads (only directory reads).

This command would use the 'backupdb', stored in
~/.tahoe/private/backupdb, as discussed in this thread:
http://allmydata.org/pipermail/tahoe-dev/2008-May/000620.html . The backupdb
would allow the client to quickly determine which files and directories have
already been copied.

The proposed "tahoe backup" command (#598) would use the same backupdb.

It would be nice to have a CLI tool which does a minimum-effort copy, from local disk into a Tahoe directory. This tool should behave like "rsync -a --delete": when done, the target directory should look exactly like the local disk's directory. By running the tool on a periodic basis, the Tahoe directory will contain a single most-recent-version backup of the local disk. `tahoe mirror ~/music tahoe:my-music` To make this as fast as possible, the default mode will use two assumptions: * the target Tahoe directory has not been changed by other parties since the last invocation of 'tahoe mirror' * any file changes will modify either their timestamp or filesize Both assumptions can be disabled with argv flags, at the expense of doing more work. If both assumptions are accepted, then a null backup should require no network traffic and no file reads (only directory reads). This command would use the 'backupdb', stored in `~/.tahoe/private/backupdb`, as discussed in this thread: http://allmydata.org/pipermail/tahoe-dev/2008-May/000620.html . The backupdb would allow the client to quickly determine which files and directories have already been copied. The proposed "tahoe backup" command (#598) would use the same backupdb.
warner added the
code-frontend-cli
major
enhancement
1.2.0
labels 2009-01-28 21:05:59 +00:00
warner added this to the undecided milestone 2009-01-28 21:05:59 +00:00
Author

so, it looks like #598 ("tahoe backup": versioned shared backups) is more
interesting right now, so I'll be working on it instead of "tahoe sync". To
checkpoint my work so far: here's my pseudocode, and the stub of the CLI
code.

create target directory

loop(localdir, target tahoe dir):
 fetch targetdir
 delete any:
  children that don't exist locally
  files that should be dirs
  dirs that should be files
 create missing dirs
 for each file in localdir:
  chk = upload(file) # uses backupdb to short-circuit
  if chk,metadata == targetdir[child]:
   continue
  else:
   set_child(child, chk+metadata)
 for each subdir in localdir:
  loop()

assuming upload() uses a backupdb successfully, a null backup with this
algorithm will read all targetdirs but will not upload or modify anything.
so, it looks like #598 ("tahoe backup": versioned shared backups) is more interesting right now, so I'll be working on it instead of "tahoe sync". To checkpoint my work so far: here's my pseudocode, and the stub of the CLI code. ``` create target directory loop(localdir, target tahoe dir): fetch targetdir delete any: children that don't exist locally files that should be dirs dirs that should be files create missing dirs for each file in localdir: chk = upload(file) # uses backupdb to short-circuit if chk,metadata == targetdir[child]: continue else: set_child(child, chk+metadata) for each subdir in localdir: loop() assuming upload() uses a backupdb successfully, a null backup with this algorithm will read all targetdirs but will not upload or modify anything. ```
Author

Attachment 597-tahoesync-stub.diff (2966 bytes) added

starting point, just a stub of the CLI command

**Attachment** 597-tahoesync-stub.diff (2966 bytes) added starting point, just a stub of the CLI command
Author

after some discussion, we decided that "tahoe mirror" was a better name for this than "tahoe sync". "sync" implies bidirectionality, whereas "mirror" has a definite real-world side and looking-glass-world side.

So this ticket is about "tahoe mirror", which does whatever is necessary to make a target directory look like a source directory, without modifying the source directory. The "tahoe sync" idea (which makes the directories look the same, but is allowed to modify both directories) has been moved to #601.

after some discussion, we decided that "tahoe mirror" was a better name for this than "tahoe sync". "sync" implies bidirectionality, whereas "mirror" has a definite real-world side and looking-glass-world side. So this ticket is about "tahoe mirror", which does whatever is necessary to make a target directory look like a source directory, without modifying the source directory. The "tahoe sync" idea (which makes the directories look the same, but is allowed to modify *both* directories) has been moved to #601.
Author

oops, forgot to modify the ticket description

oops, forgot to modify the ticket description
warner changed title from add 'tahoe sync' command, use backupdb to add 'tahoe mirror' command, use backupdb 2009-01-31 02:24:48 +00:00
stockrt commented 2009-03-11 02:14:08 +00:00
Owner

Warner, isn't this ticket about the functionality already provided by 'tahoe backup' #598?

Wouldn't be good to close this one?

Warner, isn't this ticket about the functionality already provided by 'tahoe backup' #598? Wouldn't be good to close this one?
Author

stockrt: nope, "tahoe backup" is defined to create successive timestamped snapshots, whereas "tahoe mirror" is defined to create/modify a single snapshot.

After you've used "tahoe backup ... alias:Backups" daily for a few days, you'll have:

  • Backups/Latest/...
  • Backups/Archives/2009-03-10/...
  • Backups/Archives/2009-03-11/...
  • Backups/Archives/2009-03-12/...

After you've used "tahoe mirror ... alias:Backups" daily for a few days (or a month, or just once), you'll have:

  • Backups/...

If you used "tahoe backup ... alias:Backups" and then ignored the Backups/Archives/ directory, you'd get the same thing as you'd get with "tahoe mirror ... alias:Backups/Latest". But someone who wants just the latest copy would 1) be annoyed by the old archives piling up and 2) would be annoyed by the extra "Latest/" subdirectory that they didn't ask for. That's why it seems like a separate command would be useful. (but, not as useful as "tahoe backup", which is why "tahoe mirror" got de-prioritized).

stockrt: nope, "tahoe backup" is defined to create successive timestamped snapshots, whereas "tahoe mirror" is defined to create/modify a single snapshot. After you've used "tahoe backup ... alias:Backups" daily for a few days, you'll have: * Backups/Latest/... * Backups/Archives/2009-03-10/... * Backups/Archives/2009-03-11/... * Backups/Archives/2009-03-12/... After you've used "tahoe mirror ... alias:Backups" daily for a few days (or a month, or just once), you'll have: * Backups/... If you used "tahoe backup ... alias:Backups" and then ignored the Backups/Archives/ directory, you'd get the same thing as you'd get with "tahoe mirror ... alias:Backups/Latest". But someone who wants just the latest copy would 1) be annoyed by the old archives piling up and 2) would be annoyed by the extra "Latest/" subdirectory that they didn't ask for. That's why it seems like a separate command would be useful. (but, not as useful as "tahoe backup", which is why "tahoe mirror" got de-prioritized).
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#597
No description provided.