write-only caps #796

New Issue

warner · 2009-08-22T23:55:34Z

warner commented

2009-08-22 23:55:34 +00:00

Daira Hopwood points out an even more interesting
direction to take in a recent tahoe-dev posting:

http://allmydata.org/pipermail/tahoe-dev/2009-August/002653.html

The goal is to have one cap (used frequently and stored online)
to do write-only backups, and a different cap (used only for
recovery and stored offline) to perform the reads. The effect
would be close to that of the Mac OS-X shared public "Drop Box"
folder, or of GPG-encrypting a piece of data to a private key
that is held offline: normally a one-way operation, but when you
need to, you open up the vault and pull out the decryption key.

This would be pretty cool. This ticket is to sketch out what the
crypto layout would look like. #795 (append-only files) will be
a starting point, and there will certainly be an asymmetric
encryption/decryption keypair involved.

From the UI point of view, you'd have some sort of magic
append-only no-reading directory cap, which you keep in your
private/alises table. There would be a corresponding
read-everything cap (or maybe just the full-fledged writecap;
these could be stored separately), which you keep in a vault and
only type in to test the system and to recover data. Then you
type "tahoe backup ~ backup-appendonlycap:", and you expect that
this unreadable "backup-appendonlycap:" object will acquire
another child, with a timestamp name that is hopefully (but not
guaranteedly) unique.

You might also like the unchanged-directory-sharing properties
of "tahoe backup" to keep working, so that you don't spend a lot
of time or disk on things that haven't changed. I don't know if
it's possible to accomplish this without recording some
information which would violate the no-reading properties of the
parent. This would probably be easier to pull off if we have
immutable directories (#607). I suspect that you'll still have
to read and hash your whole disk, and generate the CHK
identifiers, and then discover that they're already uploaded. So
you might save the storage space and the upload bandwidth, but
not the local disk IO.

(hm, so the current backupdb would record the uploaded filecaps,
which starts to violate the goals once the original file gets
deleted and the backupdb doesn't also delete the stored filecap.
But if your local filesystem allows you to attach metadata to
the files you're backing up, then just attach the tahoe filecap
and a ctime/mtime/filesize snapshot to the original file, so the
filecap dies with the file. The backup process would look for
this metadata, compare the ctime/mtime/size snapshot to decide
if the cached filecap is stale, then upload or not. This would
be pretty slick, actually, and I think several modern
filesystems let you attach this sort of metadata (HFS+ for one).
If you can attach metadata to directories, then you write the
verifycap of the immutable dirnode last used for that directory:
on each new backup, you figure out the new dirnode contents,
hash them into the CHK key, hash that and compare it against
the verifycap, if they match then boom now you have the dirnode
readcap for going up to the parent, if they don't match then you
must upload the new version of that dirnode. This avoids keeping
the old dircap cleartext around. The only remaining security
issue is that you'd be keeping the individual filecaps around
for old versions, until the next "tahoe backup" process came
along and replaced them, but this is a much smaller exposure
than the dirnodes. It would leak the following information: if
an attacker gets a copy of your disk at time T=2, they might be
able to learn the contents of modified-but-not-deleted files
that we previously backed up at time T=1.)

It's probably ok for the "tahoe backup" process to upload files
and create directories, generating temporary caps which it is
obligated to forget after the top-level append operation. If the
whole backup is created out of immutable objects, the only
mutable slot is the top-most timestamped-version holding
directory, and that's where the append-only operation would be
used.

I'm trying to imagine if it would make sense to add an
"append-only" or "write-only-no-reading" column to the dirnode
table (to provide something like "transitive append-only-ness").
I'm not even sure if that's sane, so I'll put off thinking about
it until later. (if you can't read, is "transitive" even
defined?).

Daira Hopwood points out an even more interesting direction to take in a recent tahoe-dev posting: <http://allmydata.org/pipermail/tahoe-dev/2009-August/002653.html> The goal is to have one cap (used frequently and stored online) to do write-only backups, and a different cap (used only for recovery and stored offline) to perform the reads. The effect would be close to that of the Mac OS-X shared public "Drop Box" folder, or of GPG-encrypting a piece of data to a private key that is held offline: normally a one-way operation, but when you need to, you open up the vault and pull out the decryption key. This would be pretty cool. This ticket is to sketch out what the crypto layout would look like. #795 (append-only files) will be a starting point, and there will certainly be an asymmetric encryption/decryption keypair involved. From the UI point of view, you'd have some sort of magic append-only no-reading directory cap, which you keep in your private/alises table. There would be a corresponding read-everything cap (or maybe just the full-fledged writecap; these could be stored separately), which you keep in a vault and only type in to test the system and to recover data. Then you type "tahoe backup ~ backup-appendonlycap:", and you expect that this unreadable "backup-appendonlycap:" object will acquire another child, with a timestamp name that is hopefully (but not guaranteedly) unique. You might also like the unchanged-directory-sharing properties of "tahoe backup" to keep working, so that you don't spend a lot of time or disk on things that haven't changed. I don't know if it's possible to accomplish this without recording some information which would violate the no-reading properties of the parent. This would probably be easier to pull off if we have immutable directories (#607). I suspect that you'll still have to read and hash your whole disk, and generate the CHK identifiers, and then discover that they're already uploaded. So you might save the storage space and the upload bandwidth, but not the local disk IO. (hm, so the current backupdb would record the uploaded filecaps, which starts to violate the goals once the original file gets deleted and the backupdb doesn't also delete the stored filecap. But if your local filesystem allows you to attach metadata to the files you're backing up, then just attach the tahoe filecap and a ctime/mtime/filesize snapshot to the original file, so the filecap dies with the file. The backup process would look for this metadata, compare the ctime/mtime/size snapshot to decide if the cached filecap is stale, then upload or not. This would be pretty slick, actually, and I think several modern filesystems let you attach this sort of metadata (HFS+ for one). If you can attach metadata to directories, then you write the verifycap of the immutable dirnode last used for that directory: on each new backup, you figure out the new dirnode contents, hash them into the CHK key, hash *that* and compare it against the verifycap, if they match then boom now you have the dirnode readcap for going up to the parent, if they don't match then you must upload the new version of that dirnode. This avoids keeping the old dircap cleartext around. The only remaining security issue is that you'd be keeping the individual filecaps around for old versions, until the next "tahoe backup" process came along and replaced them, but this is a much smaller exposure than the dirnodes. It would leak the following information: if an attacker gets a copy of your disk at time T=2, they might be able to learn the contents of modified-but-not-deleted files that we previously backed up at time T=1.) It's probably ok for the "tahoe backup" process to upload files and create directories, generating temporary caps which it is obligated to forget after the top-level append operation. If the whole backup is created out of immutable objects, the only mutable slot is the top-most timestamped-version holding directory, and that's where the append-only operation would be used. I'm trying to imagine if it would make sense to add an "append-only" or "write-only-no-reading" column to the dirnode table (to provide something like "transitive append-only-ness"). I'm not even sure if that's sane, so I'll put off thinking about it until later. (if you can't read, is "transitive" even defined?).