encrypted cloud database #2472

Open
opened 2015-07-21 18:53:00 +00:00 by zooko · 0 comments

Tahoe-LAFS does a reasonable job of flat-file storage, and of directories structure. But, kids these days (for the last few decades, I mean) are really into structured storage, i.e. relational databases, queriable nosql databases, etc.

Here's a proposal for a stab at a "Minimum Viable Product" for an end-to-end encrypted cloud database. It's extremely simple: store a sqlite db in Tahoe-LAFS.

This would immediately give off-site storage (possibly even peer-to-peer if the underlying Tahoe-LAFS grid is a peer-to-peer grid), erasure-coding for redundancy, and it would also immediately give Tahoe-LAFS's nice access-control semantics: you can give people read-only access to your sqlitedb.

A potentially interesting use for this would be to store Tahoe-LAFS caps in the sqlitedb so that you can query them out. ☺

There are a few important details about how to map sqlite's storage needs to Tahoe-LAFS's storage offerings for best performance and to retain Tahoe-LAFS's guarantees about access control and atomicity and so forth. I looked into it at one point about a year ago, and unfortunately didn't post notes to the trac so I don't remember precisely, what I decided, but I think it was that the sqlitedb should be in write-ahead-logging WAL mode (https://www.sqlite.org/wal.html), and with exclusive locking mode, and should be stored a single MDMF file with its segment size set to be the same as the sqlitedb's page size.

The -wal file should probably also be an MDMF, although it would be cool if sqlite happened to use it in write-once mode, in which case maybe it could be an immutable.

There's an open issue about whether read-only access to such a DB would work without PRAGMA journal_mode=DELETE. Read https://www.sqlite.org/wal.html#readonly to see what I mean, and keep in mind that because we're telling the user that they have to set exclusive locking mode: https://www.sqlite.org/wal.html#noshm

With this setup, the cap to the database has to be a cap to the directory containing the sqlitedb file, not a cap to the sqlitedb file itself. That's because sqlite needs to access the -wal file adjacent to the sqlitedb file itself.

A different approach would be to use the older rollback-log functionality of sqlite instead of WAL. The trade-offs listed in https://www.sqlite.org/wal.html make it sound like maybe that would fit better into Tahoe-LAFS. It might require experimentation and benchmarking to understand.

But also it requires careful study of things like https://www.sqlite.org/lockingv3.html#how_to_corrupt and https://www.sqlite.org/atomiccommit.html#sect_9_0 to figure out if Tahoe-LAFS could provide the guarantees that sqlite needs. I think we can, and I tentatively think that the WAL is easier to guarantee than the rollback journal, because with the rollback journal there is a positive requirement to preserve and make available any hot journal, or else corruption can result, whereas with a WAL a failure of preservation or availability of the -wal just results in rollback, not corruption. I think.

Tahoe-LAFS does a reasonable job of flat-file storage, and of directories structure. But, kids these days (for the last few decades, I mean) are really into *structured storage*, i.e. relational databases, queriable nosql databases, etc. Here's a proposal for a stab at a "Minimum Viable Product" for an end-to-end encrypted cloud database. It's extremely simple: store a sqlite db in Tahoe-LAFS. This would immediately give off-site storage (possibly even peer-to-peer if the underlying Tahoe-LAFS grid is a peer-to-peer grid), erasure-coding for redundancy, and it would also immediately give Tahoe-LAFS's nice access-control semantics: you can give people read-only access to your sqlitedb. A potentially interesting use for this would be to store Tahoe-LAFS caps in the sqlitedb so that you can query them out. ☺ There are a few important details about how to map sqlite's storage needs to Tahoe-LAFS's storage offerings for best performance and to retain Tahoe-LAFS's guarantees about access control and atomicity and so forth. I looked into it at one point about a year ago, and unfortunately didn't post notes to the trac so I don't remember precisely, what I decided, but I think it was that the sqlitedb should be in write-ahead-logging `WAL` mode (<https://www.sqlite.org/wal.html>), and with exclusive locking mode, and should be stored a single MDMF file with its segment size set to be the same as the sqlitedb's page size. The `-wal` file should probably also be an MDMF, although it would be cool if sqlite happened to use it in write-once mode, in which case *maybe* it could be an immutable. There's an open issue about whether read-only access to such a DB would work without `PRAGMA journal_mode=DELETE`. Read <https://www.sqlite.org/wal.html#readonly> to see what I mean, and keep in mind that because we're telling the user that they have to set exclusive locking mode: <https://www.sqlite.org/wal.html#noshm> With this setup, the cap to the database has to be a cap to the directory *containing* the sqlitedb file, not a cap to the sqlitedb file itself. That's because sqlite needs to access the `-wal` file adjacent to the sqlitedb file itself. A different approach would be to use the older rollback-log functionality of sqlite instead of WAL. The trade-offs listed in <https://www.sqlite.org/wal.html> make it sound like maybe that would fit better into Tahoe-LAFS. It might require experimentation and benchmarking to understand. But also it requires careful study of things like <https://www.sqlite.org/lockingv3.html#how_to_corrupt> and <https://www.sqlite.org/atomiccommit.html#sect_9_0> to figure out if Tahoe-LAFS could provide the guarantees that sqlite needs. I think we can, and I tentatively think that the `WAL` is easier to guarantee than the rollback journal, because with the rollback journal there is a positive requirement to *preserve* and make *available* any hot journal, or else corruption can result, whereas with a `WAL` a failure of preservation or availability of the `-wal` just results in rollback, not corruption. I think.
zooko added the
unknown
normal
defect
1.10.1
labels 2015-07-21 18:53:00 +00:00
zooko added this to the undecided milestone 2015-07-21 18:53:00 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#2472
No description provided.