add caching to tahoe proper? #316

Open
opened 2008-02-14 02:40:04 +00:00 by warner · 6 comments

We might want to add caching to the Tahoe codebase.

I'll start by saying that I'm not excited about the idea, because in my
experience, "transparent" caches are rarely all that transparent, and it is
easy to get into a state where you're sure that you've made some change, but
you aren't seeing it happen, and it turns out that it's because there's some
stale cache in the way that you didn't previously know about.

That said, since our various FUSE-like projects are making Tahoe vdrives
visible to applications that were not designed with this sort of filesystem
in mind, it might be a good idea to determine a common set of goals that a
vdrive cache would fulfill, and then implement them in a central location:
the tahoe codebase.

One such problem has been seen in the windows-FUSE plugin (which, despite the
name, is really a local SMB server). The application does
open/write/write/close, but the SMB client code expects the writes to take a
long time and the close() to be quick. If the plugin is delivering data to a
tahoe node, then the writes may push all that data directly to local disk,
and not start the upload until the close() is called. This is mostly a
consequence of the fact that we use immutable files for data: it is hard to
upload in a streaming fashion, because the application might do a seek() at
any moment and invalidate the data we've written so far.

The SMB client code times out when the close() takes a long time, so one
trick Mike has been forced to pull is to lie to the application: respond to
the close() quickly, and allow the real upload to continue in the background.
This has a whole host of problems, the most dangerous being that the file
isn't really uploaded yet (so if the user turns off their computer, the file
is not really stored: I'm told that this is what prompted Apple to disable
the use of network drives for their Time Machine backup application). The
most obvious problem is that many windows programs follow the close() call
with an immediate open() and read back the data they just wrote. To deal with
this, Mike's plugin must also spoof the directory entry, and pretend that the
file is really there (with contents that come from the temp file being used
for the upload).

So, it is a real problem, and I don't yet see a good answer. But some form of
caching is likely to be thrown about in the search for an answer, and there's
aremote chance that it'd be better to do it inside Tahoe than outside.

My vague thoughts are:

  • tahoe nodes do not cache by default. A config knob is used to enable
    caching, and to control the retention policy.
  • tahoe nodes create a cache/ directory, in which they can do their work.
  • the immutable-file cache policy is purely a question of how much storage
    space we're willing to consume. I.e. the only reason to not cache an
    immutable file is to avoid using the disk space. This cache can be
    implemented by using the URI as a filename, something like
    $BASEDIR/cache/immutable/$URI
  • the mutable-file cache policy is more complex, because mutable files
    change, and may be changed by other people. The goals are:
    • many applications (the Mac Finder) examine a dirnode hundreds of
      times within the same second, and we want this to be fast
    • when we change a dirnode, we want to see our changes right away
    • when somebody else changes a dirnode, we want to see those changes
      soon
  • I think the mutable-file cache could be implemented by putting the file
    contents in $BASEDIR/cache/mutable/$URI, with a rule that says we ignore
    (and delete) any entry that has been there for more than 10 seconds.

It might also be worthwhile to allow the application (via the web API) to
influence the caching: GET /uri/$DIRURI?t=json&cache-for=180 . There are
several HTTP headers to control this behavior too, which may be more
appropriate (but possibly harder to use) than query args.

This ticket is intended to gather discussion and come to an implementation
decision.

We might want to add caching to the Tahoe codebase. I'll start by saying that I'm not excited about the idea, because in my experience, "transparent" caches are rarely all that transparent, and it is easy to get into a state where you're sure that you've made some change, but you aren't seeing it happen, and it turns out that it's because there's some stale cache in the way that you didn't previously know about. That said, since our various FUSE-like projects are making Tahoe vdrives visible to applications that were not designed with this sort of filesystem in mind, it might be a good idea to determine a common set of goals that a vdrive cache would fulfill, and then implement them in a central location: the tahoe codebase. One such problem has been seen in the windows-FUSE plugin (which, despite the name, is really a local SMB server). The application does open/write/write/close, but the SMB client code expects the writes to take a long time and the close() to be quick. If the plugin is delivering data to a tahoe node, then the writes may push all that data directly to local disk, and not start the upload until the close() is called. This is mostly a consequence of the fact that we use immutable files for data: it is hard to upload in a streaming fashion, because the application might do a seek() at any moment and invalidate the data we've written so far. The SMB client code times out when the close() takes a long time, so one trick Mike has been forced to pull is to lie to the application: respond to the close() quickly, and allow the real upload to continue in the background. This has a whole host of problems, the most dangerous being that the file isn't really uploaded yet (so if the user turns off their computer, the file is not really stored: I'm told that this is what prompted Apple to disable the use of network drives for their Time Machine backup application). The most obvious problem is that many windows programs follow the close() call with an immediate open() and read back the data they just wrote. To deal with this, Mike's plugin must also spoof the directory entry, and pretend that the file is really there (with contents that come from the temp file being used for the upload). So, it is a real problem, and I don't yet see a good answer. But some form of caching is likely to be thrown about in the search for an answer, and there's aremote chance that it'd be better to do it inside Tahoe than outside. My vague thoughts are: * tahoe nodes do not cache by default. A config knob is used to enable caching, and to control the retention policy. * tahoe nodes create a cache/ directory, in which they can do their work. * the immutable-file cache policy is purely a question of how much storage space we're willing to consume. I.e. the only reason to *not* cache an immutable file is to avoid using the disk space. This cache can be implemented by using the URI as a filename, something like $BASEDIR/cache/immutable/$URI * the mutable-file cache policy is more complex, because mutable files change, and may be changed by other people. The goals are: * many applications (the Mac Finder) examine a dirnode hundreds of times within the same second, and we want this to be fast * when we change a dirnode, we want to see our changes right away * when somebody else changes a dirnode, we want to see those changes soon * I think the mutable-file cache could be implemented by putting the file contents in $BASEDIR/cache/mutable/$URI, with a rule that says we ignore (and delete) any entry that has been there for more than 10 seconds. It might also be worthwhile to allow the application (via the web API) to influence the caching: GET /uri/$DIRURI?t=json&cache-for=180 . There are several HTTP headers to control this behavior too, which may be more appropriate (but possibly harder to use) than query args. This ticket is intended to gather discussion and come to an implementation decision.
warner added the
code
major
task
0.7.0
labels 2008-02-14 02:40:04 +00:00
warner added this to the eventually milestone 2008-02-14 02:40:04 +00:00

AS a matter of division of labor, and layering of design, I would rather that Tahoe proper (and Brian) concentrate on improved file semantics, i.e. "Medium-Sized Mutable Distributed Files/Archives", and other layers/other authors, e.g. Mike Booker, Nathan Wilcox, FaceItLabs, etc. add caching (if needed for particular apps) separately.

More flexible and faster semantics can reduce the need for caching. For example, either "Small Distributed Mutable Files Plus Incremental Upload" or "Medium-Sized Distributed Mutable Files" could satisfy the current need that Mike reported Windows backup applications require: the ability to open(), then write();write();write();write(), then call close(), assert the the close() returned quickly, then call open() again, then read(), and get back the data just written.

Obviously SMDF+IncrementalUpload and MDMF/A don't solve all needs that all users of filesystems have. There is an infinite progression of such needs, and we hope to support the easiest ones first. (Neither, as Brian points out above, can caching layers satisfy all such needs.)

AS a matter of division of labor, and layering of design, I would rather that Tahoe proper (and Brian) concentrate on improved file semantics, i.e. "Medium-Sized Mutable Distributed Files/Archives", and other layers/other authors, e.g. Mike Booker, Nathan Wilcox, [FaceItLabs](wiki/FaceItLabs), etc. add caching (if needed for particular apps) separately. More flexible and faster semantics can reduce the need for caching. For example, either "Small Distributed Mutable Files Plus Incremental Upload" or "Medium-Sized Distributed Mutable Files" could satisfy the current need that Mike reported Windows backup applications require: the ability to open(), then write();write();write();write(), then call close(), assert the the close() returned quickly, then call open() again, then read(), and get back the data just written. Obviously SMDF+IncrementalUpload and MDMF/A don't solve *all* needs that all users of filesystems have. There is an infinite progression of such needs, and we hope to support the easiest ones first. (Neither, as Brian points out above, can caching layers satisfy all such needs.)

Brian wrote: "This is mostly a consequence of the fact that we use immutable files for data: it is hard to upload in a streaming fashion, because the application might do a seek() at any moment and invalidate the data we've written so far."

This is not why the current Tahoe CHK files fail to support this use case. Observe that the use case never tries to seek. A possible design point to aim at would be "CHK Files + Incremental Upload", and would support the Windows backup app in question without supporting seek(). This would be easier than a "Medium-Sized Mutable Distributed File/Archive" which supported seek(). I don't know whether it would be worth spending time to implement CHK+IncrementalUpload when that time could instead be spent supporting MDMF/A instead, though.

Brian wrote: "This is mostly a consequence of the fact that we use immutable files for data: it is hard to upload in a streaming fashion, because the application might do a seek() at any moment and invalidate the data we've written so far." This is *not* why the current Tahoe CHK files fail to support this use case. Observe that the use case never tries to seek. A possible design point to aim at would be "CHK Files + Incremental Upload", and would support the Windows backup app in question without supporting seek(). This would be easier than a "Medium-Sized Mutable Distributed File/Archive" which supported seek(). I don't know whether it would be worth spending time to implement CHK+IncrementalUpload when that time could instead be spent supporting MDMF/A instead, though.
warner modified the milestone from eventually to undecided 2008-06-01 20:48:24 +00:00
Author

I had an idea that I wanted to get down before forgetting it: we could add a pubsub mechanism to the storage servers (at least the current generation which is reached via foolscap connections), to let clients be quickly notified about changes to mutable shares for which they're holding a cached copy. They would then be allowed to hold on to their cached value until the pubsub channel send an "invalidated" message. We'd need to limit the number of subscriptions, to bound memory usage on the servers. And it wouldn't get us closer to our goal of fewer active TCP connections. And it wouldn't work with the proposed HTTP-based storage servers.

I had an idea that I wanted to get down before forgetting it: we could add a pubsub mechanism to the storage servers (at least the current generation which is reached via foolscap connections), to let clients be quickly notified about changes to mutable shares for which they're holding a cached copy. They would then be allowed to hold on to their cached value until the pubsub channel send an "invalidated" message. We'd need to limit the number of subscriptions, to bound memory usage on the servers. And it wouldn't get us closer to our goal of fewer active TCP connections. And it wouldn't work with the proposed HTTP-based storage servers.
swillden commented 2009-08-31 17:00:32 +00:00
Owner

I think caching would be very valuable, even if only for immutable files. If figuring out how to handle caching for mutable files would delay implementation of immutable caching, I'd say to defer the mutable caching.

OTOH, caching of dirnodes would really be nice, if it could be done safely. I think a cache with a fast timeout, as originally suggested, would accomplish this almost as effectively and much more simply than a pubsub mechanism, especially with a small adjustment: Don't delete the "timed out" mutable caches. Instead, just make Tahoe check them by looking to see if there's a newer version. For small mutable files, that's probably not a big win over retrieving the latest, but it might help a little, and would be a win for larger mutable files.

One other thought: You could trade off a little performance for some security and, perhaps simplicity of implementation, by caching the file shares under the SID, rather than the reassembled and decrypted file. You could use a structure similar (identical?) to that used by storage servers to store shares, and, in fact, the cache could even be a secondary source for the storage server to get shares from, and even to deliver to other peers that request them.

With that approach, retrieval of any file involves looking for shares first in the local cache and storage directories. If there's not enough local data to reconstruct the file, retrieve enough additional shares from remote peers, keeping the downloaded shares in the cache, which could use a typical LRU policy for replacement when it gets full.

I think caching would be very valuable, even if only for immutable files. If figuring out how to handle caching for mutable files would delay implementation of immutable caching, I'd say to defer the mutable caching. OTOH, caching of dirnodes would really be nice, if it could be done safely. I think a cache with a fast timeout, as originally suggested, would accomplish this almost as effectively and much more simply than a pubsub mechanism, especially with a small adjustment: Don't delete the "timed out" mutable caches. Instead, just make Tahoe check them by looking to see if there's a newer version. For small mutable files, that's probably not a big win over retrieving the latest, but it might help a little, and would be a win for larger mutable files. One other thought: You could trade off a little performance for some security and, perhaps simplicity of implementation, by caching the file shares under the SID, rather than the reassembled and decrypted file. You could use a structure similar (identical?) to that used by storage servers to store shares, and, in fact, the cache could even be a secondary source for the storage server to get shares from, and even to deliver to other peers that request them. With that approach, retrieval of any file involves looking for shares first in the local cache and storage directories. If there's not enough local data to reconstruct the file, retrieve enough additional shares from remote peers, keeping the downloaded shares in the cache, which could use a typical LRU policy for replacement when it gets full.

If you like this ticket, you might also like #606 (backupdb: add directory cache), #465 (add a mutable-file cache), and #300 (macfuse: need some sort of caching).

If you like this ticket, you might also like #606 (backupdb: add directory cache), #465 (add a mutable-file cache), and #300 (macfuse: need some sort of caching).

Also, I'd like to remind everyone of ticket #280. The purpose of #280 was to provide an API call specifically for caching. I believe it can be implemented with a very small change to Tahoe, no changes to the storage format, and moderate complexity in the clients.

Otherwise, my USD 0.02 on caching design is to leave it out of Tahoe proper. If the community really wants it, we can make a standard caching component that looks like the wapi on the outside, but lives separate from the main node codebase.

I prefer implementation simplicity: Minimize feature count and number of configuration states per component with high test coverage, then hook components together.

Also, I'd like to remind everyone of ticket #280. The purpose of #280 was to provide an API call specifically for caching. I believe it can be implemented with a very small change to Tahoe, no changes to the storage format, and moderate complexity in the clients. Otherwise, my USD 0.02 on caching design is to leave it out of Tahoe proper. If the community really wants it, we can make a standard caching component that looks like the wapi on the outside, but lives separate from the main node codebase. I prefer implementation simplicity: Minimize feature count and number of configuration states per component with high test coverage, then hook components together.
tahoe-lafs added
enhancement
and removed
task
labels 2010-02-11 03:45:47 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#316
No description provided.