Web gateway should avoid caching plaintext of downloads #990

Closed
opened 2010-03-11 19:37:57 +00:00 by jsgf · 14 comments
Owner

The web gateway will (on occasion) locally cache files in unencrypted form, such as handling ranged GET requests.

Now in normal use that's perfectly OK because web gateways are trusted with our unencrypted data and so having the data present in that form should be OK.

But my mental model of a gateway machine is that it's just a stateless waypoint which doesn't store anything local. If I have a setup where there's a gateway machine within my network serving several machines, I would expect it to not have any persistent memory of my data, and so when it comes time to replace the HD I don't need to worry about scrubbing the disk, at least for Tahoe's sake. (Let's assume swap has been dealt with.)

Therefore, I think the gateway should keep the cache files encrypted and only decrypt them on the fly as they're being sent to its clients. I'm not sure what the key should be, but it should be per-file and transient (derived from the cap/root hash/something else?) rather than some local state (which would defeat the purpose of encrypting in the first place).

Could possibly be handled as part of the downloader rewrite of #798?

The web gateway will (on occasion) locally cache files in unencrypted form, such as handling ranged GET requests. Now in normal use that's perfectly OK because web gateways are trusted with our unencrypted data and so having the data present in that form should be OK. But my mental model of a gateway machine is that it's just a stateless waypoint which doesn't store anything local. If I have a setup where there's a gateway machine within my network serving several machines, I would expect it to not have any persistent memory of my data, and so when it comes time to replace the HD I don't need to worry about scrubbing the disk, at least for Tahoe's sake. (Let's assume swap has been dealt with.) Therefore, I think the gateway should keep the cache files encrypted and only decrypt them on the fly as they're being sent to its clients. I'm not sure what the key should be, but it should be per-file and transient (derived from the cap/root hash/something else?) rather than some local state (which would defeat the purpose of encrypting in the first place). Could possibly be handled as part of the downloader rewrite of #798?
tahoe-lafs added the
unknown
major
defect
1.6.0
labels 2010-03-11 19:37:57 +00:00
tahoe-lafs added this to the undecided milestone 2010-03-11 19:37:57 +00:00
davidsarah commented 2010-03-11 22:45:16 +00:00
Author
Owner

There are three options that would fix this:

  1. Use encrypted temporary files as suggested above
  2. Stop using temporary files
  3. Securely overwrite temporary files before closing them

However 2. may require too much memory, and 3. may leave plaintext accessible if the gateway crashes.

Note that the SFTP and FTP frontends also use unencrypted temporary files to handle write requests (perhaps that should be split into a separate ticket, but this one will do for the time being).

#991 was a duplicate.

There are three options that would fix this: 1. Use encrypted temporary files as suggested above 2. Stop using temporary files 3. Securely overwrite temporary files before closing them However 2. may require too much memory, and 3. may leave plaintext accessible if the gateway crashes. Note that the SFTP and FTP frontends also use unencrypted temporary files to handle write requests (perhaps that should be split into a separate ticket, but this one will do for the time being). #991 was a duplicate.
tahoe-lafs added
code-frontend
and removed
unknown
labels 2010-03-11 22:45:16 +00:00
tahoe-lafs modified the milestone from undecided to 1.7.0 2010-03-11 22:45:16 +00:00
tahoe-lafs changed title from Web gateway should keep its caches encrypted to Web gateway should avoid caching plaintext 2010-03-11 22:45:16 +00:00
davidsarah commented 2010-03-11 22:49:45 +00:00
Author
Owner

Replying to davidsarah:

  1. Stop using temporary files
    ...
    However 2. may require too much memory, ...

More precisely, too much address space on 32-bit machines. How much virtual memory would be used by this option, would not be a problem per se if we had unlimited address space and an OS with a well-designed virtual memory subsystem (but we don't).

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/990#issuecomment-76158): > 2. Stop using temporary files ... > However 2. may require too much memory, ... More precisely, too much address space on 32-bit machines. How much virtual memory would be used by this option, would not be a problem *per se* if we had unlimited address space and an OS with a well-designed virtual memory subsystem (but we don't).
Author
Owner

I think the downloader's use of temp files is fairly unfortunate (downloading a 200MB file to satisfy a small range), so option 2 isn't out of the question. (There may be some value in caching for performance reasons, but that isn't why the downloader caches now.)

Secure deletion is pretty much impossible unless the filesystem and storage subsystem supports it. It certainly isn't sufficient to just overwrite the file and hope that hits all same blocks the original data hit. So I think 3 is out.

I think the downloader's use of temp files is fairly unfortunate (downloading a 200MB file to satisfy a small range), so option 2 isn't out of the question. (There may be some value in caching for performance reasons, but that isn't why the downloader caches now.) Secure deletion is pretty much impossible unless the filesystem and storage subsystem supports it. It certainly isn't sufficient to just overwrite the file and hope that hits all same blocks the original data hit. So I think 3 is out.

My favorite solution to this would be to implement #320 (add streaming (on-line) upload to HTTP interface) so that the gateway doesn't use the disk at all. #320 would offer great improvements, IMO, in performance and flexibility.

You have to give up on convergent encryption whenever you choose streaming upload (although I wonder if we could get some of it back by defining an encryption key from the secure hash of each segment in turn (including the added convergence secret) and using that key to encrypt the next segment..).

My favorite solution to this would be to implement #320 (add streaming (on-line) upload to HTTP interface) so that the gateway doesn't use the disk at all. #320 would offer great improvements, IMO, in performance and flexibility. You have to give up on convergent encryption whenever you choose streaming upload (although I wonder if we could get some of it back by defining an encryption key from the secure hash of each segment in turn (including the added convergence secret) and using that key to encrypt the next segment..).
Author
Owner

Replying to zooko:

My favorite solution to this would be to implement #320 (add streaming (on-line) upload to HTTP interface) so that the gateway doesn't use the disk at all. #320 would offer great improvements, IMO, in performance and flexibility.

However, in this case we're talking about cache files for downloading. When you do a byte-range GET request, you lose streaming downloads.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/990#issuecomment-76165): > My favorite solution to this would be to implement #320 (add streaming (on-line) upload to HTTP interface) so that the gateway doesn't use the disk at all. #320 would offer great improvements, IMO, in performance and flexibility. However, in this case we're talking about cache files for *downloading*. When you do a byte-range GET request, you lose streaming downloads.
tahoe-lafs changed title from Web gateway should avoid caching plaintext to Web gateway should avoid caching plaintext of downloads 2010-03-11 23:08:10 +00:00
jack.lloyd commented 2010-03-11 23:20:52 +00:00
Author
Owner

As a workaround of sorts, you could set the tempdir to point to an encrypted partition (or, on Linux, a tmpfs (an in-memory filesystem) backed by encrypted swap). And using encrypted swap is generally desirable for a host of other reasons; it's a shame nobody but OpenBSD (IIRC) actually does it by default.

Obviously this is impractical in many situations and undesirable in others, but I thought it would be good to point it out, for the subset of users for whom it might be useful.

I think option (3) is a non-starter - dealing with journaling filesystems, SSDs, etc make this nearly impossible in the general case. You can probably get it to work for, say, 9 in 10 users, but leaving 1 in 10 silently vulnerable is not good, and I'd think the effort doing this would be much better spent on eliminating temp files where possible or encrypting them if it's not feasible.

As a workaround of sorts, you could set the tempdir to point to an encrypted partition (or, on Linux, a tmpfs (an in-memory filesystem) backed by encrypted swap). And using encrypted swap is generally desirable for a host of other reasons; it's a shame nobody but OpenBSD (IIRC) actually does it by default. Obviously this is impractical in many situations and undesirable in others, but I thought it would be good to point it out, for the subset of users for whom it might be useful. I think option (3) is a non-starter - dealing with journaling filesystems, SSDs, etc make this nearly impossible in the general case. You can probably get it to work for, say, 9 in 10 users, but leaving 1 in 10 silently vulnerable is not good, and I'd think the effort doing this would be much better spent on eliminating temp files where possible or encrypting them if it's not feasible.

Replying to [jsgf]comment:6:

Replying to zooko:

My favorite solution to this would be to implement #320 (add streaming (on-line) upload to HTTP interface) so that the gateway doesn't use the disk at all. #320 would offer great improvements, IMO, in performance and flexibility.

However, in this case we're talking about cache files for downloading. When you do a byte-range GET request, you lose streaming downloads.

Oh, right, downloads when you are doing a range-request. My favorite solution to that is #798 (improve random-access download to retrieve/decrypt less data) so that the web gateway downloads only the segments needed to satisfy your range request and just keeps them in RAM until it has satsified you.

Replying to [jsgf]comment:6: > Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/990#issuecomment-76165): > > My favorite solution to this would be to implement #320 (add streaming (on-line) upload to HTTP interface) so that the gateway doesn't use the disk at all. #320 would offer great improvements, IMO, in performance and flexibility. > > However, in this case we're talking about cache files for *downloading*. When you do a byte-range GET request, you lose streaming downloads. Oh, right, *downloads* when you are doing a range-request. My favorite solution to that is #798 (improve random-access download to retrieve/decrypt less data) so that the web gateway downloads only the segments needed to satisfy your range request and just keeps them in RAM until it has satsified you.
Author
Owner

Replying to [zooko]comment:9:

Oh, right, downloads when you are doing a range-request. My favorite solution to that is #798 (improve random-access download to retrieve/decrypt less data) so that the web gateway downloads only the segments needed to satisfy your range request and just keeps them in RAM until it has satsified you.

Yes, that's why I mentioned #798 in the report as possibly the best way of solving the problem ;)

Replying to [zooko]comment:9: > Oh, right, *downloads* when you are doing a range-request. My favorite solution to that is #798 (improve random-access download to retrieve/decrypt less data) so that the web gateway downloads only the segments needed to satisfy your range request and just keeps them in RAM until it has satsified you. Yes, that's why I mentioned #798 in the report as possibly the best way of solving the problem ;)

yeah, the new downloader won't touch the disk at all. It fetches exactly the segment required to satisfy the first part of the range request, delivers the plaintext, then forgets about that segment and moves on to the next one.

yeah, the new downloader won't touch the disk at all. It fetches exactly the segment required to satisfy the first part of the range request, delivers the plaintext, then forgets about that segment and moves on to the next one.
davidsarah commented 2010-05-16 03:21:16 +00:00
Author
Owner

The new SFTP implementation in #1037 uses an encrypted temp file for uploads. So that will be fixed as well in 1.7.

The new SFTP implementation in #1037 uses an encrypted temp file for uploads. So that will be fixed as well in 1.7.
davidsarah commented 2010-05-16 03:23:49 +00:00
Author
Owner

Replying to davidsarah:

The new SFTP implementation in #1037 uses an encrypted temp file for uploads. So that will be fixed as well in 1.7.

Not "as well", because the new downloader has been deferred to 1.8.

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/990#issuecomment-76172): > The new SFTP implementation in #1037 uses an encrypted temp file for uploads. So that will be fixed as well in 1.7. Not "as well", because the new downloader has been deferred to 1.8.
tahoe-lafs modified the milestone from 1.7.0 to 1.8.0 2010-05-16 03:23:49 +00:00

The #798 new immutable downloader has landed, and does not touch the disk.
(the mutable downloader doesn't touch the disk either). The webapi interface
to it uses the correct read(consumer,offset,length) interface. The
upload-side webapi server will still put large (>100kB) plaintext files on
disk (in an anonymous tempfile), and I don't know what the FTP/SFTP code
does.

Is this ticket narrowly-scoped enough that we can now close it?

The #798 new immutable downloader has landed, and does not touch the disk. (the mutable downloader doesn't touch the disk either). The webapi interface to it uses the correct `read(consumer,offset,length)` interface. The upload-side webapi server will still put large (>100kB) plaintext files on disk (in an anonymous tempfile), and I don't know what the FTP/SFTP code does. Is this ticket narrowly-scoped enough that we can now close it?
davidsarah commented 2010-08-14 20:47:03 +00:00
Author
Owner

Replying to warner:

The #798 new immutable downloader has landed, and does not touch the disk.
(the mutable downloader doesn't touch the disk either). The webapi interface
to it uses the correct read(consumer,offset,length) interface. The
upload-side webapi server will still put large (>100kB) plaintext files on
disk (in an anonymous tempfile),

Perhaps it should be using [EncryptedTemporaryFile]source:src/allmydata/util/fileutil.py@4609#L118? That would be a new ticket, though.

and I don't know what the FTP/SFTP code does.

Both the FTP and SFTP code only do full downloads. They sometimes use EncryptedTemporaryFiles, but don't store plaintext on disk (see changeset:05022dca36780b3b).

Is this ticket narrowly-scoped enough that we can now close it?

Yes.

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/990#issuecomment-76175): > The #798 new immutable downloader has landed, and does not touch the disk. > (the mutable downloader doesn't touch the disk either). The webapi interface > to it uses the correct `read(consumer,offset,length)` interface. The > upload-side webapi server will still put large (>100kB) plaintext files on > disk (in an anonymous tempfile), Perhaps it should be using [EncryptedTemporaryFile]source:src/allmydata/util/fileutil.py@4609#L118? That would be a new ticket, though. > and I don't know what the FTP/SFTP code does. Both the FTP and SFTP code only do full downloads. They sometimes use `EncryptedTemporaryFile`s, but don't store plaintext on disk (see changeset:05022dca36780b3b). > Is this ticket narrowly-scoped enough that we can now close it? Yes.
tahoe-lafs added the
fixed
label 2010-08-14 20:47:03 +00:00
davidsarah commented 2010-08-21 03:17:22 +00:00
Author
Owner

Replying to [davidsarah]comment:17:

Replying to warner:

The upload-side webapi server will still put large (>100kB) plaintext files on
disk (in an anonymous tempfile),

Perhaps it should be using [EncryptedTemporaryFile]source:src/allmydata/util/fileutil.py@4609#L118? That would be a new ticket, though.

This is #1176.

Replying to [davidsarah]comment:17: > Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/990#issuecomment-76175): > > The upload-side webapi server will still put large (>100kB) plaintext files on > > disk (in an anonymous tempfile), > > Perhaps it should be using [EncryptedTemporaryFile]source:src/allmydata/util/fileutil.py@4609#L118? That would be a new ticket, though. This is #1176.
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#990
No description provided.