Web gateway should avoid caching plaintext of downloads #990
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#990
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The web gateway will (on occasion) locally cache files in unencrypted form, such as handling ranged GET requests.
Now in normal use that's perfectly OK because web gateways are trusted with our unencrypted data and so having the data present in that form should be OK.
But my mental model of a gateway machine is that it's just a stateless waypoint which doesn't store anything local. If I have a setup where there's a gateway machine within my network serving several machines, I would expect it to not have any persistent memory of my data, and so when it comes time to replace the HD I don't need to worry about scrubbing the disk, at least for Tahoe's sake. (Let's assume swap has been dealt with.)
Therefore, I think the gateway should keep the cache files encrypted and only decrypt them on the fly as they're being sent to its clients. I'm not sure what the key should be, but it should be per-file and transient (derived from the cap/root hash/something else?) rather than some local state (which would defeat the purpose of encrypting in the first place).
Could possibly be handled as part of the downloader rewrite of #798?
There are three options that would fix this:
However 2. may require too much memory, and 3. may leave plaintext accessible if the gateway crashes.
Note that the SFTP and FTP frontends also use unencrypted temporary files to handle write requests (perhaps that should be split into a separate ticket, but this one will do for the time being).
#991 was a duplicate.
Web gateway should keep its caches encryptedto Web gateway should avoid caching plaintextReplying to davidsarah:
More precisely, too much address space on 32-bit machines. How much virtual memory would be used by this option, would not be a problem per se if we had unlimited address space and an OS with a well-designed virtual memory subsystem (but we don't).
I think the downloader's use of temp files is fairly unfortunate (downloading a 200MB file to satisfy a small range), so option 2 isn't out of the question. (There may be some value in caching for performance reasons, but that isn't why the downloader caches now.)
Secure deletion is pretty much impossible unless the filesystem and storage subsystem supports it. It certainly isn't sufficient to just overwrite the file and hope that hits all same blocks the original data hit. So I think 3 is out.
My favorite solution to this would be to implement #320 (add streaming (on-line) upload to HTTP interface) so that the gateway doesn't use the disk at all. #320 would offer great improvements, IMO, in performance and flexibility.
You have to give up on convergent encryption whenever you choose streaming upload (although I wonder if we could get some of it back by defining an encryption key from the secure hash of each segment in turn (including the added convergence secret) and using that key to encrypt the next segment..).
Replying to zooko:
However, in this case we're talking about cache files for downloading. When you do a byte-range GET request, you lose streaming downloads.
Web gateway should avoid caching plaintextto Web gateway should avoid caching plaintext of downloadsAs a workaround of sorts, you could set the tempdir to point to an encrypted partition (or, on Linux, a tmpfs (an in-memory filesystem) backed by encrypted swap). And using encrypted swap is generally desirable for a host of other reasons; it's a shame nobody but OpenBSD (IIRC) actually does it by default.
Obviously this is impractical in many situations and undesirable in others, but I thought it would be good to point it out, for the subset of users for whom it might be useful.
I think option (3) is a non-starter - dealing with journaling filesystems, SSDs, etc make this nearly impossible in the general case. You can probably get it to work for, say, 9 in 10 users, but leaving 1 in 10 silently vulnerable is not good, and I'd think the effort doing this would be much better spent on eliminating temp files where possible or encrypting them if it's not feasible.
Replying to [jsgf]comment:6:
Oh, right, downloads when you are doing a range-request. My favorite solution to that is #798 (improve random-access download to retrieve/decrypt less data) so that the web gateway downloads only the segments needed to satisfy your range request and just keeps them in RAM until it has satsified you.
Replying to [zooko]comment:9:
Yes, that's why I mentioned #798 in the report as possibly the best way of solving the problem ;)
yeah, the new downloader won't touch the disk at all. It fetches exactly the segment required to satisfy the first part of the range request, delivers the plaintext, then forgets about that segment and moves on to the next one.
The new SFTP implementation in #1037 uses an encrypted temp file for uploads. So that will be fixed as well in 1.7.
Replying to davidsarah:
Not "as well", because the new downloader has been deferred to 1.8.
The #798 new immutable downloader has landed, and does not touch the disk.
(the mutable downloader doesn't touch the disk either). The webapi interface
to it uses the correct
read(consumer,offset,length)
interface. Theupload-side webapi server will still put large (>100kB) plaintext files on
disk (in an anonymous tempfile), and I don't know what the FTP/SFTP code
does.
Is this ticket narrowly-scoped enough that we can now close it?
Replying to warner:
Perhaps it should be using [EncryptedTemporaryFile]source:src/allmydata/util/fileutil.py@4609#L118? That would be a new ticket, though.
Both the FTP and SFTP code only do full downloads. They sometimes use
EncryptedTemporaryFile
s, but don't store plaintext on disk (see changeset:05022dca36780b3b).Yes.
Replying to [davidsarah]comment:17:
This is #1176.