compression (e.g. to efficiently store sparse files) #1354

New Issue

zooko · 2011-02-03T07:15:36Z

zooko commented

2011-02-03 07:15:36 +00:00

I was just re-reading the Fossil/Venti tech docs and realized that their method of omitting trailing zeroes from each block might be adapted to Tahoe-LAFS to efficiently store sparse files. (Tahoe-LAFS would do this per Tahoe-LAFS "segment" where Venti did it per Venti "block".)

This is of course a very specific form of compression and might be subsumed by more general compression instead.

On the other hand compression -- either the specific "remove runs of zeroes" compression from Venti or more general compression -- is often a lose at this layer since the biggest files are almost always uncompressable (because they are already compressed).

Are sparse files common?

I was just re-reading [the Fossil/Venti tech docs](http://doc.cat-v.org/plan_9/4th_edition/papers/fossil/) and realized that their method of omitting trailing zeroes from each block might be adapted to Tahoe-LAFS to efficiently store sparse files. (Tahoe-LAFS would do this per Tahoe-LAFS "segment" where Venti did it per Venti "block".) This is of course a very specific form of compression and might be subsumed by more general compression instead. On the other hand compression -- either the specific "remove runs of zeroes" compression from Venti or more general compression -- is often a lose at this layer since the biggest files are almost always uncompressable (because they are already compressed). Are sparse files common?

zooko added the

labels 2011-02-03 07:15:36 +00:00

zooko added this to the undecided milestone 2011-02-03 07:15:36 +00:00

warner commented

2011-02-03 18:38:37 +00:00

Neat trick!

No, from what I've seen, sparse files are not very common. The only things that come to mind are coredumps and database files, and I suspect that most modern (cross-platform compatible) DB files are not sparse anymore.

It shouldn't be too hard to rig up a tool to test that claim: os.walk, use stat to count the number of blocks, compare it against st_size/blocksize, if they're too far off you've probably got a sparse file.

The question of compression is an interesting one. To retain our low alacrity, we'd want to compress each segment separately, which would then result in variable-sized segments, so we'd need a new share layout (with a start-of-each-block table). Compressing the whole file would let us squeeze it down further, of course, but you can't generally get random-access that way. There may be some clever trick wherein we might save a copy of the compressor's internal state between segments to allow both random access and good whole-file compression, but I'd be afraid of the complexity/fragility of that approach.

Neat trick! No, from what I've seen, sparse files are not very common. The only things that come to mind are coredumps and database files, and I suspect that most modern (cross-platform compatible) DB files are not sparse anymore. It shouldn't be too hard to rig up a tool to test that claim: `os.walk`, use `stat` to count the number of blocks, compare it against `st_size/blocksize`, if they're too far off you've probably got a sparse file. The question of compression is an interesting one. To retain our low alacrity, we'd want to compress each segment separately, which would then result in variable-sized segments, so we'd need a new share layout (with a start-of-each-block table). Compressing the whole file would let us squeeze it down further, of course, but you can't generally get random-access that way. There may be some clever trick wherein we might save a copy of the compressor's internal state between segments to allow both random access *and* good whole-file compression, but I'd be afraid of the complexity/fragility of that approach.

davidsarah commented

2011-02-04 04:52:41 +00:00

Note that this leaks information about how much of each segment is sparse. You'd have to do compression before encryption to avoid that (but then how would range requests know which segments to get?)

warner commented

2011-02-04 05:52:51 +00:00

Hm. You could compress data a chunk at a time, watching the output size until it grew above some min-segment-size threshold, then flush the compression stream and declare end-of-segment. Then start again with the remaining data, repeat. Now you've got a list of compressed segments and a table with the amount of plaintext that went into each one. You encrypt the table with the readcap and store it in each share. You also store (unencrypted) the table of ciphertext segment sizes. (unlike the uncompressed case, the plaintext-segment-size table will differ significantly from the ciphertext-segment-size table).

Alacrity would rise: you'd have to download the whole encrypted-segment-size table (which is O(filesize), although the multiplier is very small, something like 8 bytes per segment). There's probably a clever O(log(N)) scheme lurking in there somewhere, but I expect it'd involve adding roundtrips (you store multiple layers of offset tables: first you fetch the coarse one that tells you which parts of the next-finer-grained table you need to fetch, then the last table you fetch has actual segment offsets).

This scheme requires a compression library that either avoids deep pipelining or is willing to tell you how much more compressed output would be emitted if you did a flush() right now. I don't think most libraries have this property. You declare a segment to be finished as soon as you've emitted say 1MB of compressed data, then you tell the compressor to flush the pipeline and add whatever else it gives you to the segment. The concern is that you could wind up with a segment significantly greater than 1MB if the pipeline is deep.

Hm. You could compress data a chunk at a time, watching the output size until it grew above some min-segment-size threshold, then flush the compression stream and declare end-of-segment. Then start again with the remaining data, repeat. Now you've got a list of compressed segments and a table with the amount of plaintext that went into each one. You encrypt the table with the readcap and store it in each share. You also store (unencrypted) the table of ciphertext segment sizes. (unlike the uncompressed case, the plaintext-segment-size table will differ significantly from the ciphertext-segment-size table). Alacrity would rise: you'd have to download the whole encrypted-segment-size table (which is O(filesize), although the multiplier is very small, something like 8 bytes per segment). There's probably a clever O(log(N)) scheme lurking in there somewhere, but I expect it'd involve adding roundtrips (you store multiple layers of offset tables: first you fetch the coarse one that tells you which parts of the next-finer-grained table you need to fetch, then the last table you fetch has actual segment offsets). This scheme requires a compression library that either avoids deep pipelining or is willing to tell you how much more compressed output would be emitted if you did a flush() right now. I don't think most libraries have this property. You declare a segment to be finished as soon as you've emitted say 1MB of compressed data, then you tell the compressor to flush the pipeline and add whatever else it gives you to the segment. The concern is that you could wind up with a segment significantly greater than 1MB if the pipeline is deep.

davidsarah commented

2011-10-11 01:52:14 +00:00

#994 is about supporting compression at the WAPI layer, rather than the storage layer. That approach has the advantage that you also get compression between the gateway and HTTP client, if the HTTP client supports it.

Sign in to join this conversation.