mutable: implement MDMF #393
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
5 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#393
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Our so-called "Medium-sized Distributed Mutable Files" scheme should be implemented at some point. This will use the same protocol and format as the existing SDMF files, but with the following new features:
When we implement this, revisit the notion of 'container size' in the mutable file API.. it's a bit fuzzy right now, and will become more important with the random-access writes. It might be useful to allow the user-facing API to seek relative to the end of the file, in which case a storage API that references the container size might be valuable.
I think MDMF is going to require a separate AES key per segment, since it isn't safe to reuse a whole-file AES key when replacing individual segments. That means we'll need to store a table of salts, one per segment, and compute the per-segment AES key by hashing the whole-file key (derived from the writecap) with the salt. We'll need to replace the salt with a new random value each time we overwrite a segment, and we'll need to overwrite a whole segment at a time.
Tagging issues relevant to new cap protocol design.
Replying to warner:
Either that or use a per-segment IV for CTR mode.
(Don't you mean 'derived from the readcap'?)
Yes, you'd still need to do that with the per-segment IV.
Random-access writes will leak information about which parts of the file have been updated, to any attacker who sees the ciphertext before and after the update. We already have this issue for random-access reads (#879), but it seems more serious to me in the case of writes.
I posted a copy of my proposal on tahoe-dev. You can read it here.
There are a few design questions outstanding in there, and I'd like to get them dealt with ASAP, so I can hit the ground running and have a better chance of getting MDMF trunk-ready by August.
Since the existing code doesn't quite (AFAICT) know how to process multi-segment mutable files, MDMF introduces forward compatibility issues in that older versions of Tahoe-LAFS won't know how to read MDMF mutable files uploaded by newer versions of Tahoe-LAFS. When I was writing my proposal, I could think of two solutions, which I write about more thoroughly in the proposal:
I favor the first solution.
Caps, as they are now, encapsulate a combination of the authority given by the capability to its holder and some information about the resource designated by the cap (e.g., is it a file? is it a directory? can it be modified?). The difference between MDMF and SDMF is (unless I'm missing something) basically an implementation detail, and not really any of those things. A protocol version number seems like a much better way of signaling a change in implementation than a different cap format.
(this assumes that the existing mutable file caps will be sufficient to ensure us of the authenticity of mutable files using MDMF; I think that they are, but I may have missed something)
The first hurdle here is to decide what MDMF files will look like at rest on storage servers; we need to do this before we can start writing or reading MDMF files.
Storage servers provide a content-independent container for storing mutable share files; this means that the MutableShareFile in mutable.py should not need to be changed. Similarly, the idea of test vectors, write vectors, and read vectors to detect and react to uncoordinated writes applies to MDMF just as it did to SDMF. Given this, I think that we can leave storage servers alone.
The code that describes the SDMF layout is in [mutable/layout.py]source:src/allmydata/mutable/layout.py. As implemented, they are preceded by this header (byte-indexed):
which is followed by a verification key, a signature of everything before the offsets in the header, the share hash chain, the block hash tree, the share data, and the encrypted private key.
Unless there's something I'm missing, we can ditch the per-file salt in favor of per-segment salts. Like the per-file salt, the per segment salt will be 128 bits, unless there's a good argument for something different. I think it makes sense to designate a separate part of the share for the per-segment salts -- we could also store them interleaved into the shares, but that makes access patterns like:
(where o is a dictionary of offsets and blocksize is the size of blocks generated by the encoder, given the encoding parameters for the upload)
harder and less clear to read, since you would also need to take into account the size of the salt in addressing. The disadvantage of this approach is that we need to add a new offset to the header, but I don't see anything terribly wrong with that.
Then the MDMF on-server file format would have a header like this (byte-indexed again):
This would be followed by the verification key, a signature of everything before the offsets in the header, the share hash chain, the block hash tree, the per-segment salts, the share data, and the encrypted private key.
The per-segment salts would be written out with no separators, like the shares are, and you'd access them like:
(again where o is a dictionary of offsets)
Like the current per-file salt, they could be randomly generated, and updated on every write.
Thoughts? Am I missing anything here?
If we do the above, the salts aren't covered by the signature (either directly or otherwise), and there isn't even an unsigned way to detect whether the salts are corrupted.
A really simple way to allow readers to detect corruption would be to hash the concatenation of all of the salts, then stick this in the signed header. We could update a hasher as we processed segments without consuming more than constant memory. This means that to validate the authenticity of one salt, we'd need to download all of them, but salts will be tiny when compared to the data that they key, and we only need to download and validate them once for the whole file, so maybe that's not as big of a deal.
Then the new design would look something like:
"signed part" in the diagram above should extend to incorporate everything up to and including the data length, since that's how it is implemented. Also, after the header comes the verification key, signature, share hash chain (not signature hash chain; that's me not proofreading well enough), block hash tree, AES salts, share data, and private key, as before.
Attachment 393progress.dpatch (113466 bytes) added
progress on MDMF
I've attached my progress so far, which includes proxies to write and read the MDMF data format, and tests for those proxies. If you're interested in seeing what the format evolved into, check out line 1375 of the patch.
Next up: reworking the mutable uploader and downloader to use these proxies.
393status2.dpatch contains a snapshot of my work so far.
Attachment 393status2.dpatch (199182 bytes) added
393status3.dpatch contains more work on the servermap. The servermap now passes all of the existing tests (and all of my new tests) with the exception of four tests that test how Tahoe-LAFS reacts to hung servers. Once I have that worked out, I'll start work on the downloader.
Attachment 393status3.dpatch (249417 bytes) added
393status4.dpatch contains a revised servermap that passes the hung server tests. After checking out test coverage reports, I'm going to start working on the downloader next.
Attachment 393status4.dpatch (249883 bytes) added
393status5.dpatch contains a mostly-complete segmented downloader. It has only a few major regressions, and lacks tests. The new downloader is much prettier and more complete than the new uploader, which is much more of a hack and which I will probably rewrite after I'm done with the downloader.
Attachment 393status5.dpatch (346720 bytes) added
393status6.dpatch contains an improved segmented downloader. It passes all but 3 regression tests, and is essentially complete aside from that. It still needs tests, which I will write tomorrow.
Attachment 393status6.dpatch (362172 bytes) added
as we just discussed briefly on IRC, I'd like to avoid O(N) transfers, so I'd rather go with a merkle structure of salts rather than a flat hash. I think the best approach would be to put the salt in front of each encrypted block, and compute the merkle tree over the (salt+ciphertext) pairs. The one block-tree remains the same size, but is computed over slightly different data. Downloading a single block requires downloading the (salt+ciphertext) for that one block, plus the merkle hash chain to validate it, plus the (constant-length) top-level UEB-like structure (which includes the root hash of the merkle chain, and the version number), and the signature of that structure.
Using version=2 in the share to indicate MDMF sounds fine. readcaps/writecaps will remain the same as for SDMF: clients will need to fetch a share before they can discover whether the file is SDMF or MDMF.
One thing to keep in mind, which I learned while writing the #798 new immutable downloader, is to try and structure the fields so that downloaders can fetch everything in a single request. This means avoiding as many round-trip-inducing decisions as possible. Having an offset table makes the share nicely flexible, but it also means that you have to fetch it (and burn a roundtrip) before you can precisely fetch anything else. The new-downloader makes a best-guess about the offset table, and optimistically fetches both the table and the data it points to at the same time, with fallbacks in case it guessed wrong.
One down side of moving anything in the share format, or of increasing the size of the data blocks, is that these sorts of guesses are more likely to be wrong, or that there will be less overlap between the wrong guess and the real location of the data. The #798 downloader unconditionally fetches the first few kilobytes of the file, without being too precise about what it grabs, because that can save a roundtrip in some cases of bad guesses.
I'm not suggesting that we do a lot of changes to the mutable downloader, but it might be worth walking through what a #798-style mutable downloader would look like, and see how many roundtrips are required, and if there's something simple we can do to the share format now, it might give us room to improve performance in the future.
393status7.dpatch contains minor modifications to the downloader that cause the last two seemingly related tests to pass. For whatever reason, tests that test the bucket counter and lease crawler sporadically fail -- I didn't touch that part of the code, and I'm not sure why the parts of the code that I did touch would cause those failures, but I guess that's software. I'll look at that after I get some tests written. Also, the mutable slot proxy now allows callers to tell it how to batch requests. This is helpful in the case where we want to fetch a block, salt, and the parts of the block hash and share hash trees necessary to validate those things from the remote storage server, since it allows for us to do that in one remote read operation with four read vectors. There is also a test for this behavior.
Attachment 393status7.dpatch (376146 bytes) added
Replying to warner:
I like that; it's much nicer than having a separate salt block and a merkle tree over just the salt block.
I'm using version=1, since SDMF is version 0, IIRC.
Looking at the #798 downloader and figuring out how the #393 downloader could borrow some of its ideas would definitely be time well spent.
393status8.dpatch contains improvements to code coverage, and tests for MDMF files. Aside from the need to extend tests for various sorts of corruption to specifically test MDMF files, the downloader is done.
Note that I unrecorded and re-recorded my changes into a more coherent set of patches. If you've applied any of my earlier bundles, you will want to apply this bundle in a new tree to avoid issues caused by that.
On my system, I have noticed intermittent test failures in a few places that shouldn't be failing as a result of my modifications -- notably the bucket and lease crawlers (allmydata.test.test_storage.BucketCrawler and LeaseCrawler), and, more recently, the immutable test test code (allmydata.test.test_immutable.Test). These typically fail with DirtyReactorErrors. I'm not sure why anything I've done would cause these to fail, since, to my knowledge, none of the code I have modified relates to any of these areas, and I can't reliably duplicate the failures. Setting t.i.b.DelayedCall.debug = True doesn't reveal anything of interest. If you do happen to apply these patches, and do happen to run the tests, and notice that these tests either fail or don't fail for you, please post a note here -- I would appreciate it. There should be no failing tests at this point.
Attachment 393status8.dpatch (265988 bytes) added
393status9.dpatch adds the tests that I talked about yesterday -- there is now 98% test coverage in mutable/retrieve.py, with the only uncovered statements in the Retrieve class being related to looking for keywords in logging statements. Over the weekend, I'm going to change the salts to be stored alongside the share data, then I'll get to work finishing up the uploader, and looking at the checker/repairer code to see what will be involved in changing that.
Attachment 393status9.dpatch (281580 bytes) added
393status10.dpatch modifies the salts to be stored alongside the share data and integrity checked as part of the block hash tree, as suggested by Brian. I will be moving on to the uploader, checker, and repairer logic come Monday.
Attachment 393status10.dpatch (269591 bytes) added
393status11.dpatch modifies the checker/verifier and repairer code to work with MDMF files. This turned out to be very straightforward -- I tweaked the new downloader to know how to perform the verification, which eliminated a bunch of duplicate code from the verifier itself. I also added tests to ensure that checking, verifying, and repairing an MDMF file works as expected. I'll start polishing the uploader tomorrow.
Attachment 393status11.dpatch (298778 bytes) added
393status13.dpatch cleans up the uploader. It is now fairly free of special-case logic (that is, SDMF versus MDMF), and is easier to read and shorter as a result. There are still a couple of test failures in checking, though.
(393status12.dpatch got skipped because it was a bunch of very minor changes)
(also, kevan_ == kevan -- Trac has convinced itself that the latter account needs email address verification, but fails to send mail to allow that)
Attachment 393status13.dpatch (358752 bytes) added
393status14.dpatch revises the cleaned-up uploader to pass all of the tests, and makes some minor code coverage improvements. It needs more tests, but I'm going to wait until I get started on IVersion before I write them; otherwise, I'll have to rewrite them in a week anyway.
Attachment 393status14.dpatch (368324 bytes) added
393status15.dpatch lays the groundwork for not converting every mutable file uploaded to the web gateway into a string before processing it. This was a major memory pain point with SDMF, and is something that MDMF will address.
Attachment 393status15.dpatch (383447 bytes) added
393status16.dpatch alters various mutable filenode APIs to have callers use MutableFileHandles and MutableData objects instead of strings when uploading things to the grid. This allows the mutable file publishing code to use memory-saving measures that Nevow performs when dealing with a lot of data (writing it to a tempfile, if I understand correctly), and is modeled on the FileHandle and Data classes that immutable files use for the same purpose. Almost all tests pass right now; the SFTP frontend tests are failing, as I need to go in there and see what (if anything) needs to be done to fix that. After that, I'll address memory concerns that crop up when downloading huge files by cloning the download target pattern used by immutable files -- as a side effect, this will give users streaming downloads of mutable files.
Attachment 393status16.dpatch (441278 bytes) added
393status17.dpatch alters the SFTP frontend to work with yesterday's API changes. I'm going to start working on the streaming download later today/tomorrow, and then I'll be releasing a beta with the memory consumption pain point fixed. In testing, uploading a large (~700MiB) video file uses about the same amount of memory with the mutable upload code as it does with the immutable upload code, which is a good thing.
Attachment 393status17.dpatch (444604 bytes) added
Replying to kevan:
Because
self.consumer.get_file()
always returns the same file object that is set in the consumer's*init*
, the fact that it is called synchronously (rather than in a callback ford2
) does not cause a problem -- but that's dependent on the implementation ofOverwriteableFileConsumer
and is not immediately obvious. I would suggest changing it toCool -- and nice that this improvement should also be inherited by the SFTP frontend.
Will do -- thanks for that. :-)
393status18.dpatch addresses davidsarahs' comment, and starts laying the groundwork for the download changes, which will be based on the ideas and suggestions in #993 (I will post feedback to that ticket as I get farther along). The changes I made today make immutable files (both literal and CHK) conform to the interface changes; I've also added tests for the modifications. Next will be changing mutable files to also conform to the new interfaces.
Attachment 393status18.dpatch (462736 bytes) added
393status18.dpatch addresses davidsarahs' comment ...
(We like your placement of the apostrophe ;-)
MutableFileHandle
'sclose
method closes its underlying file, whereas [source:src/allmydata/immutable/upload.py@4313#L1298 FileHandle doesn't]. Is this difference intentional?MutableFileHandle
's [attachment:393status18.dpatch#L8651 read method] says that:but it just delegates to the underlying file's
read
method, which might not uphold this contract if the underlying file is not a regular file. For example, the Python docs say thatread
for a stdlib file object "is simply a wrapper for the underlyingfread()
C function", so it will return partial reads for stream-like files. (FileHandle
also has this problem, if it is one.)attachment:393status18.dpatch#L8640 :
The
os.SEEK_*
constants were added in Python 2.5, but we still support 2.4.x. UseReview needed for GSoC mid-term evaluations.
Brian pointed out on IRC today that writing a mutable file segment-by-segment leads to regressions relative to SDMF behavior. If a writer dies in the middle of a mutable file update (i.e.: before writing all of the blocks), we can have a situation where there are no recoverable versions of the mutable file on the grid. In SDMF, each share is written all at once, so we have the property that each storage server with a share for a particular mutable file has a complete share (barring disk errors, network problems, etc), which makes SDMF more resilient to writer death than MDMF as currently implemented.
Brian suggested that this could be resolved by making MDMF write shares all at once, as SDMF does. This preserves the semantics described above from SDMF, but means that modifying a mutable file would consume memory proportional to the size of the modification.
Most of the user-level tools (at least the ones I'm familiar with) only modify mutable files by completely overwriting them, so many/most user-level mutable file operations will still consume memory proportional to the size of the file, as with SDMF. We can still use (unless I'm missing another regression) the block-by-block downloader that I'm working on now, so MDMF would give users streaming and memory-friendly downloads of large files, just not uploads.
(part of MDMF is writing webapi functions to allow outside code to easily modify portions of mutable files, which would be easier on memory than modifying all of the mutable file, but many command-line tools have no easy way of taking advantage of such functionality)
I'd really like to keep the memory efficient uploads, but I can't see a way to do it without that regression. Any ideas, audience at home?
393status19.dpatch includes work on IVersion (ticket #993). IVersion is about 75% complete. Existing tests pass, though I need to write some additional tests to test more specific aspects that the existing tests miss. I also need to make some improvements to POLA as applied to MutableFileVersion. Finally, I need to implement the read call. I'm hoping to have all of that done by tomorrow, along with davidsarahs' suggestions.
Attachment 393status19.dpatch (494428 bytes) added
no patch today, but I have a rough and mostly complete take on IVersion implemented and working, and have the webapi and sftpd happily using the uniform download interface. As a pleasant side effect of this, downloads of MDMF mutable files through the webapi are now streaming, and at least fast enough to stream a FLAC file from shares stored on a remote storage server so that VLC can play it. They also use constant memory space -- a benefit which, unlike the constant memory space uploads, will probably stick around. I need to do some cleanup, write some tests, and fix some odds and ends tomorrow, then I'll update this ticket with a fresh patch, and post feedback to #993.
393status20.dpatch contributes a working and reasonably well-tested IVersion implementation, a streaming downloader, and some other things. I still have some cleanup to do and some tests to write, but it seems to work without issue. In this patch, I've altered the mutable publisher to upload files larger than 128 KiB as MDMF. For the moment, this will use constant memory, though that will go away soon as I change the uploader to write MDMF files in a single write. Random-access reads are implemented, but not used by anything yet. The webapi knows how to use the streaming downloader, though.
There are some failing tests in the webapi, because I removed some restrictions in the test code that incorrectly raised a
FileTooLargeError
when a test attempted to upload a mutable file larger than 3.5 MiB (an SDMF restriction that was eliminated a while ago). There are also some failing tests intest_mutable.py
, because I need to tweak theVersion
class to publish multiple versions of files to the grid.davidsarah: Could you look at my changes to
sftpd.py
, specifically at the area where you left a reference to #993 as a TODO, and see if you agree with them? I think I understand self.async and think that I am not appropriating it for something bad, but I may be wrong.Next up: More cleanup, then partial updating of MDMF files as they sit on the grid.
Attachment 393status20.dpatch (572762 bytes) added
Replying to kevan:
They look correct as far as I can see from the darcs patch -- I'll need to apply them later to check more thoroughly. Very minor issue: the logging for _read should probably be at the start of the callback.
(people who have been diligent in following this ticket -- if there are any such people -- will be disappointed to know that I will be at a security training seminar for the week of July 19th through July 23rd. I'll resume progress on this after that)
I've been thinking about how we want the partial-file upload and modification interface to look like. The one that I'm leaning toward looks something like this:
(iwritable.diff is this, essentially)
There were suggestions from (IIRC) Brian that we should make the interface encapsulate some notion of segments -- like
def set_segments({1: "some data", 2:"some other data"})
. I like the idea of having something like this, but I worry that an abstraction that makes callers worry about segments is either leaky or too low-level, while treating modifications as data to be appended at an offset in the file is more intuitive and less leaky. Further, it is much easier to map(offset, data)
semantics to SDMF in a meaningful and useful way. Thoughts?We may also want a way to delete segments from the middle of an MDMF file, which that function doesn't yet provide.
Attachment iwritable.diff.txt (1677 bytes) added
Unsetting review-needed. This patch is not ready to be reviewed and then applied to trunk. However, it would probably be a good help and encouragement to Kevan if anyone would look at his code, docs, or comments and give him your thoughts. :-)
No patch today, but I've been working out how a partial file update will work.
The mutable filenode code, if we use a (data, offset) change representation like the one I suggest above, will be responsible for mapping the new data into segments, and then replacing those segments. For segments which can be replaced in whole, this isn't difficult -- we just upload new segments in place of old segments, and then update the integrity checks to match the new data. But it is likely that the new data will not start or end on precise segment boundaries. In these cases, we will have to fetch two segments of data in order to do an update, so we can pad the existing data appropriately. In addition, we'll need parts of the block hash trees for the shares that we're updating, so that they can be regenerated to reflect the new content. At the moment, I'm thinking about fetching this information (boundary segments + block hash tree information) during the server map update step, then presenting it to the uploader. This yields a nice separation between parts of the code that upload files and parts of the code that download files. I haven't quite worked out all the details yet, but I plan to start working on it tomorrow and see what more ideas come from that.
I'm hoping to have the low level update functionality done by the end of this week, along with the restoration of the single-write uploads that we have with SDMF. After that, I can work out how we tie all of that into the WebAPI so client code can use it.
393status21.dpatch includes some initial work on the modifications to the servermap update that I described yesterday. It also changes MDMF publishing to write each share all at once, as SDMF currently does (and includes a test to verify that this is indeed the case).
Attachment 393status21.dpatch (633657 bytes) added
393status22.dpatch adds a mostly complete but subject to change implementation of the servermap update modifications I talked about earlier. It also fixes a sporadic test failure that I inadvertently introduced with yesterday's changes. Tomorrow, I'll work on altering the uploader to use the information provided by the servermap to perform in-place file updates.
Attachment 393status22.dpatch (641761 bytes) added
393status23.dpatch implements most of the rest of the changes needed for partial file update. Some of the knobs still need to be connected, and it doesn't handle the edge case where enough segments are being added to push the segment count over a power-of-two boundary, but it should hopefully work aside from that. To come are thorough tests, and edge case coverage. Then I'll hook it up to the WebAPI somehow. I'm really hoping I can have something more or less complete by next friday, so I can spend the last week of the official GSoC period reviewing my older code and tying up loose ends. We'll see how that goes, though.
Attachment 393status23.dpatch (665974 bytes) added
393status24.dpatch includes tests for the update behavior (none of which pass at the moment) and some modifications to fix minor bugs that were causing unrelated test failures.
Attachment 393status24.dpatch (679723 bytes) added
393status25.dpatch fixes many bugs in the update behavior; 3 of the 5 tests I wrote yesterday pass now.
Attachment 393status25.dpatch (687437 bytes) added
393status26.dpatch fixes more bugs in the update behavior; all of the tests I wrote on Monday now pass, though I'm going to write more tests tomorrow to see if some edge cases that have been bothering me are going to be problematic. After that, it'll be time to figure out a way to integrate update into the rest of Tahoe-LAFS; into the WebAPI, at least.
It also occurs to me that we could get rid of the re-encoding update case (where we need to re-encode and upload the file because the block hash tree grows, shifting everything else forward into where the block and salt data was before the update) by juggling the layout of MDMF files. If we put the block and salt data after the offsets and signature but before the integrity data, then we no longer have to re-upload the file when the block hash tree grows larger than it was originally, since there wouldn't be any share data to be upset beyond the block hash tree. The disadvantage of this approach is that reading the first 1000 or so bytes of an MDMF file will have a smaller chance of fetching the encrypted private key and verification key, which would make the servermap update step use more roundtrips, but this could be addressed by putting those (and whatever else the servermap update step is likely to want) before the blocks and salts. We'd probably be just fine if we stuck only the block and share hash trees after the block and salt data, since that gets re-written on an update anyway. Then we'd have O(change_size) updates in general without any real regression with what is there now.
If testing goes well tomorrow, I'll start working on that.
Attachment 393status26.dpatch (693869 bytes) added
So it turns out that the files are already implemented in a way that allows power-of-two updates like I described above. 393status27.dpatch fixes the update behavior to take advantage of this, includes some new tests, and fixes some bugs revealed by those new tests. The mutable filenode code can now do O(update size) updates to MDMF mutable files in general.
Attachment 393status27.dpatch (695966 bytes) added
393status38.dpatch hooks the webapi up to the update behavior by adding an optional offset parameter to PUT requests to a filecap. If the filecap is mutable, it will append the content of the HTTP request to the file at the given offset. I'm not sure if this is the best way to go about this, and I'm not sure if it makes sense to add a corresponding option to the corresponding POST handler (which I think is used by the WUI, which might not have much use for the ability to do this). It also adds tests for these behaviors.
I'm going to spend next week tidying up the code and polishing up some odds and ends. Before I start working on that, I'll re-record my patchset into something more coherent for review purposes. Once I've done that, I'll set the review-needed keyword, so people can hopefully review my changes. I also need to sync these patches with the trunk so that they apply cleanly.
Attachment 393status28.dpatch (702345 bytes) added
393status29.dpatch fixes all of the failing tests, so all unit tests should pass now. It syncs up my changes with trunk. It also represents a fairly substantial reorganization of the patch set, which will (hopefully!) make it a lot easier to review. I'm going to be reviewing it myself and fixing a few rough edges this week, but I'm going to set the review-needed keyword so that others can start reviewing it too.
Rough areas that I know about:
_loop
in the mutable file downloader isn't hooked up to anything. I took it out of the path of execution when we were still doing multiple-write uploads, but it makes sense to use it again now that we're back to single-write uploads, as in SDMF, since it will allow us to place more shares than we would otherwise in certain circumstances. I'm going to hook it back up tomorrow.Attachment 393status29.dpatch (523537 bytes) added
Replying to kevan:
We want to be able to test uploading of MDMF files, and enable their use on grids where all clients are known to be of a version that understands them, without breaking backward compatibility for grids with a mix of client versions.
I can think of several ways to do that:
tahoe.cfg
.tahoe.cfg
.Option 1. is okay for testing, but then there's no obvious way to configure the WUI and CLI (or other frontends) to say that you only have newer clients and so want to use MDMF. That problem is solved by either 2. or 3.; the choice between them depends on whether we want to be able to specify this on a per-file basis.
393status30.dpatch deals with a few things.
_loop
turned out not to do anything that wasn't done elsewhere (which explains why the tests passed with it removed), so I just removed it.Like the patch yesterday, it should apply cleanly and the tests should run cleanly (i.e.: they should pass).
I'll be working on the compatibility issues tomorrow.
Attachment 393status30.dpatch (528266 bytes) added
393status31.dpatch introduces the
_turn_barrier
style fix of #237 into the deferred chains in the publisher, downloader, and servermap updater to avoid running into recursion limit errors. It also removes the size-based MDMF vs. SDMF criterion, and lays the groundwork for hooking that instead up to the webapi, something I'll be looking at tomorrow. I like option 3 in davidsarahs' comment, since it is fairly simple, but also flexible. I'm hoping that I'll have time to alter the WUI and the CLI to support that feature in an intelligent way after I make that change, but we'll see.Attachment 393status31.dpatch (530707 bytes) added
393status32.dpatch teaches the webapi how to upload new files as MDMF and SDMF, based on a new
mutable-type
parameter. Thinking about it, I'm fairly sure that I don't like the way I chose to do that for thePOST
verb, so I'll probably change that tomorrow. I also introduced amutable.format
configuration knob intahoe.cfg
, which can be set tosdmf
ormdmf
and configures the default behavior of new mutable file creation as suggested by davidsarah in comment:58. Once I rework thePOST
verb, or convince myself that the implementation won't make the WUI-side part of this as clunky as I think it will now, I'll start working on hooking up the WUI and the CLI to this behavior. I'm really hoping to have that done tomorrow. I'll then work on documentation over the weekend.Attachment 393status32.dpatch (549363 bytes) added
Kevan: by the way you are doing a great job of keeping us updated on the progress on this work. Thank you very much!
Replying to zooko:
I'm glad you find the obsessive updates helpful. :-) Talking about what I do and posting frequent updates helps me make the most of my time, which is why I try to update this ticket frequently.
393status33.dpatch hooks the WUI and the CLI up to the webapi behavior that I worked on earlier. You can now upload MDMF mutable files on the root page of the WUI, the directory page of the WUI, and using the
tahoe put
command; the first two of those should be pretty straightforward despite my brilliant web design skills, while the third is documented intahoe put --help
. On my to-do list for the weekend are documentation, code review, making it so that the default fromtahoe.cfg
is also the default on the WUI MDMF radio buttons, and making sure that I haven't introduced any regressions in the JSON representation of things from the webapi.Attachment 393status33.dpatch (557631 bytes) added
393status34.dpatch documents the new webapi functions, the new
tahoe.cfg
parameter, fixes a few annoyances in the JSON representation of things, improves code coverage, improves the WUI as I wanted to yesterday, and adds some other tests that I wanted to add. I need to write up a better spec of MDMF at some point, need to distill my work on #993 into a coherent comment on that ticket, and need to see what other tickets this patch may have solved, but there aren't any major TODOs that I'm aware of left in the code, so it should be pretty safe to test.Attachment 393status34.dpatch (568708 bytes) added
393status35.dpatch fixes some little things revealed by pyflakes, and removes some dead code. It is functionally equivalent to the previous patch.
Attachment 393status35.dpatch (573586 bytes) added
I'm on vacation (and reconditioning my new old car to not break down on said vacation) until early September, so if you leave a review comment, I probably won't get to it until then. :-)
#393 now has a branch; you can browse it through Trac, or check it out with darcs:
I've pushed a slightly updated version of 393status35.dpatch to this ticket to that branch. It is functionally identical, save for that it fixes a few places where the original patchset had bitrotted.
I'm bringing this up-to-date for trunk and (finally!) reviewing it. Amazing work! I'm building up a list of minor cleanups that can wait until after we land it, but one item came up that I wanted to ask about.
The big new webapi method is a PUT/POST operation to a mutable file that includes an
offset=X
argument, allowing you to modify the middle of an existing mutable file. But it looks like negative offsets are used to mean "do the old-style replace-the-whole-file behavior". Should we reserve negative offsets for something more useful in the future (by having them raise an error for now)? I'm thinking that it'd be nice to have an easy API to append to a mutable file, and right now it seems like the only way to accomplish that is to first do at=json
query to learn the size of the file, then do a PUT withoffset=size
. I can imagine wanting to express an append withoffset=-1
oroffset=append
or something like that, and I don't want to teach webapi users that usingoffset=-X
means "replace", making it unsafe for us to reclaim that portion of the argument space later.thoughts?
Another issue: in
MutableFileVersion._modify_once
, the old code repeated the servermap update step at the beginning of each loop, whereas the new code appears to rely upon a mapupdate being done earlier (and doesn't repeat it). The original reason for re-updating was to handle the UCWE-triggered retry case: if we're racing with somebody else to modify the file, and detected contention, then we need to update our servermap to properly update on top of what somebody else finished writing. If we don't update our servermap, we'll be using a stale one, and we'll probably keep hitting UCWE forever (or until the retry counter hits the limit).There might be some other way your code deals with this that I'm not seeing, though.. for example, in some sense, if we observe UCWE, then we should really create a new
MutableFileVersion
instance for the new state of the file, and then delegate to.modify
on it. Except we need to impose the retry limit (maybe add aretries_permitted=
argument to.modify
, and when we delegate, we pass in a value that's one lower than the one we were given? limited recursion?).Attachment 393status36.dpatch (573738 bytes) added
updated to current trunk, conflicts and test failures fixed
I got about halfway through the review this weekend. Nice work! I've uploaded the merge work that I did, that -36 dpatch should apply cleanly to trunk and pass all tests. I have a bunch of tiny cleanup issues that can wait until after we land this. Apart from the two things mentioned above, I haven't seen any major blockers. Really, this is an excellent patch.
I can't quite figure out how the MDMF salts are handled, though. I can see that they're stored right in front of each data block, but the code (at least layout.py, and the parts of publish.py that I've read so far) don't make it clear how their validated. There's talk of a "salt hash tree", but I can't find any code that manages it. Any hints?
Replying to warner:
They're validated alongside the block text that they correspond to. The leaves of the block hash tree are of the form
hash(salt + block)
. See around line 854 of mutable/retrieve.py to see that in action.(the idea for that is due to your comment:21. Before that, there was a salt hash tree. I saw some references to it when I was looking at the code to answer your question, which are definitely confusing and should be changed.)
Replying to warner:
Agreed; we shouldn't deny ourselves that part of the argument space, especially not for the sake of making the fairly unintuitive "
offset=-1
means replace the whole thing" behavior work. How aboutoffset=replace
for the current whole file replacement behavior,offset >= 0
to replace part of the file, andoffset < 0
throws an exception for now? We'll say thatoffset=replace
by default (i.e.: if the caller doesn't specify their own offset) so that callers get the whole file replacement behavior that they're accustomed to by default.Ah, ok, so the block-hash-tree is really a (salt+block)-hash-tree? That sounds fine. Yeah, if you could scan through for references to block-hash-tree or salt-hash-tree and make everything consistent, that'll help future readers. Maybe use
saltandblock_hash_tree
for an attribute name.That
offset=
behavior sounds great!Replying to [kevan]comment:74:
Why have two ways to express that? I'd suggest just requiring the offset parameter to be omitted to get whole-file replacement.
Replying to warner:
Good catch. My code doesn't do anything tricky there, so it needs the servermap update step. I'll post a patch that includes that modification.
Attachment 393status37.dpatch (583394 bytes) added
Replying to [davidsarah]comment:76:
I worry about inconsistent behavior if we later allow
offset=append
oroffset=some-other-thing-that-isn't-really-an-offset-but-a-behavior
, since replacement would then behave differently than other operations that would otherwise be treated similarly by the API. That's not really well-founded, though, since we can always start to supportoffset=replace
at that point (and in any case would want to think carefully about allowingoffset
to take strings that describe behaviors and have little to do with offsets as values). At the moment, what you suggest is a consistency improvement, and gives us more freedom in the future to add other parameters for high-level behaviors instead of overloading theoffset
parameter. Good suggestion; I'll work it into a patch. Thanks.393status37.dpatch changes the mutable file modification code to do a servermap update before attempting (or retrying) file updates, increasing the robustness of the process in the presence of uncoordinated writes, per comment:66038. It also changes the webapi as discussed in comment:66037 and comment:74, but I find myself agreeing with davidsarahs' comment:76 after replying, so that behavior will be changed in a future patch.
I noticed another curiosity, in the ServermapUpdater(update_range=) code (which fetches the segments that live on the two ends of the span being modified, which are only partially changed and thus need to copy the old contents). There are lots of situations which cause
start_segment
andend_segment
to be the same, such as modifying a small range in the middle of a segment. In those cases, it looks likeServermapupdater._got_results
will request the same data twice, and the cache inMDMFSlotReadProxy
doesn't help unless the second request arrives after the first has been resolved. So I think we're fetching twice as much data as we need to.Also, if the span we're replacing exactly aligns with the start or end of a segment (which is very common if we're modifying the start or end of the file), then we don't need to fetch the old segment at all. I think (but haven't confirmed) that the current code always fetches two segments, even in the aligned case.
If so, then after we land this stuff, let's do a cleanup pass. I think the
update_range=
code could changed from thinking about start/end segments to just getting a set of segnums, of length 0 (if the span aligns on both ends), length 1 (if only one end is aligned, or if the span falls within a single segment) or length 2 (if the two ends of the span fall in different segments). Then thefetch_update_data
code can make exactly as manyget_block_and_salt
calls as necessary, instead of always making 2.Attachment 393status38.dpatch (624793 bytes) added
393status38.dpatch alters the
offset
behavior to be similar to that described in comment:76 and comment:78 and removes misleading comments about the salt hash tree from the mutable file layout code.Replying to warner:
I think you're right on that; I don't remember designing for either of those scenarios (I was apparently firmly in the two-segment mindset when I wrote that), and a quick glance at the code doesn't reveal that they're given special consideration.
Sounds good.
Attachment fencepost-test.dpatch (13108 bytes) added
exercise fencepost-update bug
The patch I just attached is against -36, I think, but should still apply. Something funny is going on in the update() method when it decides which segments to modify: I'm seeing file corruption when small updates are made near the start of the second or thing segment. The test should trigger the problem I'm seeing, but you can also hit it manually by using 'curl' to PUT a few bytes into a large file (I'd suggest 3 or more segments) with offset=128*1024+1 . What I'm seeing is that the bytes wind up at the start of the file instead.
In the patch for tahoe put, throw
twisted.python.usage.UsageError
to indicate that themutable-type
argument is wrong. (This will print a usage string as well as the error message.)Attachment 393status39.dpatch (633428 bytes) added
fix fencepost error, improve tahoe put option parsing
393status39.dpatch fixes the fencepost error exposed in Brian's tests (which was the publisher not accounting for offsets that lie on segment boundaries when interpreting the results of
div_ceil
combined with the filenode code assuming that the segment size would be exactlyDEFAULT_MAX_SEGMENT_SIZE
, instead of the actual segment size, which is possibly a little larger so that it is a multiple ofk
). I also moved themutable-type
check to thePutOptions
class (since the option parser seems like a good place for option parsing and validation :-) and caused invalid errors to throw ausage.UsageError
, as suggested by davidsarah in comment:66051.I found another bug in the handling of
offset=0
which, when used to replace the first few bytes of a file through the webapi, caused the file to be truncated instead. Here's a test:In web/filenode.py I'd suggest using "None" instead of "False" to mean "no offset= argument was provided", and I'd strongly recommend using "
x is None
" instead of e.g. "x == None
". I'm pretty sure the bug hit by that test_web.py test is the result of comparing "0 == False
" (which is true, whereas "0 is False
" is not). The test_mutable code passes, but the test_web code fails, so I think the problem is in the webapi layer and not in theMutableFileNode
layer.Also, the code in
Publish.setup_encoding_parameters
which calculatesself.starting_segment
.. could you replace thatdiv_ceil
+corrections with a simple integer divide?self.starting_segment = offset // segmentsize
? I think that's equivalent to what it's currently trying to do, and probably easier to maintain.Done and done. I replaced what was essentially the same calculation in
mutable/filenode.py
with integer division as well. In the case wherediv_ceil
is appropriate, I added a comment explaining why for future maintainability. I also removed a comment about the power-of-two boundary that wasn't relevant anymore; these changes (along with your tests) are in attachment:393status40.dpatch.Attachment 393status40.dpatch (641647 bytes) added
correct poor choice of internal offset representation, add tests, clarify some calculations
Hm, so I'd like to jump in and look at the "latest version" of TransformingUploadable because Brian said he was having trouble digesting it. If we were still maintaining the source:ticket393 branch then I could
darcs get
that and read the current version. Since we're apparently not, then do I need to apply all the patches attached to this ticket in order to see what Brian is seeing? Or apply only some of them? Or get a git (ugh) branch from Brian? :-)The easiest way (where easiest is defined to be "the fewest steps" :-) is probably to pull a fresh copy of trunk and apply attachment:393status40.dpatch. There weren't any conflicts when I made that patch, and it's the latest copy of the code.
Cool, thanks!
That was easy.
Attachment more-tests.diff.patch (19966 bytes) added
Here is a patch that Brian and I worked on at PyCon to do tests of boundary conditions in mutable filenode
update()
. I think the boundary conditions are not thoroughly exhausted by these tests and Brian and I were thinking about other boundary conditions to test, but at the moment I don't remember what they were. Maybe they had to do with whether the end of the new file fell on or near a segment boundary and whether the end of the old file fell on or near a segment boundary.This also contains a whole bunch of incomplete hacks, debug printouts, and "notes to self". I intend to clean this up later tonight or tomorrow, but just in case someone else wants it, here it is.
Attachment 393status41.dpatch (717051 bytes) added
begin work on layout changes, add MDMF caps, tweak nodemaker and filenode to use MDMF caps
393status41.dpatch contains the current state of my work on the layout changes, cap changes, and other improvements we've come up with. In particular:
There are a few stub tests that I wrote as a sort of executable design decision todo list; these fail at the moment. We still need to decide on whether it makes sense to make MDMF directory caps, and to write more tests to ensure that other aspects of Tahoe-LAFS interact in a sane way with MDMF caps (particularly the WUI, webapi, and cli). We also need to write some tests to ensure that pausing an MDMF downloader works correctly, and to ensure interoperability with older clients, and to examine some edge cases and questionable error handling in the uploader and downloader, as noted by Brian, Zooko, and other reviewers.
393status42.dpatch adds several webapi and WUI tests for MDMF caps (not all of which pass at the moment). It also changes some CLI tests to test for MDMF caps. It applies cleanly against trunk as of about 20 minutes ago. I still need to assess the various CLI commands to find other places that need testing. Then I'll work on making all of the failing tests pass.
Attachment 393status42.dpatch (738192 bytes) added
add tests for MDMF caps
Attachment 393status43.dpatch (761142 bytes) added
add more tests, fix failing tests, fix broken pausing in mutable downloader
393status43.dpatch includes yet more tests for MDMF caps. Most tests (with the exception of a test that measures the number of writes performed by MDMF operations, which Brian is looking at, and three tests that I still need to write) pass. I've also fixed an error in the mutable downloader that Brian pointed out: without the fix, pausing a download would break the downloader. There's a test for that, also.
393status44.dpatch includes still more tests. Functional changes included with this patch include some internal tweaks to make MDMF caps come with downloader hints by default (so our future smart downloader will have a large number of MDMF caps to handle intelligently when it gets written :-), and a core implementation of MDMF directories (which turned out to be very easy to add). Next will come even more tests, as well as frontends that know how to create MDMF directories when asked to do so.
Attachment 393status44.dpatch (868284 bytes) added
add more tests; append hints to caps when we have hints to give; initial work on MDMF directories
Attachment 393status45.dpatch (915267 bytes) added
teach webapi, WUI, and CLI how to deal with MDMF caps
Someone needs to review this for Tahoe-LAFS v1.9.0! This is one of the major features that would make 1.9 shine. We think that the deadline for getting patches into trunk for v1.9.0 is July 21 -- Thursday. Or maybe we should move the deadline to the following weekend, say Sunday, July 24. When Brian returns from Paris (tonight or tomorrow), he will hopefully weigh in on that.
There's a trivial conflict with trunk: the line that starts "
if self['mutable-type']
" inclass [MakeDirectoryOptions](wiki/MakeDirectoryOptions)
ofscripts/cli.py
should be part of theparseArgs
method, not thegetSynopsis
method.393status46.dpatch sets tighter bounds on signature key, verification key, share hash chain and signature size, fixes a bug I found in situations where
n = k
(and adds a test for that bug), resolves the conflict with trunk identified by davidsarah, fixes a test from #1304 that breaks when run against MDMF, and adds tests to ensure that the new downloader can handle old SDMF shares. I've also performed other interoperability testing between the old and new mutable file code, confirming that the old downloader can read SDMF files written by the new publisher, that the new downloader can read files published by the new publisher and modified by the old code, and that the old downloader can read files modified by the new publisher.warner: During one of the first phonecalls, you mentioned that you were worried that the update process could fetch more data than necessary to update a segment near the end of the file, I think. My notes on that issue are sketchy; could you elaborate on your concerns there, if you remember what they are?
The signature, verification key, and signing key bounds are kind of iffy, in that they come from simulation (repeating the key generation and signature operations as the publisher would perform them thousands of times and observing the size of the output) rather than something more concrete (e.g., a property of the cryptosystem, or something in cryptopp that we don't think will change). Unfortunately, I haven't had time to dig deeply into the issue to find more reassuring explanations for those bounds yet. I'll probably have some time over the next week for that, but if someone else wants to give it a try in the meantime, feel free.
Attachment 393status46.dpatch (981456 bytes) added
add tighter bounds in MDMF shares, resolve bugs
I'm going to start looking at this, but I suspect it will take me a while to get enough context for a proper review. Fortunately zooko is sitting next to me, so I'll try to finish this before he gets away.
393status47.dpatch is equivalent to attachment:393status46.dpatch, but I've refactored the patches to be a little more coherent, so 393status47.dpatch should hopefully be easier to review.
Attachment 393status47.dpatch (804689 bytes) added
rework #393 patches to be more comprehensible
Thanks for refactoring the patches, that's really helpful.
Applying 393status47.dpatch to trunk (r5103) gives these test failures:
Attachment fix-test-failures-393.darcs.patch (806633 bytes) added
Fix for test failures in test_immutable.py caused by 393status47.dpatch
The last patch in fix-test-failures-393.darcs.patch fixes the failures in
test_immutable
, but not the one intest_mutable.Filenode.test_mdmf_write_count
. Maybe the tests touched by that patch should be combined so that they're not doing the setup multiple times, for speed (they're not slow tests, but every little helps with #20).I applied 393status47.dpatch and fix-test-failures-393.darcs.patch and now have only
test_mutable.Filenode.test_mdmf_write_count
as the only error/failure with the same error as comment:66069.Here's my analysis so far: It appears as if this one unittest is relying on a unittest-specific interface to
allmydata.storage_client.StorageFarmBroker
which is not present in the trunk state with the above patches. Specifically,StorageFarmBroker
does provide a unittest-specific interface with methods calledtest_add_rref
andtest_add_server
. These modify a member calledservers
.I suspect that the version of
StorageFarmBroker
which the test was written against had separateservers
andtest_servers
members, whereas the version in this repo now only hasservers
.I attempted to change the attribute
test_servers
toservers
in the test, and it proceeds to iterate over the values which are instances ofNativeStorageServer
in this loop:However, the test then fails when evaluating the assertion because the
NativeStorageServer
instances do not have.queries
members.The interface changes between the test expectation and the actual code are larger than my guess-and-check approach accommodates for now.
I'm going to sleep on this. My next step will be to see if I can get darcs to show me which patches have modified the
storage_client.py
APIs away from the unittest code in order to see how to bring the test up-to-date.I thought the test failures might have been introduced by the patch refactoring, but no, they were present in 393status46.dpatch.
It looks like changeset:0605c77f08fb4b78 removed
common.ShareManglingMixin
. The broken tests relied on a side effect ofcommon.ShareManglingMixin
's setup functionality (theself.n
variable) to work, which is why they broke when it was removed.I've created a branch hosted on http://tahoe-lafs.org named
ticket393-MDMF
. With this you can browse the patches through trac and browse builds of that branch. To trigger builds of that branch, enterticket393-MDMF
as the "branch name" on the "Force Build" form.One thing I've learned already is that the current ticket393-MDMF branch isn't pyflakes-clean.
393status48.dpatch fixes the pyflakes warnings and the mdmf write count test.
Attachment 393status48.dpatch (810953 bytes) added
fix broken test, fix pyflakes warnings
I just applied 393status48.dpatch successfully to trunk. Running
python ./setup.py test
succeeds, ending with this output:I'm about to embark on an international flight, and the state of my internet access tomorrow will be unknown. Therefore, if anyone wants to increase the chance of MDMF successfully landing in the 1.9 release, feel free to review this ticket in parallel!
Here is what I'd like to accomplish on the flight:
Learn the format and cryptographic derivations of the MDMF cap. There is a good deal of discussion in these comments about the share serialization, but not the MDMF cap. I'll first check for documentation in the repository or the wiki for this, otherwise I'll revert to code.
Think carefully about the share file format. I have only skimmed over the proposed layout without reviewing which fields are authenticated and how, as well as what the authentication tells the reader of a share file.
Brainstorm how I would implement the protocol, given what I know about the MDMF cap and the share file format. Jot down some notes. I'll use this as a reference point in the code review to notice divergences between my brain-stormed implementation and the real one.
Read the actual code, noticing any confusing bits.
After I land, I'll attempt to get internet access to dump my feedback, especially an approval for release or please-fix-X comment.
I'm reassigning this ticket to Zooko. I still plan to work on this review, but I want to explicitly get more review coverage on this feature, both because of my schedule/connectivity uncertainty and "more eyes, shallower bugs".
I applied 393status48.dpatch to source:ticket393-MDMF, but there were conflicts. Apparently Kevan re-recorded one of the patches and changed nothing:
I created a new branch named
ticket393-MDMF-2
. [Browse the source]source:ticket393-MDMF-2; view the builds. It now passes pyflakes and unit tests!Oh, actually there is a unit test failure on the Windows buildslave. Looks like a URI in the test is a Python string instead of a Python unicode. I guess our caps should always be Python unicode objects, even though they don't use non-ASCII characters. (Or actually I guess they could use non-ASCII characters in the path part after a dir cap. But that doesn't matter -- they should always be unicode objects and never strs regardless of what characters they use.)
I wonder why this happens only on Windows?
Replying to zooko:
Fixed in [5147/ticket393-MDMF-2].
No, they are UTF-8
str
objects (when not decoded into a*URI
object).Versions of the
simplejson
library differ in whethersimplejson.loads
always usesunicode
objects, or sometimes usesunicode
and sometimesstr
(the latter for strings that are representable as ASCII). The version installed on the Windows buildslave returnsunicode
more often. Using [to_str]source:src/allmydata/util/encodingutil.py@5073#L125 on each string in the output makes the behaviour consistent. (Note that this isn't necessary for keys, only values.)Applying status48 to current trunk, fixing up the conflicts that
resulted (including deleting an extra call to
d.addCallback(_created)
), then attempting a simple WUI MDMFupload, I got an exception. After adding a line to dump the
self._offsets
table, here's what it looked like:It looks like the pubkey is overrunning the subsequent share data,
as if it was larger than the space originally reserved for it.
Ah. The actual problem is in the code which pre-allocates space in the
share for the various fields. In particular, layout.py line 554 is
counting hashes, not bytes:
This is meant to reserve space for a worst-case N=256 share hash chain,
which would require 8 nodes (the root is stored separately, so we don't
actually need the +1).
In my 3-of-10 upload, the hash chain is length 4 (there are 16 leaves on
the bottom layer {one per share}, then 8, then 4, then 2). Each node is
stored with a two-byte node number and a 32-byte SHA256d output. So the
hash chain is a total of 136 bytes long.
Huh, looking more carefully at that equation, there are other problems:
math.log
is base e, not base 2HASH_SIZE*
should go outside theint()
, not inside(HASH_SIZE+2)*
+1
is unnecessary because the roothash is stored elsewhereceil()
that'd be necessary if the 256 weren't hard-coded,(this sort of problem reinforces my belief that it's better to add up
the sizes of the actual fields than to attempt to predict them or
calculate maximum values. Kevan pointed out that he might have been
advised to do it this way to enable parallelism between RSA key
generation and encryption/FEC/upload of share data, but I think I'd have
advised a more conservative approach)
So I think this ought to be:
at least that gives me (32+2)*8, which feels right.
Kevan: could you update this (and add a new patch with both this and the
merge conflicts resolved)?
Now, the real question is why the unit tests didn't catch this.
There's a little bit of slack between the estimated maximums and the
actual RSA field sizes, but only about 9 bytes, not enough to store an
entire hash. So at first glace it seems like this could only have worked
when N=1 (i.e. the share_hash_chain was empty).
Oh, unit tests tend to use artifically small RSA keys (522 bits) for speed. That might create enough slack (in the encprivkey, pubkey, and signature) to allow a short share_hash_chain to still fit.. maybe that's why the unit tests were passing.
I think the high-level
test_system
uses full-sized keys.. maybe we need an MDMF file-creation step in there. (in general, we should have a single high-level non-detailed non-exhaustive test that behaves as much as possible like normal human-initiated operations).Replying to warner:
Yes -- many tests fail if the test key size is 2048 bits (but all tests still pass if it is 1024 bits). I'll attach a patch that makes the test key size easily changeable.
In [5171/ticket393-MDMF-2]:
In [5172/ticket393-MDMF-2]:
In [5173/ticket393-MDMF-2]:
Replying to [davidsarah]comment:118:
Oops, I said I was going to attach it but I actually committed it to the branch. Sorry about that.
Although it's not obvious from the trac view of the changeset, [5171/ticket393-MDMF-2] actually uses
darcs replace
.Notes on [5124/ticket393-MDMF-2]:
self._paused_deferred = None
is a typo, it should be_pause_deferred
.self._paused
seems to beTrue
wheneverself._pause_deferred is not None
, except that there is one place in thedownload
method that setsself._paused
toFalse
without settingself._pause_deferred
toNone
.Unusued
->Unused
at retrieve.py line 305set
self._block_hash_trees
toNone
indecode()
, and make its initialization in_setup_encoding_parameters
conditional onif self._block_hash_trees is not None:
.My preferred whitespace conventions:
(this makes it harder to mistake the code as part of the comment, and makes the comment
A couple of possible refactorings in [5124/ticket393-MDMF-2] (these don't need to be done for the beta):
Rather than using pairs such as
(success, message)
or(seqnum, root_hash, IV, segsize, datalength, k, N, known_prefix, offsets_tuple)
, it might be clearer to use struct-like objects:(This might be an unpleasant reminder of Java, but actually, making the code that uses a struct less ugly and easier to read is more important than the overhead of the struct-like class definition, IME.)
_remove_reader
inRetrieve
is only called from_mark_bad_share
, so it could be inlined. There's a comment about this, but it's asking about whether there might be a reason to use_remove_reader
independently in future. My advice would be: don't worry about trying to predict the future, just simplify the current code quite aggressively.Replying to davidsarah:
Good observations.
Indeed. Both this and the similar end segment calculation are wrong.
This code gets exercised for partial reads, and doesn't behave correctly for reads that start at an offset that is evenly divisible by the segment size. For example, suppose that the segment size is 131073, and that we want to read starting at offset 131073. Since bytes are zero-indexed, we actually want to read the 131074th byte, which is the first byte in the second segment (or segment 1, since segments are also zero-indexed).
div_ceil(131073, 131073) = 1
, but the unconditionalstart -= 1
means that we'll actually start reading from segment 0, which is wrong. Floor division handles this case correctly.The end segment calculation is wrong in a similar way, but manages to work anyway because of a different bug. Specifically, it expects to read one more byte of data than is actually necessary to fulfill the read. For example, suppose that we want to read 5 bytes starting at offset 131073. Then we'll read 131073, 131074, 131075, 131076, and 131077, or the range 131073-131077 inclusive. The code doesn't calculate this range correctly, because it uses
offset + read_size
to calculate the endpoint; in this case, it produces the range 131073 - 131078.I've attached a patch that adds tests for both of these issues. The case where the read starts on a segment boundary fails without modification on the ticket393-MDMF-2 branch. To see the test where a read ends on a segment boundary fail, you need to add
end_data -= 1
afterend_data
is first calculated around line 400 ofretrieve.py
to fix the off-by-one error there.Both tests pass with floor division. I've also fixed similar equations for partial updates of MDMF files in
filenode.py
; these already used floor division, but still read more data than necessary to perform an update.Fixed in the updated patch.
Noted; I'll start fixing this.
Replying to warner:
Good catch; thanks.
Is there a compelling reason to keep posting patches that encompass the entire MDMF feature instead of using the ticket393-MDMF-2 branch? I find the latter much easier to work with.
Also, would it be worthwhile to make it so the buildbots can easily run tests with 2048-bit keys? Or to add a target to the Makefile to do that? Might make these sorts of issues easier to find in the future.
I'll attach a patch against ticket393-MDMF-2 that addresses davidsarahs' review comments (it looks like warner and davidsarah have addressed warner's comments in comment:66085, at least in that branch). If anyone wants one, I can also make a big patch bundle against trunk that contains the whole MDMF feature.
Attachment fix-393-boundary-issues.darcspatch (49914 bytes) added
address review comments, test for and fix boundary conditions (meant for application to ticket393-MDMF-2 branch)
+1 on fix-393-boundary-issues.darcspatch. The "
# XXX: Make it so that it won't set this if we're just decoding.
" comment is no longer necessary, though.It's fine by me if you just post patches against the ticket393-MDMF-2 branch.
Is there a ticket for the "robust, configurable tahoe-lafs simulator" mentioned in [this comment]source:ticket393-MDMF-2/src/allmydata/mutable/retrieve.py@5124#L430? I don't think there is.
Also delete the commented-out "
#needed.discard(0)
" [here]source:ticket393-MDMF-2/src/allmydata/mutable/retrieve.py@5124#L908.In the
_failed
method [here]source:ticket393-MDMF-2/src/allmydata/mutable/retrieve.py@5124#L1100, the "eventually(self._done_deferred.callback, ret)
" on the last line looks wrong. You have to use.callback
in theif self._verify:
branch and.errback
in theelse:
branch, I think.At source:ticket393-MDMF-2/src/allmydata/interfaces.py@5126#L452,
put_encprivey
->put_encprivkey
(orput_encrypted_private_key
might be better).tahoe debug dump-cap
does not seem to understand MDMF cap URIs:In [5188/ticket393-MDMF-2]:
In [5191/ticket393-MDMF-2]:
Attachment 131073.html (4400 bytes) added
Check results for an MDMF file on the testgrid that is causing UncoordinatedWriteErrors
I'm reviewing changeset 5125. It's a very substantial rewrite of the original code, which feels risky -- can you explain why this much of a rewrite was needed to support multiple segments? The new code is used for both SDMF and MDMF, which means there is a significant risk of regressions in SDMF support.
Also, it seems like there's a lot of near-duplicated code between the
update
andpublish
methods of thePublish
class (update
is new).Attachment partialoutputfromflogtooltailduringuncoordinatedwriteerror.txt (798038 bytes) added
The name pretty much says it all. This attachment is associated with the elusive UCWE...
Replying to davidsarah:
The short answer is that it probably isn't necessary to support multiple segments (and that I agree with you about the riskiness of a rewrite).
When I wrote the publisher, part of the goal of MDMF was to publish MDMF files segment-by-segment (that is, to write the encrypted+encoded blocks to the grid as they are produced on the client), as immutable files are published. The patch attached to comment:66008 contains a publisher that handles this sort of publishing, which is heavier on deferreds/callbacks and coordination logic than the current mutable publisher. I can't speak precisely to my reasoning at the time I wrote that; probably I felt that there wasn't a good/clean/maintainable way to add that sort of coordination logic to the current uploader without a substantial rewrite (maybe that was hasty; I recall exploring smaller, less invasive changes before the rewrite and not liking them for various reasons, but it's possible that I missed something simple). Anyway, comment:66007 describes an issue with segment-by-segment publishing for mutable files, and is ultimately why we don't publish segment-by-segment now. I guess at that point I felt that it was easier to tweak the revised publisher to work with the relatively minor
layout.py
changes induced by comment:66007 than to go back to the drawing board and re-evaluate the existing publish code in the context of the modified publishing semantics (a significant oversight, in retrospect, since so many of the publisher changes were due to the segment-by-segment publishing semantics).I can take another look at the existing publisher to see if it can be made to handle MDMF with fewer/smaller changes. I can't say how long that would take (or whether I'd be done by the 09/01/2011 deadline for 1.9). It does seem like a good idea, though.
Yes, that setup logic seems like a good target for refactoring.
In [5126/ticket393-MDMF-2], interface
IWritable
, the comment forupdate
should just say "mutable file" rather than "MDMF file". Also we spell "writeable" with an "e" (that's not the most common spelling, but we use it consistently elsewhere).IWritable
doesn't support truncating a file to a given length. I don't think this is needed immediately, but most file APIs provide this functionality.In [MutableFileNode]source:ticket393-MDMF-2/src/allmydata/mutable/filenode.py@5187#L682:
Is this comment correct? I thought
upload
(which calls_upload
) was supposed to work whether or not there is an existing version.Also, the message in the assertion should be "_update_servermap must be called before _upload", I think.
I think I agree with comment:66097, comment:66098, comment:66099 and comment:66105; I've made those fixes in my tree.
That comment on
_upload
is a bit confusing. You're correct that there's no restriction (either in the code or in the semantics prescribed by IMutableFileNode) that upload be used only for new mutable files. I'll remove that. I probably put it there because the only internal caller of_upload
is associated with mutable file creation.I'll attach a patch with those fixes and a few others (removing some TODOs in the mutable filenode code that no longer apply, removing SDMF-specific comments from interfaces.py, and removing a commented-out method from the mutable filenode code).
On the call yesterday, we had some concerns about fencepost errors in segment boundary calculations, and in particular their potential impact on SDMF mutable files. In the case of SDMF files, most of these calculations are easy: there's only one segment, and we always want to fetch or upload it. It would be fairly easy to use that knowledge to short-circuit the possibly wrong segment calculations so that they're only used for MDMF mutable files, which would reduce the potential for regression if they are in fact wrong. Does that seem like a good idea? I can prepare a patch with those changes if so.
In [5187/ticket393-MDMF-2], is it possible that
offset
anddata.get_size()
are both zero, in which caseend_segment
will end up as -1?I looked for other possible fencepost errors and couldn't see any. I don't think that we need to short-circuit the segment calculations for SDMF.
Minor nit: the CLI accepts
--mutable-type=mdmf
(for example), but not--mutable-type=MDMF
. It should probably be case-insensitive.Also, specifying
--mutable-type
should probably imply--mutable
. Currently there's an assert if the former is given without the latter; asserts should not be used to report usage errors. (Also, usage errors should be reported first, before other possible errors -- but I'm suggesting that this shouldn't be an error in any case.)These changes aren't needed for the alpha.
I added the following tests to
test_mutable.py
, to try to catch fencepost errors with zero-length reads. They both fail, although I think they're correct:Also, the test named
test_partial_read_ending_on_segment_boundary
is actually testing a read ending one byte after a segment boundary. I don't know whether that's intentional; if it is then the test should be renamed.Attachment zero-length-tests.darcs.patch (51309 bytes) added
Additional tests for zero-length partial reads and updates to mutable versions. Also refactors some tests to reduce duplicated code.
I've been searching for any critical bugs in SDMF writes in the branch. I haven't found any yet. I've found some minor bugs that shouldn't be blockers for the 1.9 alpha release.
Opened #1496 about an issue that I think should be a blocker for 1.9 final but possibly not for 1.9 alpha.
Opened #1497 about an issue that I think should be a blocker for 1.9 final but not for 1.9 alpha.
It is bed time and unfortunately I haven't finished examining the patch in search of ways that it could make us write out ill-formatted SDMF files, or leak the write-authorization secrets, or succumb to an attacker's attempt to make us write out (and digitally sign) an SDMF file that we didn't mean to (including making us use a wrong sequence number). Those are the three possibilities that I consider to be the "worst case scenarios", so they are what I started looking for first. I'll resume looking for those possibilities tomorrow.
Kevan has caught some places which say "if the file is mutable, load all of its contents into RAM", for example in 20110802021207-b8d28-b0c006c43f8efc1c3d89484ed7d6ff037f07774a
GeneralSFTPFile._close()
. This worked in Tahoe-LAFS <= v1.8, because mutable files were necessarily smaller than RAM, but this assumption is no longer valid in Tahoe-LAFS >= v1.9.If he didn't catch one, then the fact that a file is mutable but can have large contents (even larger than RAM) would be a critical bug. How can we be sure we've caught all of them? Can we write unit tests which detect if the entire file has been loaded into RAM? (Somehow... by examining the change in the RAM usage? By scanning the Python strings for a special pattern which we've put into the MDMF file? Or better: by using a mock object that observes how many reads of what size the client performs and when the resulting objects get garbage collected and adds up the high water mark.)
Failing that, we should have someone else do what Kevan has presumably already done and search for all uses of mutable files and examine each use to see if it reads the entire contents into RAM.
(Should this comment turn into a new ticket?)
Attachment 393-davidsarah.darcs.patch (66659 bytes) added
Additional tests for MDMF. Also, make the immutable/read-only constraint checking for MDMF URIs identical to that for SSK URIs. refs #393
Attachment misc-docs-davidsarah.darcs.patch (96276 bytes) added
Miscellaneous documentation updates (including some for MDMF)
Replying to zooko:
[GeneralSFTPFile._close]source:src/allmydata/frontends/sftpd.py@5179#L813 does:
where
self.consumer.get_file()
is a file object for an open temporary file.Does this read the whole file into memory? If it does, I think the bug is in
MutableFileHandle
orMutableFileVersion.overwrite
. The SFTP frontend itself is careful not to assume that files fit into memory.(I agree with the more general point that we need tests to detect whether we're making that assumption. Manual smoke-testing with large files would also help, and could be done with the alpha.)
At source:src/allmydata/mutable/publish.py@5168#L342,
ConsistencyError
->UncoordinatedWriteError
. (This bug predates MDMF.)Replying to [davidsarah]comment:148:
Oh, you meant that Kevan fixed it in this case. Sorry for being dense.
I'm confused by this code in [publish.py Publish.update]source:src/allmydata/mutable/publish.py#L134:
If I understand correctly,
self.datalength
is the length that we're going to update. Because of the issue in comment:66007, we don't do partial updates yet, which might explain why this code setsself.datalength
to at leastself._node.get_size()
(the previous size of the file). However, I don't see how this works whenoffset > 0
. If (offset
,self.datalength
) is supposed to be the offset and length of the share contents we're going to write, then I would expect:If (
offset
,self.datalength
) is supposed to be only the offset and length of the data the client provided, I would expect just:Attachment 393-fix-interfaces.darcs.patch (66903 bytes) added
Fix interfaces related to MDMF. refs #393 Also some grammar corrections, and removal of 'self' from interface method declarations in interfaces.py
Attachment 393-zero-length-reads-todo.darcs.patch (48216 bytes) added
test_mutable.py: mark zero-length read tests as TODO. refs #393
Replying to davidsarah:
self.datalength
is perhaps confusingly named -- it is the size of theMDMF file if the update operation completes successfully, and is used to calculate
segment boundaries and to initialize the tail segment encoder (if the
tail segment differs in length from the other segments).
self.datalength
should be calculated as in your first example. Actually, dueto a bug in
TransformingUploadable
, it is calculated that way -- forwhatever reason, I wrote
TransformingUploadable.get_size
to returnoffset + get_size
. That's inconsistent withMutableFileHandle.get_size
, and not what I would expect a method namedget_size
to do; I'll fix that. AFAICT, the update method works despite relying on buggy behavior because it is always passed aTransformingUploadable
-- full-file updates useMutableFileHandles
orMutableData
instances.As an aside, we still do partial updates -- we just do them in such a
way as to ensure that all of the changes are written at once, in that we
retain all of the updated segments in memory as write vectors, then send
those all at once to the storage server. So, IIUC, it isn't necessarily
correct to just set offset to zero.
Replying to davidsarah:
Actually [the assert at tahoe_put.py line 72]source:src/allmydata/scripts/tahoe_put.py@5180#L71 never triggers because the
mutable_type
variable is only set ifmutable
is set. So as #1506 says,--mutable-type
is silently ignored if--mutable
is not present.Replying to [kevan]comment:152:
I suggest
new_length
for this.I'm not surprised I was confused! ;-) Thanks for the explanation.
Attachment code-cleanup-and-datalength-fixes.dpatch (10823 bytes) added
fixes described in comment:-1 and comment:152
In changeset:ac7b8400d4680a64:
In changeset:88989a4ea260dec9:
In changeset:3c92b832f2958ba9:
I've landed 393-davidsarah.darcs.patch and 393-zero-length-reads-todo.darcs.patch .
Attachment verifier-uri-rename.darcs.patch (50747 bytes) added
Replace *URIVerifier -> *VerifierURI, which is more logical and consistent with existing class names. refs #393
I reviewed verifier-uri-rename.darcs.patch and it looks good to me. By the way, I'm glad this patch is a set of
darcs replace
instructions and not a much larger set of hunk edits. That made it easier to review, and easier to be sure that applying this patch wouldn't accidentally omit a necessary change. However, I would still be just as happy leaving this patch out of trunk until after 1.9, in order to minimize disruption to the code review and testing work that Zancas and I (hopefully among others) are doing for 1.9.In source:src/allmydata/test/test_cli.py,
test_dump_cap_sdmf_directory
andtest_dump_cap_mdmf_directory
are fairly long test methods that differ only in which classes they use. Most of their code should probably be factored out into a helper method.Brian asked me on IRC to investigate the first-segment/last-segment logic to make sure we don't have off-by-one errors, etc., in there. I started reading [TransformingUploadable]source:trunk/src/allmydata/mutable/publish.py?annotate=blame&rev=5231#L1266. Then I decided that I needed to understand what the behavior should be, so I stopped reading
TransformingUploadable
and started writing down what I thought the result should be for the the first-segment/last-segment logic. These notes turned into Python code, which I will attach under the name "choose_segs.py"...I'll come back and look at this some more tomorrow.
Attachment choose_segs.py (2860 bytes) added
Attachment choose_segs2.py (3197 bytes) added
Attachment choose_segs3.py (3575 bytes) added
No sooner had I posted choose_segs.py than I realized I had forgotten an important case—the case where your write doesn't overwrite the entire last segment of your write, but it is also the last segment of the current version of the file and your write overwrites all of the bytes in the current version of the file.
So then no sooner had I posted attachment:choose_segs2.py, which fixed that, that I realized I had also forgotten the case that your write of the first segment is not overwriting the entire segment but is overwriting all of the bytes in that segment in the current version of the file (because it is also the last segment of your write and the last segment of the current version of the file). So here is attachment:choose_segs3.py. This sort of arithmetic is annoyingly hard, and I'm sleepy, so I wouldn't be surprised if there are more mistakes like that in there.
Still no tests. Perhaps choose_segs3.py could be the tests for [TransformingUploadable]source:trunk/src/allmydata/mutable/publish.py?annotate=blame&rev=5231#L1266...
Replying to davidsarah:
This is #1527.
In changeset:5d3d0dc33656c0ad:
At source:src/allmydata/mutable/publish.py@5280#L428, there is a comment about the
self.outstanding
set: "the second table is our list
sicof outstanding queries: those which are in flight and may or may not be delivered, accepted, or acknowledged. Items are added to this table when the request is sent, and removed when the response returns (or errbacks).
"However nothing in publish.py removes entries from
self.outstanding
. In finish_publishing at source:src/allmydata/mutable/publish.py@5280#L855, the loop adds entries toself.outstanding
, and registers an errback toself._connection_problem
ifwriter.finish_publishing
fails. Butself._connection_problem
removes the writer fromself.writers
, it doesn't remove(writer.peerid, writer.shnum)
fromself.outstanding
. So either the comment or the logic is wrong. (Also there's a reference to_loop
at line 872, and I don't see any_loop
in that file.)(Don't kow how that version field got changed.)
In changeset:f94eb86fc9232ce6:
In changeset:1fa5c729b758776b:
Replying to davidsarah:
These were fixed in changeset:f94eb86fc9232ce6.
In changeset:bbb6e5d25e62eaed:
In changeset:de00b277cc9adfb0:
In [5418/ticket999-S3-backend]:
In [5420/ticket999-S3-backend]:
I need this ticket to be closed.. it's too big to be useful, and MDMF landed weeks ago. Can the interested parties create (a finite number of) new tickets for specific issues? Like, one for docs, one for the segmentation question, and one for recommended refactoring/code-cleanups?
I'm closing this one, the code landed forever ago, and I've seen at least a few specific tickets for MDMF issues found since then. If there are comments here that describe problems that are still relevant, please go ahead and file additional bugs on them.