A variety of CPU and memory resource limits for the HTTP storage client and server #1203
No reviewers
Labels
No Label
Benchmarking and Performance
HTTP Storage Protocol
Nevow Removal
Python 3 Porting
not-for-merge
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: tahoe-lafs/tahoe-lafs#1203
Loading…
Reference in New Issue
No description provided.
Delete Branch "3872-resource-limits-http-storage"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3872
Thanks. Only got through part of this today but I left some comments inline for that. I'll finish tomorrow morning.
I also recently learned that
... = Factory(dict)
works.The maximum number of shares supported by ZFEC is 256 so we could probably come down quite a bit from here if we wanted.
Also hm. Is the implication that bare
uint
is unbounded? I tried to read the spec but I couldn't find a definitive answer one way or the other. If it is unbounded, do we want to impose a bound on all uints in the schema? If we don't then presumably the server can be made to accept a 100GB uint value, right?Typo - "Callabale"
This condition might be because
read_data
ran out of bytes to return prematurely. In this case, thecontent-range
header that was written at the beginning will mismatch the actual data returned. What does HTTP have to say about such a case?It could also be because that the previous iteration correctly read the last of the data or this iteration is reading the last of it. In those cases it's perfectly correct to do an orderly completion of the response.
If the bookkeeping at the bottom were done first then
self.remaining == 0
would be a correct condition to use for orderly completion andlen(data) < to_read
would signal the error case. Maybe it's worth doing something extra in the error case? The only things I can think of are finishing the request (and maybe this will signal error to the client? is there another way to signal error?) and logging something.Can this be
fixtures.MonkeyPatch
instead? I am trying to get rid of all uses of mocking from the test suite. I see that your use ofMockPatch
doesn't actually do mocking so I hope it will work just fine withMonkeyPatch
instead.I bet the test report doesn't look great when this comparison fails.
I think you could use Hypothesis to give you a number of bytes to generate and a number of bytes of "headroom" to allow. If the
millionbytes
endpoint took the size of its response as an argument then inside the Hypothesis-enabled test you could make one request and do one assertion about the result (limited_content(..., size + headroom) returns size bytes). And if there's a problem Hypothesis will minimize the case to the smallest size that produces the problem, which might be less than 1 MB of data.Thanks! A few more inline comments. I guess there's enough that I'd like to take another look once those are addressed.
I'd probably leave the default value out here. If there's a default, it probably makes more sense for it to come from the tahoe.cfg-reading code (and allow it to be configured in tahoe.cfg).
I wonder if it would be a nicer interface to have this result be ...
typing.BinaryIO
?BinaryReadIO
would be nice if it existed.@ -117,0 +148,4 @@
d.addCallback(lambda _: treq.collect(response, collector))
def done(_):
collector.f.seek(0)
There should be some manipulation of
self.remaining_length
here too I think.@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
Sometimes the
response
should indicate how large the response will be, right? EitherContent-Length
orContent-Range
will dictate it? So this function could probably fail early in those cases instead of reading up to the limit and then failing.Related, does the response object need to be told to clean up if we don't read the whole response body? Or does
treq.collect
deal with that automatically?The return annotation needs to be updated.
!
If you dig deep enough, ints are implicitly 64-bit (it's kinda implied by CBOR, and explicitly stated in the JSON section...). So it's not unlimited-size integers and that's fine.
As far as whether 30 shares is too high or too low, can probably revisit, for now just wanted some minimal constraints.
Handy
I'll open a ticket for that, messing with configuration in this branch seems excessive (it also would need docs...).
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3906
Also note this doesn't affect how much data the client requests, it's more of a defense against malicious servers. So it also needs further thought (see comment in ticket).
Oh hm, I have a vague memory that the IStorageServer API actually allows reading beyond the end, which likley conflicts with HTTP protocol spec. So that's I think part of what's going on. So probably I need to make IStorageServer smarter and somehow get the length of the share first? Which is not currently supported.
I guess "I returned what's available, and no more" is one possible behavior, and... probably valid.
A third option: make sure IStorageServer never reads beyond the end of a share. Which may already be the case but I have distinct memory of some hack used to check existence or something...
test_istorageserver
has some tests that I wrote for reading beyond the edge, so I could just... delete those, and when if I detect usage in client just fix the client? But I swear I had that in for a reason.OK, did some instrumentation, and immutables and mutables do get reads past the end sometimes. So option 3 is no good, and so need to either make HTTP client caller get length first to ensure good range, or just return truncated data.
So I think what I will do is, in this branch just return truncated (effectively already implemented) with the wrong Content-Range header in that edge case. And follow up will add two HEAD HTTP endpoints for getting the length of the share, and then HTTP server and client will get logic to not read beyond end of share, and
IStorageServer
layer can use length endpoints to make sure it doesn't request too much.https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3907
Resolved, but leaving open for context.
@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
For the second point, if an exception is raised it... might end up in a Deferred (which I think is what I have tests for), or in some cases it might end up raising an exception directly from
content_length
. It depends! So, not so good... And that doesn't even answer the question... So clearly need a bit more work here.Coverage decreased (-0.02%) to 94.727% when pulling
a24aefaebf
on 3872-resource-limits-http-storage intoa78f50470f
on master.@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
I don't think there's any way to tell a response to close... so I think we're stuck with garbage collection.
Yep. And...
tox -e typechecks
didn't catch it.OK I believe I have either addressed all issues, explained why it's not solvable, or opened follow-up issue in one case.
As long as you mean 30 bit integer representing number of shares, agreed.
Thanks! This looks good. Just a couple minor points inline (one accidentally detached from this review). Please do with those what you will and then merge.
@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
Does the first point - failing early based on value of Content-Length or Content-Range seem valid and useful? I don't mind if this is a follow-up, just don't want to lose the point.
@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
OK, I actually figured out how to close the connection. So I filed https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3908#ticket for the first point, and will go deal with the second point now.
@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
Looks like this has to happen inside
treq
; I filed https://github.com/twisted/treq/issues/347, and will go implement PR for it, but for purposes of this PR we're done.@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
OK, PR submitted. If it gets merged, this will be handled automatically by treq; if status quo continues, not much we can do without a lot more work.
That's... just the number 30. In decimal. So that's too low.