A variety of CPU and memory resource limits for the HTTP storage client and server #1203

Merged
itamarst merged 22 commits from 3872-resource-limits-http-storage into master 2022-07-15 16:44:08 +00:00
itamarst commented 2022-06-30 17:56:05 +00:00 (Migrated from github.com)
Fixes https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3872
exarkun (Migrated from github.com) reviewed 2022-06-30 20:15:10 +00:00
exarkun (Migrated from github.com) left a comment

Thanks. Only got through part of this today but I left some comments inline for that. I'll finish tomorrow morning.

Thanks. Only got through part of this today but I left some comments inline for that. I'll finish tomorrow morning.
exarkun (Migrated from github.com) commented 2022-06-30 20:05:03 +00:00

I also recently learned that ... = Factory(dict) works.

I also recently learned that `... = Factory(dict)` works.
exarkun (Migrated from github.com) commented 2022-06-30 19:58:45 +00:00

The maximum number of shares supported by ZFEC is 256 so we could probably come down quite a bit from here if we wanted.

Also hm. Is the implication that bare uint is unbounded? I tried to read the spec but I couldn't find a definitive answer one way or the other. If it is unbounded, do we want to impose a bound on all uints in the schema? If we don't then presumably the server can be made to accept a 100GB uint value, right?

The maximum number of shares supported by ZFEC is 256 so we could probably come down quite a bit from here if we wanted. Also hm. Is the implication that bare `uint` is unbounded? I tried to read the spec but I couldn't find a definitive answer one way or the other. If it is unbounded, do we want to impose a bound on all uints in the schema? If we don't then presumably the server can be made to accept a 100GB uint value, right?
exarkun (Migrated from github.com) commented 2022-06-30 20:04:17 +00:00

Typo - "Callabale"

Typo - "Callabale"
exarkun (Migrated from github.com) commented 2022-06-30 20:14:07 +00:00

This condition might be because read_data ran out of bytes to return prematurely. In this case, the content-range header that was written at the beginning will mismatch the actual data returned. What does HTTP have to say about such a case?

It could also be because that the previous iteration correctly read the last of the data or this iteration is reading the last of it. In those cases it's perfectly correct to do an orderly completion of the response.

If the bookkeeping at the bottom were done first then self.remaining == 0would be a correct condition to use for orderly completion and len(data) < to_read would signal the error case. Maybe it's worth doing something extra in the error case? The only things I can think of are finishing the request (and maybe this will signal error to the client? is there another way to signal error?) and logging something.

This condition might be because `read_data` ran out of bytes to return prematurely. In this case, the `content-range` header that was written at the beginning will mismatch the actual data returned. What does HTTP have to say about such a case? It could also be because that the previous iteration correctly read the last of the data or this iteration is reading the last of it. In those cases it's perfectly correct to do an orderly completion of the response. If the bookkeeping at the bottom were done first then `self.remaining == 0`would be a correct condition to use for orderly completion and `len(data) < to_read` would signal the error case. Maybe it's worth doing something extra in the error case? The only things I can think of are finishing the request (and maybe this will signal error to the client? is there another way to signal error?) and logging something.
exarkun (Migrated from github.com) commented 2022-06-30 19:33:43 +00:00

Can this be fixtures.MonkeyPatch instead? I am trying to get rid of all uses of mocking from the test suite. I see that your use of MockPatch doesn't actually do mocking so I hope it will work just fine with MonkeyPatch instead.

Can this be `fixtures.MonkeyPatch` instead? I am trying to get rid of all uses of mocking from the test suite. I see that your use of `MockPatch` doesn't actually do mocking so I hope it will work just fine with `MonkeyPatch` instead.
exarkun (Migrated from github.com) commented 2022-06-30 19:40:54 +00:00

I bet the test report doesn't look great when this comparison fails.

I think you could use Hypothesis to give you a number of bytes to generate and a number of bytes of "headroom" to allow. If the millionbytes endpoint took the size of its response as an argument then inside the Hypothesis-enabled test you could make one request and do one assertion about the result (limited_content(..., size + headroom) returns size bytes). And if there's a problem Hypothesis will minimize the case to the smallest size that produces the problem, which might be less than 1 MB of data.

I bet the test report doesn't look great when this comparison fails. I think you could use Hypothesis to give you a number of bytes to generate and a number of bytes of "headroom" to allow. If the `millionbytes` endpoint took the size of its response as an argument then inside the Hypothesis-enabled test you could make one request and do one assertion about the result (limited_content(..., size + headroom) returns size bytes). And if there's a problem Hypothesis will minimize the case to the smallest size that produces the problem, which might be less than 1 MB of data.
exarkun (Migrated from github.com) requested changes 2022-07-01 14:20:15 +00:00
exarkun (Migrated from github.com) left a comment

Thanks! A few more inline comments. I guess there's enough that I'd like to take another look once those are addressed.

Thanks! A few more inline comments. I guess there's enough that I'd like to take another look once those are addressed.
exarkun (Migrated from github.com) commented 2022-07-01 12:57:05 +00:00

I'd probably leave the default value out here. If there's a default, it probably makes more sense for it to come from the tahoe.cfg-reading code (and allow it to be configured in tahoe.cfg).

I'd probably leave the default value out here. If there's a default, it probably makes more sense for it to come from the tahoe.cfg-reading code (and allow it to be configured in tahoe.cfg).
exarkun (Migrated from github.com) commented 2022-07-01 13:31:17 +00:00

I wonder if it would be a nicer interface to have this result be ... typing.BinaryIO? BinaryReadIO would be nice if it existed.

I wonder if it would be a nicer interface to have this result be ... `typing.BinaryIO`? `BinaryReadIO` would be nice if it existed.
@ -117,0 +148,4 @@
d.addCallback(lambda _: treq.collect(response, collector))
def done(_):
collector.f.seek(0)
exarkun (Migrated from github.com) commented 2022-07-01 13:00:57 +00:00

There should be some manipulation of self.remaining_length here too I think.

There should be some manipulation of `self.remaining_length` here too I think.
@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
exarkun (Migrated from github.com) commented 2022-07-01 13:00:33 +00:00

Sometimes the response should indicate how large the response will be, right? Either Content-Length or Content-Range will dictate it? So this function could probably fail early in those cases instead of reading up to the limit and then failing.

Related, does the response object need to be told to clean up if we don't read the whole response body? Or does treq.collect deal with that automatically?

_Sometimes_ the `response` should indicate how large the response will be, right? Either `Content-Length` or `Content-Range` will dictate it? So this function could probably fail early in those cases instead of reading up to the limit and then failing. Related, does the response object need to be told to clean up if we don't read the whole response body? Or does `treq.collect` deal with that automatically?
exarkun (Migrated from github.com) commented 2022-07-01 14:13:56 +00:00

The return annotation needs to be updated.

The return annotation needs to be updated.
itamarst (Migrated from github.com) reviewed 2022-07-05 15:26:13 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 15:26:13 +00:00

!

!
itamarst (Migrated from github.com) reviewed 2022-07-05 15:29:24 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 15:29:24 +00:00

If you dig deep enough, ints are implicitly 64-bit (it's kinda implied by CBOR, and explicitly stated in the JSON section...). So it's not unlimited-size integers and that's fine.

If you dig deep enough, ints are implicitly 64-bit (it's kinda implied by CBOR, and explicitly stated in the JSON section...). So it's not unlimited-size integers and that's fine.
itamarst (Migrated from github.com) reviewed 2022-07-05 15:29:45 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 15:29:44 +00:00

As far as whether 30 shares is too high or too low, can probably revisit, for now just wanted some minimal constraints.

As far as whether 30 shares is too high or too low, can probably revisit, for now just wanted some minimal constraints.
itamarst (Migrated from github.com) reviewed 2022-07-05 15:30:55 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 15:30:55 +00:00

Handy

Handy
itamarst (Migrated from github.com) reviewed 2022-07-05 15:34:10 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 15:34:09 +00:00

I'll open a ticket for that, messing with configuration in this branch seems excessive (it also would need docs...).

I'll open a ticket for that, messing with configuration in this branch seems excessive (it also would need docs...).
itamarst (Migrated from github.com) reviewed 2022-07-05 15:37:16 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 15:37:16 +00:00

https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3906

Also note this doesn't affect how much data the client requests, it's more of a defense against malicious servers. So it also needs further thought (see comment in ticket).

https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3906 Also note this doesn't affect how much data the client _requests_, it's more of a defense against malicious servers. So it also needs further thought (see comment in ticket).
itamarst (Migrated from github.com) reviewed 2022-07-05 19:30:44 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 19:30:44 +00:00

Oh hm, I have a vague memory that the IStorageServer API actually allows reading beyond the end, which likley conflicts with HTTP protocol spec. So that's I think part of what's going on. So probably I need to make IStorageServer smarter and somehow get the length of the share first? Which is not currently supported.

Oh hm, I have a vague memory that the IStorageServer API actually allows reading beyond the end, which likley conflicts with HTTP protocol spec. So that's I think part of what's going on. So probably I need to make IStorageServer smarter and somehow get the length of the share first? Which is not currently supported.
itamarst (Migrated from github.com) reviewed 2022-07-05 19:35:33 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 19:35:32 +00:00

I guess "I returned what's available, and no more" is one possible behavior, and... probably valid.

I guess "I returned what's available, and no more" is one possible behavior, and... probably valid.
itamarst (Migrated from github.com) reviewed 2022-07-05 19:49:08 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 19:49:08 +00:00

A third option: make sure IStorageServer never reads beyond the end of a share. Which may already be the case but I have distinct memory of some hack used to check existence or something...

A third option: make sure IStorageServer never reads beyond the end of a share. Which may already be the case but I have distinct memory of some hack used to check existence or something...
itamarst (Migrated from github.com) reviewed 2022-07-05 19:52:20 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 19:52:20 +00:00

test_istorageserver has some tests that I wrote for reading beyond the edge, so I could just... delete those, and when if I detect usage in client just fix the client? But I swear I had that in for a reason.

`test_istorageserver` has some tests that I wrote for reading beyond the edge, so I could just... delete those, and when if I detect usage in client just fix the client? But I swear I had that in for a reason.
itamarst (Migrated from github.com) reviewed 2022-07-05 19:59:27 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 19:59:27 +00:00

OK, did some instrumentation, and immutables and mutables do get reads past the end sometimes. So option 3 is no good, and so need to either make HTTP client caller get length first to ensure good range, or just return truncated data.

OK, did some instrumentation, and immutables and mutables do get reads past the end sometimes. So option 3 is no good, and so need to either make HTTP client caller get length first to ensure good range, or just return truncated data.
itamarst (Migrated from github.com) reviewed 2022-07-05 20:04:43 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 20:04:43 +00:00

So I think what I will do is, in this branch just return truncated (effectively already implemented) with the wrong Content-Range header in that edge case. And follow up will add two HEAD HTTP endpoints for getting the length of the share, and then HTTP server and client will get logic to not read beyond end of share, and IStorageServer layer can use length endpoints to make sure it doesn't request too much.

So I think what I will do is, in this branch just return truncated (effectively already implemented) with the wrong Content-Range header in that edge case. And follow up will add two HEAD HTTP endpoints for getting the length of the share, and then HTTP server and client will get logic to not read beyond end of share, and `IStorageServer` layer can use length endpoints to make sure it doesn't request too much.
itamarst (Migrated from github.com) reviewed 2022-07-05 20:12:51 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 20:12:51 +00:00
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3907
itamarst (Migrated from github.com) reviewed 2022-07-05 21:17:56 +00:00
itamarst (Migrated from github.com) commented 2022-07-05 21:17:56 +00:00

Resolved, but leaving open for context.

Resolved, but leaving open for context.
itamarst (Migrated from github.com) reviewed 2022-07-05 21:27:31 +00:00
@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
itamarst (Migrated from github.com) commented 2022-07-05 21:27:31 +00:00

For the second point, if an exception is raised it... might end up in a Deferred (which I think is what I have tests for), or in some cases it might end up raising an exception directly from content_length. It depends! So, not so good... And that doesn't even answer the question... So clearly need a bit more work here.

For the second point, if an exception is raised it... might end up in a Deferred (which I think is what I have tests for), or in some cases it might end up raising an exception directly from `content_length`. It depends! So, not so good... And that doesn't even answer the question... So clearly need a bit more work here.
coveralls commented 2022-07-05 22:06:43 +00:00 (Migrated from github.com)

Coverage Status

Coverage decreased (-0.02%) to 94.727% when pulling a24aefaebf on 3872-resource-limits-http-storage into a78f50470f on master.

[![Coverage Status](https://coveralls.io/builds/50908659/badge)](https://coveralls.io/builds/50908659) Coverage decreased (-0.02%) to 94.727% when pulling **a24aefaebf8f0487b4a8cc981c7cb238d0aca1d2 on 3872-resource-limits-http-storage** into **a78f50470f0f4a1be301d9b278288b9e5c6d3757 on master**.
itamarst (Migrated from github.com) reviewed 2022-07-06 13:38:13 +00:00
@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
itamarst (Migrated from github.com) commented 2022-07-06 13:38:13 +00:00

I don't think there's any way to tell a response to close... so I think we're stuck with garbage collection.

I don't think there's any way to tell a response to close... so I think we're stuck with garbage collection.
itamarst (Migrated from github.com) reviewed 2022-07-06 13:48:56 +00:00
itamarst (Migrated from github.com) commented 2022-07-06 13:48:56 +00:00

Yep. And... tox -e typechecks didn't catch it.

Yep. And... `tox -e typechecks` didn't catch it.
itamarst commented 2022-07-06 14:02:23 +00:00 (Migrated from github.com)

OK I believe I have either addressed all issues, explained why it's not solvable, or opened follow-up issue in one case.

OK I believe I have either addressed all issues, explained why it's not solvable, or opened follow-up issue in one case.
exarkun (Migrated from github.com) reviewed 2022-07-12 18:57:23 +00:00
exarkun (Migrated from github.com) commented 2022-07-12 18:57:23 +00:00

As far as whether 30 shares is too high or too low, can probably revisit, for now just wanted some minimal constraints.

As long as you mean 30 bit integer representing number of shares, agreed.

> As far as whether 30 shares is too high or too low, can probably revisit, for now just wanted some minimal constraints. As long as you mean 30 bit integer representing number of shares, agreed.
exarkun (Migrated from github.com) approved these changes 2022-07-12 19:15:31 +00:00
exarkun (Migrated from github.com) left a comment

Thanks! This looks good. Just a couple minor points inline (one accidentally detached from this review). Please do with those what you will and then merge.

Thanks! This looks good. Just a couple minor points inline (one accidentally detached from this review). Please do with those what you will and then merge.
@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
exarkun (Migrated from github.com) commented 2022-07-12 19:13:23 +00:00

Does the first point - failing early based on value of Content-Length or Content-Range seem valid and useful? I don't mind if this is a follow-up, just don't want to lose the point.

Does the first point - failing early based on value of Content-Length or Content-Range seem valid and useful? I don't mind if this is a follow-up, just don't want to lose the point.
itamarst (Migrated from github.com) reviewed 2022-07-15 14:05:35 +00:00
@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
itamarst (Migrated from github.com) commented 2022-07-15 14:05:35 +00:00

OK, I actually figured out how to close the connection. So I filed https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3908#ticket for the first point, and will go deal with the second point now.

OK, I actually figured out how to close the connection. So I filed https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3908#ticket for the first point, and will go deal with the second point now.
itamarst (Migrated from github.com) reviewed 2022-07-15 14:34:32 +00:00
@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
itamarst (Migrated from github.com) commented 2022-07-15 14:34:32 +00:00

Looks like this has to happen inside treq; I filed https://github.com/twisted/treq/issues/347, and will go implement PR for it, but for purposes of this PR we're done.

Looks like this has to happen inside `treq`; I filed https://github.com/twisted/treq/issues/347, and will go implement PR for it, but for purposes of this PR we're done.
itamarst (Migrated from github.com) reviewed 2022-07-15 15:33:22 +00:00
@ -121,2 +161,4 @@
def got_content(f: BinaryIO):
data = f.read()
schema.validate_cbor(data)
return loads(data)
itamarst (Migrated from github.com) commented 2022-07-15 15:33:22 +00:00

OK, PR submitted. If it gets merged, this will be handled automatically by treq; if status quo continues, not much we can do without a lot more work.

OK, PR submitted. If it gets merged, this will be handled automatically by treq; if status quo continues, not much we can do without a lot more work.
itamarst (Migrated from github.com) reviewed 2022-07-15 15:34:11 +00:00
itamarst (Migrated from github.com) commented 2022-07-15 15:34:11 +00:00

That's... just the number 30. In decimal. So that's too low.

That's... just the number 30. In decimal. So that's too low.
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/tahoe-lafs#1203
No description provided.