HTTP protocol is significantly slower than Foolscap protocol #3939

New Issue

itamarst · 2022-11-03T17:17:46Z

itamarst commented

2022-11-03 17:17:46 +00:00

Specifically, in https://github.com/tahoe-lafs/tahoe-lafs/pull/1225, test_filesystem (from test_system.py) is 40 seconds with HTTP, compared to 20 with Foolscap.

Of those extra 20 seconds:

7 seconds are from repeated calls to get_spki_hash() in the HTTP client validation code. Or rather, the real bottleneck is the call to get_public_bytes(), which is a little surprising but OK.
Another 7 seconds or so are just the TLS handshake?!
http_client's getContext() method to create a TLS context is also quite expensive, and there's some expense in creating context objects on the server.

The test setup doesn't use persistent connections, so this arguably unrealistic, and maybe tests should use persistent connections. This still seems... excessive. Perhaps we're doing way too many HTTP requests. E.g. maybe chunk sizes that are OK for Foolscap are too small for HTTP.

Specifically, in <https://github.com/tahoe-lafs/tahoe-lafs/pull/1225>, `test_filesystem` (from test_system.py) is 40 seconds with HTTP, compared to 20 with Foolscap. Of those extra 20 seconds: * 7 seconds are from repeated calls to `get_spki_hash()` in the HTTP client validation code. Or rather, the _real_ bottleneck is the call to get_public_bytes(), which is a little surprising but OK. * Another 7 seconds or so are just the TLS handshake?! * http_client's getContext() method to create a TLS context is also quite expensive, and there's some expense in creating context objects on the server. The test setup doesn't use persistent connections, so this arguably unrealistic, and maybe tests should use persistent connections. This still seems... excessive. Perhaps we're doing way too many HTTP requests. E.g. maybe chunk sizes that are OK for Foolscap are too small for HTTP.

itamarst added the

labels 2022-11-03 17:17:46 +00:00

itamarst added this to the HTTP Storage Protocol milestone 2022-11-03 17:17:46 +00:00

itamarst commented

2022-11-03 17:19:34 +00:00

If I switch back to persistent HTTP connections, the test takes 20 seconds. So this is perhaps not a blocker if I can figure out how to make the tests not get dirty reactor with persistent HTTP.

Still seems worth fixing though, it suggests the HTTP protocol is doing way too much requests.

If I switch back to persistent HTTP connections, the test takes 20 seconds. So this is perhaps not a blocker if I can figure out how to make the tests not get dirty reactor with persistent HTTP. Still seems worth fixing though, it suggests the HTTP protocol is doing way too much requests.

itamarst commented

2022-11-22 18:09:49 +00:00

Looking at an immutable, here are some writes for a (presumably small) object, as recorded via layout.py: 36 bytes, 1400, 32, 32, 32, 170, 320. This will get batched via pipeline.py (see #3787) and for Foolscap maybe that's fine. But HTTP/1.1 has higher overhead per query than Foolscap I suspect, and even with HTTP 2.0 I imagine it's rather higher.

So possibly one strategy is doing an alternative to pipeline.py where logic isn't generic "batch API queries" but instead it relies on the fact we're doing writes without any holes, so we can coalesce writes semantically. (Previously it would have holes so this would've been harder.)

Looking at an immutable, here are some writes for a (presumably small) object, as recorded via layout.py: 36 bytes, 1400, 32, 32, 32, 170, 320. This will get batched via pipeline.py (see #3787) and for Foolscap maybe that's fine. But HTTP/1.1 has higher overhead per query than Foolscap I suspect, and even with HTTP 2.0 I imagine it's rather higher. So possibly one strategy is doing an alternative to `pipeline.py` where logic isn't generic "batch API queries" but instead it relies on the fact we're doing writes without any holes, so we can coalesce writes semantically. (Previously it _would_ have holes so this would've been harder.)

itamarst commented

2022-11-22 18:14:31 +00:00

There is the problem that the API currently doesn't force the code to do writes in order, but that's solvable.

There is the problem that the API currently doesn't force the code to do writes _in order_, but that's solvable.

itamarst commented

2022-11-22 18:17:44 +00:00

In practice immutables path does actually have the writes happening in the correct order, so possibly that should just work.

itamarst commented

2022-11-22 20:23:41 +00:00

TODO for branch-in-progress:

Unit tests for new data structure
Validate there is still backpressure
Update #3787 to note new place that needs tuning
See if equivalent improvements are possible for mutables

TODO for branch-in-progress: * Unit tests for new data structure * Validate there is still backpressure * Update #3787 to note new place that needs tuning * See if equivalent improvements are possible for mutables

itamarst commented

2022-11-23 15:50:51 +00:00

Did all the above (mutable uploads seem more reasonable apriori? they at least don't do writes per tiny bit of metadata). Some quantitative results: for allmydata.test.test_system.HTTPSystemTest.test_filesystem, number of writes during immutable upload goes from 530 to 60. So that's good! It does not however make a meaningful dent in run time... so there are likely other overly chatty interactions.

Initial thought is that downloads are maybe using too small of a chunk size, so will investigate that next.

Did all the above (mutable uploads seem more reasonable apriori? they at least don't do writes per tiny bit of metadata). Some quantitative results: for `allmydata.test.test_system.HTTPSystemTest.test_filesystem`, number of writes during immutable upload goes from 530 to 60. So that's good! It does not however make a meaningful dent in run time... so there are likely other overly chatty interactions. Initial thought is that downloads are maybe using too small of a chunk size, so will investigate that next.

GitHub <noreply@github.com> commented

2022-12-05 19:06:01 +00:00

In 1eba202c/trunk:

Merge pull request #1231 from tahoe-lafs/3939-faster-http-protocol

Faster http protocol, part 1 (and maybe faster Foolscap too, while we're at it)

Fixes ticket:3939

In [1eba202c/trunk](/tahoe-lafs/trac-2024-07-25/commit/1eba202c08590588bf9ec9e3e04375286f4a4327): ``` Merge pull request #1231 from tahoe-lafs/3939-faster-http-protocol Faster http protocol, part 1 (and maybe faster Foolscap too, while we're at it) Fixes ticket:3939 ```

tahoe-lafs added the

fixed

label 2022-12-05 19:06:01 +00:00

GitHub <noreply@github.com> closed this issue

2022-12-05 19:06:01 +00:00

Sign in to join this conversation.