HTTP protocol is significantly slower than Foolscap protocol #3939

Closed
opened 2022-11-03 17:17:46 +00:00 by itamarst · 7 comments

Specifically, in https://github.com/tahoe-lafs/tahoe-lafs/pull/1225, test_filesystem (from test_system.py) is 40 seconds with HTTP, compared to 20 with Foolscap.

Of those extra 20 seconds:

  • 7 seconds are from repeated calls to get_spki_hash() in the HTTP client validation code. Or rather, the real bottleneck is the call to get_public_bytes(), which is a little surprising but OK.
  • Another 7 seconds or so are just the TLS handshake?!
  • http_client's getContext() method to create a TLS context is also quite expensive, and there's some expense in creating context objects on the server.

The test setup doesn't use persistent connections, so this arguably unrealistic, and maybe tests should use persistent connections. This still seems... excessive. Perhaps we're doing way too many HTTP requests. E.g. maybe chunk sizes that are OK for Foolscap are too small for HTTP.

Specifically, in <https://github.com/tahoe-lafs/tahoe-lafs/pull/1225>, `test_filesystem` (from test_system.py) is 40 seconds with HTTP, compared to 20 with Foolscap. Of those extra 20 seconds: * 7 seconds are from repeated calls to `get_spki_hash()` in the HTTP client validation code. Or rather, the _real_ bottleneck is the call to get_public_bytes(), which is a little surprising but OK. * Another 7 seconds or so are just the TLS handshake?! * http_client's getContext() method to create a TLS context is also quite expensive, and there's some expense in creating context objects on the server. The test setup doesn't use persistent connections, so this arguably unrealistic, and maybe tests should use persistent connections. This still seems... excessive. Perhaps we're doing way too many HTTP requests. E.g. maybe chunk sizes that are OK for Foolscap are too small for HTTP.
itamarst added the
unknown
normal
defect
n/a
labels 2022-11-03 17:17:46 +00:00
itamarst added this to the HTTP Storage Protocol milestone 2022-11-03 17:17:46 +00:00
Author

If I switch back to persistent HTTP connections, the test takes 20 seconds. So this is perhaps not a blocker if I can figure out how to make the tests not get dirty reactor with persistent HTTP.

Still seems worth fixing though, it suggests the HTTP protocol is doing way too much requests.

If I switch back to persistent HTTP connections, the test takes 20 seconds. So this is perhaps not a blocker if I can figure out how to make the tests not get dirty reactor with persistent HTTP. Still seems worth fixing though, it suggests the HTTP protocol is doing way too much requests.
Author

Looking at an immutable, here are some writes for a (presumably small) object, as recorded via layout.py: 36 bytes, 1400, 32, 32, 32, 170, 320. This will get batched via pipeline.py (see #3787) and for Foolscap maybe that's fine. But HTTP/1.1 has higher overhead per query than Foolscap I suspect, and even with HTTP 2.0 I imagine it's rather higher.

So possibly one strategy is doing an alternative to pipeline.py where logic isn't generic "batch API queries" but instead it relies on the fact we're doing writes without any holes, so we can coalesce writes semantically. (Previously it would have holes so this would've been harder.)

Looking at an immutable, here are some writes for a (presumably small) object, as recorded via layout.py: 36 bytes, 1400, 32, 32, 32, 170, 320. This will get batched via pipeline.py (see #3787) and for Foolscap maybe that's fine. But HTTP/1.1 has higher overhead per query than Foolscap I suspect, and even with HTTP 2.0 I imagine it's rather higher. So possibly one strategy is doing an alternative to `pipeline.py` where logic isn't generic "batch API queries" but instead it relies on the fact we're doing writes without any holes, so we can coalesce writes semantically. (Previously it _would_ have holes so this would've been harder.)
Author

There is the problem that the API currently doesn't force the code to do writes in order, but that's solvable.

There is the problem that the API currently doesn't force the code to do writes _in order_, but that's solvable.
Author

In practice immutables path does actually have the writes happening in the correct order, so possibly that should just work.

In practice immutables path does actually have the writes happening in the correct order, so possibly that should just work.
Author

TODO for branch-in-progress:

  • Unit tests for new data structure
  • Validate there is still backpressure
  • Update #3787 to note new place that needs tuning
  • See if equivalent improvements are possible for mutables
TODO for branch-in-progress: * Unit tests for new data structure * Validate there is still backpressure * Update #3787 to note new place that needs tuning * See if equivalent improvements are possible for mutables
Author

Did all the above (mutable uploads seem more reasonable apriori? they at least don't do writes per tiny bit of metadata). Some quantitative results: for allmydata.test.test_system.HTTPSystemTest.test_filesystem, number of writes during immutable upload goes from 530 to 60. So that's good! It does not however make a meaningful dent in run time... so there are likely other overly chatty interactions.

Initial thought is that downloads are maybe using too small of a chunk size, so will investigate that next.

Did all the above (mutable uploads seem more reasonable apriori? they at least don't do writes per tiny bit of metadata). Some quantitative results: for `allmydata.test.test_system.HTTPSystemTest.test_filesystem`, number of writes during immutable upload goes from 530 to 60. So that's good! It does not however make a meaningful dent in run time... so there are likely other overly chatty interactions. Initial thought is that downloads are maybe using too small of a chunk size, so will investigate that next.
GitHub <noreply@github.com> commented 2022-12-05 19:06:01 +00:00
Owner

In 1eba202c/trunk:

Merge pull request #1231 from tahoe-lafs/3939-faster-http-protocol

Faster http protocol, part 1 (and maybe faster Foolscap too, while we're at it)

Fixes ticket:3939
In [1eba202c/trunk](/tahoe-lafs/trac-2024-07-25/commit/1eba202c08590588bf9ec9e3e04375286f4a4327): ``` Merge pull request #1231 from tahoe-lafs/3939-faster-http-protocol Faster http protocol, part 1 (and maybe faster Foolscap too, while we're at it) Fixes ticket:3939 ```
tahoe-lafs added the
fixed
label 2022-12-05 19:06:01 +00:00
GitHub <noreply@github.com> closed this issue 2022-12-05 19:06:01 +00:00
Sign in to join this conversation.
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#3939
No description provided.