URL formats for HTTP-based storage server #1565

New Issue

warner · 2011-10-17T20:43:53Z

warner commented

2011-10-17 20:43:53 +00:00

Ticket #510 is about speaking to storage servers with mostly-plain HTTP. One
piece of this is deciding what the URLs should look like. Downloading a share
from the storage server should be a simple HTTP "GET", using a Range:
header to fetch less than the whole share. But we also need ways to discover
which shares are available for download, and eventually ways to upload data
to the server too.

Here's the starting point that I implemented in my prototype (which still
uses Foolscap and get_buckets() to discover shares):

GET /storage/imm/SI/%(storage_index)s/share/%(shnum)d: retrieves data

from the given share. Normal downloads use e.g. Range: bytes=87418-131108,422601-422664,423593-423656 to fetch a bunch of
spans.

GET /storage: this currently returns a human-readable page describing

the state of the storage server.

The next steps:

GET /storage/imm/SI/%(storage_index)s/shares: return a JSON list of

share numbers

GET /storage/imm/SI/%(storage_index)s/all_shares: return a JSON

dictionary mapping share number to a read data vector. The same spans are
returned for all shares. This collapses the Do-You-Have-Block query with
the initial data fetch, allowing one-round-trip downloads.

I put "imm" into the URL because the current storage server treats immutable
and mutable shares very differently (they have different container formats).
It's not trivial to take an SI and switch on the type of share that it points
to. It might be cleaner to fix the server to handle this well, and then
remove the "imm" from the URL. OTOH, it might be better to leave them
distinct.

We need similar URLs for reading from mutable shares; they can probably be
the same but with "mut" instead of "imm".

We'll need POST URLs for uploading files and modifying mutable shares, as
well as adding/renewing leases and other storage server methods. The request
bodies will be more complicated since they'll need authorization signatures
or something. But the basic URL target could be:

POST /storage/imm/SI/%(storage_index)s/shares/%(shnum)d: start

uploading the given share. Return 302 FOUND if the share already exists.
The upload can be spread across multiple requests, with a "finished" flag
on the last request. This might involve returning an "upload token" which
subsequent requests must reference.

POST /storage/mut/SI/%(storage_index)s/shares/%(shnum)d: modify the

given mutable share. The body will probably be a signed serialized JSON
modification request, basically a write-vector, along with a test-vector or
other collision-avoidance scheme.

All of this presumes that Accounting is not being enforced on read access. At
least one of the designs I've drawn up offers read=False control, as a
stick for the storage operator to use against a client who doesn't pay their
bills (but still less drastic than store=False, which deletes all their
data). To enforce read=False, the GETs would need to be authorized,
which either involves adding an extra signature header, or implementing them
with a POST instead (and putting the signature in the request body).

Ticket #510 is about speaking to storage servers with mostly-plain HTTP. One piece of this is deciding what the URLs should look like. Downloading a share from the storage server should be a simple HTTP "GET", using a `Range:` header to fetch less than the whole share. But we also need ways to discover which shares are available for download, and eventually ways to upload data to the server too. Here's the starting point that I implemented in my prototype (which still uses Foolscap and get_buckets() to discover shares): * `GET /storage/imm/SI/%(storage_index)s/share/%(shnum)d`: retrieves data > from the given share. Normal downloads use e.g. `Range: > bytes=87418-131108,422601-422664,423593-423656` to fetch a bunch of > spans. * `GET /storage`: this currently returns a human-readable page describing > the state of the storage server. The next steps: * `GET /storage/imm/SI/%(storage_index)s/shares`: return a JSON list of > share numbers * `GET /storage/imm/SI/%(storage_index)s/all_shares`: return a JSON > dictionary mapping share number to a read data vector. The same spans are > returned for all shares. This collapses the Do-You-Have-Block query with > the initial data fetch, allowing one-round-trip downloads. I put "imm" into the URL because the current storage server treats immutable and mutable shares very differently (they have different container formats). It's not trivial to take an SI and switch on the type of share that it points to. It might be cleaner to fix the server to handle this well, and then remove the "imm" from the URL. OTOH, it might be better to leave them distinct. We need similar URLs for reading from mutable shares; they can probably be the same but with "mut" instead of "imm". We'll need POST URLs for uploading files and modifying mutable shares, as well as adding/renewing leases and other storage server methods. The request bodies will be more complicated since they'll need authorization signatures or something. But the basic URL target could be: * `POST /storage/imm/SI/%(storage_index)s/shares/%(shnum)d`: start > uploading the given share. Return 302 FOUND if the share already exists. > The upload can be spread across multiple requests, with a "finished" flag > on the last request. This might involve returning an "upload token" which > subsequent requests must reference. * `POST /storage/mut/SI/%(storage_index)s/shares/%(shnum)d`: modify the > given mutable share. The body will probably be a signed serialized JSON > modification request, basically a write-vector, along with a test-vector or > other collision-avoidance scheme. All of this presumes that Accounting is not being enforced on read access. At least one of the designs I've drawn up offers `read=False` control, as a stick for the storage operator to use against a client who doesn't pay their bills (but still less drastic than `store=False`, which deletes all their data). To enforce `read=False`, the GETs would need to be authorized, which either involves adding an extra signature header, or implementing them with a POST instead (and putting the signature in the request body).

warner added the

labels 2011-10-17 20:43:53 +00:00

warner added this to the eventually milestone 2011-10-17 20:43:53 +00:00

exarkun commented

2018-07-30 14:33:08 +00:00

This has been resolved as part of https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2925

This has been resolved as part of <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2925>

exarkun added the