implement new publish/subscribe introduction scheme #271

Closed
opened 2008-01-11 01:30:00 +00:00 by zooko · 7 comments

Implement the new publish/subscribe introduction scheme we've been discussing
recently:

  • enumerate the services which can be published and queried for:

    • upload storage server (ones which will accept new shares)
    • download storage server (ones which will let you read shares)
      • (soon-to-be-decommissioned storage servers will be download-only)
    • helpers and other introducers may be added to this list, but we need
      to talk about that more first.. I'm not sure about it.
  • all nodes should have an IntroducerClient, as an attribute of the Node
    instance.

  • to publish a service, do e.g.:

    if self.get_config("offer_storage"):
        ss = StorageServer()
        ss.setServiceParent(self)
        self.introducer.publish(ss, "upload_storage")
        self.introducer.pushing(ss, "download_storage")
  • if the node cares about a particular service, it must register that intent
    at startup:
    if want_storage_servers:
        self.introducer.subscribe_to("upload_storage")
        self.introducer.subscribe_to("download_storage")
  • then, to access a service, there are two APIs: one that does permutation
    (for upload/download) and one which just returns a flat list (mostly for
    the welcome page):
    ppeers = self.introducer.get_permuted_peers("download_storage", storage_index)
    # ppeers is a list of (permuted_peerid, peerid, RemoteReference)
    all_peers = self.introducer.get_peers("upload_storage")
  • add config flags to disable upload, and to disable storage completely.
    Client installs (i.e. those created by py2exe) will disable storage
    service by default. Storage-only nodes won't subscribe to hear about other
    storage nodes.

Other things to think about:

  • get_permuted_peers could return a Deferred (which would make it easier for
    us to create a special kind of helper which knows about peers for you), or
    return an iterator, or both, somehow. To actually make this useful is
    non-trivial (to reduce the memory footprint, you'd want an iterator that
    yields Deferreds, but that might also impose a stupidly large number of
    roundtrips to a query). We should probably wait until we identify a need
    for this before implementing any part of it.
  • This API implies a publish/subscribe model in which the subscription
    accumulates knowledge about peers, and the actual point of use (i.e.
    upload or download) samples whatever peers have been acquired by that
    time. This might not be the best approach.
Implement the new publish/subscribe introduction scheme we've been discussing recently: * enumerate the services which can be published and queried for: * upload storage server (ones which will accept new shares) * download storage server (ones which will let you read shares) * (soon-to-be-decommissioned storage servers will be download-only) * helpers and other introducers may be added to this list, but we need to talk about that more first.. I'm not sure about it. * all nodes should have an IntroducerClient, as an attribute of the Node instance. * to publish a service, do e.g.: ``` if self.get_config("offer_storage"): ss = StorageServer() ss.setServiceParent(self) self.introducer.publish(ss, "upload_storage") self.introducer.pushing(ss, "download_storage") ``` * if the node cares about a particular service, it must register that intent at startup: ``` if want_storage_servers: self.introducer.subscribe_to("upload_storage") self.introducer.subscribe_to("download_storage") ``` * then, to access a service, there are two APIs: one that does permutation (for upload/download) and one which just returns a flat list (mostly for the welcome page): ``` ppeers = self.introducer.get_permuted_peers("download_storage", storage_index) # ppeers is a list of (permuted_peerid, peerid, RemoteReference) all_peers = self.introducer.get_peers("upload_storage") ``` * add config flags to disable upload, and to disable storage completely. Client installs (i.e. those created by py2exe) will disable storage service by default. Storage-only nodes won't subscribe to hear about other storage nodes. Other things to think about: * get_permuted_peers could return a Deferred (which would make it easier for us to create a special kind of helper which knows about peers for you), or return an iterator, or both, somehow. To actually make this useful is non-trivial (to reduce the memory footprint, you'd want an iterator that yields Deferreds, but that might also impose a stupidly large number of roundtrips to a query). We should probably wait until we identify a need for this before implementing any part of it. * This API implies a publish/subscribe model in which the subscription accumulates knowledge about peers, and the actual point of use (i.e. upload or download) samples whatever peers have been acquired by that time. This might not be the best approach.
zooko added the
code-network
major
enhancement
0.7.0
labels 2008-01-11 01:30:00 +00:00
zooko added this to the 0.8.0 (Allmydata 3.0 Beta) milestone 2008-01-11 01:30:00 +00:00
zooko self-assigned this 2008-01-11 01:30:00 +00:00

In a separate but related topic, we were talking about the possible utility
of different "classes" of introduction: a node could publish some object in
one category ("storage servers") and a different object in some other
category ("upload helpers").

It occurred to me that it might be useful to have "storage servers for
upload" and "storage servers for download" to be separate categories. One use
would be a way to deal with the #269 mistake (in which I accidentally caused
most of our storage servers to generate new keys and therefore change
nodeids). We could resurrect the old nodeids in a different place, and move
all their old shares to be served by those nodes, thus making the mutable
slots available once more. But we'd like those nodes to only stick around
long enough to allow clients to migrate their data onto the real servers, so
we'd want to prevent new shares from being uploaded to them. The only tool we
have at the moment is to set size_limit=0, but sizes aren't being enforced
for mutable slots yet. But, if these "read-only" nodes were published as
download storage servers (and not upload storage servers), then the upload
and download code could use slightly different peersets, and we'd get the
desired behavior.

Likewise, if we have a storage server which is scheduled to be decommissioned
(say, the hard drive is starting to have soft errors, and we've begun the
process of migrating shares off of it but have not yet finished the job), it
might be nice to allow it to be available for reading but not accept any new
shares. Not being published as an upload server would prevent clients from
trying to send shares to it in the most efficient way possible.

In a separate but related topic, we were talking about the possible utility of different "classes" of introduction: a node could publish some object in one category ("storage servers") and a different object in some other category ("upload helpers"). It occurred to me that it might be useful to have "storage servers for upload" and "storage servers for download" to be separate categories. One use would be a way to deal with the #269 mistake (in which I accidentally caused most of our storage servers to generate new keys and therefore change nodeids). We could resurrect the old nodeids in a different place, and move all their old shares to be served by those nodes, thus making the mutable slots available once more. But we'd like those nodes to only stick around long enough to allow clients to migrate their data onto the real servers, so we'd want to prevent new shares from being uploaded to them. The only tool we have at the moment is to set size_limit=0, but sizes aren't being enforced for mutable slots yet. But, if these "read-only" nodes were published as download storage servers (and *not* upload storage servers), then the upload and download code could use slightly different peersets, and we'd get the desired behavior. Likewise, if we have a storage server which is scheduled to be decommissioned (say, the hard drive is starting to have soft errors, and we've begun the process of migrating shares off of it but have not yet finished the job), it might be nice to allow it to be available for reading but not accept any new shares. Not being published as an upload server would prevent clients from trying to send shares to it in the most efficient way possible.
Author

Rob pointed out that this generalized pubsub mechanism might be a good way to meet upload helpers.

While scrubbing the kitchen floor with Amber on Saturday, I figured out that this might be a good way to meet other introducers, leading to #68 -- "implement distributed introduction, remove Introducer as a single point of failure".

Rob pointed out that this generalized pubsub mechanism might be a good way to meet upload helpers. While scrubbing the kitchen floor with Amber on Saturday, I figured out that this might be a good way to meet other introducers, leading to #68 -- "implement distributed introduction, remove Introducer as a single point of failure".
Author

merging in #168

merging in #168

Updated summary and description to specify the new introduction scheme we're planning to implement.

Updated summary and description to specify the new introduction scheme we're planning to implement.
warner changed title from subscriber-only introducer client to implement new publish/subscribe introduction scheme 2008-01-23 17:49:29 +00:00

we've finished the first step, in changeset:7421d99f186ac96d and changeset:3aceb6be1e797e50

  • allow clients to send a hello() to the introducer with my_furl=None to indicate that they
    do not wish to publish anything
  • if BASEDIR/no_storage is present, do not publish anything

Rob will change the config-wizard (used by the windows installer) to touch this file at config time, and that should be enough to accompish the primary goal: make customer nodes not offer storage servers.

The next step will be to actually split the introducer into separate publish+subscribe methods.

we've finished the first step, in changeset:7421d99f186ac96d and changeset:3aceb6be1e797e50 * allow clients to send a hello() to the introducer with my_furl=None to indicate that they do not wish to publish anything * if BASEDIR/no_storage is present, do not publish anything Rob will change the config-wizard (used by the windows installer) to touch this file at config time, and that should be enough to accompish the primary goal: make customer nodes not offer storage servers. The next step will be to actually split the introducer into separate publish+subscribe methods.
zooko was unassigned by warner 2008-02-02 02:13:59 +00:00
warner self-assigned this 2008-02-02 02:13:59 +00:00

I've implemented the next step: splitting the introducer into separate publish and subscribe methods. The new introducer is more service-centric: you publish specific services (like "storage") rather than publishing the client as a whole.

This will cause a compatibility bump, so I haven't quite pushed it yet, but I ought to by the end of the day.

I've implemented the next step: splitting the introducer into separate publish and subscribe methods. The new introducer is more service-centric: you publish specific services (like "storage") rather than publishing the client as a whole. This will cause a compatibility bump, so I haven't quite pushed it yet, but I ought to by the end of the day.

changes are pushed, and the test grid has been upgraded. The only remaining issue is what to do with the old introducer-ish functionality of distributing the default encoding parameters, and I'm ok with zooko's suggestion to just leave this out.

closing this ticket, yay!

changes are pushed, and the test grid has been upgraded. The only remaining issue is what to do with the old introducer-ish functionality of distributing the default encoding parameters, and I'm ok with zooko's suggestion to just leave this out. closing this ticket, yay!
warner added the
fixed
label 2008-02-06 01:23:57 +00:00
Sign in to join this conversation.
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#271
No description provided.