refactor storage_client.py, use IServer objects instead of rrefs #1363

Closed
opened 2011-02-20 21:01:01 +00:00 by warner · 38 comments

There's an internal cleanup I've been meaning to do for a while. I started it in source:src/allmydata/storage_client.py a few years ago (some of the TODO notes at the top indicate my plans), but didn't follow through. The goal is for the client to manage a collection of "IServer" objects, each of which represents a storage server. These objects each hold a RemoteReference and track metadata about the server (like nickname, versions, etc).

As we start to handle other kinds of servers, these objects will be a place to abstract out the common behavior. The change will be to support #466, as signed announcements result in a different notion of "server id" than unsigned ones. The IServer object will get some methods that tell the caller what write-enabler seed or lease-renewal seed or peer-selection seed to use.

The next change will be to move the details of interacting with the share into IServer, such as the actual callRemote method names. Then, when we add an HTTP-based server (which would use GET with a Range: header), the uploader/downloader doesn't need to know quite so much about the server type.

This ticket is to track the refactoring progress and host the patches for review.

There's an internal cleanup I've been meaning to do for a while. I started it in source:src/allmydata/storage_client.py a few years ago (some of the TODO notes at the top indicate my plans), but didn't follow through. The goal is for the client to manage a collection of "`IServer`" objects, each of which represents a storage server. These objects each hold a `RemoteReference` and track metadata about the server (like nickname, versions, etc). As we start to handle other kinds of servers, these objects will be a place to abstract out the common behavior. The change will be to support #466, as signed announcements result in a different notion of "server id" than unsigned ones. The `IServer` object will get some methods that tell the caller what write-enabler seed or lease-renewal seed or peer-selection seed to use. The next change will be to move the details of interacting with the share into `IServer`, such as the actual `callRemote` method names. Then, when we add an HTTP-based server (which would use `GET` with a `Range:` header), the uploader/downloader doesn't need to know quite so much about the server type. This ticket is to track the refactoring progress and host the patches for review.
warner added the
code
major
task
1.8.2
labels 2011-02-20 21:01:01 +00:00
warner added this to the undecided milestone 2011-02-20 21:01:01 +00:00
warner self-assigned this 2011-02-20 21:01:01 +00:00
Author

Attachment 1363-patch1.diff (40400 bytes) added

first refactoring step

**Attachment** 1363-patch1.diff (40400 bytes) added first refactoring step
warner was unassigned by zooko 2011-02-20 21:40:00 +00:00
zooko self-assigned this 2011-02-20 21:40:00 +00:00

Let's avoid the word "peer". It usually means both more and less than we really mean in this code, and we've already changed most or all of our documentation to use "server" or some other more specific word instead of what we used to call "peer".

So peer_selection_index should probably be renamed server_selection_index. This doesn't have to be a blocker for this patch or for this branch, but please let's get consensus on terminology whenever possible for future convergence.

Let's avoid the word "peer". It usually means both more and less than we really mean in this code, and we've already changed most or all of our documentation to use "server" or some other more specific word instead of what we used to call "peer". So `peer_selection_index` should probably be renamed `server_selection_index`. This doesn't have to be a blocker for this patch or for this branch, but please let's get consensus on terminology whenever possible for future convergence.

Why are we using sha1 for testing permutation instead of sha256? This patch didn't introduce this behavior so it isn't a blocker for this patch, but it is weird.

Why are we using sha1 for testing permutation instead of sha256? This patch didn't introduce this behavior so it isn't a blocker for this patch, but it is weird.
Author

cool, I'll add "server_selection_index" to my TODO list for this ticket. My only problem with it is the acronym collision: we use "storage index" and "SI" in lots of places, and it'd be nice to have something for this that didn't have quite so many Ss and Is in it. But yeah, let's talk it over on the list.

SHA1: hmm, good question. We started out using stdlib sha.new (which is SHA1), and never changed it (because to change it would mess up server-selection ordering for existing filecaps). It doesn't need to be secure, as it's just a load-balancing tool, but it would be nice to use the same hash function everywhere. Maybe when #466 gets us to the point of defining a new server-selection-seed (which could be a new name for peer-selection-index), we could say that old SSSs use SHA1 and new SSSs use SHA256.

cool, I'll add "server_selection_index" to my TODO list for this ticket. My only problem with it is the acronym collision: we use "storage index" and "SI" in lots of places, and it'd be nice to have something for this that didn't have quite so many Ss and Is in it. But yeah, let's talk it over on the list. SHA1: hmm, good question. We started out using stdlib `sha.new` (which is SHA1), and never changed it (because to change it would mess up server-selection ordering for existing filecaps). It doesn't need to be secure, as it's just a load-balancing tool, but it would be nice to use the same hash function everywhere. Maybe when #466 gets us to the point of defining a new server-selection-seed (which could be a new name for peer-selection-index), we could say that old SSSs use SHA1 and new SSSs use SHA256.

Okay I've now read through all the changes to test files and didn't see anything wrong.

Okay I've now read through all the changes to test files and didn't see anything wrong.

get_known_servers() returns a new copy of a list, sorted (using the sorted function), but the only callers of get_known_servers() are:

  • get_connected_servers(), which makes a new frozenset containing a copy of the list, and
  • the welcome page in [web/root.py]source:trunk/src/allmydata/web/root.py?annotate=blame&rev=4529#L248.

So, I suggest that get_known_servers() should return a list (or a frozenset for added safety -- David-Sarah and I once spent a long, miserable night tracking down an elusive bug just before a major release which turned out to be due to a function having side-effects on a mutable list of servers that had been passed into it), and that web/root.py should sort it itself.

(Also that in the future web/root.py offer the user controls to sort the list of servers by different columns. :-))

This doesn't need to be a blocker for this patch, but it does look like the kind of thing that Brian might want to tweak.

`get_known_servers()` returns a new copy of a list, sorted (using the `sorted` function), but the only callers of `get_known_servers()` are: * `get_connected_servers()`, which makes a new frozenset containing a copy of the list, and * the welcome page in [web/root.py]source:trunk/src/allmydata/web/root.py?annotate=blame&rev=4529#L248. So, I suggest that `get_known_servers()` should return a list (or a frozenset for added safety -- David-Sarah and I once spent a long, miserable night tracking down an elusive bug just before a major release which turned out to be due to a function having side-effects on a mutable list of servers that had been passed into it), and that web/root.py should sort it itself. (Also that in the future web/root.py offer the user controls to sort the list of servers by different columns. :-)) This doesn't need to be a blocker for this patch, but it does look like the kind of thing that Brian might want to tweak.

Okay, I've reviewed 1363-patch1.diff ! Modulo the comments above, +1.

(If you don't mind, remove the reviewed keyword after you land it. I seem to recall that you had some different protocol for signalling whether a patch was ready to land or not or landed or not -- I forget.)

Okay, I've reviewed [1363-patch1.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-c185-57de-d94f-7ae03ca81266) ! Modulo the comments above, +1. (If you don't mind, remove the `reviewed` keyword after you land it. I seem to recall that you had some different protocol for signalling whether a patch was ready to land or not or landed or not -- I forget.)
zooko removed their assignment 2011-02-20 23:38:10 +00:00
warner was assigned by zooko 2011-02-20 23:38:10 +00:00
Author

Ok, 1363-patch1.diff landed in changeset:ffd296fc5ab8007f. Thanks for the quick review!

I'll attach a -patch2 once I'm ready for the next stage of refactoring.

Ok, [1363-patch1.diff](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-c185-57de-d94f-7ae03ca81266) landed in changeset:ffd296fc5ab8007f. Thanks for the quick review! I'll attach a -patch2 once I'm ready for the next stage of refactoring.
Author

Attachment 1363-patch2.dpatch (167890 bytes) added

bundle of refactoring patches

**Attachment** 1363-patch2.dpatch (167890 bytes) added bundle of refactoring patches
Author

ok, -patch2 is ready for review. This one is a darcs patch bundle with 20 individual patches, intended to isolate each change for easier review. Many of them are improving internal names, like referring to "servers" instead of "peers", or fixing the uploader to clearly distinguish between a Server object and a ServerTracker (which were sufficiently confusing before that we had a bunch of assert isinstance(server, [ServerTracker](wiki/ServerTracker)) checks). There's also some dead-code removal, which made subsequent refactoring easier.

The bulk of the changes are intended to reduce the use of get_serverid(). Previously, a lot of the code has been passing around (tubid, rref) tuples: the goal is to pass around IServer objects instead. The first step is to replace those tuples with (s.get_serverid(), s.get_rref()), but the second step (which this patch starts to implement) is to push that change further down into the code, delaying the conversion from IServer to serverid until the last possible moment, and in many cases not doing it at all. This means that many data structures which were previously indexed by serverid are now indexed by IServer instance.

This patch doesn't complete the job, but it gets a significant amount of the way there. It doesn't touch the mutable code at all: I'm hoping to review and land #393 before attempting any refactoring of mutable/*.py, to make life easier.

The tree should pass all tests and be pyflakes clean after applying each patch in this series.

note to self: I still need to implement zooko's recommendations from comment:82727 in a later patch.

ok, -patch2 is ready for review. This one is a darcs patch bundle with 20 individual patches, intended to isolate each change for easier review. Many of them are improving internal names, like referring to "servers" instead of "peers", or fixing the uploader to clearly distinguish between a Server object and a ServerTracker (which were sufficiently confusing before that we had a bunch of `assert isinstance(server, [ServerTracker](wiki/ServerTracker))` checks). There's also some dead-code removal, which made subsequent refactoring easier. The bulk of the changes are intended to reduce the use of `get_serverid()`. Previously, a lot of the code has been passing around `(tubid, rref)` tuples: the goal is to pass around `IServer` objects instead. The first step is to replace those tuples with `(s.get_serverid(), s.get_rref())`, but the second step (which this patch starts to implement) is to push that change further down into the code, delaying the conversion from `IServer` to `serverid` until the last possible moment, and in many cases not doing it at all. This means that many data structures which were previously indexed by serverid are now indexed by `IServer` instance. This patch doesn't complete the job, but it gets a significant amount of the way there. It doesn't touch the mutable code at all: I'm hoping to review and land #393 before attempting any refactoring of `mutable/*.py`, to make life easier. The tree should pass all tests and be pyflakes clean after applying each patch in this series. note to self: I still need to implement zooko's recommendations from [comment:82727](/tahoe-lafs/trac-2024-07-25/issues/1363#issuecomment-82727) in a later patch.
warner removed their assignment 2011-02-27 01:26:27 +00:00
zooko was assigned by warner 2011-02-27 01:26:27 +00:00
davidsarah commented 2011-02-27 03:56:37 +00:00
Owner

Reviewing:

  • +1 on "test_client.py, upload.py:: remove KiB/MiB/etc constants, and other dead code". I like negative-code patches :-)
  • +1 on "storage_client.py: clean up test_add_server/test_add_descriptor, remove .test_servers"
  • +1 on "upload.py: more tracker-vs-server cleanup", with a nitpick that "to a set of serverids which claim to already have the share" should be "to a set of serverids for servers that claim to already have the share".

"already_servers" should be "already_serverids".

"contacted_servers" and "contacted_servers2" aren't very good variable names. I suggest s/contacted_servers/worth_asking_servers/ and s/contacted_servers2/have_asked_servers/, and similarly for trackers.

Will look at the rest of the patch tomorrow.

Reviewing: * +1 on "test_client.py, upload.py:: remove KiB/MiB/etc constants, and other dead code". I like negative-code patches :-) * +1 on "storage_client.py: clean up test_add_server/test_add_descriptor, remove .test_servers" * +1 on "upload.py: more tracker-vs-server cleanup", with a nitpick that "to a set of serverids which claim to already have the share" should be "to a set of serverids for servers that claim to already have the share". "already_servers" should be "already_serverids". "contacted_servers" and "contacted_servers2" aren't very good variable names. I suggest s/contacted_servers/worth_asking_servers/ and s/contacted_servers2/have_asked_servers/, and similarly for trackers. Will look at the rest of the patch tomorrow.
davidsarah commented 2011-02-28 02:37:46 +00:00
Owner
  • +1 on "happinessutil.py: server-vs-tracker cleanup", "test_upload.py: server-vs-tracker cleanup", "test_upload.py: factor out FakeServerTracker"

There are still lots of instances of "peer" in the source after applying these patches. Many of these are in the mutable code which I know you haven't got to yet, but some others in the following files look like they might be sensibly be changed first:

  • interfaces.py
  • immutable/{encode.py, layout.py, offloaded.py}
  • happinessutil.py
  • hashutil.py
  • storage_client.py
* +1 on "happinessutil.py: server-vs-tracker cleanup", "test_upload.py: server-vs-tracker cleanup", "test_upload.py: factor out FakeServerTracker" There are still lots of instances of "peer" in the source after applying these patches. Many of these are in the mutable code which I know you haven't got to yet, but some others in the following files look like they might be sensibly be changed first: * interfaces.py * immutable/{encode.py, layout.py, offloaded.py} * happinessutil.py * hashutil.py * storage_client.py
davidsarah commented 2011-02-28 02:49:35 +00:00
Owner
  • +1 on "happinessutil.py: finally rename merge_peers to merge_servers"
  • +1 on "upload.py: rearrange _make_trackers a bit, no behavior changes"
  • "add remaining get_* methods to storage_client.Server, NoNetworkServer, and ...":
    • I'd use get_name() and get_longname() instead of name() and longname().
    • Are there references to the .serverid attribute of NativeStorageServer from elsewhere, or can it be deleted?
* +1 on "happinessutil.py: finally rename merge_peers to merge_servers" * +1 on "upload.py: rearrange _make_trackers a bit, no behavior changes" * "add remaining get_* methods to storage_client.Server, NoNetworkServer, and ...": * I'd use `get_name()` and `get_longname()` instead of `name()` and `longname()`. * Are there references to the `.serverid` attribute of NativeStorageServer from elsewhere, or can it be deleted?
Author

heh, one step at a time. I'll add those other peer->server items to the TODO list, though.

Yeah, I don't particularly like name/longname either. I'm using name() as a short placeholder until I figure out what the method really wants to be called: my goal was to turn base32.b2a(serverid) and idlib.shortnodeid_b2a(serverid) into something like s.name(), and I was previously using s.get_short_description() which didn't exactly roll off the tongue. s.get_name() sounds better, but I'm wondering if something even more descriptive might show up once it's only ever being used in log.msg and webapi-display contexts.

But I'll add a patch to use get_name()/get_longname() for now, I only see about 30 uses of it.

And on .serverid, I don't think so, but I'm putting off removing that until I remove get_serverid too, since the goal is to redefine the concept of "serverid" altogether:

  • step one: change as much as possible to use more accurate properties like "server permutation seed", "lease secret seed", and human-display-friendly names.
  • step two: remove .serverid/.get_serverid() and fix what breaks.
  • step three: enjoy brief moment of peace while "serverid" is safely banished
  • step four: re-introduce the term to mean "public key which signed the server's Introducer announcement", since I think that's the best claimant to the term "serverid", and I don't want to switch the semantics until I'm sure there are no remaining users of the old form.
heh, one step at a time. I'll add those other peer->server items to the TODO list, though. Yeah, I don't particularly like name/longname either. I'm using `name()` as a short placeholder until I figure out what the method really wants to be called: my goal was to turn `base32.b2a(serverid)` and `idlib.shortnodeid_b2a(serverid)` into something like `s.name()`, and I was previously using `s.get_short_description()` which didn't exactly roll off the tongue. `s.get_name()` sounds better, but I'm wondering if something even more descriptive might show up once it's only ever being used in log.msg and webapi-display contexts. But I'll add a patch to use `get_name()/get_longname()` for now, I only see about 30 uses of it. And on `.serverid`, I don't think so, but I'm putting off removing that until I remove `get_serverid` too, since the goal is to redefine the concept of "serverid" altogether: * step one: change as much as possible to use more accurate properties like "server permutation seed", "lease secret seed", and human-display-friendly names. * step two: remove `.serverid/.get_serverid()` and fix what breaks. * step three: enjoy brief moment of peace while "serverid" is safely banished * step four: re-introduce the term to mean "public key which signed the server's Introducer announcement", since I think that's the best claimant to the term "serverid", and I don't want to switch the semantics until I'm sure there are no remaining users of the old form.
Author

Incidentally, if you use https://github.com/warner/tahoe-lafs to grab a copy of my "pass-server" branch, and compare its tip against the "1363-p2" tag, you'll see the changes I've made since the -p2 attachment which implement your recommendations. I'll hold off on adding another patch bundle to this ticket until you've reviewed the existing one and I've landed it.

I'm trying to not go too crazy with the refactoring/renaming, because that'll induce a lot of merge work with the many other project branches I (and others) have hanging around, but the stuff in this ticket ties directly into the #466 work, so I wanted to do them in the right order. So I'm going to be conservative about what I change until some of that other stuff gets landed (especially including #393).

Incidentally, if you use <https://github.com/warner/tahoe-lafs> to grab a copy of my "pass-server" branch, and compare its tip against the "1363-p2" tag, you'll see the changes I've made since the -p2 attachment which implement your recommendations. I'll hold off on adding another patch bundle to this ticket until you've reviewed the existing one and I've landed it. I'm trying to not go too crazy with the refactoring/renaming, because that'll induce a lot of merge work with the many other project branches I (and others) have hanging around, but the stuff in this ticket ties directly into the #466 work, so I wanted to do them in the right order. So I'm going to be conservative about what I change until some of that other stuff gets landed (especially including #393).
davidsarah commented 2011-02-28 18:43:52 +00:00
Owner

(cfdbf66ffd) :

  • "We assign each servers/trackers into one three lists." -> "We assign the tracker for each server into one of three lists."
  • The doc comment for set_shareholders no longer corresponds to the formal parameters. It should say something like "@param holders: a pair (upload_trackers, already_serverids), where".
(https://github.com/warner/tahoe-lafs/commit/cfdbf66ffd28bdd679e6f1fc5caf3385ac5d2385) : * "We assign each servers/trackers into one three lists." -> "We assign the tracker for each server into one of three lists." * The doc comment for `set_shareholders` no longer corresponds to the formal parameters. It should say something like "@param holders: a pair (upload_trackers, already_serverids), where".
Author

what about -p2? can I land it?

what about -p2? can I land it?
davidsarah commented 2011-03-23 22:49:38 +00:00
Owner

Yes, fine to land -p2. Some nitpicks:

  • Add a blank line between make_server and make_servers in test_download.py
  • What are the two XXX's added in allmydata/immutable/downloader/fetcher.py (patch lines 3340 and 3367)?
Yes, fine to land -p2. Some nitpicks: * Add a blank line between make_server and make_servers in test_download.py * What are the two XXX's added in allmydata/immutable/downloader/fetcher.py (patch lines 3340 and 3367)?
Author

Great, thanks, -p2 has been landed. I'll let you know when I've got a -p3 to review (probably after landing #393 MDMF), and I'll incorporate your suggestions.

Great, thanks, -p2 has been landed. I'll let you know when I've got a -p3 to review (probably after landing #393 MDMF), and I'll incorporate your suggestions.
Author

Attachment 1363-p3.dpatch (90258 bytes) added

next batch of refactoring patches

**Attachment** 1363-p3.dpatch (90258 bytes) added next batch of refactoring patches
Author

ok, here's the next bundle. I'm getting close to the limit of what I can clean up without overlapping with MDMF, but there are a few more I might try to work on. Please review so I can land this puppy.

ok, here's the next bundle. I'm getting close to the limit of what I can clean up without overlapping with MDMF, but there are a few more I might try to work on. Please review so I can land this puppy.
warner was unassigned by zooko 2011-06-23 20:47:28 +00:00
zooko self-assigned this 2011-06-23 20:47:28 +00:00
tahoe-lafs modified the milestone from undecided to 1.9.0 2011-07-16 20:32:54 +00:00

Still working on this! Will prioritize it.

Still working on this! Will prioritize it.

Worked on this in the car on the way here last week. Planning to work on this and #1385 on the car ride home tomorrow (about ten hours, with one co-driver and two children in the car). In order to make the deadline for new-feature patches for v1.9, which is tomorrow.

Worked on this in the car on the way here last week. Planning to work on this and #1385 on the car ride home tomorrow (about ten hours, with one co-driver and two children in the car). In order to make the deadline for new-feature patches for v1.9, which is tomorrow.

1363-p3.dpatch reviewed. This is all really good stuff—I'm glad to see this sort of clean-up branch. I'm sorry it took me so long to review it. I intend to really elevate the priority of reviewing patches in my day to day life so that whenever Brian posts a new patch review-needed, I drop everything and review it right away.

Patches that get +1 from me and I intend to commit them to trunk soon:

Patches with issues:

[1363-p3.dpatch](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-c185-57de-d94f-c55b745d407b) reviewed. This is all really good stuff—I'm glad to see this sort of clean-up branch. I'm sorry it took me so long to review it. I intend to really elevate the priority of reviewing patches in my day to day life so that whenever Brian posts a new patch `review-needed`, I drop everything and review it right away. Patches that get +1 from me and I intend to commit them to trunk soon: * [test_immutable.Test: rewrite to use [NoNetworkGrid](wiki/NoNetworkGrid), now takes 2.7s not 97s](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L548) By the way, on my Macbook Pro, `allmydata.test.test_immutable.Test` takes 8s, not 97s! After the patch "test_immutable.Test: rewrite to use NoNetworkGrid" then it takes about 2s on my system. This isn't an issue with the patch, but it may indicate there is an issue with your system. Improving the speed of the tests from 8s to 2s is valuable even if your system could be changed to run the old tests in a mere 8s. You could look at the timings of the buildslaves for comparison, e.g.: [FranXois lenny-armv5tel](http://tahoe-lafs.org/buildbot/builders/FranXois%20lenny-armv5tel/builds/486/steps/test/logs/timings), [Brian ubuntu-linode](http://tahoe-lafs.org/buildbot/builders/Brian%20ubuntu-i386%20linode/builds/252/steps/test/logs/timings), [Arthur lenny-c7-32bit](http://tahoe-lafs.org/buildbot/builders/Arthur%20lenny%20c7%2032bit/builds/739/steps/test/logs/timings), [FreeStorm WinXP-x86](http://tahoe-lafs.org/buildbot/builders/FreeStorm%20WinXP-x86%20py2.6/builds/543/steps/test/logs/timings). * [remove now-unused [ShareManglingMixin](wiki/ShareManglingMixin)](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L842); Would have been better included in the previous patch. I'll rerecord them together. * [apply zooko's advice: storage_client get_known_servers() returns a frozenset, caller sorts](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L40) * [DownloadStatus.add_known_share wants to be used by Finder, web.status](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L1320); Wouldn't it be better to open a ticket than to put an `XXX` comment in the source code? Otherwise +1 on this patch. * [replace IServer.name() with get_name(), and get_longname()](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L253) and [upload.py: apply David-Sarah's advice rename (un)contacted(2) trackers to first_pass/second_pass/next_pass](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L76); +1, but I rerecorded them to use `darcs replace` instead of hunk editing, because that avoided a lot of unnecessary merge conflicts with Kevan's #1382 branch. I'll attach my rerecorded patch to this ticket. I also took notes during the process of merging these with #1382, which notes I'll post to tahoe-dev later, explaining why `darcs replace` was so valuable in this case. Also [in set_shareholders()](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L209) the docstring is now wrong in describing the "upload_trackers" and "already_serverids". I'll change my rerecord of the patch to fix that. Patches with issues: * [remove get_serverid from [DownloadStatus](wiki/DownloadStatus).add_request_sent and customers](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L1145) and [remove get_serverid from [DownloadStatus](wiki/DownloadStatus).add_dyhb_sent and customers](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L1013); These two had lots of merge conflicts with trunk. I rebased these patches onto the current head of trunk and will attach them to this ticket for Brian (if available) to review. * [web/status.py: remove spurious whitespace, no code changes](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L1272); merge conflicts; an unimportant patch; I ran `M-x whitespace-cleanup` on this file and emacs made only one change (removing a trailing blank line at end of file). * [remove nodeid from [WriteBucketProxy](wiki/WriteBucketProxy) classes and customers](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L1334); Shouldn't the `rref` parameter to `make_write_bucket_proxy()` be removed in favor of using the `.get_rref()` method of the `server` argument? Otherwise +1 * [remove get_serverid() from [ReadBucketProxy](wiki/ReadBucketProxy) and customers, including Checker](http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/1363/1363-p3.dpatch#L1515) likewise, shouldn't the `ReadBucketProxy` stop accepting an `rref` argument now that it has a `server` argument to its constructor? Otherwise +1. If Brian (or someone) tells me that these two patches should go in as-is without removing the `rref` parameter, I'm okay with that.

Attachment remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch (29063 bytes) added

**Attachment** remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch (29063 bytes) added

Okay of the five patches with issues, remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch is my rebase of the first two, the whitespace we can skip, and I want to hear from Brian or someone that the last two are okay as-is, or else get an updated version that removes the rref param.

review-needed!

Okay of the five patches with issues, [remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-c185-57de-d94f-ee534928d2da) is my rebase of the first two, the whitespace we can skip, and I want to hear from Brian or someone that the last two are okay as-is, or else get an updated version that removes the `rref` param. `review-needed`!
zooko removed their assignment 2011-08-01 19:03:32 +00:00
warner was assigned by zooko 2011-08-01 19:03:32 +00:00
zooko@zooko.com commented 2011-08-01 19:07:10 +00:00
Owner

In changeset:880758340fb827f6:

upload.py: apply David-Sarah's advice rename (un)contacted(2) trackers to first_pass/second_pass/next_pass
This patch was written by Brian but was re-recorded by Zooko (with David-Sarah looking on) to use darcs replace instead of editing to rename the three variables to their new names.
refs #1363
In changeset:880758340fb827f6: ``` upload.py: apply David-Sarah's advice rename (un)contacted(2) trackers to first_pass/second_pass/next_pass This patch was written by Brian but was re-recorded by Zooko (with David-Sarah looking on) to use darcs replace instead of editing to rename the three variables to their new names. refs #1363 ```
warner@lothar.com commented 2011-08-01 19:07:14 +00:00
Owner

In changeset:0f11d35f855ed7c0:

replace IServer.name() with get_name(), and get_longname()

This patch was originally written by Brian, but was re-recorded by Zooko to use
darcs replace instead of hunks for any file in which it would result in fewer
total hunks.
refs #1363
In changeset:0f11d35f855ed7c0: ``` replace IServer.name() with get_name(), and get_longname() This patch was originally written by Brian, but was re-recorded by Zooko to use darcs replace instead of hunks for any file in which it would result in fewer total hunks. refs #1363 ```
warner@lothar.com commented 2011-08-01 19:07:14 +00:00
Owner

In changeset:b07af5e1a2e35320:

DownloadStatus.add_known_share wants to be used by Finder, web.status
refs #1363
In changeset:b07af5e1a2e35320: ``` DownloadStatus.add_known_share wants to be used by Finder, web.status refs #1363 ```
warner@lothar.com commented 2011-08-01 19:07:15 +00:00
Owner

In changeset:0605c77f08fb4b78:

test_immutable.Test: rewrite to use NoNetworkGrid, now takes 2.7s not 97s
remove now-unused ShareManglingMixin
refs #1363
In changeset:0605c77f08fb4b78: ``` test_immutable.Test: rewrite to use NoNetworkGrid, now takes 2.7s not 97s remove now-unused ShareManglingMixin refs #1363 ```
warner@lothar.com commented 2011-08-01 19:07:15 +00:00
Owner

In changeset:feca907499070bc1:

apply zooko's advice: storage_client get_known_servers() returns a frozenset, caller sorts
refs #1363
In changeset:feca907499070bc1: ``` apply zooko's advice: storage_client get_known_servers() returns a frozenset, caller sorts refs #1363 ```
zooko@zooko.com commented 2011-08-01 19:07:16 +00:00
Owner

In changeset:dc668754793087a9:

remove get_serverid from DownloadStatus.add_block_request and customers
This is a rebase of a patch Brian originally wrote. I haven't changed the intent of that patch, just ported it to trunk.
refs #1363
In changeset:dc668754793087a9: ``` remove get_serverid from DownloadStatus.add_block_request and customers This is a rebase of a patch Brian originally wrote. I haven't changed the intent of that patch, just ported it to trunk. refs #1363 ```
zooko@zooko.com commented 2011-08-01 19:07:16 +00:00
Owner

In changeset:6b2e7985955fb312:

remove get_serverid from DownloadStatus.add_dyhb_request and customers
This patch is a rebase of a patch originally written by Brian. I didn't change any of the intent of Brian's patch, just ported it to current trunk.
refs #1363
In changeset:6b2e7985955fb312: ``` remove get_serverid from DownloadStatus.add_dyhb_request and customers This patch is a rebase of a patch originally written by Brian. I didn't change any of the intent of Brian's patch, just ported it to current trunk. refs #1363 ```
Author

Your modified patches in
remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch
look fine, except for two hunks in the second patch that have a typo
(possibly one that I introduced in one patch and then fixed in a
subsequent one).

hunk ./src/allmydata/test/test_web.py 88
-    serverid_a = hashutil.tagged_hash("foo", "serverid_a")[:20]
-    serverid_b = hashutil.tagged_hash("foo", "serverid_b")[:20]
+    serverA = FakeIServer(hashutil.tagged_hash("foo", "serverid_a")[:20])
+    serverB = FakeIServer(hashutil.tagged_hash("foo", "serverid_a")[:20])

that last line needs to use serverid_b, not serverid_a.

hunk ./src/allmydata/test/test_web.py 117
-    e = ds.add_block_request(serverid_a, 1, 120, 30, now+1) # left unfinished
+    e = ds.add_block_request(serverB, 1, 120, 30, now+1) # left unfinished

same issue, it needs to be "serverA".

As for changing make_write_bucket_proxy() to take an IServer
instead of a (rref, IServer) pair: nope, the rref passed into
make_write_bucket_proxy() is an RIBucketWriter (bound to a
specific share), whereas IServer.get_rref() returns the server's
RIStorageServer (on which you use allocate_buckets() to get
an RIBucketWriter). I suppose it'd have been more obvious if the
parameter name was "bucket_rref" instead of just "rref".

Your modified patches in [remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-c185-57de-d94f-ee534928d2da) look fine, except for two hunks in the second patch that have a typo (possibly one that I introduced in one patch and then fixed in a subsequent one). ``` hunk ./src/allmydata/test/test_web.py 88 - serverid_a = hashutil.tagged_hash("foo", "serverid_a")[:20] - serverid_b = hashutil.tagged_hash("foo", "serverid_b")[:20] + serverA = FakeIServer(hashutil.tagged_hash("foo", "serverid_a")[:20]) + serverB = FakeIServer(hashutil.tagged_hash("foo", "serverid_a")[:20]) ``` that last line needs to use serverid_b, not serverid_a. ``` hunk ./src/allmydata/test/test_web.py 117 - e = ds.add_block_request(serverid_a, 1, 120, 30, now+1) # left unfinished + e = ds.add_block_request(serverB, 1, 120, 30, now+1) # left unfinished ``` same issue, it needs to be "serverA". As for changing `make_write_bucket_proxy()` to take an `IServer` instead of a `(rref, IServer)` pair: nope, the rref passed into `make_write_bucket_proxy()` is an `RIBucketWriter` (bound to a specific share), whereas `IServer.get_rref()` returns the server's `RIStorageServer` (on which you use `allocate_buckets()` to get an `RIBucketWriter`). I suppose it'd have been more obvious if the parameter name was "bucket_rref" instead of just "rref".
warner@lothar.com commented 2011-08-01 23:54:24 +00:00
Owner

In changeset:550d67f51f7ebd45:

remove get_serverid() from ReadBucketProxy and customers, including Checker
and debug.py dump-share commands
refs #1363
In changeset:550d67f51f7ebd45: ``` remove get_serverid() from ReadBucketProxy and customers, including Checker and debug.py dump-share commands refs #1363 ```
warner@lothar.com commented 2011-08-01 23:54:25 +00:00
Owner

In changeset:3668cb3d068b7f3a:

remove nodeid from WriteBucketProxy classes and customers
refs #1363
In changeset:3668cb3d068b7f3a: ``` remove nodeid from WriteBucketProxy classes and customers refs #1363 ```
zooko added the
fixed
label 2011-08-02 00:00:36 +00:00
zooko closed this issue 2011-08-02 00:00:36 +00:00
Author

note: changeset:5bf1ffbc879cf082 has some more work along these lines

note: changeset:5bf1ffbc879cf082 has some more work along these lines
Brian Warner <warner@lothar.com> commented 2016-08-26 21:48:40 +00:00
Owner

In 54f974d/trunk:

make IServer.get_serverid() use pubkey, not tubid

This is a change I've wanted to make for many years, because when we get
to HTTP-based servers, we won't have tubids for them. What held me back
was that there's code all over the place that uses the serverid for
various purposes, so I wasn't sure it was safe. I did a big push a few
years ago to use IServer instances instead of serverids in most
places (in #1363), and to split out the values that actually depend upon
tubid into separate accessors (like get_lease_seed and
get_foolscap_write_enabler_seed), which I think took care of all the
important uses.

There are a number of places that use get_serverid() as dictionary key
to track shares (Checker results, mutable servermap). I believe these
are happy to use pubkeys instead of tubids: the only thing they do with
get_serverid() is to compare it to other values obtained from
get_serverid(). A few places in the WUI used serverid to compute display
values: these were fixed.

The main trouble was the Helper: it returns a HelperUploadResults (a
Copyable) with a share->server mapping that's keyed by whatever the
Helper's get_serverid() returns. If the uploader and the helper are on
different sides of this change, the Helper could return values that the
uploader won't recognize. This is cosmetic: that mapping is only used to
display the upload results on the "Recent and Active Operations" page.
I've added code to StorageFarmBroker.get_stub_server() to fall back to
tubids when looking up a server, so this should still work correctly
when the uploader is new and the Helper is old. If the Helper is new and
the uploader is old, the upload results will show unusual server ids.

refs ticket:1363
In [54f974d/trunk](/tahoe-lafs/trac-2024-07-25/commit/54f974d44c3d740bf1a41624d5a001952561c3d2): ``` make IServer.get_serverid() use pubkey, not tubid This is a change I've wanted to make for many years, because when we get to HTTP-based servers, we won't have tubids for them. What held me back was that there's code all over the place that uses the serverid for various purposes, so I wasn't sure it was safe. I did a big push a few years ago to use IServer instances instead of serverids in most places (in #1363), and to split out the values that actually depend upon tubid into separate accessors (like get_lease_seed and get_foolscap_write_enabler_seed), which I think took care of all the important uses. There are a number of places that use get_serverid() as dictionary key to track shares (Checker results, mutable servermap). I believe these are happy to use pubkeys instead of tubids: the only thing they do with get_serverid() is to compare it to other values obtained from get_serverid(). A few places in the WUI used serverid to compute display values: these were fixed. The main trouble was the Helper: it returns a HelperUploadResults (a Copyable) with a share->server mapping that's keyed by whatever the Helper's get_serverid() returns. If the uploader and the helper are on different sides of this change, the Helper could return values that the uploader won't recognize. This is cosmetic: that mapping is only used to display the upload results on the "Recent and Active Operations" page. I've added code to StorageFarmBroker.get_stub_server() to fall back to tubids when looking up a server, so this should still work correctly when the uploader is new and the Helper is old. If the Helper is new and the uploader is old, the upload results will show unusual server ids. refs ticket:1363 ```
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1363
No description provided.