refactor storage_client.py, use IServer objects instead of rrefs #1363
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1363
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
There's an internal cleanup I've been meaning to do for a while. I started it in source:src/allmydata/storage_client.py a few years ago (some of the TODO notes at the top indicate my plans), but didn't follow through. The goal is for the client to manage a collection of "
IServer
" objects, each of which represents a storage server. These objects each hold aRemoteReference
and track metadata about the server (like nickname, versions, etc).As we start to handle other kinds of servers, these objects will be a place to abstract out the common behavior. The change will be to support #466, as signed announcements result in a different notion of "server id" than unsigned ones. The
IServer
object will get some methods that tell the caller what write-enabler seed or lease-renewal seed or peer-selection seed to use.The next change will be to move the details of interacting with the share into
IServer
, such as the actualcallRemote
method names. Then, when we add an HTTP-based server (which would useGET
with aRange:
header), the uploader/downloader doesn't need to know quite so much about the server type.This ticket is to track the refactoring progress and host the patches for review.
Attachment 1363-patch1.diff (40400 bytes) added
first refactoring step
Let's avoid the word "peer". It usually means both more and less than we really mean in this code, and we've already changed most or all of our documentation to use "server" or some other more specific word instead of what we used to call "peer".
So
peer_selection_index
should probably be renamedserver_selection_index
. This doesn't have to be a blocker for this patch or for this branch, but please let's get consensus on terminology whenever possible for future convergence.Why are we using sha1 for testing permutation instead of sha256? This patch didn't introduce this behavior so it isn't a blocker for this patch, but it is weird.
cool, I'll add "server_selection_index" to my TODO list for this ticket. My only problem with it is the acronym collision: we use "storage index" and "SI" in lots of places, and it'd be nice to have something for this that didn't have quite so many Ss and Is in it. But yeah, let's talk it over on the list.
SHA1: hmm, good question. We started out using stdlib
sha.new
(which is SHA1), and never changed it (because to change it would mess up server-selection ordering for existing filecaps). It doesn't need to be secure, as it's just a load-balancing tool, but it would be nice to use the same hash function everywhere. Maybe when #466 gets us to the point of defining a new server-selection-seed (which could be a new name for peer-selection-index), we could say that old SSSs use SHA1 and new SSSs use SHA256.Okay I've now read through all the changes to test files and didn't see anything wrong.
get_known_servers()
returns a new copy of a list, sorted (using thesorted
function), but the only callers ofget_known_servers()
are:get_connected_servers()
, which makes a new frozenset containing a copy of the list, andSo, I suggest that
get_known_servers()
should return a list (or a frozenset for added safety -- David-Sarah and I once spent a long, miserable night tracking down an elusive bug just before a major release which turned out to be due to a function having side-effects on a mutable list of servers that had been passed into it), and that web/root.py should sort it itself.(Also that in the future web/root.py offer the user controls to sort the list of servers by different columns. :-))
This doesn't need to be a blocker for this patch, but it does look like the kind of thing that Brian might want to tweak.
Okay, I've reviewed 1363-patch1.diff ! Modulo the comments above, +1.
(If you don't mind, remove the
reviewed
keyword after you land it. I seem to recall that you had some different protocol for signalling whether a patch was ready to land or not or landed or not -- I forget.)Ok, 1363-patch1.diff landed in changeset:ffd296fc5ab8007f. Thanks for the quick review!
I'll attach a -patch2 once I'm ready for the next stage of refactoring.
Attachment 1363-patch2.dpatch (167890 bytes) added
bundle of refactoring patches
ok, -patch2 is ready for review. This one is a darcs patch bundle with 20 individual patches, intended to isolate each change for easier review. Many of them are improving internal names, like referring to "servers" instead of "peers", or fixing the uploader to clearly distinguish between a Server object and a ServerTracker (which were sufficiently confusing before that we had a bunch of
assert isinstance(server, [ServerTracker](wiki/ServerTracker))
checks). There's also some dead-code removal, which made subsequent refactoring easier.The bulk of the changes are intended to reduce the use of
get_serverid()
. Previously, a lot of the code has been passing around(tubid, rref)
tuples: the goal is to pass aroundIServer
objects instead. The first step is to replace those tuples with(s.get_serverid(), s.get_rref())
, but the second step (which this patch starts to implement) is to push that change further down into the code, delaying the conversion fromIServer
toserverid
until the last possible moment, and in many cases not doing it at all. This means that many data structures which were previously indexed by serverid are now indexed byIServer
instance.This patch doesn't complete the job, but it gets a significant amount of the way there. It doesn't touch the mutable code at all: I'm hoping to review and land #393 before attempting any refactoring of
mutable/*.py
, to make life easier.The tree should pass all tests and be pyflakes clean after applying each patch in this series.
note to self: I still need to implement zooko's recommendations from comment:82727 in a later patch.
Reviewing:
"already_servers" should be "already_serverids".
"contacted_servers" and "contacted_servers2" aren't very good variable names. I suggest s/contacted_servers/worth_asking_servers/ and s/contacted_servers2/have_asked_servers/, and similarly for trackers.
Will look at the rest of the patch tomorrow.
There are still lots of instances of "peer" in the source after applying these patches. Many of these are in the mutable code which I know you haven't got to yet, but some others in the following files look like they might be sensibly be changed first:
get_name()
andget_longname()
instead ofname()
andlongname()
..serverid
attribute of NativeStorageServer from elsewhere, or can it be deleted?heh, one step at a time. I'll add those other peer->server items to the TODO list, though.
Yeah, I don't particularly like name/longname either. I'm using
name()
as a short placeholder until I figure out what the method really wants to be called: my goal was to turnbase32.b2a(serverid)
andidlib.shortnodeid_b2a(serverid)
into something likes.name()
, and I was previously usings.get_short_description()
which didn't exactly roll off the tongue.s.get_name()
sounds better, but I'm wondering if something even more descriptive might show up once it's only ever being used in log.msg and webapi-display contexts.But I'll add a patch to use
get_name()/get_longname()
for now, I only see about 30 uses of it.And on
.serverid
, I don't think so, but I'm putting off removing that until I removeget_serverid
too, since the goal is to redefine the concept of "serverid" altogether:.serverid/.get_serverid()
and fix what breaks.Incidentally, if you use https://github.com/warner/tahoe-lafs to grab a copy of my "pass-server" branch, and compare its tip against the "1363-p2" tag, you'll see the changes I've made since the -p2 attachment which implement your recommendations. I'll hold off on adding another patch bundle to this ticket until you've reviewed the existing one and I've landed it.
I'm trying to not go too crazy with the refactoring/renaming, because that'll induce a lot of merge work with the many other project branches I (and others) have hanging around, but the stuff in this ticket ties directly into the #466 work, so I wanted to do them in the right order. So I'm going to be conservative about what I change until some of that other stuff gets landed (especially including #393).
(
cfdbf66ffd
) :set_shareholders
no longer corresponds to the formal parameters. It should say something like "@param holders: a pair (upload_trackers, already_serverids), where".what about -p2? can I land it?
Yes, fine to land -p2. Some nitpicks:
Great, thanks, -p2 has been landed. I'll let you know when I've got a -p3 to review (probably after landing #393 MDMF), and I'll incorporate your suggestions.
Attachment 1363-p3.dpatch (90258 bytes) added
next batch of refactoring patches
ok, here's the next bundle. I'm getting close to the limit of what I can clean up without overlapping with MDMF, but there are a few more I might try to work on. Please review so I can land this puppy.
Still working on this! Will prioritize it.
Worked on this in the car on the way here last week. Planning to work on this and #1385 on the car ride home tomorrow (about ten hours, with one co-driver and two children in the car). In order to make the deadline for new-feature patches for v1.9, which is tomorrow.
1363-p3.dpatch reviewed. This is all really good stuff—I'm glad to see this sort of clean-up branch. I'm sorry it took me so long to review it. I intend to really elevate the priority of reviewing patches in my day to day life so that whenever Brian posts a new patch
review-needed
, I drop everything and review it right away.Patches that get +1 from me and I intend to commit them to trunk soon:
By the way, on my Macbook Pro,
allmydata.test.test_immutable.Test
takes 8s, not 97s! After the patch "test_immutable.Test: rewrite to use NoNetworkGrid" then it takes about 2s on my system. This isn't an issue with the patch, but it may indicate there is an issue with your system. Improving the speed of the tests from 8s to 2s is valuable even if your system could be changed to run the old tests in a mere 8s. You could look at the timings of the buildslaves for comparison, e.g.: FranXois lenny-armv5tel, Brian ubuntu-linode, Arthur lenny-c7-32bit, FreeStorm WinXP-x86.XXX
comment in the source code? Otherwise +1 on this patch.darcs replace
instead of hunk editing, because that avoided a lot of unnecessary merge conflicts with Kevan's #1382 branch. I'll attach my rerecorded patch to this ticket. I also took notes during the process of merging these with #1382, which notes I'll post to tahoe-dev later, explaining whydarcs replace
was so valuable in this case. Also in set_shareholders() the docstring is now wrong in describing the "upload_trackers" and "already_serverids". I'll change my rerecord of the patch to fix that.Patches with issues:
M-x whitespace-cleanup
on this file and emacs made only one change (removing a trailing blank line at end of file).rref
parameter tomake_write_bucket_proxy()
be removed in favor of using the.get_rref()
method of theserver
argument? Otherwise +1ReadBucketProxy
stop accepting anrref
argument now that it has aserver
argument to its constructor? Otherwise +1. If Brian (or someone) tells me that these two patches should go in as-is without removing therref
parameter, I'm okay with that.Attachment remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch (29063 bytes) added
Okay of the five patches with issues, remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch is my rebase of the first two, the whitespace we can skip, and I want to hear from Brian or someone that the last two are okay as-is, or else get an updated version that removes the
rref
param.review-needed
!In changeset:880758340fb827f6:
In changeset:0f11d35f855ed7c0:
In changeset:b07af5e1a2e35320:
In changeset:0605c77f08fb4b78:
In changeset:feca907499070bc1:
In changeset:dc668754793087a9:
In changeset:6b2e7985955fb312:
Your modified patches in
remove_get_serverid_from_DownloadStatus.add_request_sent_and_add_block_request-rebased.darcs.patch
look fine, except for two hunks in the second patch that have a typo
(possibly one that I introduced in one patch and then fixed in a
subsequent one).
that last line needs to use serverid_b, not serverid_a.
same issue, it needs to be "serverA".
As for changing
make_write_bucket_proxy()
to take anIServer
instead of a
(rref, IServer)
pair: nope, the rref passed intomake_write_bucket_proxy()
is anRIBucketWriter
(bound to aspecific share), whereas
IServer.get_rref()
returns the server'sRIStorageServer
(on which you useallocate_buckets()
to getan
RIBucketWriter
). I suppose it'd have been more obvious if theparameter name was "bucket_rref" instead of just "rref".
In changeset:550d67f51f7ebd45:
In changeset:3668cb3d068b7f3a:
note: changeset:5bf1ffbc879cf082 has some more work along these lines
In 54f974d/trunk: