implement distributed introduction, remove Introducer as a single point of failure #68
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#68
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I am quite sure you are aware of the problem of an introducer [...] crash bringing everything down. I read your roadmap.txt but I didn't find anything specific to address this. May I suggest using introducers.furl [...] where multiple entries can be used and the information is updated to all introducers at the same time when a peer makes an update.
Also, upon adding a new introducer, there should be a way to discover all the info currently on the existing introducer. I think I am making this part sounds more trivial than it is.
Thanks
Lu
I would like to make a fully decentralized introduction scheme, such as the one I had in Mnet. Basically, every node would be an Introducer. This doesn't scale up in terms of number of nodes in the network unless we add some cleverness to it, but currently Tahoe networks are neither capable of scaling up to more than 100 nodes nor required to scale up to more than 100 nodes. (See UseCases.)
I will add to source:roadmap.txt about this issue.
I updated source:roadmap.txt . I'm tempted to think we should go directly to connection management v4 and not stop at v3.
Sam Stoller mentioned that it would be cool if peer nodes were discoverable through Bonjour. I agree!
This would be nice, but I think it's a lower priority than the connection management. I think we'll hit scaling problems earlier because of the number of connections held open by client nodes (windows boxes with minimal memory and python-vs-windows limitations) than because of the number of connections held open by a central Introducer (which will be running on a well-provisioned unix box, with plenty of memory and bandwidth, in a professionally-run colo facility). We can also introduce multiple central Introducers without too much effort, which would make them even more available.
Also note that an Introducer failure will prevent new clients from seeing the mesh, but will not prevent already-running clients from continuing to use each other, so such a failure is somewhat graceful.
I'm also thinking that relay is higher priority than distributed introduction.
Also, I'm thinking that we may want to provide for some sort of private mesh in the future, which will mean creating some sort of "membership badge" credentials, which would need to be checked at connection establishment (or lease-request) time, and we might want to at least lay out some requirements for that before building the distributed introduction scheme.
That said, if we choose to build a single global mesh, I very much like the gossip approach to learning about other nodes, and zeroconf/Bonjour would also be pretty slick (although I can only see it being useful when there is already a tahoe node on your local LAN), so I would like to see those implemented sooner or later.
-Brian
One introducer = single point of failureto implement distributed introduction, remove Introducer as a single point of failureZooko and I hashed out a good scheme to do this while I was in Boulder this
week. Here's the plan:
internal setup and emits a FURL to email/IM/paste to your friend
accepts this FURL.
to be merged.
mesh is fully connected
With this approach, there is no single introducer. In addition, it enables
the following interesting properties:
pet name path from themselves to every other node in the mesh.
and options to impose individual quotas, or cut them off entirely
The vdrive server is still an outstanding question: until we get distributed
dirnodes (#115), each dirnode will still be attached to a single host, which
needs to be visible to anyone who's interested in reading the directory. So
our first release that removes the Introducer will probably retain the vdrive
server, and we'll have to figure out a reasonable UI that handles this.
In one of our current designs, the API for the PersonalIntroducer held by
each node on each other node (not necessarily reified as a distinct
Foolscap-Referenceable object, but that would be an easy implementation)
would have the following API:
and
RIPersonalStorageServer
would have the same API as the currentRIStorageServer
, with allocate_buckets, etc.tell_me_about_peers
would use the provided list to filter out all peersthat the asker already knows about, and would then go to all of the remaining
peers with a
please_meet
message to produce newRIPersonalIntroducer
facets for the asker, then return a list of thesefacets. This is the place where Horton will go: until we get that, each node
that does an introduction gets to take advantage of the facet that ought to
be reserved for the asker (i.e. the introductee is vulnerable to the
introducer). With Horton, the same attack exists, but the two nodes will see
different identifiers for the MitM, so that if Bob ever comes to learn about
Carol through a different path, he will perceive her as being different than
the pseudoCarol that Alice gave him.
With a maximally transparent Horton built in to Foolscap, the
tell_me_about_peers method just returns a list of Alice's existing
RIPersonalIntroducer proxies, and Foolscap will do the Horton work to
transform them into Bob-oriented proxies. Also, the please_meet method would
move into the customized Stub class (where it would behave much the same
way).
This is an important feature, but I don't think we are going to get it done in the next six weeks, so I'm putting it in Milestone 1.0.
Here is a simple scheme for decentralized introduction. It probably scales up at least as well as the rest of our current network architecture does — i.e. it is scalable enough for now.
First, implement #271 — "subscriber-only introducer client". DONE
Second, make announcement idempotent in the introducer — i.e. make it so that if a node announces themselves when they are already in the introducer's set of announced nodes, that the introducer ignores the announcement. (This makes sense, anyway, because the introducer doesn't need to inform any subscribers about the re-announcement since the clients will already have heard the earlier announcement from the introducer, and the only time a node would announce itself redundantly would be if that node were buggy.) DONE
Third, make "introducers" a class of publishable, subscribable thing like storage server and read-only storage server and upload helper (as per #271).
Fourth, make all publishers — introducer clients that send announcements — send their announcements to an evenly distributed subset of the introducers, namely the "Chord fingers" — the introducer halfway around the circle from the publisher, plus the one a quarter fo the way around the circle, etc.
Fifth, tell each introducer to subscribe to the introducer-announcements of a small set of other introducers — again choosing the Chord fingers — and whenever the introducer hears announcements from the introducers that it subscribes to, then it announces those announcements themselves, just as if it had just heard them from a client. (Of course, it still ignores any announcements of nodes which it has previously announced, as above.)
Now the load of handling introductions is evenly spread among all introducers, and there is no Single Point of !Failure/Single Point of Load.
Each introducer receives log(N) redundant announcements of each new node, where N is the total number of introducers in the system.
See #295 for how to add access control for the authority to act as a server and the authority to act as a client on top of distributed introduction.
This review of Tahoe-LAFS on arstechnica.com reminds me that while issue #68 seems relatively non-urgent to me because I know how little the grid relies on the introducer and how easy it is to replicate introducers, it would be much better if we could simply say "the grid is fully decentralized" and then introductory articles like this one could optimize out a whole paragraph describing the introducer.
http://allmydata.org/pipermail/tahoe-dev/2009-August/002509.html
Also, fixing this ticket would be fun. Someone should do it. :-)
To get started on this, see source:src/allmydata/introducer/client.py and source:src/allmydata/introducer/server.py. Each of those files is fairly small and you should be able to read through them both and understand the current implementation. See also source:src/allmydata/introducer/interfaces.py which defines the interfaces between the components.
Attachment DualInroducerScenario1.jpeg (76784 bytes) added
Dual Introducer Scenario1 (26/5/10)
Attachment DualInroducerScenario1-Modified.png (99503 bytes) added
Dual Introducer Scenario 1 Modified (27/5/10)
Snapshot A: Client1 and client2 is connected with Introducer X and Y. But Client3 is only connected to IntroducerY.
Snapshot B: IntroducerY becomes down, Client4 joins with a configured to talk to Introducer X and Y. Client3 has no knowledge about Client4 and vice versa.
Snapshot C: Introducer X becomes down and Y becomes up. So all clients come to know about each other.
The main target of this scenario is to enable clients to talk to multiple introducers.
Attachment client(can-subscribe-to-multi-introducer-backward-compat).dpatch (5205 bytes) added
Given a file "introducers" in client basedir, each line containing single introducer_furl, this patch can subscribe to all of them keeping backward compatibility
Backward compatibility is maintained by:
Note this patch does not update client's webui with all connected introducers.
Faruq: glad to see this patch! Okay here are my comments.
What does this comment mean? Do you mean keep it in order not to break any reference to it?
It can't be equal to
'\n'
after a.split('\n')
. Maybe change this to:Now this code needs tests. Let's save this code aside, write a unit test which turned red, and then put this patch back into place and see if it turns the unit test green. So the unit test could, for example, populate the "introducer" file with two introducers, then instantiate the
Client
object (from source:src/allmydata/client.py), then invoke some method of thatClient
object which it will handle correctly only if it knows about both of the introducers.Oh, I've got to go to lunch. I'll look at this more later!
Thanks for corrections. Regarding the reference, that's my intent, not to break any reference to it. If this code is fine, I'd like to add another patch that changes web/root.py and web/welcome.xhtml to show the connected introducers etc.
Attachment connected_to_introducers.png (30184 bytes) added
Client's welcome page shows a list of connected introducers.
Attachment client(can-show-connected-introducers-in-welcome-page).dpatch (5883 bytes) added
Serving the connection status to multiple introducers, still backwrad compatible
Attachment root(can-show-connected-introducers-in-welcome-page).dpatch (1070 bytes) added
Attachment welcome(can-show-connected-introducers-in-welcome-page).dpatch (1385 bytes) added
These patches (probably one patch would be better) fetches the connection status to multiple introducers in somewhat crude way. Tested with enabling and disabling introducers. These patches are also backward compatible, not breaking any reference to old connected_to_introducer(), but new code should call connected_to_introducers() that also supply the status of the single introducer.
Nice work! Next, please write a unit test of these patches. One unit test should verify that the client learns about a server when that server is announced to one introducer and also when that server is announced to the other introducer. The unit test should use a "mock IntroducerClient class" to test that code that your patch changed in source:src/allmydata/client.py@4193#L173. The idea is that the code in source:src/allmydata/client.py thinks that it is instantiating an instance of [IntroducerClient]source:src/allmydata/introducer/client.py@3931#L13, but actually the test code has set it up so that when the code under test instantiates
IntroducerClient()
then instead it gets an instance of the mock introducer client.You can accomplish this using the Python mock library's
mock.patch
decorator. You can copy the way we usemock.patch
in other places in our tests if you like to learn by code copying (I like to learn that way).http://www.voidspace.org.uk/python/mock/
Attachment test_multi_introducers.py (1209 bytes) added
Demo test file that checks if the number of introducer_clients is same as the number of introducers_furls found in "introducers" cfg file
Nice work! Now that there is a unit test we can start thinking about actually committing these patches to trunk.
This test would notice if the code under test failed to read the
.tahoe/introducers
config file correctly or failed to create an IntroducerClient for each one, right?Now can you write a test (or extend the test you already wrote) to notice if the code under test failed to subscribe to all of the introducers that it knew about? For example, maybe the test would configure two introducers in the
.tahoe/introducers
file,mock.patch()
the IntroducerClient class, then instantiate the src/allmydata/client.pyClient
class, then check that two mock IntroducerClients got created and that each of them had their.subscribe_to()
method called.After that, I can't think of any way that your patch to allmydata/client.py would have a bug which would not be caught by these tests. Can you?
Replying to writefaruq:
Instead of doing this, please search the codebase for any other reference to the
self.introducer_furl
attribute and change that code to reference the newself.introducer_furls
attribute instead. Note also that any such code will have unit tests that will turn red if your patch which removesself.introducer_furl
breaks that code, so run the unit tests after you have removedself.introducer_furl
and after you have searched the codebase for other code that usesintroducer_furl
.Likewise in an earlier comment you mentioned:
This is not the sort of "backward compatibility" that we want. If you are adding a new feature in the code or changing a feature in the code then instead of leaving the old feature in place in the code in case anyone is calling it, we prefer to find all callers and update them.
On the other hand the things that you said about backward compatibility of the tahoe.cfg file is the sort of "backward compatibility" that we want. That has to do with users who might be using an older version of Tahoe-LAFS and then upgrade to a newer version which has your patch. We want the behavior of the new version to be some good behavior that they expected even if they do not make any change to their config files.
allmydata.client.Client.self.introducer_furl
is called fromallmydata.web.root.Root
for fetching the list of introducer furls. But that can be replaced by new code that is tested bytest_root.py
.self.introducer_furl
is also called from various testing modules, e.g. test/common.py (line 471). I'm not sure if they need to be patched at this moment.Replying to writefaruq:
As I mentioned on IRC, I want you to do "test-driven development" on this part. Step 1 is to remove the attribute
introducer_furl
from theallmydata.client.Client
class. Step 2 is to run the complete (current) test suite and see which tests, if any, go red. Step 3 is to think about the places that you know of in the code that refer to the old, now-removed attribute, and think about whether the tests that are currently red are the right tests to exercise those places of the code. If they are not the right way to test that code (they test that code only "by accident", in some sense, or you think it is a bad way to test that code for some reason) then write a new test that tests that code. Now for the important point in "test-driven development": you are not allowed to fix the bad code which refers to the now-deletedintroducer_furl
attribute until you have a red test which you think is a good test for that code! Step 4: fix the code. :-)Attachment test_root.py (1009 bytes) added
corrected test for checking the use of introducer_furl by root.py
Looks good! Except, heh heh heh. Isn't this test testing that
data_introducer_furl()
queries the client object's.introducer_furl
method? Maybe you should now change the test to say that if thedata_introducer_furl()
method queries the client object's.introducer_furl
method then it fails the test, but if it queries the client object's.introducer_furls
attribute instead then it passes the test?Then run it and confirm that it fails the test.
Then fix it!
:-)
Attachment enable_client_with_multi_introducer.dpatch (10264 bytes) added
Revised patch for client.py web/root.py web/welcome.xhtml
I have some questions about how decentralized (gossip-based) introduction is supposed to work. Faruq (and everyone who cares about decentralized introduction!) please tell me if my assumptions are wrong.
Assumption 1: there will be a flat text file in your ".tahoe" base dir named "introducers" containing a list of introducer furls that the node will read at start-up.
Assumption 2: whenever the node learns about new introducers it will write the furl of that new introducer into the file.
Assumption 3: if there is no "introducers" file at startup then it will instead look into the .tahoe/tahoe.cfg file to find the "introducer.furl" entry (which is how introducer was configured up until Tahoe-LAFS v1.7.0), and if it finds it then it will write it into the ".tahoe/introducers" file and use it.
Assumption 4: if there is an "introducers" file at startup then it will not look into the .tahoe/tahoe.cfg file to find the "introducer.furl" entry, and any entry which is in there will be ignored.
Question 1: is this what you are trying to implement, Faruq?
Question 2: is this what people want to use in Tahoe-LAFS v1.8?
Regards,
Zooko
Attachment test-run-after-client_py-web-root_py-welcome_xhtml-patched.log (81805 bytes) added
Test results after applying the previous enable-client-* patch
Assumption 1 is implemented and tested.
Regarding assumption 2 and later part of 3:
Assumption 4 was not considered before.
Review needed for GSoC mid-term evaluations.
Faruq: hey we're making progress! Maybe we could even finish assumption 1, the latter 3 and 4 and Terrell's comment that it should warn if it is ignoring an old setting:
http://tahoe-lafs.org/pipermail/tahoe-dev/2010-July/004636.html
If we finished that of behaviors, including tests (which I think you have already done a pretty good job of) and docs, then we could commit that to trunk and people could start using it even before we implement assumption 2. What do you think?
Combining assumption 1, 3-4 and Terrell's comment the following strategy can be coded into Client.
Step 1: Try to load "basedir/introducers"
Step 2A: If "basedir/introducers" found: a) load introducer furls from this file b) warn if there is any introducer_furl entry in tahoe.cfg
Step 2B: If no "basedir/introducers" found: a) create one "basedir/introducers" b) write introducer_furl entry from tahoe.cfg to this file.
If this is fine, I can proceed to implement this strategy.
Replying to writefaruq:
For an existing basedir, 2B b) would cause the
introducer_furl
to be written tobasedir/introducers
on the first run, and then 2A b) would cause a warning on subsequent runs. The warning seems unnecessary in this case, since there's no reason to believe the user was confused about the config settings; they were changed automatically.Replying to [davidsarah]comment:39:
That's a good point, but how could we do better? I don't think it is a good idea to automatically edit the tahoe.cfg file (to delete the old introducer.furl). Currently Tahoe-LAFS never edits that file -- it is for humans to edit only. I think it should still be a warning because we don't want the human to look into the tahoe.cfg file, see the introducer.furl there, and think that they have now seen the introducer config. We could suppress the warning in the case that tahoe.cfg's introducer.furl and the "introducers" file are the exact same thing (i.e. there is only one entry in "introducers" and it is this one).
Any other ideas?
Faruq: your strategy in comment:60515 sounds perfect to me. Except for the open question about whether or how to indicate warnings to the user, then the only other outstanding issue is that this change needs docs.
All of the following docs need to be updated to accept this into trunk:
I think you are close to getting this first working version completely implemented, doc'ed, tested, and ready for inclusion in trunk.
Replying to [zooko]comment:40:
I think we should do this.
I've drafted the following text. Please correct me!
For configuration.txt:
If a Tahoe grid has multiple introducers, each introducer's FURL must be placed in "BASEDIR/introducers" file. Each line of this file contains exactly one FURL entry. Any FURL entry found in tahoe.cfg will be copied to that file.
For architecture.txt:
By deploying multiple introducers in a Tahoe grid, the above SPoF challenge can be overcome. In that case if one introducer fails clients are still be able to get announcement about new servers from remaining introducers. This is our first step towards implementing a fully distributed introduction.
For future releases, we have plans to enhance our distributed introduction, allowing any server to tell a new client about all the others.
For running.html:
To use multiple introducers, write all introducers' FURLs in "BASEDIR/introducers" file, one FURL per line.
Faruq:
Great! Please go ahead and take my suggestions below then write documentation patches like these and attach a darcs patch to this ticket for just the documentation patches.
The current plan is to finish the strategy from comment:60515, except that for
change it to:
(This is as described in my comment:40 and davidsarah's comment:42.)
Also about your docs: consider that once your patches land in trunk then configuring the "introducers" file will be the preferred way to do it and the "introducer.furl" entry in tahoe.cfg will be supported only for backward-compatibility reasons and will not be recommended to new users. So the documentation should describe the "introducers" file as the way to configure it and mention the "introducer.furl" entry in tahoe.cfg only when explaining that such an entry, if it exists, will be automatically written into the "introducers" file.
Replying to writefaruq:
Don't say "If" here, just say that this is the way to configure any introducers (regardless of if it is one or more). It is necessary to mention the automatic copying of the FURL entry from tahoe.cfg so that readers of configuration.txt will have a complete understanding and understand the backward-compatibility implications.
Also, please call it "Tahoe-LAFS" instead of "Tahoe" in docs. (For one thing, I don't want to have a name collision with http://sourceforge.net/projects/tahoe/ . For another thing, I think of "LAFS" as the protocol and the data formats and specification, and "Tahoe-LAFS" as the current Python implementation.)
Nice!
Again, edit running.html so that the "BASEDIR/introducers" is the only method of configuring introducers. It is not necessary to mention the automatic copying of introducer.furl from tahoe.cfg in running.html.
Please for each patch that you submit write a descriptive patch name and description like these ones: changeset:8ba536319689ec8e, changeset:1de4d2c594ee64c8, changeset:d0706d27ea2624b5, changeset:63b28d707b12202f, changeset:c18b934c6a8442f8, changeset:7cadb49b88c03209, changeset:be6139dad72cdf49.
Okay, good work on this! I'm hoping that by the time I have to write a mid-term review for Google (which I guess I have to do by Friday), that I will be able to say that you've completed a working subset of your summer goal.
Please post the doc patch as a darcs patch and I will review it right away. Now what about test patches. You've already posted test_root.py and test_multi_introducers.py . Are those the complete set of tests for the "comment:60515" strategy?
Oh no, looking at them I see that test_root.py is asking the code-under-test to look at the old
.introducer_furl
attribute. That is not right, it should instead be requiring the code-under-test to not look at the old.introducer_furl
attribute and instead to look only at the.introducer_furls
attribute.I see that test_multi_introducers.py is requiring the code-under-test to have 1 introducer for the "introducer.furl" entry in tahoe.cfg plus however many are in the "introducers" file. But what "introducers" file is used for this test? When this test code runs it will be inside a temporary directory (named "_trial_temp") which will not already have any "introducers" file present.
Let's make the test code provide an introducers file to the code-under-test, something like this:
That test would be testing that the
Client
discovers the two furls in the "introducers" file. Then we also need the following tests of the "comment:60515" strategy:Client
object, and then check that it has an introducer client object for the furl entry from the tahoe.cfg file, and then check that a new "basedir/introducers" file has been created with that furl in it.Attachment multiple-introducers-changes-in-architecture-configuration-running.dpatch (13563 bytes) added
doc chages for multiple introducers
I have kept the multiple introducers config file name as usual. But
"introducers.cfg"
can be another alternative. Another question, is this file initially be generated for user like tahoe.cfg ?To implement modified comment:60515 strategy, I re-structure the code in Client's
init_introducer_clients
like this:But is warning to be sent to somewhere else? Which one should be called
self.log()
orlog.msg()
?Attachment test_root.2.py (877 bytes) added
corrected test for checking the use of introducer_furls by root.py (multiple introducer version)
This test counts the number of furls loaded by the Client and see if that is equal to the response of the query made in root.py. Tested with 0-2 introducers (in cfg file) and found working.
Replying to writefaruq:
You should use
self.log()
for logging (if the object in question subclasses from some class so that it has aself.log()
method. In this case it does becauseClient
's parent classNode
defines alog()
method.).I wonder if there is a better way to communicate to the user than just logging a message. Not sure.
I'm really not sure that I agree with Brian's comment in http://tahoe-lafs.org/pipermail/tahoe-dev/2010-July/004663.html . The way Brian proposed and Faruq agreed to do it means that there are "two ways to do it"--you can either edit your tahoe.cfg's introducer.furl or you can edit your introducer.furls file. Users who see one of them may assume that it is the only one and then be surprised when they get different behavior than they expected (due to the existence of the other one). I guess I'm too sleepy to go into detail right now, but I want Faruq to know that I looked at this ticket tonight. :-)
See also my reply to Brian on tahoe-dev:
http://tahoe-lafs.org/pipermail/tahoe-dev/2010-July/004713.html
Replying to [zooko]comment:49:
Yes. But for displaying a warning to the user, I would
print >>sys.stderr
. (For tests,sys.stderr
can be captured; see the existing tests in source:src/allmydata/test/test_runner.py .)Replying to [davidsarah]comment:52:
That works for cli scripts, but for the Tahoe-LAFS node itself (unless it launched with
tahoe run
or a possible futuretahoe start --nodaemon
), where would lines written to stderr go? I would hope that they would be logged, but it is possible they would be silently dropped.Replying to [zooko]comment:53:
Good point. But the config files are only read at startup, so perhaps
tahoe start
could read and parse them just in order to display any warnings, before launching the node.(I realize this doesn't guarantee that the contents of the files haven't changed between when
tahoe start
reads them and when the node does, but that would be very unusual.)Alternatively, a solution to #71 ("client node probably started") might allow the node to communicate messages to the runner process at startup.
Attachment test_introducers_cfg.py (1122 bytes) added
Check if a new "introducers" cfg file can be created and tahoe.cfg's introducer_furl can be written in this file
Unsetting review-needed. This patch is not ready to be reviewed and then applied to trunk. However, it would probably be a good help and encouragement to Faruq if anyone would look at his code, docs, or comments and give him your thoughts. :-)
Attachment test_multi_introducers.2.py (640 bytes) added
Check if Client's number of introducer_clients equals to the number of furls in "introducers" file
test_multi_introducers.2.py looks like a good test of whether the
allmydata.client.Client
correctly reads all of the entries from the "introducers" config file. Please run pyflakes on it (you can just runpython setup.py flakes
) and fix any warnings that pyflakes reports.Re: test_introducers_cfg.py please add a docstring to the
test_introducer_clients_count()
method saying what this test is looking for in the behavior of the code under test. The comment that comes with the attachment on trac says:Check if a new "introducers" cfg file can be created and tahoe.cfg's introducer_furl can be written in this file
But of course a file can be created! I guess from looking at the code and the name of
test_introducer_clients_count()
that it is intended to do something like this:The
basedir
variable is unnecessary—remove it and replaceos.path.join(basedir, "tahoe.cfg")
with just"tahoe.cfg"
. The line at the end that readsMULTI_INTRODUCERS_CFG
doesn't do anything—remove it.Otherwise this looks like a good test.
Attachment test_introducers_cfg.2.py (1045 bytes) added
code refined by pyflakes
Attachment test_multi_introducers.3.py (544 bytes) added
code refined by pyflakes
Attachment test_root.3.py (850 bytes) added
code refined by pyflakes
Faruq:
Please merge all the tests into one file named test_multi_introducer.py.
Here is a branch to hold your work:
http://tahoe-lafs.org/trac/tahoe-lafs/browser/ticket68-multi-introducer
Here is a view of the buildbot which shows the history of builds of your branch (only showing the Supported Builders):
http://tahoe-lafs.org/buildbot/waterfall?builder=hardy-amd64&builder=windows&builder=Kyle+OpenBSD-4.6+amd64&builder=Arthur+lenny+c7+32bit&builder=David+A.+OpenSolaris+i386&builder=Ruben+Fedora&builder=Eugen+lenny-amd64&builder=Zooko+zomp+Mac-amd64+10.6+py2.6&builder=tarballs&branch=ticket68-multi-introducer
Please attach your most recent patches to this ticket and I will apply them to that branch and then trigger the buildbot to run the tests on all of our buildslaves.
Attachment test_multi_introducers.4.py (3871 bytes) added
Merged all tests
Attachment multiple-introducer-client-side-002.dpatch (5410 bytes) added
multi-introducers doc patch
The last three files: multiple-introducer-client-side-001.dpatch multiple-introducer-client-side-002.dpatch test_multi_introducers.4.py (patch sending failed for some unknown reason) should be applied/added to test repo.
Okay I applied the two patches and I copied test_multi_introducers.4.py into src/allmydata/test/test_multi_introducers.py . Then I ran these tests with this command:
The output from that command ended with this message:
Have you tried this yourself? I would have expected you to get the same error.
Attachment multiple-introducer-client-side-001.dpatch (11357 bytes) added
Client side code changes combined together, fixed warn_flag error
This error should be escaped by undo the last patch multiple-introducer-client-side-001.dpatch and apply the latest one. I've replaced with the correct version now.
Faruq: now that we've started storing your patches in this branch: source:ticket68-multi-introducer, there is no longer a good way to undo the old patches. So would you please provide a patch which gets added on top of the patches that are already in your branch? One way to do this would be to get a new repo from your branch, like this:
Then cd into the
ticket68-multi-introducer
repository and change the code there in to make the tests pass. But do not usedarcs unrecord
,darcs obliterate
, ordarcs amend-record
in that repository, because those commands work by removing patches from the repository, and we can't (or don't want to) remove patches from the repository http://tahoe-lafs.org/source/tahoe-lafs/ticket68-multi-introducer on the server.Okay I merged trunk (which is currently 1.8.0rc1) into the source:ticket68-multi-introducer branch and ran a full build here are the results. Then I applied your three patches from comment:60535 and ran a full build again: here are the results.
Replying to writefaruq:
I can't undo the last patch multiple-introducer-client-side-001.dpatch because, as described in comment:60538, we are going to maintain a history of all patches on source:ticket68-multi-introducer. For example, here is the history of such patches: http://tahoe-lafs.org/trac/tahoe-lafs/log/ticket68-multi-introducer/ and the one that you attached as the last multiple-introducer-client-side-001.dpatch I have now applied to that branch as [20100801142304-e2516-411e80c14e29287e8d9ce700e7b359e23fb45105].
Attachment multiple-introducer-client-side-001-x1.dpatch (2034 bytes) added
Fixed warn_flag error
Faruq: did you run the tests after you fixed the warn_flag error? If you did, what do you think of the results? If you did not, please run the tests and paste the results in here.
My note in comment:60536 tells you how to run the tests.
I've tested after applying this patch. Test result is at here: http://pastebin.com/1Ac3b6Jk A summary is given below.
Okay, good, now also please run more of the other tests to see if your patches broke anything else.
Attachment multiple-introducer-client-side-001-x2.dpatch (5056 bytes) added
tweaks to pass the full-tests
Just for reference, here is a hyperlink that shows you the most recent results of building the source:ticket68-multi-introducer branch on all of our Supported Builder: buildbot link
Faruq: I committed your latest patches and triggered the buildbot to test them. Use the buildbot link to see the results (I committed them just now, so look for the builds that started at 22:41:21 PDT on 2010-08-11).
You can see the patches that are on the branch here: http://tahoe-lafs.org/trac/tahoe-lafs/log/ticket68-multi-introducer/
The builds haven't finished yet so I don't know whether all the tests passed on all platforms, but I'm going to sleep now. :-)
Okay as you can see from the buildbot link that shows Supported Builders testing this branch the tests pass on the buildbot. Adding the
review-needed
tag to this ticket.I added a question about multiple introducer to the FAQ wiki page.
So after close this ticket, please edit FAQ page
Full source available at
http://tahoe-lafs.org/source/tahoe-lafs/ticket68-multi-introducer/
The final GSoC code is here
http://code.google.com/p/google-summer-of-code-2010-tahoe-lafs/downloads/detail?name=MOFaruque_Sarker.tar.gz&can=2&q=#makechanges
Some hints to use it
A seond file "BASEDIR/introducers" configures introducers. It is necessary to
write all FURL entries into this file. Each line in this file contains exactly
one FURL entry. For backward compatibility reasons, any "introducer.furl"
entry found in tahoe.cfg file will automatically be copied into this file. Keeping
any FURL entry in tahoe.cfg file is not recommended for new users.
Edit BASEDIR/introducers and add FURLs for each introducer. Of course you need to run them before you get a FURL.
Play with them as you like.
Attachment ticket68-multi-introducer.tar.gz (1309681 bytes) added
A snapshot of working repository
I've installed the snapshot on 2 systems. Started an introducer on both systems. Started a storage node on both systems with both furls. I can see the storage nodes appearing in the web interface of both introducers and the storage node web interface contains both introducers.
Shutdown one system, web interface still shows off-line system as active for introducer and storage.
Trying to create a new directory causes the node to contact the off-line system and keeps busy with that (no time-out?). Request stays in "active operations" list, even after stopping the request.
Hm. Myckel: Could you please reproduce this and then about 10 seconds after you shutdown one storage server, click the "Report an Incident" button on the welcome page. Then again when you attempt to mkdir, please click the "Report an Incident" button a few seconds after you've done so.
Each time you click "Report an Incident" it creates a file in the
logs/incidents
. Please attach those files to this ticket.Faruq: we should write a unit test of this workflow—create two introducers, create a storage server point at both introducers, create a storage client pointing at both introducers, shutdown one of the servers, then, um, then initiate an operation in the storage client, such as mkdir (which is what Myckel did manually) or any other operation that uses storage servers.
Attachment incident-2010-10-31-082948-tx5qoxy.flog.bz2 (7846 bytes) added
First incident report (after shutdown, before making dir)
Attachment incident-2010-10-31-083037-4o3degq.flog.bz2 (8570 bytes) added
2nd incident log (after mkdir)
Replying to zooko:
Ok, files are attached. I hope they are useful, because after making the incident report I noticed that the storage server recovered. This might also not be related to the multiple introducer situation, because I've had it also happening when trying with volunteer grid (one storage node went off-line, I couldn't do anything any more, until restarting my storage node).
I've restarted the storage node and introducer that I shutdown. Took a few minutes before the other storage node and introducer noticed the new storage node and introducer.
Is there some heartbeat or small time out in place?
Replying to [Myckel]comment:76:
Wait, what? I'm confused. You created two introducers and two storage nodes, right? And then were you using one of the storage nodes to also be a gateway (== a storage client)? And then did you shut down the other one by running
tahoe stop $BASEDIR
on it?Replying to Myckel:
Yes, you retry to open a connection to each peer periodically, in an exponential back-off pattern (until you have backed off to trying only once per hour, at which point you keep trying at that rate indefinitely).
So if the peer was down for 5 minutes then it might take up to 5 minutes after it is brought back up before you reconnect to it.
Replying to [zooko]comment:78:
Guess I was not so clear. This is what I did:
2 computers:
Computer 1:
Run both an introducer and storage client (access it through the web interface).
Computer 2:
Run both an introducer and a storage client (can access it trough the web interface, but don't bother with that).
Both introducers see both storage clients. Both storage clients say they are connected to the introducers. All fine so far.
Then I shutdown system 2, so NO tahoe stop $BASEDIR (I could also plug the power or do a hard reset).
Then on system 1 I try to make a dir through the web interface, and then everything stays busy while it tries to contact the storage node/client on system 2.
Faruq: as per comment:60551, we should add a test for this case. Removing the
review-needed
tag and adding thetest-needed
flag.There's been some discussion of this ticket on the mailing list here and here and in the Tahoe-LAFS Weekly News.
Out of time for v1.9.0! But anyone who loves this, please jump in. There's no time like the present! Do some manual testing of Faruq's patch, write a new patch, write unit tests, etc. :-)
I've been using this patch with the ones in #1007 and #1010 (and foolscap tickets 150 and 151) on I2P with v1.8.3 and so far there haven't been any issues with the functionality.
It seems, however, that comments aren't allowed in
$TAHOENODE/introducers
. At least # doesn't work as a comment character. Having the ability to add comments would be a very welcome addition.Just to give a heads up: Most of the 18 storage nodes on our smallish grid on I2P have been using the multiple introducer patch since late November and things are still working well for us.
Also one of our users made some modifications that add colors to the introducer list as can be seen at http://i.imgur.com/aPbaY.png. After I refactor the patch for the current git revision I'll add it to this ticket.
killyourtv: cool! Thank you for the note. If I recall correctly, Faruq's patch didn't have a thorough unit test.
I've noticed several good contributions to Tahoe-LAFS that are blocked on not having unit tests. I think a lot of people know how to write Python code but aren't sure what we expect in terms of testing, or don't know how to use trial's features to test results that are deferred until a subsequent event. I've been thinking that having a "unit test tutorial" party could be fun, where everyone who has a patch for Tahoe-LAFS that needs tests comes to the IRC channel and we pick one and walk through how to write tests for it...
For anyone who wants to contribute to this ticket, the patches are available through darcs from this repo https://tahoe-lafs.org/trac/tahoe-lafs/browser/ticket68-multi-introducer , i.e.
darcs get --lazy <https://tahoe-lafs.org/source/tahoe-lafs/ticket68-multi-introducer>
. killyourtv probably has them available in another form (unified diff?). It would be cool to port the darcs repo to be a git branch. If you do that, please add a comment to this ticket pointing to the git branch.In case it's of use: https://github.com/kytvi2p/tahoe-lafs.
I made two branches, one for what I think should be close to 1.8.3 (it's not tagged) and one for 1.9.x (current git).
The patchset has been refactored to apply on top of the current build at https://github.com/kytvi2p/tahoe-lafs/tree/68-multi
Although everything seems to work, the unit tests are (unfortunately) still broken.
I've been working on restructuring the new
IntroducerClient
so that we can implement multi-introducer grids without losing announcement deduplication logic in the client. My work so far is here: https://github.com/lebek/tahoe-lafs/compare/master...68-multi-introducerThe way configuration works is a new
clientintroducer.furls
option which takes multiple values (whitespace or line break separated). If bothclientintroducer.furl
andclientintroducer.furls
are set the values are appended.All introducer tests are passing at the moment, so theoretically this might work already. I'm still working on new tests specific to the multi-introducer setting. I also still need to make announcements idempotent in the introducer. Finally, I'll import Faruq's patches to the WUI and documentation, they shouldn't require much modification.
This isn't ready for Tahoe-LAFS v1.10, but as [//pipermail/tahoe-dev/2012-November/007867.html recently discussed], we've decided we'd like to try integrating it into trunk ASAP! Lebek, or anyone else who wants to help, please see that mailing list discussion and reply on tahoe-dev, or this ticket, or join us at the next Weekly Dev Chat.
Removing obsolete reference to vdrive servers in the Description.
I'd like to get this into trunk ASAP! So it can get thoroughly tested out for Tahoe-LAFS v1.11. If I understand correctly, lebek's notes at comment:60567 and [//pipermail/tahoe-dev/2012-November/007867.html our discussion] from a weekly dev chat are telling us what next steps to take.
#1402 was a duplicate, and there was a patch attached to it by socrates:
attachment:relay.py🎫1402
Replying to killyourtv:
david415 and I began updating this patch to work with post-1.10 versions of tahoe:
https://github.com/leif/tahoe-lafs/commits/ticket68 (tests do not pass yet, but it is connecting to multiple introducers).
Hopefully we'll have a cleaned up patch soon.
I'm cross-posting this comment to #68 and #467.
Here is a squashed commit of the multi-introducer and introducerless patches on top of the current master:
https://github.com/leif/tahoe-lafs/compare/master...introless-multiintro-squashed
And here is a 3-way merge combining the history of both feature branches with master in such a way that
git log
andgit blame
can still find the original commits: https://github.com/leif/tahoe-lafs/compare/master...introless-multiintro-with-history (creating this was a git adventure; I ended up doing the 3-way merge using-s ours
and then doing another squash merge followed bygit commit --amend
)I'm going to write more tests before submitting a pull request with one of these. But, if anyone wants to review or test it now I'd appreciate it!
Here is the latest introducerless/multi-introducer patch: https://github.com/leif/tahoe-lafs/commit/1ae5aaecbb68f13019b6bc2ba4632bb4a5623aaa (that is a squash merge on top of two other commits which will hopefully land on master soon).
It should perhaps have some more tests, but testing/review/feedback would be welcomed.
I gave some feedback, although it's a huge diff and probably needs more eyes on it.
I'm optimistically putting this in the 1.10.3 milestone; it may well get booted out to 1.11.
Here is the new version, after addressing daira's comments: https://github.com/leif/tahoe-lafs/commit/8fc8cd9151d4dc4c041867bac98aefff6a105729
I think this is nearly ready to merge, so more review and/or testing would be appreciated.
The one thing remaining that I think needs to be done is to add some tests to
test_web
.Here is the latest introless-multiintro branch (with full history) with a few more commits since the squashed commit in my previous comment.
I posted a comment about my next steps for this branch on ticket #467.
Out of time for 1.10.3.
Milestone renamed
moving most tickets from 1.12 to 1.13 so we can release 1.12 with magic-folders
i've got this dev branch where i added
init_introducer_clients
:https://github.com/david415/tahoe-lafs/tree/68.multi_intro.0
in the above dev branch i've gotten all the unit tests to pass... so i opened this pull-request here:
https://github.com/tahoe-lafs/tahoe-lafs/pull/338
please review
In 3b24e7e/trunk:
In d802135/trunk:
In 2e3ec41/trunk:
Ok, at long last, this ticket is done. We didn't implement the cool "gossip" approach, or the limited-flood thing, or the invitation thing. But nodes can now be configured with zero/1/many introducer FURLs (via a combination of tahoe.cfg
introducer.furl=
and the newNODEDIR/private/introducers.yaml
), and servers will announce themselves to all introducers, and clients will merge announcements from all introducers.