immutable peer selection refactoring and enhancements #1382
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
5 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1382
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I've been working on refactoring and improving immutable peer selection. I have several immediate goals for this project.
These improvements correspond roughly to issues presented in tickets #610, #614, #1115, #1124, #1130, and #1293. Unifying mutable and immutable peer selection is ticket #1057, though I don't expect to address that until after MDMF (#393) is merged and until we're done with this ticket.
First up is a draft of of
IPeerSelector
, an abstract peer selection class. The draft is contained in attachment:interface.dpatch.A significant annoyance with servers of happiness (and a source of many of its shortcomings) is the fact that it wasn't used to motivate share placement or server selection in a well thought out way; rather, it was just used to evaluate the job that the existing share placement strategy did. So, with
IPeerSelector
(and with the implicit assumption that the peer selection and share placement strategies you'd want to use when uploading a file are probably generally motivated by how you define file health), I wanted to combine both file health and share placement in one abstraction. I still have a few small issues to fix with the uploader changes that actually use the interface, so you'll have to wait to see the code that exercises and implements the interface, but the interaction is basically:add_peer_with_share
to tell the peer selector about that relationship.add_peer_with_share
) and failures (viamark_full_peer
andmark_bad_peer
) as appropriate.I have an immutable uploader that works pretty well with this sort of interaction.
Things to look for if you're reviewing:
(of course, you're welcome to look for whatever you want -- that's just what I think I need feedback on :-)
Attachment interface.dpatch (13846 bytes) added
initial implementation of IPeerSelector interface
Attachment interfaces.dpatch (35792 bytes) added
update IPeerSelector definition
Attachment 1382.dpatch (155132 bytes) added
snapshot of #1382; includes new, simplified uploader, refactored server selection logic, tests
On the Tahoe-LAFS Weekly Call, we agreed that this isn't likely to be ready for 1.9.0.
Kevan says there isn't any specific thing that he knows is wrong with this patch, but he thinks of it as an experiment that would need more consideration and hunting for edge cases before it should go into trunk.
There are a few other tickets that may be affected by this one: #362, #1394, #873, #610, at least.
We discussed #1382 at the second Tahoe-LAFS summit. We concluded that this approach (splitting peer selection into two parts along a well-defined and generally useful interface) was a good approach, and that a more complete and correct share allocation algorithm (share director? share overlord? I don't think we agreed on a name for the object) was a good idea. I think we concluded that the right path to bring #1382 into trunk was something like:
Once that's done, we'll have an easy way for people to add new share placement algorithms to Tahoe-LAFS, and a share placement algorithm that fixes the corner cases users currently experience with servers of happiness.
I should have more free time in December than I've had lately, so I'll be happy to work on #1382. I don't know if we want to plan on #1382 for 1.10, or defer it until a future release; if we discussed that at the summit, I don't remember what we concluded.
Kevan: excellent! I'm looking forward to progress on this. Your summary of the Summit discussion matches my recollections. We did not talk about whether it would go into 1.10 or not. Lets decide later. I don't even know who is going to be Release Wrangler for 1.10. (Hopefully Brian. He had some good ideas about "more automation, less process".)
You can follow my progress on this project by watching the [ticket1382 branch on my GitHub page](https://github.com/isnotajoke/tahoe-lafs/tree/ticket1382). I'm trying to assemble a series of small deltas that take us from our current immutable peer selector to something that can be easily modified to use a pluggable peer selector, and would be particularly interested in knowing whether the deltas that are there now (and show up there as I continue to work on it) are small enough to be easily reviewed.
As Release Manager for 1.10, I say this goes in.
(I effectively renamed 1.10 to 1.11, so that milestone is correct.)
I'm moving this to Milestone: "eventually", which means that we agree it ought to be fixed, but we don't agree that it is going to get fixed in 1.11.
If someone (e.g. kevan or markberger) wants to prioritize this, then I suppose they might finish it in time for Milestone 1.11. In that case they can move it back into this milestone when they start working on it.
Huh. I thought this was the main ticket we wanted to get fixed for 1.11.
Replying to daira:
Oh! Well, it is the main ticket that I want to get fixed for markberger's Google Summer of Code project, but I hope we release Tahoe-LAFS v1.11 before that project is complete...
To view progress on this ticket refer to https://github.com/markberger/tahoe-lafs/tree/1382-immutable-peer-selection
I'm currently figuring out the last few tests that fail with the new uploader. One of them is test_good_share_hosts in src/allmydata/test/test_checker.py. Here is a summary of the test:
However, with upload of happiness, repair produces
Is this behavior acceptable? The new algorithm views any additional share placements over N as redundant and does not attempt to place any more shares. However, this appears to be different from the old algorithm, which looks like it tried to place shares on as many servers as possible, regardless of N (at least from this test).
Yes, I believe the new behavior is okay. The real question should be: does it match the specification? Well, what is the specification? Could you please make sure it is written in a precise and succinct form in a .rst file under source:docs/ ? (It can refer to Kevan's thesis, but the reader should be able to tell exactly what the specification is without reading Kevan's thesis.)
I'm pretty sure that the result you show in comment:83044 does meet the spec.
It seems non-optimal that the new algorithm does not place share D. The algorithm in /tahoe-lafs/trac-2024-07-25/issues/6192#comment:83034 does place share D, in step 5.
Zooko (comment:23), I have added a document under source:docs/specifications that you should be able to see on the github branch.
Daira (comment:83046), the map is shares to servers, so the algorithm is placing share 0 on server A, share 1 on servers A and B, etc. So all of the shares are being placed, but not all of the servers get a share because the set of servers is larger than the set of shares.
Oh I see, I misread it.
I think this behaviour is completely acceptable.
Does the new code also decide which leases need to be renewed?
Yes. The uploader attempts to place the existing shares on their respective servers and this causes the servers to renew their shares instead of uploading them again.
In 7/9/13's weekly dev chat, it was agreed upon between Brian, Daira, and Zooko that the upload algorithm associated with this ticket should contact 2n servers. There was also discussion on what to do in the case when existing shares are found. Refer to ticket #610.
I think I've found the error for why
allmydata.test.test_download.Corruption.test_each_byte
fails. Unlike the previous tests I needed to modify for this patch, the downloader is not deterministic. It will download the same shares each time, but not in the same order. Therefore different corruption offsets are not detected each time the test is run. In order forallmydata.test.test_download. Corruption.test_each_byte
to pass, something must be modified in order to ensure the downloader is deterministic.It turns out what I perceived to be download order was not the case. Please ignore comment:83051.
My github branch (https://github.com/markberger/tahoe-lafs/tree/1382-immutable-peer-selection) passes all tests.
The above branch imposes the 2n server limit and a test has been added to check whether 2n servers are contacted.
I rewrote my messy commit history and rebased the branch to trunk. You can find the new branch here: https://github.com/markberger/tahoe-lafs/tree/1382-rewrite
For anyone who is going to review the patch, I suggest reviewing each group in the same sitting.
This would fix #1814.
I believe this would fix #1130.
Mark: I started doing real code review while on a plane to Washington D.C. today. Here are my notes. I apologize if they are confusing or incomplete. I'm deciding I should send you these notes now instead of sleeping on it, in order to not delay.
Overall, these patches are really great! You've done a fantastic job of satisfying all the requirements, recording the patches into coherent chunks, etc. I'm making a lot of requests of you in the following review notes, and I hope it doesn't discourage you! I'm excited about getting #1382 fixed in trunk ASAP!
regarding https://github.com/markberger/tahoe-lafs/commit/739468e69676d30353814248f1fd7dd4b7b9d8f9:
regarding https://github.com/markberger/tahoe-lafs/commit/ff1a77924b03d286477496c0e0a3ddfe45353c61:
to:
No wait, actually remove
self.servermap_shareids
andself.servermap_peerids
entirely and use the appropriate expression instead of those member variables where they are currently used.For _flow_network():
Okay, I'm stopping reviewing ff1a77924b03d286477496c0e0a3ddfe45353c61 for now because I don't know how much of it is duplicated or near-duplicated code…
regarding https://github.com/markberger/tahoe-lafs/commit/e7af5f7995bc661d19d5c903359813fa91880966 and https://github.com/markberger/tahoe-lafs/commit/5875c31d32ef50467c572736599066fffe8b73ac:
to:
zooko: Thanks for reviewing my patch! I will start working on these changes. I'm hesitant to implement some of the changes you suggested for https://github.com/markberger/tahoe-lafs/commit/5875c31d32ef50467c572736599066fffe8b73ac because the next commit makes some changes to the same lines. If you also reviewed the next commit and those comments apply to both, please let me know and I'll go ahead with your changes.
Replying to markberger:
Will do.
I've pushed some commits which fix the issues with the documentation and the happiness_upload class.
I tried to refactor those two functions into one, but servers are indexed differently between the two and I couldn't figure out how to normalize the input. I will try again later when I have more time.
During dev chat today (September 11th, 2013), Zooko and I talked about some modifications to
happiness_upload.py
that would make it easier to understand._server_flow_graph
and when to use_flow_network
.index_peers
andindex_shares
into one function.peerids
andshareids
to something else which makes it clear that they are indices used by the flow network.generate_mappings
to its own function for clarity. That way reindexing can occur in the function and not distract the reader from what is occurring.Besides explaining why flow networks are used in the Edmond-Karp algorithm, all of the above points have been addressed on 1382-rewrite.
I've opened a pull request for my branch here.
Mark: I looked at the pull request linked from comment:83066 again just now (with Peter Stuge's) help, and I recognized most of the commits as things I've already read, and I see a lot of comments I've already left. Did you already take into account all previous comments? And if you did, did you put your latest code into new commits at the end of the 1382-rewrite branch, or a new branch, or what? Sorry if I'm being dumb, but please hold my hand and tell me where your latest code is now. Thanks! ☺ —Z
Hi Zooko! Sorry for the late response.
After the dev chat we had, I forgot to go and implement the changes we discussed. I quickly added the basic changes yesterday, and they can be found on the pull request. I investigated combining the loop over read only servers and the loop over write servers, but I did not have enough time to commit anything that works.
Sadly I don't have as much time to hack on tahoe as I had expected, but I will try to revisit the loop issue this weekend.
I've been working on the suggestions we discussed during last week's dev chat.
It turns out a lot of the query counting mechanics in place on my branch are incorrect. I'm shocked that it actually passes all of the tests in its current state. Also the current version tries to place shares on servers even when we have enough information to determine that a happy upload cannot occur.
I've started to correct these issues, as well as remove the redundant loop, in a branch here.
For the above branch, when
tracker.ask_about_existing_shares()
is called onFakeStorageServer
, I receive the following error for each tracker:Sadly, this error does not appear when you run
python setup.py trial -s allmydata.test.test_upload
. It simply fails, stating that the happiness error messages are wrong. To reproduce this error message,import pdb
and callpdb.set_trace()
in_handle_existing_response
withinimmutable/upload.py
.I'm not sure why I'm receiving this error because the same function is called in 1.10. Additionally, there is some code that should give
get_buckets
a proper response here.If anyone has some advice on how to fix this error, I would appreciate the help. I haven't been able to figure this bug out.
To correct the issue I was having in my previous comment, I added a
get_buckets
method toFakeStorageServer
. I'm not sure why this is necessary since it seemed to work before, but it appears to fix my problem.Today during the summit, we addressed the error messages which occur on the the
1382-rewrite-2
branch. Daira raised some concerns that the uploader might terminate too early because the upload code is tied to repair. I haven't looked into that issue yet.At the summit we also discussed the issue that repair should be a best-effort operation and should not exit early, even if it can tell that it isn't going to achieve the happiness set by
shared.happy
. However I think that should already be addressed by the fixed ticket #1212, which set the happiness threshold to 0 when invoking the repairer. On the other hand, there should be an explicit test of this behaviour, and I'm not sure there is one already.Adding the resulting happiness to repair and check reports is part of #1784.
Once we land this ticket, let's go revisit #614. I am inspired to say that by the discussion on #2106.
Mark:
Here are some more review comments. These are from looking at upload.py on the current version of your 1382-rewrite-2 branch.
2
at line 340. A reader might wonder why that is there, and why it isn't3
or2.71828
or something. We should add a comment next to it saying something like "If there are a large number of servers, we don't want to spend a lot of time and network traffic talking to all of them before deciding which ones to upload to. Here we limit our consideration to the first 2*N servers. That ought to be enough. See docs/specifications/servers-of-happiness.rst for details.". Then add to servers-of-happiness.rst a discussion about that constant 2.generate_mappings
tocalculate_upload_plan
. ☺servermap
is used in this file on line 54 to denote a data structure which is shaped like{serverid: set(shnum)}
, but here a "servermap" is a datastructure which is shaped like{shnum: serverid}
, and then further down in the file "servermap" is used to mean something which is shaped like{shnum: set(serverids)}
. Confusing! (Even though "servermap" is a reasonable word for any of these three shapes.) Could we think of different local names for the latter two? Better not touch the first one since it is transmitted over the wire in the helper protocol, apparently.) Oh, here's a partial solution: the one on line 482 doesn't need to be named at all — just inline it instead of using a variable to hold it between line 482 and 483. ☺_isHappinessPossible
tois_happiness_possible
andHappiness_Upload
toHappinessUpload
, to match our conventions.Okay, I didn't really finish studying the entire upload.py (this version of it), but I'm stopping here for tonight. ☺
Okay, so about the "try the first 2*
N
servers" part (line 340) that I mentioned in comment:83074.There are two limitations of this. The first one relatively simple to solve and I'd like to do it in this branch. The second one is more invasive and I'd like to open a separate ticket for it.
The first one is that all 2*N of the first servers might be in read-only mode (and not already have shares of this file). Then the upload would fail. This is actually fixable without waiting for further network results, because whether the servers were already announced to be in read-only mode is already known (through the introduction protocol) by the client, so we could for example do something like the following. Replace:
with:
I guess we ought to write a unit test of this! Demonstrating a situation where the first 2*N servers are all read-only so the upload fails with the old code but succeeds with the new code.
The second limitation is the "deeper" one. Even if we pick 2N writeable servers at this stage, there's no guarantee that any of those 2N writeable servers will actually accept our subsequent writes. It could be that those 2*N of them are all full or read-only-mode now (even though they weren't back when the introduction protocol ran and informed our client about those servers), or they might all fail at this moment, or something. The current behavior is to give up on the upload if this happens. It would almost always be better to instead go ahead and ask the next few servers if they would hold shares.
I'm going to open a separate ticket for that…
Okay, I filed #2108 for the deeper issue raised in comment:83075.
Here is another review of upload.py from the current tip of 1382-rewrite-2. I'm looking at _get_next_allocation() on line 466, and it seems like it is doing some complicated computations in order to yield just a simple data structure — the ordered list of [(server, set(shnums))] that constitute the upload plan. It seems like the "calculate upload plan" (a.k.a.
_calculate_tasks()
method could just generate that list and return it and_get_next_allocation()
could disappear. Am I missing anything?I've push patches that address what zooko mentioned in comment:83074. I have not done anything in response to ticket #2108. I also removed
_get_next_allocation
.I have a lot of free time to work on the patch for the next couple of weeks. If we could clear up what is necessary to get this merged, I would be able to work on this ticket.
Replying to markberger:
Hi Mark! Thanks for posting this. I just saw it (I've been spending all day on family Xmas stuff until now), and I'll see if I can answer any questions you have. I have a patch myself somewhere around here…
transcribed from IRC:
So, now I'm still confused about https://github.com/markberger/tahoe-lafs/blob/6f5a125283494d3bbf1d2fa143eddd3d5183c7d7/src/allmydata/test/test_upload.py#L734 is_happy_enough()
It uses k.
Why does it do that?
The (modern) definition of happiness doesn't include mention of K, right?
(It's just that nobody ever sets their H requirement to less than their K.)
So I suspect that is_happy_enough() isn't calculating exactly th right function, but we don't notice because it gets called only for well-distributed things, by test_upload, so "return True" also works for the current tests. (I tried this, and
return True
does indeed give the correct answer for all queries that test_upload.py sends to theis_happy_enough
function.)But it seems like we ought to replace it with a function that calculates the correct thing, to avoid confusing future test-writers.
So Mark: please write a new
is_happy_enough
, or explain to me why I'm wrong, or replace the call tois_happy_enough
with a call toallmydata.util.happinessutil.servers_of_happiness
.So, I was trying to update docs/specifications/servers-of-happiness.rst to explain the algorithm and provide an "intuition" of it for readers, and look what happened: I confused myself again! I apparently do not have an "intuition" that matches the behavior of servers-of-happiness:
Servers of happiness is limited by the cardinality of the smaller set.
So in example 1 we have:
and then for example two we add:
This doesn't increase happiness because share 1 is already matched with server B. Matching shares with extra servers doesn't increase happiness. Since we don't have any homeless shares to match with server C, we are limited by the cardinality of the smallest set, and happiness is set to 2.
In example 4 we have
We can match server A with share 0 and server B with share 1. Once again we are limited by the cardinality of the smaller set, so even though we matched server A with share 1 and server B with share 2 as well, those pairs do not increase happiness. Our set of matchings already has servers A and B, so any additional pairs with servers A and B will not increase happiness.
However, when we add:
we are no longer limited by the cardinality of 2 because each set has a cardinality of 3. So now we can match server A with share 0, server B with share 1, and server C with share 2. Each of these pairing has a unique server and share that is only used once in our set of matchings, therefore each pair contributes to a happiness of 3.
That explanation would be the intuitive way to think about the maximum matching of bipartite graphs. I think it is clearer if you show pictures of bipartite graphs, but introducing such graphs might be outside the scope of the document.
Replying to zooko:
Calling
allmydata.util.happinessutil.servers_of_happiness
fromis_happy_enough
and trusting the result would not preserve the intent of the test, because there might be a bug inhappinessutil
that would then propagate to the test and result in a false pass.Since the test only checks for happiness rather than unhappiness, it would be possible to call a function that returns a purported maximum matching, and check that it actually is a matching. It is not necessary to check that the matching is maximum, because not being maximum could only cause the test to fail; it could not cause a false pass.
Ah, the maximum matching is generated in happiness_upload.py. I was not expecting it to be there; I was expecting it to be in happinessutil.py. Does this mean there is duplication between those two files that should be factored out?
I updated the docs! Please review! https://github.com/zooko/tahoe-lafs/commit/69ec1ad94b29cd06ffc08b19286f6d6ee989a857
comment x-posted from github:
reviewed, clear and descriptive. only one quibble: not explicitly clear at which point(s) the uploader strategy would fail, that is determine it couldn't achieve happiness, either in principle or in practice
I worked on this for a few minutes today and posted notes about what I found as I went. It is entirely possible that the problems I found will turn out to be non-problems once I look more closely at them, and if they are problems they are shallow ones and I'll experiment with cleaning them up myself: [//pipermail/tahoe-dev/2014-February/008908.html].
Here are some of my notes from previous sessions of work on my branch:
This will also fix #1791.
I believe this will fix #1124. We need to confirm that and make sure there's a unit test that demonstrates that.
I'm commenting on "Calculating Share Placements" in here:
https://github.com/zooko/tahoe-lafs/blob/1382-rewrite-3/docs/specifications/servers-of-happiness.rst
I might be wrong but... I think step 10 should be written as follows:
If any placements from step 9 fail (generated in step 7), remove the server from the set of possible servers and regenerate the matchings. Go back to step 6.
I will re-read Mark's docs and consider comment:83092.
Please read http://markjberger.com/upload-strategy-of-happiness/ for context.
I've agreed (OMG what have I gotten myself into) to try to understand this patch, the math behind it, and try to review it and/or rearrange it into smaller pieces that other folks can help review. We still think of this as a blocker for 1.11 (it's kind of the showcase feature), but it's likely to push the release out way past our target. Oh well.
Okay I've updated the docs and included diagrams! https://github.com/zooko/tahoe-lafs/commits/1382-rewrite-4
I intend to squash it into a few coherent commits on a new branch, to be named
1382-rewrite-5
.My most recent code is in Zooko's 1382-rewrite-3 (ae77df1297cdd). If you're working on the code, please start there! :-) See also my notes from when I was last working on this: comment:83089.
My most recent doc is in Zooko's 1382-rewrite-4 (3cd00fb32c5ca).
I intend to re-edit and rebase the docs into a new branch, which will be named
1382-rewrite-5
.we hashed out some more docs in an etherpad, results recorded here: https://gist.github.com/warner/66b604617d307374fe17
Okay, I've written a new introductory doc to complement the specification doc. Brian: please read this and tell me if it makes the servers-of-happiness algorithm seem desirable to you. ☺
Here's the patch that adds the docs:
https://github.com/zooko/tahoe-lafs/commit/89261a922d6fa5165d0341b84c3f09b972f55c95
Here's the introductory doc, rendered:
https://github.com/zooko/tahoe-lafs/blob/89261a922d6fa5165d0341b84c3f09b972f55c95/docs/upload-happiness.rst
Current status of this ticket is: blocked on Brian reading https://github.com/zooko/tahoe-lafs/blob/89261a922d6fa5165d0341b84c3f09b972f55c95/docs/upload-happiness.rst
I've read that, and I like it. Nice explanation!
Milestone renamed
moving most tickets from 1.12 to 1.13 so we can release 1.12 with magic-folders
Mark Berger's code from comment:83066 merged into current master: https://github.com/tahoe-lafs/tahoe-lafs/pull/362
I have a branch based on rebasing instead, which gets to about the same point, but there are tests failing depending on the value of
PYTHONHASHSEED
-- it looks like "because it's choosing different shares" and/or the tests are wrong. But, I wouldn't expect the hashseed to affect how we choose share-placement.Looks like review-needed is not the case at the moment, so I'll remove the tag if nobody minds.
At the Tahoe-LAFS summit recently, we clarified and edited the "servers of happiness" document (that's included in this branch, among a bunch of others). So, this should allow us to write proper tests and then move this forward.
meejah and i made some progress in our pairing session today.
i continued work in my dev branch:
https://github.com/david415/tahoe-lafs/tree/1382.markberger-rewrite-rebase.3
i noticed that happinessutils had a duplicate servers of happiness subroutine, here:
https://github.com/tahoe-lafs/tahoe-lafs/blob/ce47f6aaee952bfc9872458355533af6afefa481/src/allmydata/util/happinessutil.py#L80
i wrote a wrapper so that servers_of_happiness in happinessutils called our new
servers of happiness routine:
https://github.com/david415/tahoe-lafs/blob/b49fc2eff5666d1215dbf4f532f0e14e65921405/src/allmydata/util/happinessutil.py#L81-L91
is this the right approach?
here's my current attempts to make all the tests pass:
https://github.com/david415/tahoe-lafs/tree/1382.markberger-rewrite-rebase.3
currently some tests are still failing; in particular
allmydata.test.test_upload.EncodingParameters.test_exception_messages_during_server_selection
recent progress here:
https://github.com/david415/tahoe-lafs/tree/1382.markberger-rewrite-rebase.4
I fixed a few broken unit tests... at least I thought they were broken because
they failed depending on the value set in PYTHONHASHSEED.
I'm currently stuck on trying to get this unit test to pass::
meejah's latest dev branch https://github.com/meejah/tahoe-lafs/tree/1382.markberger-rewrite-rebase.5
the following tests are failing:
allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
allmydata.test.test_checker.BalancingAct.test_good_share_hosts
I have submitted a pull-request (https://github.com/tahoe-lafs/tahoe-lafs/pull/402) that is a rebased as squashed of the latest stuff with fixes for the last couple tests that failed. One was "not enough sorting", basically (so the answers weren't as stable as they could be) and the other was a test that wasn't quite correct?
In 53130c6/trunk: