'tahoe start -m' no longer works #1262
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1262
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The recent import-and-call-twistd change (changeset:ac3b26ecf29c08cb) unfortunately breaks
the -m/--multiple option, which is what lets me do
./bin/tahoe start -m ../MY-TESTNET/node-*
to spin up a 5-nodelocal test grid in a single command. It also breaks
tahoe restart -m
,which I use even more frequently as I iterate over a fix.
The symptom is that the first node-startup terminates the 'tahoe start'
command, so that none of the subsequent nodes are started. The call to
twistd.run()
is terminating the program, so it never makes it to thesecond pass through the loop.
I really prefer import-and-call over fork-and-exec, so I want to keep it
and find some other way to fix -m. The fix will probably be to make
the -m path use fork() just inside the loop over all target
directories, something like:
One downside is that any errors which occur in twistd.run() (like
ImportErrors
, or problems in tahoe.cfg) will be eaten by thechild, so the overall 'tahoe start' exit code won't reflect the error.
The stderr would still be emitted (although interleaved painfully).
Another potential downside is that I've heard of one OS (freebsd?
OS-X?) which allows fork()-then-exec() but forbids
fork()-then-anything-else, and this would fail badly on such a system.
Attachment start-m.diff (2026 bytes) added
preliminary patch: use fork() to start/restart all nodes but the last
That patch only uses fork() when there are multiple targets, and doesn't use
fork on the last target (i.e. it calls
twistd.run()
inline, as it didbefore). So it ought to behave exactly the same when start/restart is invoked
with just a single target. It seems to work for my testgrid.
Writing a unit test might be a slight nuisance: I think it'd require a fork
or a spawn within the test case. But we're probably already doing that
somewhere, so maybe we've got a good starting point.
test needed! regression!
This is a regression from 1.8.0, and a serious one, but there is no unit test. I will see if I can figure out how to write a unit test for this.
Ran out of time for tonight. If anybody wants to have a crack at unit-testing this patch, that would be great!
os.fork()
is unsupported on Windows. Either we need to:twisted.internet.utils.getProcessOutputAndValue
, which does work on Windows; or-m
is only supported on other platforms (which would still be a regression relative to 1.8.0) and remove that option from the help on Windows.Doc for
twisted.internet.utils
: http://twistedmatrix.com/documents/10.1.0/api/twisted.internet.utils.html . We use it already intest_runner
.I would be reluctant to adopt a patch which makes the functionality different on different platforms.
The 1.8.1 release is past due, and it is blocked by this issue and by #1264.
Perhaps we should go ahead and release 1.8.1 as-is, with the
-m
option broken (and, re #1264, with download performance whenK
is large worse than v1.7).I'm cool with leaving this regression until later.
FWIW, I've been happy with the os.fork() patch locally. Also,
-m
is really just for developers (since normal users should only be running one node per disk, and-m
is really just a shortcut to spin up a whole local testgrid at once). I'd be surprised (but pleased) if anyone other than me had ever used it. So documenting that-m
doesn't work on non-os.fork()
platforms would be fine with me (probably best done by conditionally inserting the argument into theUsage
class depending uponhasattr(os, "fork")
).Incidentally, it was OS-X (10.6 "Snow Leopard") which sometimes complains about fork-without-immediate-exec. See here (in particular the part that says "Warning: When launching separate processes using the fork function, you must always follow a call to fork with a call to exec or a similar function. Applications that depend on the Core Foundation, Cocoa, or Core Data frameworks (either explicitly or implicitly) must make a subsequent call to an exec function or those frameworks may behave improperly."). Also see the Twisted discussion here and maybe the Adium discussion here. Something to do with protecting apps against weird thread behavior.
I think the consensus is that programs launched from the Dock under 10.6 are
not allowed to use most OS-X-specific APIs (Cocoa, Quartz, etc) after a
fork
but before anexec
. Fortunately, Tahoe isn'tlaunched from the Dock (until someone builds a launcher app), and I don't
think it's using any of those OS-X APIs at all.
So I don't think we need to worry about this. I've been testing the patch on
my OS-X 10.6 box without problems.
Attachment start-m-with-test.diff (10761 bytes) added
better patch: includes test, removes -m from --help when unavailable
_node_has_started
->_nodes_have_started
. Otherwise LGTM.Cool, pushed in changeset:f3adb037ae0d22eb. One thing I caught at the last minute: the new test needed a distinct workdir ("test_multiple_clients" instead of the cut-and-paste leftover "test_client"), otherwise the two tests could interfere with each other. (I observed a test_multiple_clients failure at the very end, where it threw an exception while trying to delete a non-existent hotline file: the whole c2/ directory was missing).
Drat, there was a problem with the windows buildslave (
--multiple
was being removed for 'tahoe create-client' too, which can handle it just fine), and the fix was getting messy, so I reverted that patch in changeset:f36bda278014589a.If David-Sarah's work to come up with an os.fork replacement for windows works out, that will make things much easier (the actual '
tahoe start
' code is working fine, it's just the "don't advertise--multiple
on platforms that can't accept it" code that broke).If not, I'll look into refactoring
BasedirMixin
to make this easier to get right. For reference, we can accept--multiple
on all platforms fortahoe create-client
and other create-node commands. We do not accept--multiple
ever fortahoe run
. And we accept--multiple
only on platforms that haveos.fork
(i.e. on non-windows) fortahoe start
andtahoe restart
.For reference, here's one of the windows buildslave errors which happened with changeset:f3adb037ae0d22eb:
Sigh. Alright, Zooko has convinced me that keeping
-m
working (at least fortahoe start -m
) is not worth needing to manage different--help
texts for windows. I've agreed to remove-m
support and write a local shell script or something to do the equivalent function (zooko suggestsfunction tahoestartm { for NODE in ${*} ; do tahoe start $NODE ; done }
).I'll be bitter about it, though :-). I should keep a list of useful features that I've had to remove because they weren't easy to implement under windows, especially when nobody would ever have noticed that they were missing under windows. I'm pretty sure that I'm the only person who has ever used
tahoe start -m
, and it will take me some time to retrain my fingers to type something else.The code will be cleanest if we remove
-m
from everything (did you know it was accepted bytahoe create-client
,tahoe create-keygen
, andtahoe create-stats-gatherer
? yeah, didn't think so :-). I don't think anyone will miss it.I'll put together a patch to remove -m from everything and attach it here.
Attachment remove-m.diff (9680 bytes) added
remove --multiple/-m from all CLI commands, clean up basedir processing
I use the -m option frequently from the command line to start/stop/create-client etc. I can get around it, but I too will briefly mourn its passing.
Looks good! Hooray for less code!
In changeset:69b42c6cb7ec5d4a:
Attachment many (108 bytes) added
shell script to replace "-m": run like "many ./bin/tahoe start ../GRID/node*"