"client node probably started" #71

Closed
opened 2007-07-01 03:55:42 +00:00 by zooko · 25 comments

It would be nice if we could remove the "probably" from that message. How about doing a Foolscap "Hi there" with it? (That was Sam Stoller's suggestion.)

It would be nice if we could remove the "probably" from that message. How about doing a Foolscap "Hi there" with it? (That was Sam Stoller's suggestion.)
zooko added the
code
minor
enhancement
0.4.0
labels 2007-07-01 03:55:42 +00:00

The "probably" is there because the runner process has no clear way of knowing if the new process dies right away or continues running.

I've got some code in buildbot which watches the logfile and looks for the message that indicates startup has been successful.. perhaps we could snarf it for this purpose.

What do you mean by a 'Foolscap "Hi there"' message?

-Brian

The "probably" is there because the runner process has no clear way of knowing if the new process dies right away or continues running. I've got some code in buildbot which watches the logfile and looks for the message that indicates startup has been successful.. perhaps we could snarf it for this purpose. What do you mean by a 'Foolscap "Hi there"' message? -Brian
warner added this to the undecided milestone 2007-07-02 19:21:15 +00:00
Author

If the runner process has some positive indication that the new process started up long enough to perform some action (such as writing a message to the log or connecting back to the runner process with Foolscap and saying "Hi there"), then the runner process should inform the user that the process has started, without the "probably".

So, yes, snarfing that code from buildbot would be fine with me.

If the runner process has some positive indication that the new process started up long enough to perform some action (such as writing a message to the log or connecting back to the runner process with Foolscap and saying "Hi there"), then the runner process should inform the user that the process has started, without the "probably". So, yes, snarfing that code from buildbot would be fine with me.
warner added
code-nodeadmin
and removed
code
labels 2007-08-14 18:59:32 +00:00

I'm working on the foolscap approach. I believe it's possible to connect to the node and call get_version, so I'll use that if possible. (I've started by modifying the runner tests to start a node and fail if "probably started" appears in the output.)

I'm working on the foolscap approach. I believe it's possible to connect to the node and call get_version, so I'll use that if possible. (I've started by modifying the runner tests to start a node and fail if "probably started" appears in the output.)
nejucomo self-assigned this 2007-09-05 21:31:01 +00:00
Author

This would fit nicely into the theme of v0.6.1: documentation, packaging, user-friendliness, etc.

This would fit nicely into the theme of v0.6.1: documentation, packaging, user-friendliness, etc.
zooko added
0.6.0
and removed
0.4.0
labels 2007-09-25 04:35:12 +00:00
zooko modified the milestone from undecided to 0.6.1 2007-09-25 04:35:12 +00:00

I'd advise the logfile-scanning approach. Benefits:

  • any exceptions or warnings which occur during startup are displayed to the admin
    who is starting the node, at exactly the time and place they need to see it
  • it displays exceptions even if foolscap fails to work (i.e. if pyopenssl isn't
    installed). Logfile writing is the only requirement

Downsides:

  • it generally requires forking off a process, which is problematic under windows.
    I think I have a good-enough solution for this in buildbot, but I think it involves
    limited functionality
I'd advise the logfile-scanning approach. Benefits: * any exceptions or warnings which occur during startup are displayed to the admin who is starting the node, at exactly the time and place they need to see it * it displays exceptions even if foolscap fails to work (i.e. if pyopenssl isn't installed). Logfile writing is the only requirement Downsides: * it generally requires forking off a process, which is problematic under windows. I think I have a good-enough solution for this in buildbot, but I think it involves limited functionality
Author

Bumping to v0.7 milestone.

Nejucomo: if you aren't planning to fix this ticket, would you please take your name off the "assigned" field?

Bumping to v0.7 milestone. Nejucomo: if you aren't planning to fix this ticket, would you please take your name off the "assigned" field?
zooko modified the milestone from 0.6.1 to 0.7.0 2007-10-13 22:37:00 +00:00

I built a prototype of this, watching twistd.log until the introducer has been contacted. I suspect it will have interactions with windows though (forking), and it probably breaks the 'start -m' (multiple nodes) functionality.

I plan to make it work better once I've gotten more progress down on #197.

I built a prototype of this, watching twistd.log until the introducer has been contacted. I suspect it will have interactions with windows though (forking), and it probably breaks the 'start -m' (multiple nodes) functionality. I plan to make it work better once I've gotten more progress down on #197.
nejucomo was unassigned by warner 2007-10-31 07:45:38 +00:00
warner self-assigned this 2007-10-31 07:45:38 +00:00

Attachment prototype.diff (10798 bytes) added

prototype implementation

**Attachment** prototype.diff (10798 bytes) added prototype implementation
zooko added
0.7.0
and removed
0.6.0
labels 2007-11-13 18:17:41 +00:00
zooko added this to the undecided milestone 2008-01-23 02:46:49 +00:00
Author

I forget exactly how many people I have watched going through the Tahoe install and launch process. About half a dozen. Every single one has exclaimed at "Client node probably started.". I just watched another person do it, and they too exclaimed in exactly the same way, so let's say it's seven out of seven.

I forget exactly how many people I have watched going through the Tahoe install and launch process. About half a dozen. Every single one has exclaimed at "Client node *probably* started.". I just watched another person do it, and they too exclaimed in exactly the same way, so let's say it's seven out of seven.
guest commented 2009-01-13 05:03:39 +00:00
Owner

I will add another vote that "probably" is not a very reassuring word choice. While things seem to be working, I still am unclear as to why I've been told that tahoe has only "probably started"

I will add another vote that "probably" is not a very reassuring word choice. While things seem to be working, I still am unclear as to why I've been told that tahoe has only "probably started"

Incidentally, I just learned that modern twistd can be run as a library. See http://divmod.org/trac/browser/trunk/Axiom/axiom/scripts/axiomatic.py for an example. This would make it easier to avoid the extra subprocess, and might make it easier to provide a more confident answer to this ticket.

In general, if we can instantiate the Client before the fork, then the parent process can be sure that:

  1. the child was able to load all the correct Tahoe code, and import all the dependencies
  2. the tahoe.cfg file was well-formed and none of its values caused immediate problems

To feel confident that the Client actually got started, we'll need to establish some form of communication between the "tahoe start" parent and the actual node process, whether that means tailing the logfile or connecting to the control.furl .

Incidentally, I just learned that modern twistd can be run as a library. See <http://divmod.org/trac/browser/trunk/Axiom/axiom/scripts/axiomatic.py> for an example. This would make it easier to avoid the extra subprocess, and might make it easier to provide a more confident answer to this ticket. In general, if we can instantiate the Client before the fork, then the parent process can be sure that: 1. the child was able to load all the correct Tahoe code, and import all the dependencies 2. the tahoe.cfg file was well-formed and none of its values caused immediate problems To feel confident that the Client actually got started, we'll need to establish some form of communication between the "tahoe start" parent and the actual node process, whether that means tailing the logfile or connecting to the control.furl .
tahoe-lafs modified the milestone from eventually to 1.7.0 2010-02-02 03:15:29 +00:00
Author

Jeremy Visser has packaged Tahoe-LAFS v1.6.1 for Ubuntu Lucid. He tried to test his package by following these instructions: http://allmydata.org/source/tahoe-lafs/trunk/docs/running.html but he got stuck and gave up on testing it (until I reminded him to try again). So I asked why he had given up:

<jayvee> I'm reading that, and not getting very far
<zooko> Why not?
<zooko> Sounds like I need to file a bug report on that page. :_)
<jayvee> oh, just the feedback I'm getting is not very descriptive
<jayvee> "introducer node probably started"
<jayvee> I'm basically expecting to see "tahoe successfully started, browse to
	 this_url to view contents"
<jayvee> but maybe I'm just a simpleton
<jayvee> 'tahoe run' blocks with no feedback. I presume that's intentional (no
	 news is good news), but a little disconcerting to someone who has
	 never used it before.
Jeremy Visser has packaged Tahoe-LAFS v1.6.1 for Ubuntu Lucid. He tried to test his package by following these instructions: <http://allmydata.org/source/tahoe-lafs/trunk/docs/running.html> but he got stuck and gave up on testing it (until I reminded him to try again). So I asked why he had given up: ``` <jayvee> I'm reading that, and not getting very far <zooko> Why not? <zooko> Sounds like I need to file a bug report on that page. :_) <jayvee> oh, just the feedback I'm getting is not very descriptive <jayvee> "introducer node probably started" <jayvee> I'm basically expecting to see "tahoe successfully started, browse to this_url to view contents" <jayvee> but maybe I'm just a simpleton <jayvee> 'tahoe run' blocks with no feedback. I presume that's intentional (no news is good news), but a little disconcerting to someone who has never used it before. ```
Author
<jayvee> I ran 'tahoe start .' and 'tahoe run', and yet nothing is listening
	 on port 3456.
<jayvee> the documentation (using.html) says that should be the case
``` <jayvee> I ran 'tahoe start .' and 'tahoe run', and yet nothing is listening on port 3456. <jayvee> the documentation (using.html) says that should be the case ```
Author
<jayvee> zooko, as a point of comparison, this is what upstart gives me
<jayvee> $ sudo start mythtv-backend
<jayvee> mythtv-backend start/running, process 17060
<jayvee> much more satisfying. even printing the PID makes me much more
	 confident.
``` <jayvee> zooko, as a point of comparison, this is what upstart gives me <jayvee> $ sudo start mythtv-backend <jayvee> mythtv-backend start/running, process 17060 <jayvee> much more satisfying. even printing the PID makes me much more confident. ```
tahoe-lafs added
major
and removed
minor
labels 2010-03-09 19:39:17 +00:00
Author

It looks like http://twistedmatrix.com/trac/ticket/823 would solve this ticket with its --wait option.

See also #602 which is about "probably not started" not being sufficiently detailed and #529 which is about detecting problems on startup and failing loudly instead of quietly, and #371 which is about a common problem on startup.

It looks like <http://twistedmatrix.com/trac/ticket/823> would solve this ticket with its `--wait` option. See also #602 which is about "probably not started" not being sufficiently detailed and #529 which is about detecting problems on startup and failing loudly instead of quietly, and #371 which is about a common problem on startup.
zooko modified the milestone from 1.7.0 to soon 2010-06-04 07:50:11 +00:00
davidsarah commented 2010-07-22 05:24:53 +00:00
Owner

We could just adapt the approach suggested in the twisted ticket (and implemented in this patch) rather than waiting for twisted to adopt it. That would also allow us to receive arbitrary messages from the child process and print them, addressing /tahoe-lafs/trac-2024-07-25/issues/5130#comment:53 for example.

We could just adapt the approach suggested in the twisted ticket (and implemented in [this patch](http://twistedmatrix.com/trac/changeset/23471)) rather than waiting for twisted to adopt it. That would also allow us to receive arbitrary messages from the child process and print them, addressing [/tahoe-lafs/trac-2024-07-25/issues/5130](/tahoe-lafs/trac-2024-07-25/issues/5130)#comment:53 for example.
Author

It would be nice to contribute to Twisted. We either do so directly by contributing patches and code review for Twisted #823 and then waiting for it to be deployed and the using it in Tahoe-LAFS, or at least we could work on a patch within Tahoe-LAFS but be sure to carefully cross-link it with the relevant Twisted tickets and to try to get a similar patch committed to Twisted.

It would be nice to contribute to Twisted. We either do so directly by contributing patches and code review for [Twisted #823](http://twistedmatrix.com/trac/ticket/823) and then waiting for it to be deployed and the using it in Tahoe-LAFS, or at least we could work on a patch within Tahoe-LAFS but be sure to carefully cross-link it with the relevant Twisted tickets and to try to get a similar patch committed to Twisted.

I think this is resolved by changeset:ac3b26ecf29c08cb .. anyone want to confirm?

I think this is resolved by changeset:ac3b26ecf29c08cb .. anyone want to confirm?
Author

I ran tahoe start and it didn't print out any uncertainty-inducing messages:

Zooko-Ofsimplegeos-MacBook-Pro:~ pubvolgrid$ tahoe start
STARTING '/Users/pubvolgrid/.tahoe'

Hm, news-needed.

I ran `tahoe start` and it didn't print out any uncertainty-inducing messages: ``` Zooko-Ofsimplegeos-MacBook-Pro:~ pubvolgrid$ tahoe start STARTING '/Users/pubvolgrid/.tahoe' ``` Hm, news-needed.
zooko added the
fixed
label 2010-11-20 06:32:28 +00:00
zooko modified the milestone from soon to 1.8.1 2010-11-20 06:32:28 +00:00
zooko closed this issue 2010-11-20 06:32:28 +00:00
Author

Hey does this mean that we can start running all these tests on cygwin and/or windows now:

[test_runner.py]source:trunk/src/allmydata/test/test_runner.py?annotate=blame&rev=4800#L38

It looks like both of these conditions which force tests to be skipped are now irrelevant and all tests should be runnable, but I'm not sure.

Hey does this mean that we can start running all these tests on cygwin and/or windows now: [test_runner.py]source:trunk/src/allmydata/test/test_runner.py?annotate=blame&rev=4800#L38 It looks like both of these conditions which force tests to be skipped are now irrelevant and all tests should be runnable, but I'm not sure.
zooko removed the
fixed
label 2010-11-20 06:53:35 +00:00
zooko reopened this issue 2010-11-20 06:53:35 +00:00
davidsarah commented 2010-11-20 21:16:58 +00:00
Owner

Replying to zooko:

Hey does this mean that we can start running all these tests on cygwin and/or windows now:

[test_runner.py]source:trunk/src/allmydata/test/test_runner.py?annotate=blame&rev=4800#L38

Possibly, I will investigate that.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/71#issuecomment-60666): > Hey does this mean that we can start running all these tests on cygwin and/or windows now: > > [test_runner.py]source:trunk/src/allmydata/test/test_runner.py?annotate=blame&rev=4800#L38 Possibly, I will investigate that.
davidsarah commented 2010-11-20 22:35:04 +00:00
Owner

Replying to zooko:

Hey does this mean that we can start running all these tests on cygwin and/or windows now: [test_runner.py]source:trunk/src/allmydata/test/test_runner.py?annotate=blame&rev=4800#L38

Apparently not.

The cygwin part of this is #908, and is due to a bug in twisted.internet.utils on cygwin that apparently causes it to hang. (I haven't tested it with recent cygwin, but it wouldn't have been affected by changeset:ac3b26ecf29c08cb.)

For native Windows, we currently skip the test_runner.RunNode tests because of #27 (twistd doesn't daemonize on windows). That is, tahoe start behaves like tahoe run on Windows, which is too different for the tests to work. It looks non-trivial to make them work without fixing either #27 or #1121 (test 'tahoe run').

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/71#issuecomment-60666): > Hey does this mean that we can start running all these tests on cygwin and/or windows now: [test_runner.py]source:trunk/src/allmydata/test/test_runner.py?annotate=blame&rev=4800#L38 Apparently not. The cygwin part of this is #908, and is due to a bug in `twisted.internet.utils` on cygwin that apparently causes it to hang. (I haven't tested it with recent cygwin, but it wouldn't have been affected by changeset:ac3b26ecf29c08cb.) For native Windows, we currently skip the `test_runner.RunNode` tests because of #27 (twistd doesn't daemonize on windows). That is, `tahoe start` behaves like `tahoe run` on Windows, which is too different for the tests to work. It looks non-trivial to make them work without fixing either #27 or #1121 (test 'tahoe run').
Author

Thanks for investigating!

Thanks for investigating!
zooko added the
fixed
label 2010-11-20 23:16:59 +00:00
zooko closed this issue 2010-11-20 23:16:59 +00:00
davidsarah commented 2010-11-21 02:07:11 +00:00
Owner

Replying to [davidsarah]comment:31:

[...] That is, tahoe start behaves like tahoe run on Windows, [...]

More precisely: tahoe start now behaves like tahoe run. Prior to changeset:ac3b26ecf29c08cb, it [used os.system]source:src/allmydata/scripts/startstop_node.py@4641#L96 to run twistd, which put the node in a different process to the tahoe command, although that process did not then daemonize. Since changeset:ac3b26ecf29c08cb, it runs the node in the same process as the tahoe command. Hmm, is that a regression?

(Here is the code for twistd on Windows, and here is for Unix.)

Replying to [davidsarah]comment:31: > [...] That is, `tahoe start` behaves like `tahoe run` on Windows, [...] More precisely: `tahoe start` now behaves like `tahoe run`. Prior to changeset:ac3b26ecf29c08cb, it [used os.system]source:src/allmydata/scripts/startstop_node.py@4641#L96 to run `twistd`, which put the node in a different process to the `tahoe` command, although that process did not then daemonize. Since changeset:ac3b26ecf29c08cb, it runs the node in the same process as the `tahoe` command. Hmm, is that a regression? ([Here](http://twistedmatrix.com/trac/browser/trunk/twisted/scripts/_twistw.py) is the code for `twistd` on Windows, and [here](http://twistedmatrix.com/trac/browser/trunk/twisted/scripts/_twistd_unix.py) is for Unix.)
Author

Replying to [davidsarah]comment:34:

More precisely: tahoe start now behaves like tahoe run. Prior to changeset:ac3b26ecf29c08cb, it [used os.system]source:src/allmydata/scripts/startstop_node.py@4641#L96 to run twistd, which put the node in a different process to the tahoe command, although that process did not then daemonize. Since changeset:ac3b26ecf29c08cb, it runs the node in the same process as the tahoe command. Hmm, is that a regression?

I don't think anybody benefited from or cared about the fact that it used to run it in a separate process. It just made it harder to kill it on Windows.

Replying to [davidsarah]comment:34: > More precisely: `tahoe start` now behaves like `tahoe run`. Prior to changeset:ac3b26ecf29c08cb, it [used os.system]source:src/allmydata/scripts/startstop_node.py@4641#L96 to run `twistd`, which put the node in a different process to the `tahoe` command, although that process did not then daemonize. Since changeset:ac3b26ecf29c08cb, it runs the node in the same process as the `tahoe` command. Hmm, is that a regression? I don't think anybody benefited from or cared about the fact that it used to run it in a separate process. It just made it harder to kill it on Windows.
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#71
No description provided.