unit test failure on cygwin #208

Closed
opened 2007-11-14 14:52:31 +00:00 by zooko · 2 comments

The unit tests on the cygwin buildslave are failing. The waterfall suggests that they started failing on patch changeset:5e974ede20b1bae4, but that makes no sense.

The unit tests on the cygwin buildslave are failing. The waterfall suggests that they started failing on patch changeset:5e974ede20b1bae4, but that makes no sense.
zooko added the
unknown
major
defect
0.7.0
labels 2007-11-14 14:52:31 +00:00
zooko added this to the 0.7.0 milestone 2007-11-14 14:52:31 +00:00

if I recall correctly, this test failure involves staring a node, shutting it
down, waiting for a moment, then starting it back up again. The test is
intended to make sure that the state created by the first incarnation is
usable by the second.

The test will fail if the first node has not finished shutting down by the
time the second node starts, because the node will re-use the TCP port
number, and if the first node is still running, the second node will be
unable to grab the same port.

On windows, it appears that either we do not have a good handle on when the
first node has finished shutting down. I remember putting an arbitrary delay
(2 seconds?) in there to improve the isolation, but obviously that only works
if the machine is not heavily loaded and can complete its shutdown in time.

I recall that the fix wasn't trivial, because I've already put half a day of
effort into fixing this one and wasn't able to figure it out.

if I recall correctly, this test failure involves staring a node, shutting it down, waiting for a moment, then starting it back up again. The test is intended to make sure that the state created by the first incarnation is usable by the second. The test will fail if the first node has not finished shutting down by the time the second node starts, because the node will re-use the TCP port number, and if the first node is still running, the second node will be unable to grab the same port. On windows, it appears that either we do not have a good handle on when the first node has finished shutting down. I remember putting an arbitrary delay (2 seconds?) in there to improve the isolation, but obviously that only works if the machine is not heavily loaded and can complete its shutdown in time. I recall that the fix wasn't trivial, because I've already put half a day of effort into fixing this one and wasn't able to figure it out.

I put in an extra delay on this test.. it looks like the delay I put in earlier was in the wrong place. It remains to be seen whether this is a stable fix or not.

I put in an extra delay on this test.. it looks like the delay I put in earlier was in the wrong place. It remains to be seen whether this is a stable fix or not.
zooko added the
fixed
label 2007-11-19 19:27:23 +00:00
zooko closed this issue 2007-11-19 19:27:23 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#208
No description provided.