directory isn't rendered at all sometimes #463
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#463
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Justin wasn't connected to the introducer or to any servers, and when he looked at a directory, the boilerplate at the top rendered, but then no directory contents were rendered -- it just waited indefinitely. Brian said he thinks that if there are no storage servers at all then instead of giving an error about failing to download the SSK, it hangs.
Just now I saw the same thing. It looked like I did have many servers connected (on the Test Grid), but I wasn't sure if that welcome page with the stats was stale -- had been loaded earlier when I was connected to a different wireless network. I reloaded the status page and it showed the same (as far as I noticed) status, and then I reloaded the directory and it loaded normally.
This just happened to me again, and reloading the directory, even after the storage servers are connected, doesn't help -- it still fails to render the directory contents in the same way. Restarting the tahoe node, and waiting until the servers are connected before loading the directory, causes it to load normally.
This just happened again. Even though the node had been running for a long time and had many storage servers connected, the fact that I attempted to load the directory earlier, when too few servers were connected, appears to prevent it from ever loading until I restart my node. I guess this could have to do with our caching of the DirNode object.
Hm, we keep the dirnode object around, but we don't really cache the results of the read (each time you do dirnode.read(), it will contact all the servers again).
Is it fairly reproduceable? I'll see if I can trigger it under closer observation, maybe by starting a node on my laptop with the network disconnected, try (and fail) to read the directory, then connect the network, allow servers to connect, then try to read the directory again.
Ok, so I am able to reproduce this locally. The second read failing is because of our serialization strategy: the second read is not allowed to proceed until the first has finished, and the first one never finishes. Interrupting the GET doesn't cause the read to stop (although it probably should.. the API doesn't lend itself to that, though).
I'll look more closely at what happens when there are no servers to be asked, that case is probably not handled correctly.
Yup, it was never entering the state machine.. the operation would just hang forever. Fixed (by changeset:91c7e0f6897827fe), although the new behavior is to emit a "no recoverable versions" error message, whereas if we aren't connected to any servers it might be more useful to say something like "I'm not connected to any servers".
leaving this open for a while longer, because it needs a unit test
changeset:2074c92dd13abb23 adds the unit tests, they aren't exactly on the same situation as Justin saw (a webapi GET of a dirnode while no servers are connected), but they should cover the same underlying problems.
I think it's safe to close this one now.