Exception: <class 'allmydata.mutable.common.UncoordinatedWriteError'> when trying to create directory on testgrid #748
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#748
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
In a random attempt to see what the new CSS wui looks like, I connected to testgrid.allmydata.com:3567 just now.
I clicked the "Create a directory" button and was presented with the exception listed above. I'm afraid I see no further information to provide but the error is reproducable. The actual HTML source returned is the following:
Please look in your
~/.tahoe/logs/incidents
directory and attach any incident report files that occurred around this time.I don't understand. This is not on a Tahoe instance running on my machine. It is on the testgrid at testgrid.allmydata.com. Could you visit http://testgrid.allmydata.com:3567 and see if you can reproduce this?
Yes, I can. Thanks for the bug report!
Hm, I guess we should look into this.
Soultcer just reported the same thing on IRC.
Okay, so this could be caused by one or more of the following known issues: #651 (handle MemoryError by failing quickly and loudly) (I added a note about this issue on that issue just now, #548 (mutable publish sends queries to servers that have already been asked), #547 (mapupdate(MODE_WRITE) triggers on a false boundary), #546 (mutable-file surprise shares raise inappropriate UCWE), #540 (inappropriate "uncoordinated write error" after handling a server failure). I will investigate more in the evening after work and next weekend.
Attached is the foolscap logtool incident report. The critical excerpt of that report looks like this:
Hm, so far I don't understand how the client was surprised by this chain of events.
But, now it is time to go to work so I'll wonder about it later.
Attachment incident-2009-07-14-053247-3x6tsnq.flog.bz2 (76120 bytes) added
since zooko found a MemoryError on one of the testgrid servers, this is most likely another instance of #540.
I upgraded and restarted the tahoebs5 storage servers, but I still get an UCWE from testgrid.allmydata.org:3456.
Attached are two incident reports:
Attachment incident-2009-07-15-063507-jljozha.flog.bz2 (74310 bytes) added
Attachment incident-2009-07-15-063511-xyeh2hy.flog.bz2 (73453 bytes) added
Terrell Russell pointed out that http://testgrid.allmydata.org:3567 currently says:
Perhaps this issue (ticket #748) is caused by #653.
Grr. My blog broke. My blog is hosted on the Test Grid. That was really the last straw! I don't mind if everyone else who uses the Test Grid is inconvenienced by it being impossible to create directories, but I'm not going to leave my blog broken! So I upgraded the web gateway from:
Now instead of
Connected to 350 of 61 known storage servers
it saysConnected to 14 of 24 known storage servers
and my blog works nicely. Also, people can create directories again.This doesn't mean that this issue is resolved for v1.5 release though. We still need to investigate.
Okay, we investigated (details posted on #653), and I'm pretty that this was caused by #653, and I'm pretty sure that #653 has been fixed. Closing this as "fixed".