inappropriate "uncoordinated write error" after handling a server failure #540
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#540
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I noticed the automated "speedtest" failing with an unexpected Uncoordinated Write Error for the past few days. There were several issues involved, but the one for this ticket is as follows:
So really the sole publisher is colliding with themselves.
I think the fix would be to have the publisher keep track of which share requests it has sent, perhaps in the servermap (as "pending writes", or "proposed writes"). When the second writev request is generated, it should build a test vector based upon the pending write (so it includes share2).
I think the publisher can also hit this for already-existing files too, where the first message says "I think you have sh1=ver1, here is sh1=ver2", and then (because of some other server having an error) it wants to add a second share to that same server, so it sends "I think you have sh1=ver1, here is sh2=ver2", and is surprised when the server says "actually I have sh1=ver2 you numbskull".
I think zooko's
incident-2009-07-29-104230-vyc6byy.flog.bz2
in ticket #786 is related, but I haven't been able to figure it out exactly (it reports a surprise, but the log event says that their report matches our expectations, which makes me think that the code which logs the event is showing a different "expectation" than the one that was bundled in the testv portion of the share-write request.. it feels like two messages being sent at the same time to the same server).This might be related to #899, newly reported by Kyle Markley and Andrej Falout.
It's really bothering me that mutable file upload and download behavior is so finicky, buggy, inefficient, hard to understand, different from immutable file upload and download behavior, etc. So I'm putting a bunch of tickets into the "1.8" Milestone. I am not, however, at this time, volunteering to work on these tickets, so it might be a mistake to put them into the 1.8 Milestone, but I really hope that someone else will volunteer or that I will decide to do it myself. :-)
I'm almost certain that I'll end up squashing this with MDMF, so I'll assign it to myself.
If you like this ticket, you might like #546 (mutable-file surprise shares raise inappropriate UCWE).
If you like this ticket, you might like #547 (mapupdate(MODE_WRITE) triggers on a false boundary).
Kevan will look at whether his MDMF patches squash this.