connection lost during "tahoe backup" #782
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#782
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Andrej Falout reported this to tahoe-dev.
Andrej: could you please look for "incident report files" which were created around the time of the problem, in your
$TAHOEBASEDIR/logs/incidents
directory. If there is an incident report file created about the same time as the (first) failure you encounter, please attach it to this ticket. Thanks!andrej: the allmydata.com servers have occasionally been full and rejecting new uploads. This may have caused your problem. Did you look for incident report files? Does this problem still occur? Thanks.
andrej sent me this note in private email:
"The issue is cause in large majority of the cases by Tahoe's poor resistance to concurrent traffic; put it simply, if I have p2p client running with more then few hundred opened connections, Tahoe starts loosing connections. I stop p2p, Tahoe immediately starts working again.
Please note that this is not a bad router kind of issue, I tested it extensively while debugging another issue. Or a saturated connection, there is plenty of headroom left, and no other network app I use exhibits this kind of sensitivity. It simply looks like Tahoe want response NOW, and if it does not get it NOW, it just gives up.
I'd suspect a more tollerant timeouts plus a connection retry handling would go a long way in fixing this."
In response to Zooko's comments:
"I don't see how your theory can fit with my mental model of the Tahoe-LAFS network code. Maybe if you turn on some extra logging and then stimulate it to fail and then post the logs then I can figure it out."
I can confirm without any uncertainty that running a P2P app with large number of connections kills Tahoe. I even scripted this into my backup scripts so all P2P traffic is stopped when running Tahoe.. Lite P2P (5 files/500 connections or so) is OK but anything significantly over this is a killer.
Now whether this means something can or even should be changed in Tahoe, is another matter entirely.
I would argue that for an application that is supposed to transfer a large amount of data over a long period of time, ability to recover form any sort of network interruptions is a paramount.
I would even go so far as not to allow Tahoe to quit for this reason at all, instead preferring it to retry the action indefinitely, until it either completes the requested operation, or user interrupts it.