client feedback channel #484
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#484
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
It would be nice if clients had a way to report errors and performance results to a central gatherer. This would be configured by dropping a "client-feedback.furl" file into the client's basedir. The client would then use this to send the following information to a gatherer at that FURL:
The issue is that, since Tahoe is such a resilient system, there are a lot of failures modes that wouldn't be visible to users. If a client reads a share and sees the wrong hash, it just uses a different one, and users don't see any problems unless there are many simultaneous failures. However, from the grid-admin/server point of view, a bad hash is a massively unlikely event and indicates serious problems: a disk is failing, a server has bad RAM, a file is corrupted, etc. The server/grid admin wants to know about these, even though the user does not.
Similarly, there are a number of grid tuning issues that are best addressed by learning about the client experience, and watching them change over time. When you add new servers to a grid, clients who ask all servers for their shares will take longer to do peer selection. How much longer? The best way to find out is to have the client report their peer-selection time to a gatherer on each operation, and then the gatherer can graph them over time. The grid admins might want to make their servers faster if it would improve user experience, but they need to find out what the user experience is before then can make that decision.
We might want to make these two separate reporting channels, with separate FURLs. Also, we can batch the reporting of the performance numbers: we don't have to report every single operation. We could cut down on active network connections by only trying to connect once a day and dumping incidents if and when we establish a connection. We need to keep several issues in mind: thundering herd, overloading the gatherer, bounding the queue size.
Does [the log-gatherer]source:trunk/docs/logging.rst?rev=861892983369c0e96dc1e73420c1d9609724d752#log-gatherer satisfy this ticket?