show full, explorable details about check and repair operations #1821

Open
opened 2012-10-04 15:39:41 +00:00 by zooko · 3 comments
On Mon, Jul 9, 2012 at 10:39 AM, Brad Rupp <bradrupp@gmail.com> wrote:
>
> The output from repair #1:
>
> repair successful
> done: 11801 objects checked
>  pre-repair: 11725 healthy, 76 unhealthy
>  76 repairs attempted, 76 successful, 0 failed
>  post-repair: 11801 healthy, 0 unhealthy
>
> The output from repair #2:
>
> done: 11801 objects checked
>  pre-repair: 11789 healthy, 12 unhealthy
>  12 repairs attempted, 11 successful, 1 failed
>  post-repair: 11800 healthy, 1 unhealthy
>
> As you can see, the first repair found and fixed 76 unhealthy objects. The
> second repair, approximately 12 hours later, found 12 unhealthy objects and
> fixed 11 of them.
>
> Why would the second repair find 12 unhealthy objects?  I would have
> expected it to find 0 unhealthy objects given that the first repair was
> performed only 12 hours earlier.

Wouldn't it be great if the text that said "12 repairs attempted, 11 successful, 1 failed" had hyperlinks to web pages that listed all of the repair attempts, where you could see which file was not healthy, which servers the repair job attempted to use to repair the file, and what happened with each server that led to success or failure?

Providing such a web page would mostly just be a matter of "web programming" -- generating HTML that shows the contents of the Python objects in memory which contain that data.

See [//pipermail/tahoe-dev/2012-July/007544.html this thread on the tahoe-dev list].

``` On Mon, Jul 9, 2012 at 10:39 AM, Brad Rupp <bradrupp@gmail.com> wrote: > > The output from repair #1: > > repair successful > done: 11801 objects checked > pre-repair: 11725 healthy, 76 unhealthy > 76 repairs attempted, 76 successful, 0 failed > post-repair: 11801 healthy, 0 unhealthy > > The output from repair #2: > > done: 11801 objects checked > pre-repair: 11789 healthy, 12 unhealthy > 12 repairs attempted, 11 successful, 1 failed > post-repair: 11800 healthy, 1 unhealthy > > As you can see, the first repair found and fixed 76 unhealthy objects. The > second repair, approximately 12 hours later, found 12 unhealthy objects and > fixed 11 of them. > > Why would the second repair find 12 unhealthy objects? I would have > expected it to find 0 unhealthy objects given that the first repair was > performed only 12 hours earlier. ``` Wouldn't it be great if the text that said "12 repairs attempted, 11 successful, 1 failed" had hyperlinks to web pages that listed all of the repair attempts, where you could see which file was not healthy, which servers the repair job attempted to use to repair the file, and what happened with each server that led to success or failure? Providing such a web page would mostly just be a matter of "web programming" -- generating HTML that shows the contents of the Python objects in memory which contain that data. See [//pipermail/tahoe-dev/2012-July/007544.html this thread on the tahoe-dev list].
zooko added the
unknown
normal
defect
1.9.2
labels 2012-10-04 15:39:41 +00:00
zooko added this to the undecided milestone 2012-10-04 15:39:41 +00:00
zooko added
code-frontend-web
enhancement
and removed
unknown
defect
labels 2012-10-04 15:40:59 +00:00
davidsarah commented 2012-10-11 04:31:19 +00:00
Owner

I think this is a good idea.

I think this is a good idea.
tahoe-lafs modified the milestone from undecided to eventually 2012-10-11 04:31:19 +00:00
Author

related tickets: #1596, #1116, #2101, #2130

related tickets: #1596, #1116, #2101, #2130
daira commented 2014-12-11 23:26:39 +00:00
Owner

#2130 was a duplicate. Its description was:

In today's Weekly Dev Chat, nejucomo said that in addition to synthetic metrics like "recoverable, healthy, happy, and needs-rebalancing", he wants to see the complete list of which servers are holding which shares. That sounds like a great idea! To close this ticket, make it so that checker results contain that information.

related tickets: #1821, #1596, #1116

especially related ticket: #2101, which is the same as this ticket except #2101 is about presenting this information in an error message and this ticket is about presenting it in a checker-results.

#2130 was a duplicate. Its description was: > In today's Weekly Dev Chat, nejucomo said that in addition to synthetic metrics like "recoverable, healthy, happy, and needs-rebalancing", he wants to see the complete list of which servers are holding which shares. That sounds like a great idea! To close this ticket, make it so that checker results contain that information. > > related tickets: #1821, #1596, #1116 > > especially related ticket: #2101, which is the same as this ticket except #2101 is about presenting this information in an error message and this ticket is about presenting it in a checker-results.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1821
No description provided.