use leasedb (not crawler) to figure out how many shares you have and how many bytes #1836
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
5 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1836
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
In current trunk, there is a "BucketCountingCrawler" whose job it is to count up how many shares are stored.
I propose that this be replaced by using the leasedb to count files (a simple SQL COUNT query!), and at the same time to extend the storage server's abilities by letting it be able to add up the aggregate sizes of things as well as their number.
This is part of an "overarching ticket" to eliminate most uses of crawler — ticket #1834.
The part about reporting total space usage would be very useful for customers of LeastAuthority.com (who pay per byte), among others.
+1.
stop crawling share files in order to figure out how many shares you haveto use leasedb (not crawler) to figure out how many shares you have and how many bytesUsing leasedb this way would facilitate solving #671 — bring back sizelimit (i.e. max consumed, not min free).
Using leasedb this way would facilitate solving #940.
The most basic form of the 'total used space' query is
How much account-specific information should we add? At the moment, there are only two accounts -- anonymous and starter -- but that is already enough to introduce the complication that more than one account can hold a lease on the same share, so the query above is not equivalent to
since that can count space for a share more than once.
This query solves the above problem, giving the total number of leased shares and the total space used by leased shares:
(Any WHERE clause can be added to the inner SELECT to pick leases that satisfy certain criteria.)
And this gives the number of shares and total used space leased by each account, sorted beginning with the one that is using most space:
After talking with markberger today, I realized that #1818 is the ticket to merge leasedb into trunk, and #1819 is the superceding ticket to merge leasedb+cloud-backend into trunk.
Here is a patch for this ticket: https://github.com/markberger/tahoe-lafs/tree/1836-use-leasedb-for-share-count
Reviewed, but I think this doesn't remove the BucketCrawler yet.
Removed review-needed until BucketCountingCrawlectomy is complete.
All of the BucketCountingCrawler code has been removed and tests have been added to the branch.
Reviewing.
Diara, did you review this one past comment 16. Is this still in need of a review?
I'll do another pass at the code review for this one.
I appear to have dropped the ball on this one after comment:89933. Yes, it's still in need of review.
Review of https://github.com/markberger/tahoe-lafs/tree/1836-use-leasedb-for-share-count :
Good job with this change. There are a few small things that I found.
I could not run the full test suite. It might be because this branch was made on a somewhat old version of tahoe-lafs. There are a bunch of "
exceptions.ImportError: cannot import name HTTPConnectionPool
" in the tests. If you could merge your branch with the latest trunk version, it might solve this.In
src/allmydata/web/storage.py
, it seems like there are still a few remainingBucketCountingCrawler
stuff left. For instance, inStorageStatus.render_JSON
, you are still returning bucket-counter even though it returns None for it. Is this because the UI expects it? If this is the case, the UI might need to be changed as well as the backend. Another one isStorageStatus.render_count_crawler_status
. Is this still needed for something if the crawler was removed?Reassigning to markberger to fix those issues.
remyroy: what's the output of
bin/tahoe --version-and-path
for you (on that branch)?Replying to remyroy:
I see the problem; that branch has a requirement of Twisted >= 11.0.0, but
HTTPConnectionPool
was only made public in Twisted 12.1.0. The 1819-cloud-merge branch has a requirement of Twisted >= 12.1.0 for that reason.I'm not sure if you still need the version-and-path but here it is:
I was using Twisted 11.1.
Thanks, that confirms that it was the Twisted version.
I've rebased markberger's branch on top of 1819-cloud-merge: https://github.com/tahoe-lafs/tahoe-lafs/commits/1836-use-leasedb-for-share-count
Just ran the test suite on https://github.com/tahoe-lafs/tahoe-lafs/commits/1836-use-leasedb-for-share-count and everything seems fine.
My relational algebra may be a little rusty, but can't that be simplified to:
?
Also see comments https://github.com/tahoe-lafs/tahoe-lafs/commit/50a617f7c629d316e0e5a9f63576f119ac9f8749#commitcomment-6216938 and https://github.com/tahoe-lafs/tahoe-lafs/commit/b9f1d00fadd8e859a06d5641a4acb491b50d4868#commitcomment-6216968.
Oh, I was responsible for the variation with the double
SELECT ... FROM ...
in comment:89927 . I wonder whether there was any reason for writing it that way?Related discussion: https://github.com/tahoe-lafs/tahoe-lafs/commit/006a04976eb42f56c118c34adf2ddb54c1605edb#commitcomment-6229240
Milestone renamed
renaming milestone
Moving open issues out of closed milestones.
The established line of development on the "cloud backend" branch has been abandoned. This ticket is being closed as part of a batch-ticket cleanup for "cloud backend"-related tickets.
If this is a bug, it is probably genuinely no longer relevant. The "cloud backend" branch is too large and unwieldy to ever be merged into the main line of development (particularly now that the Python 3 porting effort is significantly underway).
If this is a feature, it may be relevant to some future efforts - if they are sufficiently similar to the "cloud backend" effort - but I am still closing it because there are no immediate plans for a new development effort in such a direction.
Tickets related to the "leasedb" are included in this set because the "leasedb" code is in the "cloud backend" branch and fairly well intertwined with the "cloud backend". If there is interest in lease implementation change at some future time then that effort will essentially have to be restarted as well.