lease expiration / deletion / garbage-collection #119
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#119
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I think the last Big Thing we need to develop (as opposed to implement or
fix) is a structure to both maintain the long-term health of files and also
insure their eventual deletion. I think these need to be developed together,
since they are closely related.
Leases need to expire after a while (we're thinking of one month as a good
timeout). Files that are supposed to stick around longer than this either
need to be kept alive by the original uploader or by someone to whom they've
delegated this task. If the original uploader expects to be around at least
once a month, they can do it themselves, but for a backup application we
can't impose this requirement. We refer to this task as "refreshing", and the
provider of this service is either doing it out of the kindness of their
heart (in the friend-net use case) or as part of a paid service (in the
commercial-offering use case).
The refreshing process will also perform "file checking", which is simply
counting the number of shares that are available for any given file. This
gives a rough measure of the "health" of the file. The process may also
perform "file verification" from time to time, which is downloading the
crypttext and checking its hash against the value in the URI extension block.
If either checking/verification process discovers a problem, the "file
repairer" may be triggered, which uses the remaining shares to reconstruct
the correct crypttext, then re-encodes and re-uploads any shares which have
been lost.
This series of processes all serve to improve the health of the file, at
various bandwidth/CPU costs: refreshing/checking is cheap, repair/re-upload
is expensive. The intent is to use the refreshing service to keep the file as
healthy as possible at low cost, and use the checker results to trigger more
costly repair operations as little as possible. Refreshing must take place at
least once a month to keep the leases alive. The required filecheck frequency
wil depend upon how quickly storage servers drop out of the grid: we expect
that files will undergo an exponential decay curve, so we must do checks
frequently enough to reduce the chance that the health will decay beyond
repair. The exact parameters will be tunable, of course, to pick a tradeoff
between bandwidth consumed and the chance that a file will decay too quickly
to be saved.
Files that are deleted from a vdrive need to have their shares dereferenced
in a timely fashion (I'm thinking by the end of the day for this). If the
reference count drops to zero, the share should be deleted immediately (for a
storage server on a home user's machine who wants their disk for other
purposes), or marked for deletion as soon as the storage is needed for
something else (for a dedicated commercial server with nothing better to do
with that disk space; there's a chance that someone will re-upload the file
that was just deleted, and if the share is still around then we can avoid
repeating the upload). Deleted files should also be removed from the
filechecker and repair mechanisms.
Note that files should be deleted promptly, rather than allowing their leases
to expire on their own, to reduce the storage overhead (storage consumed
beyond that required to desired files). The lease expiration mechanism is a
necessary fallback to keep storage usage from growing without bound, but
without prompt deletion, high churn rates could cause actual storage consumed
to grow larger than desired.
Finally, many of our use cases will want to enforce a utilization quota on
each user, limiting the amount of storage space they are allowed to consume.
The file-repair service may be a good place to enforce this (with a rule
saying that you can upload as much as you want, but the repair service won't
help you exceed your quota). Eventually we may want each client to have
membership credentials which would allow storage servers to measure how much
space each client is consuming: with this, a daily (or slower) process could
calculate how much global space is consumed by each client, and flag or
revoke membership for clients which use more space than they've contracted
for.
We're focussing on an imminent v0.7.0 (see the roadmap) which hopefully has [#197 #197 -- Small Distributed Mutable Files] and also a fix for [#199 #199 -- bad SHA-256]. So I'm bumping less urgent tickets to v0.7.1.
This is an important, required, feature, but it is a big feature to implement, and I don't think we are going to get it done in the next six weeks, so I'm putting it in Milestone 1.0.
we've decided to push this out past 0.9.0
this isn't a 1.1.0 thing
Here are some random notes that used to be in roadmap.txt:
We've basically split lease/gc into a separate task from checker/repairer, so I'm removing the checker/repairer aspects of this ticket. This ticket will focus on lease/gc work.
lease expiration / deletion / filechecking / quotasto lease expiration / deletion / garbage-collection / quotasI'm not sure, but I think we've tentatively agreed to focus on garbage collection separately from the notion of accounting or quotes, so I'm changing the name of this ticket.
lease expiration / deletion / garbage-collection / quotasto lease expiration / deletion / garbage-collectionI mentioned this ticket as one of the most important-to-me improvements that we could make in the Tahoe code: http://allmydata.org/pipermail/tahoe-dev/2008-September/000809.html
I recently pushed a number of changes that roughly implement this. What we have right now (and will be in 1.3.1 or whatever-comes-after-1.3.0) is:
There are lots of details about how GC currently works in source:docs/garbage-collection.txt . There are ways it can be improved (in particular by associated leases with account identifiers, to reduce the scope of the lease, to make it easier for leaseholders to safely cancel leases; also to reduce renewal traffic by switching to an expire-the-account mode instead of the current expire-the-file mode). But for moderate sized grids, the mark-and-sweep lease/GC approach ought to be sufficient.