Add [storage].max_shares configuration option #2124
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2124
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This feature is a proposed solution to #2107 and #2123.
I discussed some weeks ago a possible feature with Daira and Warner to help me achieve #2123. The solutions looks like easy enough to make for a newbie-coder-task like me. This is a constraint of this approach, but also looks to have its merit also because of its simplicity, IMO.
The solution would entail adding a "max_shares=n" setting in the "storage" config section of storage nodes (proposed names were checked with Daira and Warner).
A node having this value set, for any given file, would gently refuse to accept shares in excess of "n" if its already stored shares already are "n".
It is easily noted that this implies adding server-side storage configuration instead of adding anything in the share placement logic done at client side, AFAIK. Servers not having this value set, would continue to accept as many shares as they did prevously.
The relevant code I learnt it's doing this is at this loop:
https://github.com/tahoe-lafs/tahoe-lafs/blob/master/src/allmydata/storage/server.py#L284
Adding here a bit of logic that checks if we already have at least "n" shares, excess shares could be simply dropped. The uploading nodes will have to find another server to place that share wherever they see fit, no changes in upload logic.
Also, as further documentation, here are some excerpts (edited for brevity) of the chat with Daira and Warner with my learnings and misundertandings about implementation details that might be useful for others:
warner:
the protocol carries a request to hold a set of shares, then the server responds with two sets: the first is the set of shares it will accept, the second is the set of shares it already has at least for immutable shares
warner:
for mutable shares, the client just says "write to these shares", and I think the server gets to respond with "ok" or an error (I'm not sure there's a good way to signal that it accepted some shares but not others)
amontero:
it's this done here?
https://github.com/tahoe-lafs/tahoe-lafs/blob/master/src/allmydata/storage/server.py#L284
looks like to me even the server throws the data away (bw.throw_out_all_data = True)
warner:
I'm not sure the server is in a great position to know the best policy, but it seems like a feasible thing to add.
the "first set" I described is returned as a dictionary of "bucket writers" so you could probably just change line 284 to say like "for shnum in sharenums[:LIMIT]:"
amontero:
I will have to go to the Py books to look for that [:LIMIT]: thingie
warner:
maybe "for shnum in list(sharenums)[:LIMIT]:" if sharenums is passed as a set instead of a list. the last colon ends the loop condition. the [:foo] syntax is how you get the first "foo" elements of a list
amontero:
I was thinking in some (surely more inefficient) "if" to "pass" or set throw_out_all_data=True
warner:
yeah, you probably don't want .throw_out_all_data: that will pretend to accept the share, but then silently discard the bytes. (we use that for unit tests)
warner:
you want the server to explicitly reject the share, so the client puts it elsewhere.
anything that doesn't add an entry to "bucketwriters" will reject the share.
like "elif len(bucketwriters) > LIMIT: pass".
with a comment saying "don't accept too many shares".
amontero:
I have now to figure how and where in the code get the "ConfigEntry=num" from file
warner: yeah, that part will probably take 90% of the code. it should be passed as an argument into the StorageServer constructor and should be read from the config file in src/allmydata/client.py in the init_storage() method
amontero:
"client.py" filename makes me think it's the wrong place to read it, I was thinking in something inside storage/source files
warner:
well, client.py is slightly misnamed. client.py is where we set up all clients and storage servers. it doesn't include the weird nodes (key-generator, stats-gatherer, etc) basically because it's so common to run a node that has both client stuff and storage-server stuff. we lumped them together (at least the high-level setup code) in a single file we may fix that some day by moving the init_storage() method out of client.py and into something else but init_storage() is definitely the right place to read storage-related config settings, and init_storage() currently lives in client.py
BTW, my newbie feature branch attempt is at https://github.com/pataquets/tahoe-lafs/compare/ticket-2124 (no config, hardcoded to 1) but it is a proof/WIP, it's pending more manual testing. Be aware that it is implemented on top of 1.10.0.
I think this is not necessary, because once #1382 and #1816 are both fixed, only leases for shares that are still needed (i.e. in some maximum matching for a file) will be renewed by an upload or repair.
In other words, I think we should implement both those tickets and then see if the perceived problem goes away without needing any additional feature or configuration. (#1832 would also help by making it feasible to switch on GC in more cases.)
I'm not sure, since I'm unable to completely understand #1382. #1816 is also very interesting at GC time, which is next for my case. Notice that I'm interested in this setup for both upload and not only repair.
However, as #2107 is evolving, it might be what I need, so I'm fine with watching it until it is closed and come back here to see if makes this unnecessary.