gossip-introducer should forget about old nodes somehow #1765
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1765
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Just a note-to-self: when #68 gets working, and decentralized
gossip-based introduction is implemented, we should make sure the
announcements are:
The idea is that a server who has left the grid permanently should
eventually be forgotten by everyone else. Gossip never forgets
(even if you forget it locally, you'll be reminded by your cohorts,
and if you don't remember what you forgot, you'll fail to forget it
again).
The simplest way to accomplish this is with a timestamp in the
announcement, and to prune entries more than maybe a month old.
(but wait a few minutes after startup to do that, so if you leave
your node offline for several months, it still has a chance to
connect to somebody and fetch fresh announcements).
We aren't usually keen on timestamps, in particular comparing time
from different nodes (in this case, the announcement's timestamp
plus one month versus the client's clock). But I think this would
be a reasonable use of clocks. As of yesterday, the announcement
record includes a timestamp, named "seqnum" (so named because I
didn't want to make any claims about it's use as a timestamp, but
merely as a mostly-monotonically increasing number, used to decide
when one announcement may replace another).
Maybe I should rename that to "when" or "announcement-time"?
The Introducer Client still needs code to refresh its announcements
periodically (once a week would be fine). Currently it only
refreshes them at node boot, and we don't want live-and-connected
nodes with good uptime to start being ignored merely because they
weren't rebooted frequently enough.
+1 for renaming "seqnum" to "announcement-time".
I would be kind of sad to make tahoe-lafs require synchronization between clocks of different computers. As far as I know, it doesn't currently do so. There isn't any way to be sure that your computer's clock is synchronized with the clock of another computer (the one you are gossiping with), except by relying on a trusted third party -- an NTP server.
Except, the above is no longer true, now that Bitcoin exists. So I retract my longstanding objection against relying on synchronized clocks, and replace it with a suggested policy that the only remote-clock-synchronization protocol that a tahoe-lafs node is allowed to rely on is the Bitcoin blockchain.
☺
P.S. Also in all seriousness I don't like the proposed design that much. Not only the part about requiring clock synchronization (and by the way in practice, clocks are often more than a month out of sync with each other, especially in some of the "different" deployment targets that people are increasingly interested in, such as embedded systems and Windows clients). I am concerned about relying on that, because our defenses against data deletion, rollback attack on mutables, and (hopefully in the future) unadd-attack on add-only-sets rely on the client connecting to a sufficient number of good servers. This seems to add another path by which accident or malice could prevent clients from connecting to good servers, which I think deserves careful risk analysis, both now and whenever we change the server-selection behavior.
But in addition to that, also the part about waiting for "a few minutes after starting up" sounds kind of fragile.
Let me try to think of a reasonable alternative to consider. What do you think of this:
We need to carefully revisit 3 when changing anything to do with server selection, but at least there is less of a path for remote attackers to manipulate this than with the remote-clock-synchronization approach.
What do you say? This sounds not much more complicated than the initial proposal, and maybe less complicated. It is certainly less complicated if you include the fact that you have to think about the clock-synchronization protocol in that one and you don't in this one. Does this proposal satisfy the same values as the initial post does -- i.e. not letting dead servers pile up indefinitely in the gossip network?
Great response!
Yeah, I'm not keen on requiring synchronized clocks either. I was
considering how we might have the recipient note the difference between
their local clock and the sender's clock (or however that'd map to the
flooded announcement scheme, where messages are being delivered by third
parties minutes or days after they were created) and using that to
correct for a static offset in future messages. But that feels fragile.
Hey, that sounds great! Let's see, the first rule prevents the
"persistent nonsense" problem, as long as any grid-control-only nodes
(i.e. what the Introducer becomes in the new gossip world) follow this
rule too. The only concern I can think of is that partial connectivity
might prevent a new client from learning about nodes that they could
normally connect to. In particular, could this interact with NAT in some
way that might produce a less-connected grid than our current central
Introducer? I don't think so, but I'd have to study it more.
The second rule is really about implementing connection throttling,
which might want to be a Foolscap feature (maybe expressed as
tub.setOption("pending-connection-limit", 10)
or similar), andthen asking for connections in a specific order (most-recently-seen
first). Seems like a good idea, but not as critical as the other two.
The third rule prevents local nonsense from sticking around forever. It
also ties into a more general "connection history" mechanism that I
think we want: something to hold historic uptime, RTT, speeds, and
overall reliability for each server we know about. This could be used to
decide how long to wait for a response from the server before declaring
it "overdue" (and switching to an alternate), and could eventually be
published and aggregated to provide some sort of collaborative
reliability-prediction metric to influence share placement or even
storage prices (servers that everyone agrees have been highly available
might command higher fees).
I like it! I'll update this ticket to reflect the new scheme.
Would you still be in favor of changing the Announcement field from
"seqnum" to "announcement-time", even if we don't plan to use it for
that purpose? The specific purpose of that field (which is inside the
signed announcement body) is to prevent replay and rollback attacks
(feeding an old announcement into some client in the hopes of changing
their behavior in some useful way).
The publishing node could indeed just use a sequence number (incremented
by one for each new message), but:
rebuilding the node after a hard drive failure, otherwise peers would
not believe new announcements until the new node's counter naturally
incremented beyond the other values.
the other information needed to rebuild a node (node.privkey,
node.pem) would be static.
I can imagine arguments against using time.time() instead of an actual
counter:
uses all significant digits of time.time(), frequently microseconds)
that might reveal time consumed during boot, which might help a timing
attack on e.g. key generation or signature generation.
period refresh to make sure the disbelief is eventually overcome
(imagine setting your clock back a day and then rebooting: you need to
have at least one announcement more than one day after reboot to catch
up)
Oh, wait, here's an idea: use a counter, remember it somewhere like
NODEDIR/private/announcement.counter, initialize it to zero upon node
creation. '''But''': listen for your own announcements too. If you hear
a valid announcement with a higher seqnum than what you're currently
publishing, increase your counter to match. (if the announcement is
different than what you're currently publishing, increase it one more..
that ought to converge).
What do you think about that? And, given your thoughts about that, what
are your new thoughts about seqnum vs announcement-time? Can you think
of any reason that we'd really like actual (possibly erroneous and/or
malicious) wallclock values in Announcements?
gossip-introducer should include timeoutsto gossip-introducer should forget about old nodes somehowReplying to warner:
+1
If you use (time of last restart, # of announcements since restart) ordered lexicographically, that would solve the first two problems. It wouldn't solve the timequake problem: if you restarted the server at a local time earlier than the local time of some previous restart, you wouldn't recover until you restarted again.
Replying to warner:
Hrm. This idea of gossip conflicts with my idea that each server should attempt to connect to all clients -- and only to clients -- and that each client should attempt to connect to all servers -- and only to servers (#344, #1086).
It would also interact somewhat poorly with #444
In fact, why do we need to switch from introducers to gossip at all? Could we finish the rest of the #466 new-introduction-protocol and related accounting infrastructure while leaving the current centralized introducer (or the #68 multiple introducers) alone?
I think this discussion needs to move to the mailing list...
moved the discussion about whether to use sequence numbers (and how to recover from quakes) to #1767. Leaving the discussion about gossip and how-to-forget here, since they aren't as time-critical as #1767 (which I want to get resolved for 1.10)
(https://tahoe-lafs.org/pipermail/tahoe-dev/2012-June/007458.html)
Replying to [zooko]comment:5:
Hm. We could set it up so that grid-control announcements flow
along all sorts of connections (instead of having nodes subscribe
to a specific "grid-control" servicename). Then servers would learn
about other servers even though they don't connect to each other,
and new clients could learn about all servers from any one server.
That might make it hard to prune uninteresting/bogus data, though
(i.e. throw out records for things you don't care about, and rely
on that mechanism to keep the overall dataset smaller).
Remember that "learning about node X" doesn't mean "connecting to
node X". The Announcements are just data, they can be transported
by anything (including some designated node that just gathers and
serves up the current announcement list on demand). No long-term
connections necessary.
Yeah, sure, that's the plan. I'm just anticipating the future.
Well, I think it'd be more robust, and would make grid setup
easier. If we can embed a default relay (hosted on tahoe-lafs.org
somewhere), then joining an existing grid could be as easy as:
And that gets you all of the following:
and nobody else ever had to set up an Introducer either.
I'm not in favor of multiple-introducers (specifically the
introducer.furls
GSoC design) because I think introducers area nuisance to set up, FURLs are a nuisance to transfer, and
multiple introducers would still be
multiple-points-of-centralization (instead of being properly
'''decentralized''' like the title of #68 suggests).
Multiple-introducers are an easier short-term target, but we've
done without them for years now, so I'd rather push forwards on a
better solution than add complication and maintenance burden for a
partial solution.
Yeah, good idea. I'll try to write up more about the
gossip/invitation scheme tonight.
Something like this is being worked on in #467.