
ctx Hub: High-availability cluster¶
Run multiple hub nodes with Raft-based leader election for redundancy. Any follower can take over if the leader dies.
This recipe assumes you've read the
ctx Hub overview and the
Multi-machine setup. HA only makes
sense in the "small trusted team" story — a personal
cross-project brain on one workstation does not need three Raft
peers.
Raft-lite
ctx uses Raft only for leader election, not for data consensus. Entry replication happens via sequence-based gRPC sync on the append-only JSONL store. This is simpler than full Raft log replication and is possible because the store is append-only and clients are idempotent. The implication: a write accepted by the leader is durable on the leader immediately; followers catch up asynchronously. If the leader crashes between accepting a write and replicating it, that write can be lost. Do not use the hub as a bank ledger.
Topology¶
A minimum HA cluster is three nodes. Two is worse than one — it doubles failure probability without providing quorum.
+-------------+
| client(s) |
+------+------+
|
+-----------+-----------+
| | |
+---v---+ +---v---+ +---v---+
| hub A | | hub B | | hub C |
| :9900 | | :9900 | | :9900 |
+-------+ +-------+ +-------+
^ ^ ^
+-----------+-----------+
Raft (leader election)
gRPC (data sync)
Step 1 — Bootstrap the first node¶
The node starts a Raft election as soon as it sees its peers.
Step 2 — Start the other nodes¶
On hub-b.lan:
On hub-c.lan:
After a few seconds, one node wins the election and becomes the leader. The other two are followers.
Step 3 — Verify cluster state¶
From any node:
Expected output:
role: leader
peers: hub-a.lan:9900 (leader)
hub-b.lan:9900 (follower, in-sync)
hub-c.lan:9900 (follower, in-sync)
entries: 1248
uptime: 3h42m
Step 4 — Register clients with failover peers¶
When registering a client, give it the full peer list:
ctx connection register hub-a.lan:9900 \
--token ctx_adm_... \
--peers hub-b.lan:9900,hub-c.lan:9900
If the leader becomes unreachable, the client reconnects to the next peer. Followers redirect to the current leader, so writes always land on the right node.
Runtime membership changes¶
Add a new peer without downtime:
Remove a decommissioned peer:
Planned maintenance¶
Before taking a leader offline, hand off leadership:
stepdown triggers a new election among the remaining followers
before the leader goes offline. In-flight clients briefly pause,
then reconnect to the new leader.
Failure modes at a glance¶
| Event | What happens |
|---|---|
| Leader crashes | New election; clients reconnect to new leader |
| Follower crashes | No write impact; catches up on restart |
| Network partition (majority) | Majority side keeps serving; minority read-only |
| Network partition (split) | No quorum; all nodes read-only |
| Disk full on leader | Writes rejected; read traffic continues |
For the full list, see Hub failure modes.
See also¶
- Multi-machine recipe — single-node deployment
- Hub operations — backup and maintenance
- Hub security model — TLS, tokens