Agent Operations
Upgrading the agent binary and resetting hosts.
Agent operations manage the host-resident unbounded-agent binary. They are
handled by the agent itself.
AgentUpgrade
Replaces the host agent binary using blue-green staging with automatic rollback.
The operation requires a downloadURL parameter pointing to an agent release
tarball.
apiVersion: unbounded-cloud.io/v1alpha3
kind: MachineOperation
metadata:
name: upgrade-agent-worker-01
spec:
machineRef: worker-01
operationKind: AgentUpgrade
parameters:
downloadURL: https://example.com/releases/unbounded-agent-linux-amd64.tar.gz
kubectl apply -f upgrade-agent-worker-01.yaml
kubectl get mop upgrade-agent-worker-01 -w
NAME KIND MACHINE PHASE AGE
upgrade-agent-worker-01 AgentUpgrade worker-01 Pending 0s
upgrade-agent-worker-01 AgentUpgrade worker-01 InProgress 2s
upgrade-agent-worker-01 AgentUpgrade worker-01 Complete 15s
How It Works
The agent uses a blue-green binary layout with two slots. Only one slot is active at a time.
Staging:
- Downloads the release tarball from
downloadURL. - Extracts the agent binary into the inactive slot.
- Runs
unbounded-agent versionagainst the staged binary as a preflight check. If this fails, the operation is markedFailedand the current binary is unchanged.
Switching:
- Updates the
unbounded-agent-currentsymlink to point to the newly staged binary. - Preserves the previous binary as
unbounded-agent-last-good. - Restarts
unbounded-agent-daemon.service.
Rollback:
If the upgraded daemon fails to stay healthy after the restart,
unbounded-agent-daemon-recovery.service triggers automatically:
- Switches
unbounded-agent-currentback tounbounded-agent-last-good. - Restarts the daemon with the previous binary.
- Records a rollback failure signal.
- The recovered daemon marks the
MachineOperationasFailedwith reasonDaemonFailed.
This ensures the agent always recovers to a known-good binary, even if the new version crashes on startup.
Failure Modes
| Failure | Result |
|---|---|
Invalid or missing downloadURL | Operation marked Failed immediately. |
| Download or extraction error | Operation marked Failed. Current binary unchanged. |
Preflight (unbounded-agent version) fails | Operation marked Failed. Current binary unchanged. |
| Upgraded daemon crashes after restart | Systemd recovery rolls back to last-good binary. Operation marked Failed with reason DaemonFailed. |
Verifying the Upgrade
After a successful operation, confirm the agent version on the host:
# From the host
unbounded-agent version
# From the cluster, check the machine status
kubectl describe machine worker-01
AgentReset
Removes the agent and all managed resources from a host, restoring it to its
pre-bootstrap state. This can be triggered remotely through a MachineOperation
or locally with the unbounded-agent reset command.
Remote Reset via MachineOperation
Use this to reset agents remotely without SSH access to the host.
apiVersion: unbounded-cloud.io/v1alpha3
kind: MachineOperation
metadata:
name: reset-worker-01
spec:
machineRef: worker-01
operationKind: AgentReset
kubectl apply -f reset-worker-01.yaml
kubectl get mop reset-worker-01 -w
NAME KIND MACHINE PHASE AGE
reset-worker-01 AgentReset worker-01 Pending 0s
reset-worker-01 AgentReset worker-01 InProgress 1s
reset-worker-01 AgentReset worker-01 Complete 10s
The daemon marks the operation complete before stopping its own running unit.
Local Reset via CLI
If you have SSH or console access to the host, you can reset directly:
sudo unbounded-agent reset
This is the inverse of unbounded-agent start and performs the same cleanup as
the MachineOperation path.
What Reset Does
The reset process performs these steps in order:
- Stops the nspawn machines - gracefully stops
kube1andkube2, then force-terminates if needed. - Removes network interfaces - WireGuard (
wg*), tunnel (geneve0,vxlan0,ipip0), and overlay (unbounded0,cbr0) interfaces. - Removes WireGuard keys - cleans up
/etc/wireguard/server.privandserver.pub. - Removes nspawn configuration - deletes
.nspawnconfigs and systemd overrides for both machines. - Removes the machine rootfs - deletes
/var/lib/machines/kube1and/var/lib/machines/kube2. - Cleans up routing - removes policy routing rules and flushes routing tables.
- Removes agent binaries - deletes the agent binary and config artifacts.
- Reloads systemd - picks up all configuration changes.
The reset is idempotent and safe to run multiple times. It unconditionally
cleans up both possible nspawn machine names (kube1 and kube2) so it works
regardless of which upgrade cycle the node is in.
When to Reset
- Decommissioning - removing a node from the cluster permanently.
- Troubleshooting - starting fresh after a failed bootstrap.
- Testing or development - iterating on the bootstrap process.
After Reset
After resetting the host, delete the Kubernetes Node object from the cluster:
kubectl delete node worker-01
You may also want to reboot the host to ensure all kernel modules and network state are fully cleared:
sudo reboot