Operations
Deployment, monitoring, troubleshooting, and operational procedures for unbounded-net.
This guide covers deployment, monitoring, troubleshooting, and day-2 operations. For configuration details, see Configuration.
Deployment
Prerequisites
- Kubernetes cluster (1.24+)
- WireGuard kernel module on all nodes (for encrypted tunnels), or eBPF/TC kernel support (for GENEVE/VXLAN/IPIP tunnels)
- Container runtime with CNI support
- Network connectivity between sites (UDP ports)
Verifying WireGuard Support
# Check if WireGuard module is loaded
lsmod | grep wireguard
# Load if needed
modprobe wireguard
# Verify tools
wg --version
Installation Steps
- Deploy CRDs:
kubectl apply -f deploy/machina/crd/ - Deploy Controller:
kubectl apply -f deploy/controller/ - Deploy Node Agent:
kubectl apply -f deploy/node/ - Create Sites: Define Site resources with
nodeCidrsandpodCidrAssignments. - Create GatewayPools: Define pools with
nodeSelector. - Assign Sites to Pools: Create SiteGatewayPoolAssignment resources.
- Label Gateway Nodes:
kubectl label node <name> net.unbounded-cloud.io/gateway=true - Verify Connectivity: Test with pod-to-pod ping across sites.
Note: When using Unbounded, steps 1-7 are handled automatically by
kubectl unbounded site init. See the Getting Started guide.
Monitoring
Web Dashboard
The controller provides a real-time web dashboard:
kubectl -n kube-system port-forward deploy/unbounded-net-controller 9999:9999
# Open http://localhost:9999/status
Features:
- Cluster health overview (node counts, site counts, gateway status)
- Per-site node counts and health indicators
- Node-to-node connectivity matrix (pingmesh results)
- Detailed node list with filtering, sorting, and pagination
- Tunnel peer status, gateway health, site membership
- WebSocket real-time updates with delta compression
- Dark/light theme toggle
Health Endpoints
| Component | Endpoint | Purpose |
|---|---|---|
| Controller | :9999/healthz | Liveness |
| Controller | :9999/readyz | Readiness |
| Controller | :9999/status/json | Cluster status JSON |
| Controller | :9999/status/node/<name> | Per-node status |
| Node Agent | :9998/healthz | Liveness |
| Node Agent | :9998/readyz | Readiness |
| Node Agent | :9998/status/json | Full node status JSON |
| Node Agent | :9998/metrics | Prometheus metrics |
Prometheus Metrics
All components expose metrics at /metrics:
| Component | Port | Description |
|---|---|---|
| Controller (HTTP) | 9999 | Controller, client-go, Go runtime metrics |
| Controller (TLS) | 9443 | Same metrics via webhook TLS port |
| Node Agent | 9998 | Node agent, client-go, Go runtime metrics |
All pods carry prometheus.io/* annotations for automatic discovery.
Key Custom Metrics
Controller:
reconciliation_duration_seconds/reconciliation_totalsite_nodes_total(per site)pod_cidr_allocations_total/pod_cidr_exhaustion_totalgateway_pool_nodes_total(per pool)leader_is_leader(1/0)websocket_connections
Node Agent:
wireguard_peers(per interface)routes_installed(per table)status_push_total/status_push_duration_secondscni_config_writes_total
Health Check:
peer_state(0=down, 1=up, 2=admin-down, per peer)probe_duration_seconds(per peer)probes_sent_total/probes_received_total
Viewing Tunnel Status
WireGuard mode:
wg show all
eBPF mode – verify BPF attachment:
tc filter show dev unbounded0 egress
# Expected: bpf filter with "unbounded_encap direct-action"
eBPF mode – dump BPF maps:
bpftool map list | grep unbounded
bpftool map dump name unbounded_endpo # kernel truncates name to 15 chars
Using the unroute diagnostic tool (included in node agent image):
kubectl -n kube-system exec <node-agent-pod> -- unroute # dump all
kubectl -n kube-system exec <node-agent-pod> -- unroute <ip> # lookup
kubectl -n kube-system exec <node-agent-pod> -- unroute --local # local CIDRs
Via kubectl plugin:
kubectl unbounded net node show <name> bpf # BPF entries
kubectl unbounded net node show <name> routes # routes
kubectl unbounded net node show <name> peers # peers
kubectl unbounded net node show <name> json # full status
Interface Verification
eBPF mode:
ip link show unbounded0 # Should have NOARP flag
ip link show geneve0 # Flow-based GENEVE (if active)
ip link show vxlan0 # Flow-based VXLAN (if active)
ip link show ipip0 # Shared IPIP (if active)
ip route show dev unbounded0 # Supernet routes (scope global)
WireGuard mode:
ip link show type wireguard
ip route show dev wg51820
ip route show dev wg51821
Troubleshooting
Nodes Not Getting Pod CIDRs
Symptoms: Node has no spec.podCIDRs; pods stuck in ContainerCreating.
Check:
kubectl get node <name> -L net.unbounded-cloud.io/site
kubectl get sites -o yaml
kubectl -n kube-system logs -l app=unbounded-net-controller | grep -i alloc
Common causes:
- Node internal IP doesn’t match any Site
nodeCidrs. - No matching
nodeRegexin the site’s assignments. - Assignment has
assignmentEnabled: false. - CIDR pools exhausted (controller exits fatally).
- Controller not running or not leader.
WireGuard Tunnels Not Establishing
Symptoms: No WireGuard handshakes; pod-to-pod fails.
Check:
ip link show wg51820
wg show wg51820
kubectl get node <name> -o jsonpath='{.metadata.annotations.net\.unbounded-kube\.io/wg-pubkey}'
Common causes:
- Firewall blocking UDP 51820.
- WireGuard kernel module not loaded (
modprobe wireguard). - Node not labeled with site.
Cross-Site Traffic Failing
Symptoms: Intra-site works, cross-site times out.
Check:
kubectl get gp -o yaml
kubectl get gp main-gateways -o jsonpath='{.status.nodes[*].externalIPs}'
ip route | grep <remote-site-cidr>
Common causes:
- No gateways configured or labeled.
- Gateways not reachable (firewall on external IPs).
- Health checks failing.
Gateway Health Check Failures
Symptoms: Routes to remote sites disappear.
Check:
curl -v http://<gateway-health-ip>:9998/healthz
kubectl -n kube-system logs <gateway-node-agent-pod>
Common causes:
- Gateway node agent not running.
- Health server not started.
- Network partition.
Dashboard Shows Stale Data
Symptoms: Nodes show “Stale cache” status.
Check:
kubectl -n kube-system get endpointslices -l kubernetes.io/service-name=unbounded-net-controller
kubectl -n kube-system get endpoints unbounded-net-controller 2>&1
Common causes:
- Stale
v1/Endpointsfrom a previous controller version. The controller cleans these on leader election, but during upgrades it may be needed:kubectl -n kube-system delete endpoints unbounded-net-controller
Note: The controller Service has no selector. The leader manages its own EndpointSlice. Do not add a selector.
Diagnostic Commands
# Cluster overview
kubectl get st # Sites
kubectl get gp # Gateway pools
kubectl get nodes -L net.unbounded-cloud.io/site # Node assignments
# Per-node (eBPF)
tc filter show dev unbounded0 egress # BPF program
ip route show dev unbounded0 # Supernet routes
ip link | grep -E 'unbounded0|geneve0|vxlan0|ipip0'
# Per-node (WireGuard)
wg show all
# Per-node (common)
ip route show table main | grep -E 'wg|cbr|unbounded'
cat /etc/cni/net.d/10-unbounded.conflist
# Controller
kubectl -n kube-system get lease unbounded-net-controller -o yaml
kubectl -n kube-system logs -l app=unbounded-net-controller --tail=100
# Node agent
kubectl -n kube-system logs -l app=unbounded-net-node --tail=100
Debug Logging
args:
- -v=4 # 0=errors only, 2=normal, 3=detailed, 4+=debug
Operational Procedures
Adding a New Site
- Create Site resource with
nodeCidrsandpodCidrAssignments. - Create SiteGatewayPoolAssignment to bind site to a gateway pool.
- Deploy nodes whose IPs fall within the site’s
nodeCidrs. - Label a gateway node:
kubectl label node <name> net.unbounded-cloud.io/gateway=true - Verify:
kubectl get gp <pool> -o yaml
Removing a Site
- Drain workloads from site nodes.
- Remove gateway labels.
- Delete the Site:
kubectl delete site <name> - SiteNodeSlices are automatically garbage collected.
Replacing a Gateway Node
- Label the new gateway node.
- Verify it appears in the pool:
kubectl get gp <pool> -o yaml - Wait for routes to update (~10s health check interval).
- Remove old gateway label.
- Drain old node if needed.
Expanding CIDR Pools
Edit the Site to add CIDR blocks under podCidrAssignments[].cidrBlocks.
Rolling Restart
kubectl -n kube-system rollout restart daemonset/unbounded-net-node
kubectl -n kube-system rollout status daemonset/unbounded-net-node
Backup and Recovery
What to Backup
kubectl get sites -o yaml > sites-backup.yaml
kubectl get gatewaypools -o yaml > gatewaypools-backup.yaml
kubectl get sitepeerings -o yaml > sitepeerings-backup.yaml
kubectl get sitegatewaypoolassignments -o yaml > sgpa-backup.yaml
kubectl get gatewaypoolpeerings -o yaml > gpp-backup.yaml
SiteNodeSlices and GatewayPoolNodes are automatically regenerated and don’t need backup.
Recovery
- Restore CRDs:
kubectl apply -f deploy/machina/crd/ - Restore resources from backup YAMLs.
- Deploy controller and node agent.
- Re-apply gateway labels.
Security Considerations
Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-unbounded-net
namespace: kube-system
spec:
podSelector:
matchLabels:
app: unbounded-net-node
policyTypes: [Ingress, Egress]
ingress:
- ports:
- { protocol: UDP, port: 51820 }
- { protocol: TCP, port: 9998 }
egress:
- {}
Key Rotation
# On the node:
rm /etc/wireguard/server.priv /etc/wireguard/server.pub
# Then restart the node agent pod -- briefly disrupts connectivity.
Audit Logging
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
- level: Metadata
resources:
- group: "net.unbounded-cloud.io"
resources: ["sites", "sitenodeslices", "gatewaypools",
"gatewaypoolnodes", "sitepeerings",
"sitegatewaypoolassignments", "gatewaypoolpeerings"]
Next Steps
- Architecture – System internals.
- Custom Resources – CRD specifications.
- Configuration – All flags and settings.