Skip to main content

Overview

Use this guide to diagnose common problems in cluster onboarding, health monitoring, and settings management.

Troubleshooting Decision Tree

Possible causes
  • Incomplete onboarding form data.
  • Invalid kube configuration or provider credentials.
  • Backend workflow failed during registration.
Recommended checks
  1. Re-run onboarding with validated configuration inputs.
  2. Confirm ingress and API connectivity.
  3. Check platform logs for onboarding workflow errors.
Possible causes
  • Active deployments still attached to the cluster.
  • Insufficient permissions to perform delete.
Recommended checks
  1. Review Deployments tab and drain or migrate active endpoints.
  2. Confirm cluster:manage permission.
  3. Retry deletion after dependencies are cleared.
Possible causes
  • Metrics pipeline latency or outage.
  • Node exporter/connectivity issues.
Recommended checks
  1. Compare with Nodes tab readiness and event data.
  2. Validate monitoring integration health.
  3. Check if issue is cluster-local or platform-wide.
Possible causes
  • Insufficient allocatable CPU/GPU/memory.
  • Taints/affinity mismatch.
  • Storage constraints.
Recommended checks
  1. Inspect request-vs-allocatable values on affected nodes.
  2. Validate scheduling constraints in deployment configs.
  3. Scale capacity or rebalance workloads.
Possible causes
  • Storage classes unavailable from cluster API.
  • Access mode incompatible with selected storage class.
  • API or permission errors.
Recommended checks
  1. Reload settings and confirm storage class discovery works.
  2. Select recommended access mode.
  3. Verify user has manage permission and retry.

Escalation Data to Capture

  • Cluster ID and environment.
  • Timestamp and user action attempted.
  • Screenshot or export of relevant tab state.
  • Node event snippets and affected workloads.

Next Steps

Cluster Concepts

Revisit lifecycle and tab responsibilities

Cluster Operations Guide

Strengthen day-2 operational practices