Troubleshooting - Bud Stack Documentation

Overview

Use this guide to diagnose common problems in cluster onboarding, health monitoring, and settings management.

Troubleshooting Decision Tree

Cluster does not appear after onboarding

Possible causes

Incomplete onboarding form data.
Invalid kube configuration or provider credentials.
Backend workflow failed during registration.

Recommended checks

Re-run onboarding with validated configuration inputs.
Confirm ingress and API connectivity.
Check platform logs for onboarding workflow errors.

Cluster cannot be deleted

Possible causes

Active deployments still attached to the cluster.
Insufficient permissions to perform delete.

Recommended checks

Review Deployments tab and drain or migrate active endpoints.
Confirm cluster:manage permission.
Retry deletion after dependencies are cleared.

General tab shows degraded or missing metrics

Possible causes

Metrics pipeline latency or outage.
Node exporter/connectivity issues.

Recommended checks

Compare with Nodes tab readiness and event data.
Validate monitoring integration health.
Check if issue is cluster-local or platform-wide.

Node events show repeated scheduling failures

Possible causes

Insufficient allocatable CPU/GPU/memory.
Taints/affinity mismatch.
Storage constraints.

Recommended checks

Inspect request-vs-allocatable values on affected nodes.
Validate scheduling constraints in deployment configs.
Scale capacity or rebalance workloads.

Unable to save storage settings

Possible causes

Storage classes unavailable from cluster API.
Access mode incompatible with selected storage class.
API or permission errors.

Recommended checks

Reload settings and confirm storage class discovery works.
Select recommended access mode.
Verify user has manage permission and retry.

Escalation Data to Capture

Cluster ID and environment.
Timestamp and user action attempted.
Screenshot or export of relevant tab state.
Node event snippets and affected workloads.

Next Steps

Cluster Concepts

Revisit lifecycle and tab responsibilities

Cluster Operations Guide

Strengthen day-2 operational practices

Introduction to API IntegrationConnect applications to Bud AI Foundry deployments with OpenAI-compatible and Bud-native APIs