Skip to main content

Documentation Index

Fetch the complete documentation index at: https://budecosystem-b7b14df4.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

This guide describes how to use Dashboard as the first stop for operations triage and proactive monitoring.

Monitoring Playbook

1. Daily health triage

  • Check request trend direction.
  • Check readiness cards (running endpoints, inactive clusters).
  • Scan latency and throughput deltas.

2. Project hotspot identification

  • Use API Calls chart to locate high-volume projects.
  • Compare their latency and throughput behavior.
  • Prioritize investigation where volume and degradation overlap.

3. Model quality and efficiency checks

  • Compare top model usage with accuracy leaderboard.
  • Validate whether high-usage models remain quality-competitive.
  • Track token changes to catch rising inference cost patterns.

Incident-Oriented Workflow

When a degradation is reported:
  1. Confirm signal in Dashboard.
  2. Identify whether issue is global, project-specific, or model-specific.
  3. Drill into the corresponding module (Projects, Models, Clusters, Observability).
  4. Implement corrective action.
  5. Verify stabilization using the same dashboard window.

Escalation Heuristics

  • Escalate to platform engineering when latency and throughput degrade across multiple projects.
  • Escalate to model governance when accuracy drops on top-used models.
  • Escalate to FinOps when token output growth persists over multiple windows.
Start standups with a 3-minute dashboard snapshot.
Use one owner per red signal to avoid ambiguous accountability.
Archive weekly screenshot summaries for trend continuity.