> ## Documentation Index
> Fetch the complete documentation index at: https://docs.budecosystem.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Performance Benchmarks

> Run benchmark experiments on your infrastructure to compare model speed, throughput, and runtime efficiency

Use the Models benchmarking workflow to measure runtime performance before promoting a model to production.

Bud AI Foundry supports guided benchmark creation, benchmark history filtering, and side-by-side comparison of runs executed on your own clusters.

<img src="https://mintcdn.com/budecosystem-b7b14df4/x2_P5cIlf4jublqF/images/image-19.png?fit=max&auto=format&n=x2_P5cIlf4jublqF&q=85&s=41175d38743c307e1db257eb94245432" alt="Image" className="hidden dark:block" width="1920" height="878" data-path="images/image-19.png" />

<img src="https://mintcdn.com/budecosystem-b7b14df4/x2_P5cIlf4jublqF/images/performance-benchmark.png?fit=max&auto=format&n=x2_P5cIlf4jublqF&q=85&s=b47ecf2daedeb2f2d4fa8f6b78bf2f85" alt="Image" className="dark:hidden" width="1920" height="877" data-path="images/performance-benchmark.png" />

## Benchmark workflow

```mermaid theme={null}
graph LR
    A[Create Benchmark] --> B[Choose Eval Mode]
    B --> C[Select Model and Cluster]
    C --> D[Configure Hardware and Requests]
    D --> E[Run Benchmark]
    E --> F[Analyze History and Results]
```

## What to measure

* **Latency**: response time characteristics for target traffic patterns.
* **Throughput**: sustained request handling capacity.
* **Duration**: total benchmark completion time.
* **Consistency**: result stability across reruns and environments.

## Demo walkthrough

1. Open **Benchmark History** from the Models area.
2. Click **Run Another Benchmark**.
3. Enter benchmark metadata (name, tags, description, concurrent requests).
4. Choose evaluation mode:
   * Dataset
   * Configuration
5. Select model, target cluster, and runtime settings.
6. Run benchmark and monitor status.
7. Review throughput, latency/TPOT, duration, and completion state.

<img src="https://mintcdn.com/budecosystem-b7b14df4/x2_P5cIlf4jublqF/images/image-19.png?fit=max&auto=format&n=x2_P5cIlf4jublqF&q=85&s=41175d38743c307e1db257eb94245432" alt="Image" width="1920" height="878" data-path="images/image-19.png" />

## Recommended process

1. Start with 2-3 candidate models for the same workload.
2. Run the same setup (dataset/config, hardware profile, concurrency) for fair comparison.
3. Track benchmark history by model and status to identify regressions.
4. Re-run critical scenarios after model, adapter, or infrastructure changes.
5. Promote only configurations that satisfy performance SLOs.

## Best practices

* Use descriptive benchmark names for easy audits.
* Tag runs by project, release, or use case for fast filtering.
* Separate exploratory runs from release-gating runs.
* Capture benchmark context (cluster type, runtime settings, request profile) with each run.

## Escalation checklist

<Check>
  Latency and throughput meet service objectives.
</Check>

<Check>
  Duration is acceptable for recurring validation cycles.
</Check>

<Check>
  No regression versus the previous approved benchmark baseline.
</Check>

<Check>
  Approval owner signs off promotion based on benchmark evidence.
</Check>

## Related docs

* [Evaluations](/models/guides/evaluations)
* [Creating Your First Model](/models/creating-first-model)
* [Troubleshooting](/models/troubleshooting)
