> ## Documentation Index
> Fetch the complete documentation index at: https://docs.budecosystem.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Comparing Models

> Use leaderboard and experiment history for reliable model selection

## Comparison Framework

```mermaid theme={null}
flowchart LR
    A[Select Candidate Models] --> B[Run Same Dataset/Trait Scope]
    B --> C[Compare Leaderboard Scores]
    C --> D[Inspect Explorer Samples]
    D --> E[Choose Winner / Rerun]
```

## 1) Keep Comparisons Fair

* Use the same trait and dataset set for each model.
* Group runs under one experiment for traceability.
* Avoid comparing scores across unrelated datasets.

## 2) Use Leaderboard for Ranking

Leaderboard helps identify top-performing models quickly:

* Compare relative ordering.
* Look for score gaps, not only rank position.
* Re-check runs with small score differences.

## 3) Use Explorer for Qualitative Validation

After ranking, inspect sample-level outputs:

* Validate prompt understanding.
* Check response consistency.
* Confirm failures are acceptable for your use case.

## 4) Track Operational Signals

Include non-score context from run history:

* Run duration
* Completion/failure frequency
* Trait-level variance across reruns

## Decision Matrix

| Signal            | Strong Candidate Indicator           |
| ----------------- | ------------------------------------ |
| Leaderboard score | High and stable across reruns        |
| Explorer quality  | Fewer critical errors on key samples |
| Run reliability   | Completed runs with low failure rate |
| Trait coverage    | Good results across required traits  |

<Tip>
  If two models are close on score, prioritize the one with more stable outputs and lower operational risk.
</Tip>
