Documentation Index
Fetch the complete documentation index at: https://budecosystem-b7b14df4.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Comparison Framework
1) Keep Comparisons Fair
- Use the same trait and dataset set for each model.
- Group runs under one experiment for traceability.
- Avoid comparing scores across unrelated datasets.
2) Use Leaderboard for Ranking
Leaderboard helps identify top-performing models quickly:- Compare relative ordering.
- Look for score gaps, not only rank position.
- Re-check runs with small score differences.
3) Use Explorer for Qualitative Validation
After ranking, inspect sample-level outputs:- Validate prompt understanding.
- Check response consistency.
- Confirm failures are acceptable for your use case.
4) Track Operational Signals
Include non-score context from run history:- Run duration
- Completion/failure frequency
- Trait-level variance across reruns
Decision Matrix
| Signal | Strong Candidate Indicator |
|---|---|
| Leaderboard score | High and stable across reruns |
| Explorer quality | Fewer critical errors on key samples |
| Run reliability | Completed runs with low failure rate |
| Trait coverage | Good results across required traits |