Documentation Index
Fetch the complete documentation index at: https://budecosystem-b7b14df4.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Core Objects
Dataset A benchmark definition with prompts, expected behavior, metadata, and estimated token footprint. Trait A capability lens (for example reasoning, safety, domain skill) used to organize datasets and filter selection. Experiment A container for related evaluation runs so teams can compare iterations over time. Run One execution of an evaluation workflow against a selected model + dataset/trait configuration.Concept Map
Evaluation Surfaces
Evaluations Hub
- Search datasets by name and intent.
- Filter by traits.
- Inspect modality badges and metadata links.
Evaluation Detail
- Details: scope, context, and expected behavior.
- Leaderboard: ranked model comparison.
- Evaluations Explorer: prompt/response and metric-level evidence.
Experiment Workspace
- List experiments with status, tags, models, and created date.
- Drill into run history and aggregate metrics.
Status Lifecycle
Score Interpretation
- Compare models within the same dataset/trait context.
- Pair aggregate scores with Explorer evidence before decisions.
- Use repeated runs to detect variance and regressions.
Good Practices
Keep a stable baseline experiment for release comparisons.
Tag experiments consistently for filtering and auditability.
Inspect both trait-level and dataset-level views before promoting a model.
Export results for offline review when approvals are required.