Skip to main content

Documentation Index

Fetch the complete documentation index at: https://budecosystem-b7b14df4.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Core Objects

Dataset A benchmark definition with prompts, expected behavior, metadata, and estimated token footprint. Trait A capability lens (for example reasoning, safety, domain skill) used to organize datasets and filter selection. Experiment A container for related evaluation runs so teams can compare iterations over time. Run One execution of an evaluation workflow against a selected model + dataset/trait configuration.

Concept Map

Evaluation Surfaces

Evaluations Hub

  • Search datasets by name and intent.
  • Filter by traits.
  • Inspect modality badges and metadata links.

Evaluation Detail

  • Details: scope, context, and expected behavior.
  • Leaderboard: ranked model comparison.
  • Evaluations Explorer: prompt/response and metric-level evidence.

Experiment Workspace

  • List experiments with status, tags, models, and created date.
  • Drill into run history and aggregate metrics.

Status Lifecycle

Score Interpretation

  • Compare models within the same dataset/trait context.
  • Pair aggregate scores with Explorer evidence before decisions.
  • Use repeated runs to detect variance and regressions.

Good Practices

Keep a stable baseline experiment for release comparisons.
Tag experiments consistently for filtering and auditability.
Inspect both trait-level and dataset-level views before promoting a model.
Export results for offline review when approvals are required.