Skip to main content

Documentation Index

Fetch the complete documentation index at: https://budecosystem-b7b14df4.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Bud AI Foundry Evaluations help teams measure model quality before rollout. You can benchmark Bud-hosted and cloud-connected models using curated datasets, trait filters, and repeatable experiment runs.

What You Can Do

Discover Benchmarks Use Evaluations Hub search and trait filters to find relevant datasets quickly. Run Structured Experiments Group one or more runs under an experiment, then compare model results over time. Inspect Results Deeply Use Details, Leaderboard and Evaluations Explorer tabs to move from summary scores to row-level evidence. Export for Governance Export evaluation outcomes as CSV for audits, reviews, and reporting.

Core Screens

ScreenPurpose
Evaluations HubBrowse datasets with modalities, traits, and metadata links
Evaluation DetailInspect details, leaderboard and sample-level explorer
ExperimentsTrack evaluation studies with status, tags, and timestamps
Experiment DetailReview run history, benchmark details, and current metrics
Image

Evaluation Lifecycle

  1. Discover a dataset in Evaluations Hub.
  2. Create or open an Experiment.
  3. Launch a Run Evaluation workflow.
  4. Monitor run status and per-trait scores.
  5. Compare outcomes and rerun with updated configuration.

Next Steps

Quick Start

Run your first evaluation in minutes

Evaluation Concepts

Learn entities, states, and score interpretation

First Evaluation

Follow a full end-to-end walkthrough