Skip to main content

Overview

Bud Runtime’s Observability feature provides comprehensive monitoring and analytics for AI model inferences, enabling teams to understand model performance, user interactions, and system behavior in real-time. This feature transforms raw inference data into actionable insights through an intuitive interface for viewing prompts, responses, performance metrics, and user feedback.

Key Benefits

Comprehensive Visibility

  • Complete Inference History: View every AI model interaction with full context
  • Real-time Monitoring: Track system performance as it happens
  • Cross-project Analytics: Analyze performance across multiple projects and models

Performance Optimization

  • Bottleneck Identification: Quickly identify slow or expensive inferences
  • Cost Management: Track and optimize AI model usage costs
  • Resource Planning: Understand usage patterns for capacity planning

Quality Assurance

  • Error Analysis: Debug failed inferences with complete request/response data
  • User Feedback Integration: Collect ratings and feedback to improve model outputs
  • A/B Testing Support: Compare performance across different models and configurations

Data-Driven Decisions

  • Usage Analytics: Understand how users interact with your AI models
  • Trend Analysis: Identify patterns in model performance over time
  • Export Capabilities: Integrate with external analytics and reporting tools

Inference Listing

Accessing Inference Data

Navigate to any project in the Bud Runtime dashboard and select the “Inferences” tab to view all model interactions for that project.
Dashboard → Projects → [Your Project] → Inferences Tab
The inferences view is automatically scoped to show only data from the selected project, ensuring data privacy and relevance.

Data Table Features

The inference list displays comprehensive information in an easy-to-scan table format:
ColumnDescriptionInteractive Features
TimestampWhen the inference occurredSortable, timezone-aware
ModelAI model name and providerClick to filter by model
Prompt PreviewFirst 100 characters of the user inputClick to expand full view
Response PreviewFirst 100 characters of the AI responseClick to expand full view
TokensInput/Output/Total token countsSortable, hover for breakdown
LatencyResponse time in millisecondsSortable, color-coded performance
CostInference cost in USDSortable, cumulative totals
StatusSuccess/Failed indicatorVisual badges, click to filter
ActionsView, Copy, Export optionsQuick action menu

Advanced Filtering

Sorting & Pagination

  • Multi-column Sorting: Click column headers to sort, shift-click for secondary sort
  • Flexible Pagination: Choose page sizes (25, 50, 100 items per page)
  • Deep Linking: URLs update to reflect current filters and sort order
  • Performance Optimized: Server-side pagination handles large datasets efficiently

Detailed Inference View

Overview Tab

Click any inference row to open the detailed view, starting with comprehensive overview information:

Request Details

  • Unique inference ID
  • Timestamp with timezone
  • Request source IP
  • User agent information

Model Information

  • Model name and version
  • Provider information
  • Endpoint configuration
  • Deployment details

Performance Summary

  • Total response time
  • Time to first token (TTFT)
  • Processing time breakdown
  • Success/failure status

Messages Tab

View the complete conversation in an intuitive chat interface:
  • System Prompts: Display system instructions and context
  • User Messages: Original user inputs with metadata
  • Assistant Responses: Complete AI responses with formatting
  • Message Timing: Individual message timestamps and token counts

Performance Tab

Comprehensive performance analytics with visual representations:

Timing Metrics

Request Received ──→ Request Forwarded ──→ First Token ──→ Response Complete
    (0ms)              (45ms)               (234ms)         (1,456ms)

Latency Breakdown

  • Queue Time: Time spent waiting for processing
  • Processing Time: Actual model inference time
  • Network Time: Request/response transfer time
  • Total Latency: End-to-end response time

Token Analysis

  • Input Tokens: User prompt and system context
  • Output Tokens: Generated response content
  • Token Rate: Tokens generated per second
  • Efficiency Metrics: Cost per token analysis

Performance Benchmarking

Compare current inference against:
  • Project Average: How this inference compares to project baseline
  • Model Average: Performance relative to other instances of the same model
  • Historical Trends: Performance over time for context

Raw Data Tab

Access complete technical details for debugging and integration:

Feedback Tab

Analyze user feedback and quality metrics:

Feedback Summary

  • Average Rating: Aggregate user satisfaction score
  • Feedback Count: Total number of feedback entries
  • Response Rate: Percentage of inferences with feedback
  • Trend Analysis: Feedback trends over time

Feedback Types

  • Boolean Metrics: Thumbs up/down, helpful/not helpful
  • Rating Scales: 1-5 star ratings for quality
  • Text Comments: Detailed user feedback
  • Demonstrations: User-provided improvements

Feedback Timeline

Track how user perception evolves:
Initial Rating: ★★★★☆ (4/5) - "Good response"
Follow-up: ★★★☆☆ (3/5) - "Could be more detailed"
Final: ★★★★★ (5/5) - "Perfect after clarification"

Performance Metrics

Real-time Monitoring

Response Time

  • Current Average: Real-time response time tracking
  • 99th Percentile: Worst-case performance monitoring
  • SLA Compliance: Track against performance targets

Token Usage

  • Tokens per Hour: Current usage rate
  • Cost Tracking: Real-time cost accumulation
  • Efficiency Trends: Token utilization patterns

Success Rate

  • Success Percentage: Current success rate
  • Error Categories: Breakdown of failure types
  • Recovery Metrics: System resilience tracking

Performance Analytics

Trend Analysis

Track key metrics over time to identify patterns and issues:
  • Response Time Trends: Identify performance degradation
  • Usage Patterns: Understand peak usage times
  • Cost Trends: Monitor spending patterns
  • Quality Metrics: Track user satisfaction over time

Comparative Analysis

Compare performance across different dimensions:
Compare different AI models on:
  • Average response time
  • Token efficiency
  • Cost per inference
  • User satisfaction ratings

Performance Optimization

Bottleneck Identification

Automatically identify performance issues:
High Latency Alert: 15% of inferences exceeded 5s response time in the last hour
Cost Optimization: Switch to smaller model for simple queries could reduce costs by 40%

Optimization Recommendations

Based on performance data, receive actionable recommendations:
  1. Model Selection: Suggest optimal models for different use cases
  2. Parameter Tuning: Recommend temperature, max_tokens adjustments
  3. Caching Strategies: Identify opportunities for response caching
  4. Load Balancing: Optimize request distribution across endpoints

Feedback Management

Collecting User Feedback

The observability system integrates with user feedback collection:

Feedback Types

Quantitative Feedback

  • Boolean Metrics: Yes/No, Helpful/Not Helpful
  • Rating Scales: 1-5 stars, 1-10 satisfaction
  • Performance Ratings: Speed, accuracy, relevance

Qualitative Feedback

  • Text Comments: Open-ended user feedback
  • Improvement Suggestions: User recommendations
  • Use Case Context: How the response was used

Integration Points

  • API Endpoints: Direct feedback submission via API
  • UI Components: Built-in feedback widgets
  • Webhook Integration: Real-time feedback notifications
  • Third-party Tools: Integration with customer feedback platforms

Feedback Analysis

Sentiment Analysis

Automatically analyze text feedback for sentiment and themes:
  • Positive Sentiment: Identify what users appreciate most
  • Negative Sentiment: Understand pain points and issues
  • Neutral Feedback: Collect objective observations
  • Theme Extraction: Identify common topics in feedback

Quality Metrics

Track key quality indicators:

Accuracy

How often responses are factually correct and relevant

Helpfulness

How useful responses are for user goals

Clarity

How easy responses are to understand

Feedback-Driven Improvements

Model Fine-tuning

Use feedback data to improve models:
  • Training Data Generation: Convert feedback into training examples
  • Parameter Optimization: Adjust model parameters based on feedback
  • Prompt Engineering: Improve system prompts using feedback insights

System Optimization

Optimize the entire inference pipeline:
  • Response Filtering: Remove low-quality responses before delivery
  • Confidence Scoring: Add confidence indicators to responses
  • Fallback Strategies: Implement better fallback options

Data Export

Export Formats

Perfect for spreadsheet analysis and reporting:
inference_id,timestamp,model_name,prompt_preview,response_preview,input_tokens,output_tokens,response_time_ms,cost,is_success
550e8400-e29b-41d4-a716-446655440001,2024-01-15T10:30:00Z,gpt-4,"What is the capital...","The capital of France...",45,73,1234,0.0045,true
Use Cases:
  • Excel/Google Sheets analysis
  • Business intelligence tools
  • Custom dashboard creation

Export Options

Filtered Exports

Export respects all current filters:
  • Date Range: Export only data from selected time period
  • Performance Filters: Export high-latency or failed inferences
  • Model Filters: Export data for specific models or endpoints
  • Content Filters: Export inferences matching text search

Scheduled Exports

Automate data export for regular analysis:
  • Daily Reports: Automated daily performance summaries
  • Weekly Analytics: Comprehensive weekly analysis reports
  • Monthly Insights: Monthly trends and insights reports
  • Custom Schedules: Configure exports for specific needs

Integration Capabilities

Analytics Platforms

Integrate with popular analytics tools:

Business Intelligence

  • Tableau integration
  • Power BI connectors
  • Looker dashboards
  • Custom BI tools

Data Warehouses

  • Snowflake integration
  • BigQuery exports
  • Redshift compatibility
  • Custom database connectors

API Integration

Use the inference API for custom integrations:
import requests

# Fetch inference data programmatically
response = requests.post(
    'https://api.bud.studio/api/v1/metrics/inferences/list',
    headers={'Authorization': 'Bearer YOUR_TOKEN'},
    json={
        'project_id': 'your-project-id',
        'from_date': '2024-01-01T00:00:00Z',
        'limit': 1000
    }
)

inferences = response.json()['items']

API Reference

Authentication

All API endpoints require authentication via Bearer token:
curl -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
     https://api.bud.studio/api/v1/metrics/inferences/list

List Inferences Endpoint

POST /api/v1/metrics/inferences/list Retrieve paginated inference data with filtering options.
project_id
string
Filter inferences by project ID
from_date
string
required
Start date in ISO 8601 format (e.g., “2024-01-01T00:00:00Z”)
to_date
string
End date in ISO 8601 format
is_success
boolean
Filter by success status
min_tokens
number
Minimum total token count
max_tokens
number
Maximum total token count
max_latency_ms
number
Maximum response time in milliseconds
sort_by
string
Sort field: “timestamp”, “tokens”, “latency”, or “cost”
sort_order
string
Sort order: “asc” or “desc”
offset
number
Pagination offset (default: 0)
limit
number
Page size, max 1000 (default: 50)

Get Inference Details Endpoint

GET /api/v1/metrics/inferences/{inference_id} Retrieve complete details for a single inference.

Get Inference Feedback Endpoint

GET /api/v1/metrics/inferences/{inference_id}/feedback Retrieve all feedback associated with an inference.

Security & Privacy

Data Protection

Access Control

  • Project-level Isolation: Users see only their project data
  • Role-based Permissions: Different access levels by user role
  • API Key Management: Secure token-based authentication

Data Privacy

  • Content Sanitization: Sensitive data automatically masked
  • Audit Logging: All data access logged for compliance
  • Data Retention: Configurable retention policies

Compliance Features

  • GDPR Compliance: Right to deletion and data portability
  • SOC 2 Type II: Certified security controls
  • HIPAA Ready: Healthcare data protection capabilities
  • Custom Compliance: Configurable for industry-specific requirements

Troubleshooting

Common Issues

Support Resources

  • Documentation: Complete API and UI documentation
  • Support Chat: In-app support for immediate assistance
  • Community Forum: Community-driven troubleshooting
  • Enterprise Support: Dedicated support for enterprise customers

Getting Started

Quick Start Guide

  1. Navigate to Project: Select your project from the dashboard
  2. Open Inferences Tab: Click the “Inferences” tab in project navigation
  3. Explore Data: Browse recent inferences to understand the interface
  4. Apply Filters: Use date range and other filters to focus on relevant data
  5. View Details: Click any inference to see detailed information
  6. Export Data: Use export features to analyze data externally

Best Practices

Monitoring Strategy

  • Set up regular monitoring schedules
  • Define performance baselines and alerts
  • Track key metrics consistently
  • Review trends weekly

Performance Optimization

  • Use filtering to identify bottlenecks
  • Analyze high-cost inferences regularly
  • Monitor user feedback for quality issues
  • Export data for deeper analysis

Advanced Usage

  • Custom Dashboards: Create project-specific monitoring dashboards
  • Automated Alerts: Set up notifications for performance issues
  • Integration Workflows: Connect with existing analytics pipelines
  • Team Collaboration: Share filtered views and insights with team members
The Observability feature in Bud Runtime provides comprehensive visibility into your AI model performance, enabling data-driven optimization and ensuring high-quality user experiences across all your AI applications.