Observability & Inference Analytics - Bud Stack Documentation

Overview

Bud Runtime’s Observability feature provides comprehensive monitoring and analytics for AI model inferences, enabling teams to understand model performance, user interactions, and system behavior in real-time. This feature transforms raw inference data into actionable insights through an intuitive interface for viewing prompts, responses, performance metrics, and user feedback.

Inference Analytics

View and analyze individual AI model inference requests with detailed breakdowns

Performance Monitoring

Track response times, token usage, costs, and system performance

User Feedback Analysis

Collect and analyze user feedback to improve model performance

Export & Integration

Export data for external analysis and integrate with existing workflows

Key Benefits

Comprehensive Visibility

Complete Inference History: View every AI model interaction with full context
Real-time Monitoring: Track system performance as it happens
Cross-project Analytics: Analyze performance across multiple projects and models

Performance Optimization

Bottleneck Identification: Quickly identify slow or expensive inferences
Cost Management: Track and optimize AI model usage costs
Resource Planning: Understand usage patterns for capacity planning

Quality Assurance

Error Analysis: Debug failed inferences with complete request/response data
User Feedback Integration: Collect ratings and feedback to improve model outputs
A/B Testing Support: Compare performance across different models and configurations

Data-Driven Decisions

Usage Analytics: Understand how users interact with your AI models
Trend Analysis: Identify patterns in model performance over time
Export Capabilities: Integrate with external analytics and reporting tools

Inference Listing

Accessing Inference Data

Navigate to any project in the Bud Runtime dashboard and select the “Inferences” tab to view all model interactions for that project.

Project Navigation
Direct Access

Use deep links to navigate directly to filtered views:

/projects/[project-id]/inferences?from=2024-01-01&status=failed

Data Table Features

The inference list displays comprehensive information in an easy-to-scan table format:

Column	Description	Interactive Features
Timestamp	When the inference occurred	Sortable, timezone-aware
Model	AI model name and provider	Click to filter by model
Prompt Preview	First 100 characters of the user input	Click to expand full view
Response Preview	First 100 characters of the AI response	Click to expand full view
Tokens	Input/Output/Total token counts	Sortable, hover for breakdown
Latency	Response time in milliseconds	Sortable, color-coded performance
Cost	Inference cost in USD	Sortable, cumulative totals
Status	Success/Failed indicator	Visual badges, click to filter
Actions	View, Copy, Export options	Quick action menu

Advanced Filtering

Filter Options

Sorting & Pagination

Multi-column Sorting: Click column headers to sort, shift-click for secondary sort
Flexible Pagination: Choose page sizes (25, 50, 100 items per page)
Deep Linking: URLs update to reflect current filters and sort order
Performance Optimized: Server-side pagination handles large datasets efficiently

Detailed Inference View

Overview Tab

Click any inference row to open the detailed view, starting with comprehensive overview information:

Request Details

Unique inference ID
Timestamp with timezone
Request source IP
User agent information

Model Information

Model name and version
Provider information
Endpoint configuration
Deployment details

Performance Summary

Total response time
Time to first token (TTFT)
Processing time breakdown
Success/failure status

Messages Tab

View the complete conversation in an intuitive chat interface:

Chat View
Raw Content

System Prompts: Display system instructions and context
User Messages: Original user inputs with metadata
Assistant Responses: Complete AI responses with formatting
Message Timing: Individual message timestamps and token counts

Performance Tab

Comprehensive performance analytics with visual representations:

Timing Metrics

Request Received ──→ Request Forwarded ──→ First Token ──→ Response Complete
    (0ms)              (45ms)               (234ms)         (1,456ms)

Latency Breakdown

Queue Time: Time spent waiting for processing
Processing Time: Actual model inference time
Network Time: Request/response transfer time
Total Latency: End-to-end response time

Token Analysis

Input Tokens: User prompt and system context
Output Tokens: Generated response content
Token Rate: Tokens generated per second
Efficiency Metrics: Cost per token analysis

Performance Benchmarking

Compare current inference against:

Project Average: How this inference compares to project baseline
Model Average: Performance relative to other instances of the same model
Historical Trends: Performance over time for context

Raw Data Tab

Access complete technical details for debugging and integration:

Request Data

Feedback Tab

Analyze user feedback and quality metrics:

Feedback Summary

Average Rating: Aggregate user satisfaction score
Feedback Count: Total number of feedback entries
Response Rate: Percentage of inferences with feedback
Trend Analysis: Feedback trends over time

Feedback Types

Boolean Metrics: Thumbs up/down, helpful/not helpful
Rating Scales: 1-5 star ratings for quality
Text Comments: Detailed user feedback
Demonstrations: User-provided improvements

Feedback Timeline

Track how user perception evolves:

Initial Rating: ★★★★☆ (4/5) - "Good response"
Follow-up: ★★★☆☆ (3/5) - "Could be more detailed"
Final: ★★★★★ (5/5) - "Perfect after clarification"

Performance Metrics

Real-time Monitoring

Response Time

Current Average: Real-time response time tracking
99th Percentile: Worst-case performance monitoring
SLA Compliance: Track against performance targets

Token Usage

Tokens per Hour: Current usage rate
Cost Tracking: Real-time cost accumulation
Efficiency Trends: Token utilization patterns

Success Rate

Success Percentage: Current success rate
Error Categories: Breakdown of failure types
Recovery Metrics: System resilience tracking

Performance Analytics

Trend Analysis

Track key metrics over time to identify patterns and issues:

Response Time Trends: Identify performance degradation
Usage Patterns: Understand peak usage times
Cost Trends: Monitor spending patterns
Quality Metrics: Track user satisfaction over time

Comparative Analysis

Compare performance across different dimensions:

Model Comparison
Time-based Analysis
Project Analysis

Compare different AI models on:

Average response time
Token efficiency
Cost per inference
User satisfaction ratings

Performance Optimization

Bottleneck Identification

Automatically identify performance issues:

High Latency Alert: 15% of inferences exceeded 5s response time in the last hour

Cost Optimization: Switch to smaller model for simple queries could reduce costs by 40%

Optimization Recommendations

Based on performance data, receive actionable recommendations:

Model Selection: Suggest optimal models for different use cases
Parameter Tuning: Recommend temperature, max_tokens adjustments
Caching Strategies: Identify opportunities for response caching
Load Balancing: Optimize request distribution across endpoints

Feedback Management

Collecting User Feedback

The observability system integrates with user feedback collection:

Feedback Types

Quantitative Feedback

Boolean Metrics: Yes/No, Helpful/Not Helpful
Rating Scales: 1-5 stars, 1-10 satisfaction
Performance Ratings: Speed, accuracy, relevance

Qualitative Feedback

Text Comments: Open-ended user feedback
Improvement Suggestions: User recommendations
Use Case Context: How the response was used

Integration Points

API Endpoints: Direct feedback submission via API
UI Components: Built-in feedback widgets
Webhook Integration: Real-time feedback notifications
Third-party Tools: Integration with customer feedback platforms

Feedback Analysis

Sentiment Analysis

Automatically analyze text feedback for sentiment and themes:

Positive Sentiment: Identify what users appreciate most
Negative Sentiment: Understand pain points and issues
Neutral Feedback: Collect objective observations
Theme Extraction: Identify common topics in feedback

Quality Metrics

Track key quality indicators:

Accuracy

How often responses are factually correct and relevant

Helpfulness

How useful responses are for user goals

Clarity

How easy responses are to understand

Feedback-Driven Improvements

Model Fine-tuning

Use feedback data to improve models:

Training Data Generation: Convert feedback into training examples
Parameter Optimization: Adjust model parameters based on feedback
Prompt Engineering: Improve system prompts using feedback insights

System Optimization

Optimize the entire inference pipeline:

Response Filtering: Remove low-quality responses before delivery
Confidence Scoring: Add confidence indicators to responses
Fallback Strategies: Implement better fallback options

Data Export

Export Formats

CSV Export
JSON Export

Perfect for spreadsheet analysis and reporting:

inference_id,timestamp,model_name,prompt_preview,response_preview,input_tokens,output_tokens,response_time_ms,cost,is_success
550e8400-e29b-41d4-a716-446655440001,2024-01-15T10:30:00Z,gpt-4,"What is the capital...","The capital of France...",45,73,1234,0.0045,true

Use Cases:

Excel/Google Sheets analysis
Business intelligence tools
Custom dashboard creation

Structured data for programmatic processing:

{
  "export_metadata": {
    "timestamp": "2024-01-15T12:00:00Z",
    "filters_applied": {...},
    "total_records": 1523
  },
  "inferences": [
    {
      "inference_id": "550e8400-e29b-41d4-a716-446655440001",
      "timestamp": "2024-01-15T10:30:00Z",
      "model": {
        "id": "model-789",
        "name": "gpt-4",
        "provider": "openai"
      },
      "performance": {
        "input_tokens": 45,
        "output_tokens": 73,
        "response_time_ms": 1234,
        "cost": 0.0045
      },
      "content": {
        "messages": [...],
        "output": "..."
      }
    }
  ]
}

Use Cases:

API integration
Data warehouse ingestion
Machine learning pipelines

Export Options

Filtered Exports

Export respects all current filters:

Date Range: Export only data from selected time period
Performance Filters: Export high-latency or failed inferences
Model Filters: Export data for specific models or endpoints
Content Filters: Export inferences matching text search

Scheduled Exports

Automate data export for regular analysis:

Daily Reports: Automated daily performance summaries
Weekly Analytics: Comprehensive weekly analysis reports
Monthly Insights: Monthly trends and insights reports
Custom Schedules: Configure exports for specific needs

Integration Capabilities

Analytics Platforms

Integrate with popular analytics tools:

Business Intelligence

Tableau integration
Power BI connectors
Looker dashboards
Custom BI tools

Data Warehouses

Snowflake integration
BigQuery exports
Redshift compatibility
Custom database connectors

API Integration

Use the inference API for custom integrations:

import requests

# Fetch inference data programmatically
response = requests.post(
    'https://api.bud.studio/api/v1/metrics/inferences/list',
    headers={'Authorization': 'Bearer YOUR_TOKEN'},
    json={
        'project_id': 'your-project-id',
        'from_date': '2024-01-01T00:00:00Z',
        'limit': 1000
    }
)

inferences = response.json()['items']

API Reference

Authentication

All API endpoints require authentication via Bearer token:

curl -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
     https://api.bud.studio/api/v1/metrics/inferences/list

List Inferences Endpoint

POST /api/v1/metrics/inferences/list Retrieve paginated inference data with filtering options.

project_id

string

Filter inferences by project ID

from_date

string

required

Start date in ISO 8601 format (e.g., “2024-01-01T00:00:00Z”)

to_date

string

End date in ISO 8601 format

is_success

boolean

Filter by success status

min_tokens

number

Minimum total token count

max_tokens

number

Maximum total token count

max_latency_ms

number

Maximum response time in milliseconds

sort_by

string

Sort field: “timestamp”, “tokens”, “latency”, or “cost”

sort_order

string

Sort order: “asc” or “desc”

offset

number

Pagination offset (default: 0)

limit

number

Page size, max 1000 (default: 50)

Get Inference Details Endpoint

GET /api/v1/metrics/inferences/{inference_id} Retrieve complete details for a single inference.

Get Inference Feedback Endpoint

GET /api/v1/metrics/inferences/{inference_id}/feedback Retrieve all feedback associated with an inference.

Security & Privacy

Data Protection

Access Control

Project-level Isolation: Users see only their project data
Role-based Permissions: Different access levels by user role
API Key Management: Secure token-based authentication

Data Privacy

Content Sanitization: Sensitive data automatically masked
Audit Logging: All data access logged for compliance
Data Retention: Configurable retention policies

Compliance Features

GDPR Compliance: Right to deletion and data portability
SOC 2 Type II: Certified security controls
HIPAA Ready: Healthcare data protection capabilities
Custom Compliance: Configurable for industry-specific requirements

Troubleshooting

Common Issues

Data Not Loading

Performance Issues

Export Problems

Support Resources

Documentation: Complete API and UI documentation
Support Chat: In-app support for immediate assistance
Community Forum: Community-driven troubleshooting
Enterprise Support: Dedicated support for enterprise customers

Getting Started

Quick Start Guide

Navigate to Project: Select your project from the dashboard
Open Inferences Tab: Click the “Inferences” tab in project navigation
Explore Data: Browse recent inferences to understand the interface
Apply Filters: Use date range and other filters to focus on relevant data
View Details: Click any inference to see detailed information
Export Data: Use export features to analyze data externally

Best Practices

Monitoring Strategy

Set up regular monitoring schedules
Define performance baselines and alerts
Track key metrics consistently
Review trends weekly

Performance Optimization

Use filtering to identify bottlenecks
Analyze high-cost inferences regularly
Monitor user feedback for quality issues
Export data for deeper analysis

Advanced Usage

Custom Dashboards: Create project-specific monitoring dashboards
Automated Alerts: Set up notifications for performance issues
Integration Workflows: Connect with existing analytics pipelines
Team Collaboration: Share filtered views and insights with team members

The Observability feature in Bud Runtime provides comprehensive visibility into your AI model performance, enabling data-driven optimization and ensuring high-quality user experiences across all your AI applications.

Self-Hosting

Development

Features

Cluster Setup

​Overview

Inference Analytics

Performance Monitoring

User Feedback Analysis

Export & Integration

​Key Benefits

​Comprehensive Visibility

​Performance Optimization

​Quality Assurance

​Data-Driven Decisions

​Inference Listing

​Accessing Inference Data

​Data Table Features

​Advanced Filtering

​Sorting & Pagination

​Detailed Inference View

​Overview Tab

Request Details

Model Information

Performance Summary

​Messages Tab

​Performance Tab

​Timing Metrics

Latency Breakdown

Token Analysis

​Performance Benchmarking

​Raw Data Tab

​Feedback Tab

Feedback Summary

Feedback Types

​Feedback Timeline

​Performance Metrics

​Real-time Monitoring

Response Time

Token Usage

Success Rate

​Performance Analytics

​Trend Analysis

​Comparative Analysis

​Performance Optimization

​Bottleneck Identification

​Optimization Recommendations

​Feedback Management

​Collecting User Feedback

​Feedback Types

Quantitative Feedback

Qualitative Feedback

​Integration Points

​Feedback Analysis

​Sentiment Analysis

​Quality Metrics

Accuracy

Helpfulness

Clarity

​Feedback-Driven Improvements

​Model Fine-tuning

​System Optimization

​Data Export

​Export Formats

​Export Options

​Filtered Exports

​Scheduled Exports

​Integration Capabilities

​Analytics Platforms

Business Intelligence

Data Warehouses

​API Integration

​API Reference

​Authentication

​List Inferences Endpoint

​Get Inference Details Endpoint

​Get Inference Feedback Endpoint

​Security & Privacy

​Data Protection

Access Control

Data Privacy

Overview

Key Benefits

Comprehensive Visibility

Performance Optimization

Quality Assurance

Data-Driven Decisions

Inference Listing

Accessing Inference Data

Data Table Features

Advanced Filtering

Sorting & Pagination

Detailed Inference View

Overview Tab

Messages Tab

Performance Tab

Timing Metrics

Performance Benchmarking

Raw Data Tab

Feedback Tab

Feedback Timeline

Performance Metrics

Real-time Monitoring

Performance Analytics

Trend Analysis

Comparative Analysis

Performance Optimization

Bottleneck Identification

Optimization Recommendations

Feedback Management

Collecting User Feedback

Feedback Types

Integration Points

Feedback Analysis

Sentiment Analysis

Quality Metrics

Feedback-Driven Improvements

Model Fine-tuning

System Optimization

Data Export

Export Formats

Export Options

Filtered Exports

Scheduled Exports

Integration Capabilities

Analytics Platforms

API Integration

API Reference

Authentication

List Inferences Endpoint

Get Inference Details Endpoint

Get Inference Feedback Endpoint

Security & Privacy

Data Protection

Compliance Features

Troubleshooting

Common Issues

Support Resources

Getting Started

Quick Start Guide

Best Practices

Advanced Usage