Overview
Bud Runtime’s Observability feature provides comprehensive monitoring and analytics for AI model inferences, enabling teams to understand model performance, user interactions, and system behavior in real-time. This feature transforms raw inference data into actionable insights through an intuitive interface for viewing prompts, responses, performance metrics, and user feedback.Inference Analytics
View and analyze individual AI model inference requests with detailed breakdowns
Performance Monitoring
Track response times, token usage, costs, and system performance
User Feedback Analysis
Collect and analyze user feedback to improve model performance
Export & Integration
Export data for external analysis and integrate with existing workflows
Key Benefits
Comprehensive Visibility
- Complete Inference History: View every AI model interaction with full context
- Real-time Monitoring: Track system performance as it happens
- Cross-project Analytics: Analyze performance across multiple projects and models
Performance Optimization
- Bottleneck Identification: Quickly identify slow or expensive inferences
- Cost Management: Track and optimize AI model usage costs
- Resource Planning: Understand usage patterns for capacity planning
Quality Assurance
- Error Analysis: Debug failed inferences with complete request/response data
- User Feedback Integration: Collect ratings and feedback to improve model outputs
- A/B Testing Support: Compare performance across different models and configurations
Data-Driven Decisions
- Usage Analytics: Understand how users interact with your AI models
- Trend Analysis: Identify patterns in model performance over time
- Export Capabilities: Integrate with external analytics and reporting tools
Inference Listing
Accessing Inference Data
Navigate to any project in the Bud Runtime dashboard and select the “Inferences” tab to view all model interactions for that project.Data Table Features
The inference list displays comprehensive information in an easy-to-scan table format:| Column | Description | Interactive Features |
|---|---|---|
| Timestamp | When the inference occurred | Sortable, timezone-aware |
| Model | AI model name and provider | Click to filter by model |
| Prompt Preview | First 100 characters of the user input | Click to expand full view |
| Response Preview | First 100 characters of the AI response | Click to expand full view |
| Tokens | Input/Output/Total token counts | Sortable, hover for breakdown |
| Latency | Response time in milliseconds | Sortable, color-coded performance |
| Cost | Inference cost in USD | Sortable, cumulative totals |
| Status | Success/Failed indicator | Visual badges, click to filter |
| Actions | View, Copy, Export options | Quick action menu |
Advanced Filtering
Filter Options
Filter Options
Sorting & Pagination
- Multi-column Sorting: Click column headers to sort, shift-click for secondary sort
- Flexible Pagination: Choose page sizes (25, 50, 100 items per page)
- Deep Linking: URLs update to reflect current filters and sort order
- Performance Optimized: Server-side pagination handles large datasets efficiently
Detailed Inference View
Overview Tab
Click any inference row to open the detailed view, starting with comprehensive overview information:Request Details
- Unique inference ID
- Timestamp with timezone
- Request source IP
- User agent information
Model Information
- Model name and version
- Provider information
- Endpoint configuration
- Deployment details
Performance Summary
- Total response time
- Time to first token (TTFT)
- Processing time breakdown
- Success/failure status
Messages Tab
View the complete conversation in an intuitive chat interface:- Chat View
- Raw Content
- System Prompts: Display system instructions and context
- User Messages: Original user inputs with metadata
- Assistant Responses: Complete AI responses with formatting
- Message Timing: Individual message timestamps and token counts
Performance Tab
Comprehensive performance analytics with visual representations:Timing Metrics
Latency Breakdown
- Queue Time: Time spent waiting for processing
- Processing Time: Actual model inference time
- Network Time: Request/response transfer time
- Total Latency: End-to-end response time
Token Analysis
- Input Tokens: User prompt and system context
- Output Tokens: Generated response content
- Token Rate: Tokens generated per second
- Efficiency Metrics: Cost per token analysis
Performance Benchmarking
Compare current inference against:- Project Average: How this inference compares to project baseline
- Model Average: Performance relative to other instances of the same model
- Historical Trends: Performance over time for context
Raw Data Tab
Access complete technical details for debugging and integration:Request Data
Request Data
Feedback Tab
Analyze user feedback and quality metrics:Feedback Summary
- Average Rating: Aggregate user satisfaction score
- Feedback Count: Total number of feedback entries
- Response Rate: Percentage of inferences with feedback
- Trend Analysis: Feedback trends over time
Feedback Types
- Boolean Metrics: Thumbs up/down, helpful/not helpful
- Rating Scales: 1-5 star ratings for quality
- Text Comments: Detailed user feedback
- Demonstrations: User-provided improvements
Feedback Timeline
Track how user perception evolves:Performance Metrics
Real-time Monitoring
Response Time
- Current Average: Real-time response time tracking
- 99th Percentile: Worst-case performance monitoring
- SLA Compliance: Track against performance targets
Token Usage
- Tokens per Hour: Current usage rate
- Cost Tracking: Real-time cost accumulation
- Efficiency Trends: Token utilization patterns
Success Rate
- Success Percentage: Current success rate
- Error Categories: Breakdown of failure types
- Recovery Metrics: System resilience tracking
Performance Analytics
Trend Analysis
Track key metrics over time to identify patterns and issues:- Response Time Trends: Identify performance degradation
- Usage Patterns: Understand peak usage times
- Cost Trends: Monitor spending patterns
- Quality Metrics: Track user satisfaction over time
Comparative Analysis
Compare performance across different dimensions:- Model Comparison
- Time-based Analysis
- Project Analysis
Compare different AI models on:
- Average response time
- Token efficiency
- Cost per inference
- User satisfaction ratings
Performance Optimization
Bottleneck Identification
Automatically identify performance issues:Cost Optimization: Switch to smaller model for simple queries could reduce costs by 40%
Optimization Recommendations
Based on performance data, receive actionable recommendations:- Model Selection: Suggest optimal models for different use cases
- Parameter Tuning: Recommend temperature, max_tokens adjustments
- Caching Strategies: Identify opportunities for response caching
- Load Balancing: Optimize request distribution across endpoints
Feedback Management
Collecting User Feedback
The observability system integrates with user feedback collection:Feedback Types
Quantitative Feedback
- Boolean Metrics: Yes/No, Helpful/Not Helpful
- Rating Scales: 1-5 stars, 1-10 satisfaction
- Performance Ratings: Speed, accuracy, relevance
Qualitative Feedback
- Text Comments: Open-ended user feedback
- Improvement Suggestions: User recommendations
- Use Case Context: How the response was used
Integration Points
- API Endpoints: Direct feedback submission via API
- UI Components: Built-in feedback widgets
- Webhook Integration: Real-time feedback notifications
- Third-party Tools: Integration with customer feedback platforms
Feedback Analysis
Sentiment Analysis
Automatically analyze text feedback for sentiment and themes:- Positive Sentiment: Identify what users appreciate most
- Negative Sentiment: Understand pain points and issues
- Neutral Feedback: Collect objective observations
- Theme Extraction: Identify common topics in feedback
Quality Metrics
Track key quality indicators:Accuracy
How often responses are factually correct and relevant
Helpfulness
How useful responses are for user goals
Clarity
How easy responses are to understand
Feedback-Driven Improvements
Model Fine-tuning
Use feedback data to improve models:- Training Data Generation: Convert feedback into training examples
- Parameter Optimization: Adjust model parameters based on feedback
- Prompt Engineering: Improve system prompts using feedback insights
System Optimization
Optimize the entire inference pipeline:- Response Filtering: Remove low-quality responses before delivery
- Confidence Scoring: Add confidence indicators to responses
- Fallback Strategies: Implement better fallback options
Data Export
Export Formats
- CSV Export
- JSON Export
Perfect for spreadsheet analysis and reporting:Use Cases:
- Excel/Google Sheets analysis
- Business intelligence tools
- Custom dashboard creation
Export Options
Filtered Exports
Export respects all current filters:- Date Range: Export only data from selected time period
- Performance Filters: Export high-latency or failed inferences
- Model Filters: Export data for specific models or endpoints
- Content Filters: Export inferences matching text search
Scheduled Exports
Automate data export for regular analysis:- Daily Reports: Automated daily performance summaries
- Weekly Analytics: Comprehensive weekly analysis reports
- Monthly Insights: Monthly trends and insights reports
- Custom Schedules: Configure exports for specific needs
Integration Capabilities
Analytics Platforms
Integrate with popular analytics tools:Business Intelligence
- Tableau integration
- Power BI connectors
- Looker dashboards
- Custom BI tools
Data Warehouses
- Snowflake integration
- BigQuery exports
- Redshift compatibility
- Custom database connectors
API Integration
Use the inference API for custom integrations:API Reference
Authentication
All API endpoints require authentication via Bearer token:List Inferences Endpoint
POST/api/v1/metrics/inferences/list
Retrieve paginated inference data with filtering options.
Filter inferences by project ID
Start date in ISO 8601 format (e.g., “2024-01-01T00:00:00Z”)
End date in ISO 8601 format
Filter by success status
Minimum total token count
Maximum total token count
Maximum response time in milliseconds
Sort field: “timestamp”, “tokens”, “latency”, or “cost”
Sort order: “asc” or “desc”
Pagination offset (default: 0)
Page size, max 1000 (default: 50)
Get Inference Details Endpoint
GET/api/v1/metrics/inferences/{inference_id}
Retrieve complete details for a single inference.
Get Inference Feedback Endpoint
GET/api/v1/metrics/inferences/{inference_id}/feedback
Retrieve all feedback associated with an inference.
Security & Privacy
Data Protection
Access Control
- Project-level Isolation: Users see only their project data
- Role-based Permissions: Different access levels by user role
- API Key Management: Secure token-based authentication
Data Privacy
- Content Sanitization: Sensitive data automatically masked
- Audit Logging: All data access logged for compliance
- Data Retention: Configurable retention policies
Compliance Features
- GDPR Compliance: Right to deletion and data portability
- SOC 2 Type II: Certified security controls
- HIPAA Ready: Healthcare data protection capabilities
- Custom Compliance: Configurable for industry-specific requirements
Troubleshooting
Common Issues
Data Not Loading
Data Not Loading
Performance Issues
Performance Issues
Export Problems
Export Problems
Support Resources
- Documentation: Complete API and UI documentation
- Support Chat: In-app support for immediate assistance
- Community Forum: Community-driven troubleshooting
- Enterprise Support: Dedicated support for enterprise customers
Getting Started
Quick Start Guide
- Navigate to Project: Select your project from the dashboard
- Open Inferences Tab: Click the “Inferences” tab in project navigation
- Explore Data: Browse recent inferences to understand the interface
- Apply Filters: Use date range and other filters to focus on relevant data
- View Details: Click any inference to see detailed information
- Export Data: Use export features to analyze data externally
Best Practices
Monitoring Strategy
- Set up regular monitoring schedules
- Define performance baselines and alerts
- Track key metrics consistently
- Review trends weekly
Performance Optimization
- Use filtering to identify bottlenecks
- Analyze high-cost inferences regularly
- Monitor user feedback for quality issues
- Export data for deeper analysis
Advanced Usage
- Custom Dashboards: Create project-specific monitoring dashboards
- Automated Alerts: Set up notifications for performance issues
- Integration Workflows: Connect with existing analytics pipelines
- Team Collaboration: Share filtered views and insights with team members