> ## Documentation Index > Fetch the complete documentation index at: https://docs.budecosystem.com/llms.txt > Use this file to discover all available pages before exploring further. # Observability & Inference Analytics > Comprehensive AI model inference monitoring, analytics, and performance optimization through detailed observability features. ## Overview Bud Runtime's Observability feature provides comprehensive monitoring and analytics for AI model inferences, enabling teams to understand model performance, user interactions, and system behavior in real-time. This feature transforms raw inference data into actionable insights through an intuitive interface for viewing prompts, responses, performance metrics, and user feedback. View and analyze individual AI model inference requests with detailed breakdowns Track response times, token usage, costs, and system performance Collect and analyze user feedback to improve model performance Export data for external analysis and integrate with existing workflows ## Key Benefits ### Comprehensive Visibility * **Complete Inference History**: View every AI model interaction with full context * **Real-time Monitoring**: Track system performance as it happens * **Cross-project Analytics**: Analyze performance across multiple projects and models ### Performance Optimization * **Bottleneck Identification**: Quickly identify slow or expensive inferences * **Cost Management**: Track and optimize AI model usage costs * **Resource Planning**: Understand usage patterns for capacity planning ### Quality Assurance * **Error Analysis**: Debug failed inferences with complete request/response data * **User Feedback Integration**: Collect ratings and feedback to improve model outputs * **A/B Testing Support**: Compare performance across different models and configurations ### Data-Driven Decisions * **Usage Analytics**: Understand how users interact with your AI models * **Trend Analysis**: Identify patterns in model performance over time * **Export Capabilities**: Integrate with external analytics and reporting tools ## Inference Listing ### Accessing Inference Data Navigate to any project in the Bud Runtime dashboard and select the **"Inferences"** tab to view all model interactions for that project. ``` Dashboard → Projects → [Your Project] → Inferences Tab ``` The inferences view is automatically scoped to show only data from the selected project, ensuring data privacy and relevance. Use deep links to navigate directly to filtered views: ``` /projects/[project-id]/inferences?from=2024-01-01&status=failed ``` ### Data Table Features The inference list displays comprehensive information in an easy-to-scan table format: | Column | Description | Interactive Features | | -------------------- | --------------------------------------- | --------------------------------- | | **Timestamp** | When the inference occurred | Sortable, timezone-aware | | **Model** | AI model name and provider | Click to filter by model | | **Prompt Preview** | First 100 characters of the user input | Click to expand full view | | **Response Preview** | First 100 characters of the AI response | Click to expand full view | | **Tokens** | Input/Output/Total token counts | Sortable, hover for breakdown | | **Latency** | Response time in milliseconds | Sortable, color-coded performance | | **Cost** | Inference cost in USD | Sortable, cumulative totals | | **Status** | Success/Failed indicator | Visual badges, click to filter | | **Actions** | View, Copy, Export options | Quick action menu | ### Advanced Filtering * **Date Range Picker**: Select specific time periods * **Quick Ranges**: Last hour, day, week, month * **Timezone Support**: Automatic conversion to user timezone * **Success Status**: All, Success Only, Failed Only * **Token Range**: Minimum and maximum token counts * **Latency Threshold**: Maximum response time in milliseconds * **Cost Range**: Filter by inference cost * **Model Selection**: Filter by specific AI models * **Provider Filter**: Filter by model provider (OpenAI, Anthropic, etc.) * **Endpoint Filter**: Filter by deployment endpoint * **Text Search**: Search within prompts and responses * **Cached vs Non-cached**: Filter by cache usage * **Feedback Presence**: Show only inferences with user feedback ### Sorting & Pagination * **Multi-column Sorting**: Click column headers to sort, shift-click for secondary sort * **Flexible Pagination**: Choose page sizes (25, 50, 100 items per page) * **Deep Linking**: URLs update to reflect current filters and sort order * **Performance Optimized**: Server-side pagination handles large datasets efficiently ## Detailed Inference View ### Overview Tab Click any inference row to open the detailed view, starting with comprehensive overview information: * Unique inference ID * Timestamp with timezone * Request source IP * User agent information * Model name and version * Provider information * Endpoint configuration * Deployment details * Total response time * Time to first token (TTFT) * Processing time breakdown * Success/failure status ### Messages Tab View the complete conversation in an intuitive chat interface: * **System Prompts**: Display system instructions and context * **User Messages**: Original user inputs with metadata * **Assistant Responses**: Complete AI responses with formatting * **Message Timing**: Individual message timestamps and token counts * **Structured Data**: JSON representation of message arrays * **Token Breakdown**: Per-message token usage analysis * **Content Actions**: Copy individual messages or entire conversations ### Performance Tab Comprehensive performance analytics with visual representations: #### Timing Metrics ``` Request Received ──→ Request Forwarded ──→ First Token ──→ Response Complete (0ms) (45ms) (234ms) (1,456ms) ``` * **Queue Time**: Time spent waiting for processing * **Processing Time**: Actual model inference time * **Network Time**: Request/response transfer time * **Total Latency**: End-to-end response time * **Input Tokens**: User prompt and system context * **Output Tokens**: Generated response content * **Token Rate**: Tokens generated per second * **Efficiency Metrics**: Cost per token analysis #### Performance Benchmarking Compare current inference against: * **Project Average**: How this inference compares to project baseline * **Model Average**: Performance relative to other instances of the same model * **Historical Trends**: Performance over time for context ### Raw Data Tab Access complete technical details for debugging and integration: ```json theme={null} { "model": "gpt-4", "messages": [ { "role": "system", "content": "You are a helpful AI assistant..." }, { "role": "user", "content": "What is the capital of France?" } ], "temperature": 0.7, "max_tokens": 150 } ``` ```json theme={null} { "id": "chatcmpl-8ABC123", "object": "chat.completion", "created": 1699649152, "model": "gpt-4", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "The capital of France is Paris..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 45, "completion_tokens": 28, "total_tokens": 73 } } ``` ### Feedback Tab Analyze user feedback and quality metrics: * **Average Rating**: Aggregate user satisfaction score * **Feedback Count**: Total number of feedback entries * **Response Rate**: Percentage of inferences with feedback * **Trend Analysis**: Feedback trends over time * **Boolean Metrics**: Thumbs up/down, helpful/not helpful * **Rating Scales**: 1-5 star ratings for quality * **Text Comments**: Detailed user feedback * **Demonstrations**: User-provided improvements #### Feedback Timeline Track how user perception evolves: ``` Initial Rating: ★★★★☆ (4/5) - "Good response" Follow-up: ★★★☆☆ (3/5) - "Could be more detailed" Final: ★★★★★ (5/5) - "Perfect after clarification" ``` ## Performance Metrics ### Real-time Monitoring * **Current Average**: Real-time response time tracking * **99th Percentile**: Worst-case performance monitoring * **SLA Compliance**: Track against performance targets * **Tokens per Hour**: Current usage rate * **Cost Tracking**: Real-time cost accumulation * **Efficiency Trends**: Token utilization patterns * **Success Percentage**: Current success rate * **Error Categories**: Breakdown of failure types * **Recovery Metrics**: System resilience tracking ### Performance Analytics #### Trend Analysis Track key metrics over time to identify patterns and issues: * **Response Time Trends**: Identify performance degradation * **Usage Patterns**: Understand peak usage times * **Cost Trends**: Monitor spending patterns * **Quality Metrics**: Track user satisfaction over time #### Comparative Analysis Compare performance across different dimensions: Compare different AI models on: * Average response time * Token efficiency * Cost per inference * User satisfaction ratings Analyze performance by: * Hour of day patterns * Day of week variations * Monthly trends * Seasonal patterns Cross-project insights: * Resource utilization * Performance variations * Cost distribution * Usage patterns ### Performance Optimization #### Bottleneck Identification Automatically identify performance issues: **High Latency Alert**: 15% of inferences exceeded 5s response time in the last hour **Cost Optimization**: Switch to smaller model for simple queries could reduce costs by 40% #### Optimization Recommendations Based on performance data, receive actionable recommendations: 1. **Model Selection**: Suggest optimal models for different use cases 2. **Parameter Tuning**: Recommend temperature, max\_tokens adjustments 3. **Caching Strategies**: Identify opportunities for response caching 4. **Load Balancing**: Optimize request distribution across endpoints ## Feedback Management ### Collecting User Feedback The observability system integrates with user feedback collection: #### Feedback Types * **Boolean Metrics**: Yes/No, Helpful/Not Helpful * **Rating Scales**: 1-5 stars, 1-10 satisfaction * **Performance Ratings**: Speed, accuracy, relevance * **Text Comments**: Open-ended user feedback * **Improvement Suggestions**: User recommendations * **Use Case Context**: How the response was used #### Integration Points * **API Endpoints**: Direct feedback submission via API * **UI Components**: Built-in feedback widgets * **Webhook Integration**: Real-time feedback notifications * **Third-party Tools**: Integration with customer feedback platforms ### Feedback Analysis #### Sentiment Analysis Automatically analyze text feedback for sentiment and themes: * **Positive Sentiment**: Identify what users appreciate most * **Negative Sentiment**: Understand pain points and issues * **Neutral Feedback**: Collect objective observations * **Theme Extraction**: Identify common topics in feedback #### Quality Metrics Track key quality indicators: How often responses are factually correct and relevant How useful responses are for user goals How easy responses are to understand ### Feedback-Driven Improvements #### Model Fine-tuning Use feedback data to improve models: * **Training Data Generation**: Convert feedback into training examples * **Parameter Optimization**: Adjust model parameters based on feedback * **Prompt Engineering**: Improve system prompts using feedback insights #### System Optimization Optimize the entire inference pipeline: * **Response Filtering**: Remove low-quality responses before delivery * **Confidence Scoring**: Add confidence indicators to responses * **Fallback Strategies**: Implement better fallback options ## Data Export ### Export Formats Perfect for spreadsheet analysis and reporting: ```csv theme={null} inference_id,timestamp,model_name,prompt_preview,response_preview,input_tokens,output_tokens,response_time_ms,cost,is_success 550e8400-e29b-41d4-a716-446655440001,2024-01-15T10:30:00Z,gpt-4,"What is the capital...","The capital of France...",45,73,1234,0.0045,true ``` **Use Cases:** * Excel/Google Sheets analysis * Business intelligence tools * Custom dashboard creation Structured data for programmatic processing: ```json theme={null} { "export_metadata": { "timestamp": "2024-01-15T12:00:00Z", "filters_applied": {...}, "total_records": 1523 }, "inferences": [ { "inference_id": "550e8400-e29b-41d4-a716-446655440001", "timestamp": "2024-01-15T10:30:00Z", "model": { "id": "model-789", "name": "gpt-4", "provider": "openai" }, "performance": { "input_tokens": 45, "output_tokens": 73, "response_time_ms": 1234, "cost": 0.0045 }, "content": { "messages": [...], "output": "..." } } ] } ``` **Use Cases:** * API integration * Data warehouse ingestion * Machine learning pipelines ### Export Options #### Filtered Exports Export respects all current filters: * **Date Range**: Export only data from selected time period * **Performance Filters**: Export high-latency or failed inferences * **Model Filters**: Export data for specific models or endpoints * **Content Filters**: Export inferences matching text search #### Scheduled Exports Automate data export for regular analysis: * **Daily Reports**: Automated daily performance summaries * **Weekly Analytics**: Comprehensive weekly analysis reports * **Monthly Insights**: Monthly trends and insights reports * **Custom Schedules**: Configure exports for specific needs ### Integration Capabilities #### Analytics Platforms Integrate with popular analytics tools: * Tableau integration * Power BI connectors * Looker dashboards * Custom BI tools * Snowflake integration * BigQuery exports * Redshift compatibility * Custom database connectors #### API Integration Use the inference API for custom integrations: ```python theme={null} import requests # Fetch inference data programmatically response = requests.post( 'https://api.bud.studio/api/v1/metrics/inferences/list', headers={'Authorization': 'Bearer YOUR_TOKEN'}, json={ 'project_id': 'your-project-id', 'from_date': '2024-01-01T00:00:00Z', 'limit': 1000 } ) inferences = response.json()['items'] ``` ## API Reference ### Authentication All API endpoints require authentication via Bearer token: ```bash theme={null} curl -H "Authorization: Bearer YOUR_ACCESS_TOKEN" \ https://api.bud.studio/api/v1/metrics/inferences/list ``` ### List Inferences Endpoint **POST** `/api/v1/metrics/inferences/list` Retrieve paginated inference data with filtering options. Filter inferences by project ID Start date in ISO 8601 format (e.g., "2024-01-01T00:00:00Z") End date in ISO 8601 format Filter by success status Minimum total token count Maximum total token count Maximum response time in milliseconds Sort field: "timestamp", "tokens", "latency", or "cost" Sort order: "asc" or "desc" Pagination offset (default: 0) Page size, max 1000 (default: 50) ### Get Inference Details Endpoint **GET** `/api/v1/metrics/inferences/{inference_id}` Retrieve complete details for a single inference. ### Get Inference Feedback Endpoint **GET** `/api/v1/metrics/inferences/{inference_id}/feedback` Retrieve all feedback associated with an inference. ## Security & Privacy ### Data Protection * **Project-level Isolation**: Users see only their project data * **Role-based Permissions**: Different access levels by user role * **API Key Management**: Secure token-based authentication * **Content Sanitization**: Sensitive data automatically masked * **Audit Logging**: All data access logged for compliance * **Data Retention**: Configurable retention policies ### Compliance Features * **GDPR Compliance**: Right to deletion and data portability * **SOC 2 Type II**: Certified security controls * **HIPAA Ready**: Healthcare data protection capabilities * **Custom Compliance**: Configurable for industry-specific requirements ## Troubleshooting ### Common Issues * Empty inference list * Loading spinner never stops * Error messages in browser console 1. **Check Project Selection**: Ensure correct project is selected 2. **Verify Date Range**: Confirm date filters include expected time period 3. **Authentication**: Refresh browser or re-login if session expired 4. **Network Issues**: Check network connectivity and try again * Slow page loading * Timeouts on large data requests * Browser becomes unresponsive 1. **Reduce Date Range**: Query smaller time periods 2. **Add Filters**: Use filters to reduce dataset size 3. **Pagination**: Use smaller page sizes (25-50 items) 4. **Browser Resources**: Close other tabs, check browser memory * Export files not downloading * Incomplete export data * Corrupted file formats 1. **Browser Settings**: Check popup blocker and download settings 2. **File Size Limits**: Reduce export size if too large 3. **Format Selection**: Try different export format (CSV vs JSON) 4. **Connection Stability**: Ensure stable internet during export ### Support Resources * **Documentation**: Complete API and UI documentation * **Support Chat**: In-app support for immediate assistance * **Community Forum**: Community-driven troubleshooting * **Enterprise Support**: Dedicated support for enterprise customers ## Getting Started ### Quick Start Guide 1. **Navigate to Project**: Select your project from the dashboard 2. **Open Inferences Tab**: Click the "Inferences" tab in project navigation 3. **Explore Data**: Browse recent inferences to understand the interface 4. **Apply Filters**: Use date range and other filters to focus on relevant data 5. **View Details**: Click any inference to see detailed information 6. **Export Data**: Use export features to analyze data externally ### Best Practices * Set up regular monitoring schedules * Define performance baselines and alerts * Track key metrics consistently * Review trends weekly * Use filtering to identify bottlenecks * Analyze high-cost inferences regularly * Monitor user feedback for quality issues * Export data for deeper analysis ### Advanced Usage * **Custom Dashboards**: Create project-specific monitoring dashboards * **Automated Alerts**: Set up notifications for performance issues * **Integration Workflows**: Connect with existing analytics pipelines * **Team Collaboration**: Share filtered views and insights with team members The Observability feature in Bud Runtime provides comprehensive visibility into your AI model performance, enabling data-driven optimization and ensuring high-quality user experiences across all your AI applications.