Skip to main content

Content Filtering

Content filtering in Bud Stack uses probes to detect various types of harmful, sensitive, or policy-violating content. This guide covers the available detection categories across supported providers and how to create profiles that combine probes for comprehensive protection.

Detection Categories by Provider

Different providers offer different detection capabilities. Choose providers based on the categories you need.

Bud Sentinel Categories

Bud Sentinel provides detection for:
CategoryDescription
Prompt InjectionAttempts to manipulate model behavior through crafted inputs
Jailbreak AttemptsEfforts to bypass model safety guidelines
PII DetectionNames, addresses, phone numbers, SSNs, and other personal identifiers
ToxicityHarmful, offensive, or inappropriate language
Malicious InstructionsRequests for harmful actions or dangerous information
Bud Sentinel probes sync automatically every 7 days. Available probes may expand as new detection capabilities are added.

OpenAI Moderation Categories

OpenAI’s moderation models (text-moderation-latest, omni-moderation-latest) detect:
CategoryDescription
HateContent targeting groups based on protected characteristics
HarassmentThreatening or bullying content
Self-HarmContent promoting or instructing self-injury
SexualExplicit or suggestive sexual material
ViolenceGraphic violence or threats
Harassment/ThreateningHarassment with violent or threatening intent

Azure Content Safety Categories

Azure Content Safety (azure-content-safety-text) provides:
CategoryDescription
ViolenceViolent content and threats
Self-HarmSelf-injury related content
SexualAdult and explicit content
HateHate speech and discrimination

Understanding Probes

Each probe contains:
  • Name & Description: What the probe detects
  • Guard Types: Whether it applies to input, output, or both
  • Scanner Types: The detection methodology used
  • Modality Types: Content types it can analyze (text, image, etc.)
  • Rules: Specific detection patterns within the probe
  • Examples: Sample content the probe would flag

Probe Status

StatusDescription
activeProbe is available for use
disabledProbe exists but is not available
deletedProbe has been removed

Creating Profiles

Profiles combine multiple probes into a reusable guardrail configuration.

Profile Structure

A profile includes:
  • Name: Descriptive identifier for the profile
  • Description: Explanation of the profile’s purpose
  • Project (optional): Associate with a specific project
  • Severity Threshold (optional): Default threshold for all probes (0.0 - 1.0)
  • Guard Types: Which stages to apply (input, output, or both)

Adding Probes to Profiles

After creating a profile, add probes to define what content gets filtered:
  • Select Probes: Choose from available probes based on your filtering needs
  • Set Thresholds: Override the profile-level severity threshold for specific probes
  • Configure Guard Types: Override when specific probes run (input/output)

Configuring Rules

Each probe contains multiple rules. Fine-tune your profile by:
  • Enabling/Disabling Rules: Turn off specific rules within a probe
  • Rule Thresholds: Set severity thresholds at the individual rule level
  • Rule Guard Types: Override guard types for specific rules

Profile APIs

  • Create Profile (POST /guardrails/profiles) - Create a new profile
  • List Profiles (GET /guardrails/profiles) - View your profiles
  • Get Profile (GET /guardrails/profiles/{id}) - Retrieve profile details
  • Update Profile (PUT /guardrails/profiles/{id}) - Modify profile settings
  • Delete Profile (DELETE /guardrails/profiles/{id}) - Remove a profile

Profile Probe APIs

  • Add Probes (POST /guardrails/profiles/{id}/probes) - Add probes to a profile
  • Update Probe (PUT /guardrails/profiles/{id}/probes/{probe_id}) - Modify probe settings in profile
  • Get Probe Rules (GET /guardrails/profiles/{id}/probes/{probe_id}/rules) - View rules with profile overrides

Combining Providers

You can create profiles that use probes from multiple providers for layered protection. For example:
  • Use Bud Sentinel for prompt injection and jailbreak detection
  • Add OpenAI Moderation for comprehensive harmful content filtering
  • Include Azure Content Safety if you need Azure compliance integration
The gateway executes probes according to your configured execution mode (parallel or sequential) and aggregates results from all providers.

Next Steps