Advanced Routing with Rhai: Custom AI Scheduling Logic

💡 Premium Feature

Advanced Rhai routing is included in Premium Queen (€129 lifetime) and Queen + Worker Bundle (€279 lifetime). Free tier includes basic round-robin routing only. Pre-launch pricing available now through Q2 2026.

Why Custom Routing? The Problem with Static Load Balancing

Basic routing (round-robin, least-loaded) works for simple setups, but production AI infrastructure needs intelligence. Rhai scripting lets you implement sophisticated routing that Ollama and vLLM can't match. Here's why custom routing matters:

A/B Testing

Route 10% of traffic to new model versions

Canary Deployments

Gradually roll out new models with zero downtime

User-Based Routing

Premium users get faster GPUs, free users get CPU

Time-Based Routing

Use cheaper workers during off-peak hours

Cost Optimization

Monetize GPU capacity with intelligent routing

Worker-Based Routing

Route to specific marketplace workers (LLM, SD, etc)

rbee uses Rhai, a Rust-embedded scripting language, for custom routing logic. Unlike Ollama (no routing) or vLLM (basic load balancing), Rhai gives you complete control over how requests are distributed.

What is Rhai? A Powerful Scripting Engine for AI Orchestration

Rhai is a simple, fast scripting language designed for embedding in Rust applications. It has a JavaScript-like syntax and is sandboxed for security. rbee uses Rhai to give you complete control over how AI requests are routed across your multi-machine GPU cluster without the complexity of Kubernetes or vLLM.

Quick Rhai Example

// Rhai syntax is similar to JavaScript
let x = 10;
let y = 20;
let sum = x + y;

// Functions
fn greet(name) {
    return "Hello, " + name;
}

// Arrays and loops
let workers = [1, 2, 3, 4, 5];
for worker in workers {
    print(worker);
}

Basic Routing Script: Getting Started with Rhai

Create a routing script at ~/.config/rbee/routing.rhai. This is the foundation for all custom routing logic:

Basic Routing Function

// Basic routing function
fn route_request(request, workers) {
    // Filter available workers
    let available = workers.filter(|w| w.status == "ready");
    
    // Return the first available worker
    return available[0];
}

Request Object: Understanding AI Request Metadata

The request object contains information about the incoming AI inference request. Use this data to make intelligent routing decisions:

Request Object Structure

// Request object structure
{
    model: "llama-3.1-8b",           // Model name
    model_size: 8_000_000_000,       // Model size in parameters
    user_id: "user@example.com",     // User identifier (if provided)
    priority: "normal",              // "low", "normal", "high"
    max_tokens: 2048,                // Max output tokens
    stream: true,                    // Streaming response?
    metadata: {                      // Custom metadata
        "tier": "premium",
        "region": "eu-west"
    }
}

Worker Object: GPU Worker Capabilities and Status

The workers array contains information about available GPU workers (LLM, Stable Diffusion, etc). Each worker has metadata about its capabilities, load, and performance:

Worker Object Structure

// Worker object structure
{
    id: "worker-gaming-pc",
    status: "ready",                 // "ready", "busy", "offline"
    backend: "CUDA",                 // "CUDA", "Metal", "ROCm", "CPU"
    vram: 24,                        // VRAM in GB
    vram_used: 8,                    // Currently used VRAM
    load: 0.3,                       // Current load (0.0 - 1.0)
    models: ["llama-3.1-8b", ...],   // Available models
    performance: 120,                // Tokens/sec benchmark
    cost_per_hour: 0.50,             // Estimated cost
    metadata: {                      // Custom metadata
        "region": "us-east",
        "tier": "premium"
    }
}

Example 1: A/B Testing AI Models

Route 10% of traffic to a new model version for safe experimentation:

A/B Testing Routing

fn route_request(request, workers) {
    // Generate random number 0-99
    let random = rand() * 100;
    
    // 10% of traffic goes to new model
    let target_model = if random < 10 {
        "llama-3.2-8b"  // New version
    } else {
        "llama-3.1-8b"  // Stable version
    };
    
    // Find workers with the target model
    let available = workers.filter(|w| 
        w.status == "ready" && 
        w.models.contains(target_model)
    );
    
    // Return least loaded worker
    return available.sort_by(|a, b| a.load < b.load)[0];
}

Example 2: Canary Deployment for AI Models

Gradually increase traffic to a new model version with zero downtime:

Canary Deployment Routing

// Store canary percentage in metadata
const CANARY_PERCENT = 25;  // Start with 25%

fn route_request(request, workers) {
    let random = rand() * 100;
    
    // Canary traffic
    if random < CANARY_PERCENT {
        let canary_workers = workers.filter(|w| 
            w.metadata.get("canary") == true
        );
        if !canary_workers.is_empty() {
            return canary_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Stable traffic
    let stable_workers = workers.filter(|w| 
        w.metadata.get("canary") != true &&
        w.status == "ready"
    );
    return stable_workers.sort_by(|a, b| a.load < b.load)[0];
}

Example 3: User-Based Routing for GPU Farm Monetization

Route premium customers to faster GPUs, free users to CPU. This is how GPU operators generate revenue:

User-Based Routing

fn route_request(request, workers) {
    let user_tier = request.metadata.get("tier");
    
    // Premium users get high-performance workers
    if user_tier == "premium" {
        let premium_workers = workers.filter(|w| 
            w.backend == "CUDA" && 
            w.performance > 100 &&
            w.status == "ready"
        );
        if !premium_workers.is_empty() {
            return premium_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Free tier users get CPU or slower GPUs
    if user_tier == "free" {
        let free_workers = workers.filter(|w| 
            (w.backend == "CPU" || w.performance < 50) &&
            w.status == "ready"
        );
        if !free_workers.is_empty() {
            return free_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Default: any available worker
    return workers.filter(|w| w.status == "ready")[0];
}

Example 4: Cost Optimization with Intelligent Routing

Use cheaper CPU workers for simple queries, reserve expensive GPUs for complex tasks:

Cost Optimization Routing

fn route_request(request, workers) {
    let max_tokens = request.max_tokens;
    
    // Simple queries (< 100 tokens): use CPU
    if max_tokens < 100 {
        let cpu_workers = workers.filter(|w| 
            w.backend == "CPU" && w.status == "ready"
        );
        if !cpu_workers.is_empty() {
            return cpu_workers[0];
        }
    }
    
    // Medium queries (100-500 tokens): use any GPU
    if max_tokens < 500 {
        let gpu_workers = workers.filter(|w| 
            w.backend != "CPU" && w.status == "ready"
        );
        if !gpu_workers.is_empty() {
            // Sort by cost per hour (cheapest first)
            return gpu_workers.sort_by(|a, b| 
                a.cost_per_hour < b.cost_per_hour
            )[0];
        }
    }
    
    // Large queries: use high-performance GPU
    let fast_workers = workers.filter(|w| 
        w.performance > 100 && w.status == "ready"
    );
    return fast_workers.sort_by(|a, b| a.load < b.load)[0];
}

Example 5: Time-Based Routing for Energy Efficiency

Use cheaper CPU workers during off-peak hours, activate GPUs during peak demand:

Time-Based Routing

fn route_request(request, workers) {
    // Get current hour (0-23)
    let hour = timestamp().hour();
    
    // Off-peak hours (midnight to 6am): use CPU to save power
    if hour >= 0 && hour < 6 {
        let cpu_workers = workers.filter(|w| 
            w.backend == "CPU" && w.status == "ready"
        );
        if !cpu_workers.is_empty() {
            return cpu_workers[0];
        }
    }
    
    // Peak hours (9am to 5pm): use all available GPUs
    if hour >= 9 && hour < 17 {
        let gpu_workers = workers.filter(|w| 
            w.backend != "CPU" && w.status == "ready"
        );
        return gpu_workers.sort_by(|a, b| a.load < b.load)[0];
    }
    
    // Default: round-robin
    return workers.filter(|w| w.status == "ready")[0];
}

Example 6: Region-Based Routing for Multi-Location Deployments

Route to workers in the same region for lower latency and better user experience:

Region-Based Routing

fn route_request(request, workers) {
    let user_region = request.metadata.get("region");
    
    // Try to find worker in same region
    if user_region != () {
        let local_workers = workers.filter(|w| 
            w.metadata.get("region") == user_region &&
            w.status == "ready"
        );
        if !local_workers.is_empty() {
            return local_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Fallback to any available worker
    return workers.filter(|w| w.status == "ready")[0];
}

Example 7: Marketplace Worker Routing for Multi-Modal AI

Route requests to specific marketplace workers based on task type (LLM inference, image generation, etc):

Marketplace Worker Routing

fn route_request(request, workers) {
    let request_type = request.metadata.get("type");
    
    // Route image generation to SD workers
    if request_type == "image" {
        let sd_workers = workers.filter(|w| 
            w.id.starts_with("sd-worker-") &&
            w.status == "ready"
        );
        if !sd_workers.is_empty() {
            return sd_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Route LLM inference to LLM workers
    if request_type == "llm" {
        let llm_workers = workers.filter(|w| 
            w.id.starts_with("llm-worker-") &&
            w.status == "ready"
        );
        if !llm_workers.is_empty() {
            return llm_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Default: any available worker
    return workers.filter(|w| w.status == "ready")[0];
}

Testing Your Routing Script: Simulation and Benchmarking

Test your routing logic before deploying to production. rbee provides built-in simulation tools:

Test Routing Script

# Test routing script
rbee routing test --script ~/.config/rbee/routing.rhai

# Simulate requests
rbee routing simulate \
  --script ~/.config/rbee/routing.rhai \
  --requests 1000 \
  --report

# Output:
# Worker Distribution:
#   worker-gaming-pc: 450 requests (45%)
#   worker-mac-studio: 350 requests (35%)
#   worker-cpu: 200 requests (20%)
# Average latency: 120ms

Built-in Helper Functions for Rhai Routing Scripts

rbee provides powerful helper functions to simplify complex routing logic:

Helper Functions

// Random number (0.0 - 1.0)
let r = rand();

// Current timestamp
let now = timestamp();
let hour = now.hour();
let day = now.day();

// Hash a string (for consistent routing)
let hash = hash_str("user@example.com");
let worker_index = hash % workers.len();

// Filter and sort
let available = workers.filter(|w| w.status == "ready");
let sorted = available.sort_by(|a, b| a.load < b.load);

// Array operations
let first = workers[0];
let last = workers[-1];
let count = workers.len();
let contains = workers.contains(some_worker);

Error Handling: Graceful Fallbacks in Routing Scripts

Always handle edge cases to ensure requests never fail. Implement graceful fallbacks:

Error Handling

fn route_request(request, workers) {
    // Check if any workers are available
    if workers.is_empty() {
        throw "No workers available";
    }
    
    let available = workers.filter(|w| w.status == "ready");
    
    // Fallback if no workers are ready
    if available.is_empty() {
        // Return first worker (will queue the request)
        return workers[0];
    }
    
    // Your routing logic here
    return available.sort_by(|a, b| a.load < b.load)[0];
}

Performance Tips: Optimizing Rhai Routing for Low Latency

Keep scripts simple: Complex logic adds latency to every request
Use early returns: Exit as soon as you find a suitable worker
Prefer filter/sort: Avoid loops when possible
Test with rbee routing simulate: Benchmark before deploying to production
Monitor routing metrics: Track worker distribution and latency impact

Debugging Rhai Routing Scripts: Logging and Monitoring

Enable debug logging to see routing decisions and troubleshoot issues:

Enable Debug Logging

# In ~/.config/rbee/config.toml
[routing]
script = "~/.config/rbee/routing.rhai"
debug = true  # Log routing decisions

# View logs
tail -f ~/.config/rbee/logs/routing.log

Best Practices for Production Rhai Routing

Start simple: Begin with basic routing, add complexity as needed
Test thoroughly: Use rbee routing simulate before deploying
Monitor metrics: Track worker utilization and request latency
Have fallbacks: Always return a worker, even if conditions aren't met
Document your logic: Add comments explaining routing decisions

💡 Pro Tip: GPU Farm Monetization

Use Rhai routing to monetize your GPU farm. Route premium customers to your fastest GPUs, free users to CPU, and sell excess capacity on the marketplace. This is how GPU operators generate recurring revenue with rbee.

Why Rhai Beats the Competition

Ollama

No routing. Single machine only. Can't implement custom logic.

vLLM

Basic load balancing only. Requires Kubernetes. No custom routing.

rbee (Premium)

Full Rhai scripting. Multi-machine. Unlimited custom logic. No Kubernetes.

Cloud APIs

Fixed routing. Vendor lock-in. $1,500-3,000/month for same workload.

Ready to master production AI routing?

Advanced Rhai routing is included in Premium Queen (€129 lifetime) and Queen + Worker Bundle (€279 lifetime). Pre-launch pricing available now. After Q2 2026, pricing moves to monthly subscription.

💳 Unlock Premium Routing 📖 Routing Docs ⭐ Star on GitHub

📚 Continue Reading

→ Introducing rbee: Why We Built It → How to Set Up Multi-Machine GPU Orchestration → Running NVIDIA, Apple, and AMD GPUs Together → Cost Comparison: Self-Hosted vs Cloud

💡 Premium Feature

Why Custom Routing? The Problem with Static Load Balancing

A/B Testing

Route 10% of traffic to new model versions

Canary Deployments

Gradually roll out new models with zero downtime

User-Based Routing

Premium users get faster GPUs, free users get CPU

Time-Based Routing

Use cheaper workers during off-peak hours

Cost Optimization

Monetize GPU capacity with intelligent routing

Worker-Based Routing

Route to specific marketplace workers (LLM, SD, etc)

What is Rhai? A Powerful Scripting Engine for AI Orchestration

Quick Rhai Example

// Rhai syntax is similar to JavaScript
let x = 10;
let y = 20;
let sum = x + y;

// Functions
fn greet(name) {
    return "Hello, " + name;
}

// Arrays and loops
let workers = [1, 2, 3, 4, 5];
for worker in workers {
    print(worker);
}

Basic Routing Script: Getting Started with Rhai

Create a routing script at ~/.config/rbee/routing.rhai. This is the foundation for all custom routing logic:

Basic Routing Function

// Basic routing function
fn route_request(request, workers) {
    // Filter available workers
    let available = workers.filter(|w| w.status == "ready");
    
    // Return the first available worker
    return available[0];
}

Request Object: Understanding AI Request Metadata

The request object contains information about the incoming AI inference request. Use this data to make intelligent routing decisions:

Request Object Structure

// Request object structure
{
    model: "llama-3.1-8b",           // Model name
    model_size: 8_000_000_000,       // Model size in parameters
    user_id: "user@example.com",     // User identifier (if provided)
    priority: "normal",              // "low", "normal", "high"
    max_tokens: 2048,                // Max output tokens
    stream: true,                    // Streaming response?
    metadata: {                      // Custom metadata
        "tier": "premium",
        "region": "eu-west"
    }
}

Worker Object: GPU Worker Capabilities and Status

The workers array contains information about available GPU workers (LLM, Stable Diffusion, etc). Each worker has metadata about its capabilities, load, and performance:

Worker Object Structure

// Worker object structure
{
    id: "worker-gaming-pc",
    status: "ready",                 // "ready", "busy", "offline"
    backend: "CUDA",                 // "CUDA", "Metal", "ROCm", "CPU"
    vram: 24,                        // VRAM in GB
    vram_used: 8,                    // Currently used VRAM
    load: 0.3,                       // Current load (0.0 - 1.0)
    models: ["llama-3.1-8b", ...],   // Available models
    performance: 120,                // Tokens/sec benchmark
    cost_per_hour: 0.50,             // Estimated cost
    metadata: {                      // Custom metadata
        "region": "us-east",
        "tier": "premium"
    }
}

Example 1: A/B Testing AI Models

Route 10% of traffic to a new model version for safe experimentation:

A/B Testing Routing

fn route_request(request, workers) {
    // Generate random number 0-99
    let random = rand() * 100;
    
    // 10% of traffic goes to new model
    let target_model = if random < 10 {
        "llama-3.2-8b"  // New version
    } else {
        "llama-3.1-8b"  // Stable version
    };
    
    // Find workers with the target model
    let available = workers.filter(|w| 
        w.status == "ready" && 
        w.models.contains(target_model)
    );
    
    // Return least loaded worker
    return available.sort_by(|a, b| a.load < b.load)[0];
}

Example 2: Canary Deployment for AI Models

Gradually increase traffic to a new model version with zero downtime:

Canary Deployment Routing

// Store canary percentage in metadata
const CANARY_PERCENT = 25;  // Start with 25%

fn route_request(request, workers) {
    let random = rand() * 100;
    
    // Canary traffic
    if random < CANARY_PERCENT {
        let canary_workers = workers.filter(|w| 
            w.metadata.get("canary") == true
        );
        if !canary_workers.is_empty() {
            return canary_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Stable traffic
    let stable_workers = workers.filter(|w| 
        w.metadata.get("canary") != true &&
        w.status == "ready"
    );
    return stable_workers.sort_by(|a, b| a.load < b.load)[0];
}

Example 3: User-Based Routing for GPU Farm Monetization

Route premium customers to faster GPUs, free users to CPU. This is how GPU operators generate revenue:

User-Based Routing

fn route_request(request, workers) {
    let user_tier = request.metadata.get("tier");
    
    // Premium users get high-performance workers
    if user_tier == "premium" {
        let premium_workers = workers.filter(|w| 
            w.backend == "CUDA" && 
            w.performance > 100 &&
            w.status == "ready"
        );
        if !premium_workers.is_empty() {
            return premium_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Free tier users get CPU or slower GPUs
    if user_tier == "free" {
        let free_workers = workers.filter(|w| 
            (w.backend == "CPU" || w.performance < 50) &&
            w.status == "ready"
        );
        if !free_workers.is_empty() {
            return free_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Default: any available worker
    return workers.filter(|w| w.status == "ready")[0];
}

Example 4: Cost Optimization with Intelligent Routing

Use cheaper CPU workers for simple queries, reserve expensive GPUs for complex tasks:

Cost Optimization Routing

fn route_request(request, workers) {
    let max_tokens = request.max_tokens;
    
    // Simple queries (< 100 tokens): use CPU
    if max_tokens < 100 {
        let cpu_workers = workers.filter(|w| 
            w.backend == "CPU" && w.status == "ready"
        );
        if !cpu_workers.is_empty() {
            return cpu_workers[0];
        }
    }
    
    // Medium queries (100-500 tokens): use any GPU
    if max_tokens < 500 {
        let gpu_workers = workers.filter(|w| 
            w.backend != "CPU" && w.status == "ready"
        );
        if !gpu_workers.is_empty() {
            // Sort by cost per hour (cheapest first)
            return gpu_workers.sort_by(|a, b| 
                a.cost_per_hour < b.cost_per_hour
            )[0];
        }
    }
    
    // Large queries: use high-performance GPU
    let fast_workers = workers.filter(|w| 
        w.performance > 100 && w.status == "ready"
    );
    return fast_workers.sort_by(|a, b| a.load < b.load)[0];
}

Example 5: Time-Based Routing for Energy Efficiency

Use cheaper CPU workers during off-peak hours, activate GPUs during peak demand:

Time-Based Routing

fn route_request(request, workers) {
    // Get current hour (0-23)
    let hour = timestamp().hour();
    
    // Off-peak hours (midnight to 6am): use CPU to save power
    if hour >= 0 && hour < 6 {
        let cpu_workers = workers.filter(|w| 
            w.backend == "CPU" && w.status == "ready"
        );
        if !cpu_workers.is_empty() {
            return cpu_workers[0];
        }
    }
    
    // Peak hours (9am to 5pm): use all available GPUs
    if hour >= 9 && hour < 17 {
        let gpu_workers = workers.filter(|w| 
            w.backend != "CPU" && w.status == "ready"
        );
        return gpu_workers.sort_by(|a, b| a.load < b.load)[0];
    }
    
    // Default: round-robin
    return workers.filter(|w| w.status == "ready")[0];
}

Example 6: Region-Based Routing for Multi-Location Deployments

Route to workers in the same region for lower latency and better user experience:

Region-Based Routing

fn route_request(request, workers) {
    let user_region = request.metadata.get("region");
    
    // Try to find worker in same region
    if user_region != () {
        let local_workers = workers.filter(|w| 
            w.metadata.get("region") == user_region &&
            w.status == "ready"
        );
        if !local_workers.is_empty() {
            return local_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Fallback to any available worker
    return workers.filter(|w| w.status == "ready")[0];
}

Example 7: Marketplace Worker Routing for Multi-Modal AI

Route requests to specific marketplace workers based on task type (LLM inference, image generation, etc):

Marketplace Worker Routing

fn route_request(request, workers) {
    let request_type = request.metadata.get("type");
    
    // Route image generation to SD workers
    if request_type == "image" {
        let sd_workers = workers.filter(|w| 
            w.id.starts_with("sd-worker-") &&
            w.status == "ready"
        );
        if !sd_workers.is_empty() {
            return sd_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Route LLM inference to LLM workers
    if request_type == "llm" {
        let llm_workers = workers.filter(|w| 
            w.id.starts_with("llm-worker-") &&
            w.status == "ready"
        );
        if !llm_workers.is_empty() {
            return llm_workers.sort_by(|a, b| a.load < b.load)[0];
        }
    }
    
    // Default: any available worker
    return workers.filter(|w| w.status == "ready")[0];
}

Testing Your Routing Script: Simulation and Benchmarking

Test your routing logic before deploying to production. rbee provides built-in simulation tools:

Test Routing Script

# Test routing script
rbee routing test --script ~/.config/rbee/routing.rhai

# Simulate requests
rbee routing simulate \
  --script ~/.config/rbee/routing.rhai \
  --requests 1000 \
  --report

# Output:
# Worker Distribution:
#   worker-gaming-pc: 450 requests (45%)
#   worker-mac-studio: 350 requests (35%)
#   worker-cpu: 200 requests (20%)
# Average latency: 120ms

Built-in Helper Functions for Rhai Routing Scripts

rbee provides powerful helper functions to simplify complex routing logic:

Helper Functions

// Random number (0.0 - 1.0)
let r = rand();

// Current timestamp
let now = timestamp();
let hour = now.hour();
let day = now.day();

// Hash a string (for consistent routing)
let hash = hash_str("user@example.com");
let worker_index = hash % workers.len();

// Filter and sort
let available = workers.filter(|w| w.status == "ready");
let sorted = available.sort_by(|a, b| a.load < b.load);

// Array operations
let first = workers[0];
let last = workers[-1];
let count = workers.len();
let contains = workers.contains(some_worker);

Error Handling: Graceful Fallbacks in Routing Scripts

Always handle edge cases to ensure requests never fail. Implement graceful fallbacks:

Error Handling

fn route_request(request, workers) {
    // Check if any workers are available
    if workers.is_empty() {
        throw "No workers available";
    }
    
    let available = workers.filter(|w| w.status == "ready");
    
    // Fallback if no workers are ready
    if available.is_empty() {
        // Return first worker (will queue the request)
        return workers[0];
    }
    
    // Your routing logic here
    return available.sort_by(|a, b| a.load < b.load)[0];
}

Performance Tips: Optimizing Rhai Routing for Low Latency

Keep scripts simple: Complex logic adds latency to every request
Use early returns: Exit as soon as you find a suitable worker
Prefer filter/sort: Avoid loops when possible
Test with rbee routing simulate: Benchmark before deploying to production
Monitor routing metrics: Track worker distribution and latency impact

Debugging Rhai Routing Scripts: Logging and Monitoring

Enable debug logging to see routing decisions and troubleshoot issues:

Enable Debug Logging

# In ~/.config/rbee/config.toml
[routing]
script = "~/.config/rbee/routing.rhai"
debug = true  # Log routing decisions

# View logs
tail -f ~/.config/rbee/logs/routing.log

Best Practices for Production Rhai Routing

Start simple: Begin with basic routing, add complexity as needed
Test thoroughly: Use rbee routing simulate before deploying
Monitor metrics: Track worker utilization and request latency
Have fallbacks: Always return a worker, even if conditions aren't met
Document your logic: Add comments explaining routing decisions

💡 Pro Tip: GPU Farm Monetization

Why Rhai Beats the Competition

Ollama

No routing. Single machine only. Can't implement custom logic.

vLLM

Basic load balancing only. Requires Kubernetes. No custom routing.

rbee (Premium)

Full Rhai scripting. Multi-machine. Unlimited custom logic. No Kubernetes.

Cloud APIs

Fixed routing. Vendor lock-in. $1,500-3,000/month for same workload.

Ready to master production AI routing?

💳 Unlock Premium Routing 📖 Routing Docs ⭐ Star on GitHub

📚 Continue Reading

→ Introducing rbee: Why We Built It → How to Set Up Multi-Machine GPU Orchestration → Running NVIDIA, Apple, and AMD GPUs Together → Cost Comparison: Self-Hosted vs Cloud