Blog

Luis Majano

April 03, 2026

Spread the word


Share your thoughts

BoxLang AI 3.0 Series Β· Part 4 of 7


Here's the question every team eventually asks about their AI agents: how do we test these things?

Agents make live LLM calls. They invoke real tools. They have non-deterministic outputs. Standard unit testing approaches fall apart. You can't mock every provider. You can't replay a conversation from three weeks ago. You can't confidently tell stakeholders that the agent you deployed today behaves the same way it did when you signed off on it.

And before testing, there's production: how do you add logging without touching provider code? How do you retry transient failures without wrapping every call? How do you block dangerous tool invocations without forking the agent logic?

BoxLang AI 3.0 solves all of this with middleware. Six battle-tested middleware classes ship out of the box, covering the most common cross-cutting concerns. And if none of them fit your use case exactly, writing your own is a matter of extending one class or defining a struct of closures.


πŸ—οΈ The Middleware Architecture

Middleware sits between the agent's run() call and the actual LLM invocations and tool calls. Both AiModel and AiAgent support it. When an agent runs, its middleware is prepended to the model's middleware β€” agent hooks fire first, then model hooks.

There are two hook styles:

Sequential hooks β€” fire in registration order (or reverse for after* hooks). Return AiMiddlewareResult to control flow.

HookFiresDirection
beforeAgentRun( context )Before agent startsForward
afterAgentRun( context )After agent completesReverse
beforeLLMCall( context )Before each LLM callForward
afterLLMCall( context )After each LLM callReverse
beforeToolCall( context )Before each tool invocationForward
afterToolCall( context )After each tool returnsReverse
onError( context )When any hook throwsβ€”

Wrap hooks β€” nested closures. Call handler() to proceed, intercept the result.

HookPurpose
wrapLLMCall( context, handler )Surround each LLM provider call
wrapToolCall( context, handler )Surround each tool invocation

🎯 AiMiddlewareResult β€” Typed Flow Control

Every sequential hook must return an AiMiddlewareResult. The static factory methods make this expressive:

import bxModules.bxai.models.middleware.AiMiddlewareResult;

// Continue normally β€” chain proceeds
return AiMiddlewareResult.continue()

// Stop everything immediately
return AiMiddlewareResult.cancel( "Rate limit exceeded for this tenant." )

// Human approved β€” used by HITL middleware
return AiMiddlewareResult.approve()

// Human rejected β€” terminal, stops the chain
return AiMiddlewareResult.reject( "Operator rejected: amounts over $1000 require VP approval." )

// Human edited the tool args β€” patched args flow to the tool
return AiMiddlewareResult.edit( { correctedArgs: { amount: 100 } } )

// Suspend for async human review β€” terminal
return AiMiddlewareResult.suspend( { toolName: "transferFunds", args: toolArgs } )

Terminal results (cancel, reject, suspend) stop the chain immediately. Non-terminal results continue to the next middleware.

// Predicates for checking results
result.isContinue()   // chain continues
result.isCancelled()  // was stopped
result.isApproved()   // human approved
result.isRejected()   // human rejected (terminal)
result.isEdit()       // args were modified
result.isSuspended()  // waiting for async input (terminal)
result.isTerminal()   // cancelled OR rejected OR suspended

πŸ“ LoggingMiddleware β€” Instant Observability

Drop this in and every LLM call, tool invocation, agent run start/end, and error gets logged to BoxLang's ai log file and optionally to the console β€” with zero code changes to your agents:

agent = aiAgent(
    name       : "support-bot",
    middleware : new LoggingMiddleware(
        logToConsole : true,
        logLevel     : "info",
        prefix       : "[SupportBot]"
    )
)

The implementation is a clean example of how sequential hooks compose:

// From LoggingMiddleware.bx
AiMiddlewareResult function beforeAgentRun( required struct context ) {
    emit( "Agent run starting | input: #left( toString( context.input ), 120 )#" )
    return AiMiddlewareResult.continue()
}

AiMiddlewareResult function afterToolCall( required struct context ) {
    var toolName = context.tool?.getName() ?: "unknown"
    emit( "Tool call complete | tool: #toolName# | result: #left( toString( context.result ), 120 )#" )
    return AiMiddlewareResult.continue()
}

AiMiddlewareResult function onError( required struct context ) {
    emit( "Error in phase '#context.phase#': #context.error?.message#", "error" )
    return AiMiddlewareResult.continue()  // don't stop the chain on logging errors
}

Options:

OptionDefaultDescription
logToFiletrueWrite to BoxLang ai log
logToConsolefalseAlso print to stdout
logLevel"info"info, debug, warning, error
prefix"[AI Middleware]"Prepended to every message

πŸ” RetryMiddleware β€” Resilience Without Boilerplate

LLM providers have rate limits. Networks have transient failures. RetryMiddleware wraps both LLM calls and tool calls with exponential backoff β€” transparently, without any code in your tools or agents:

agent = aiAgent(
    name       : "analyst",
    middleware : new RetryMiddleware(
        maxRetries        : 5,
        initialDelay      : 2000,
        backoffMultiplier : 1.5,
        maxDelay          : 30000
    )
)

It uses wrapLLMCall and wrapToolCall hooks β€” the outer wrap catches exceptions, sleeps, and retries up to maxRetries times. Non-retryable exceptions (like InvalidInput or MaxInteractionsExceeded) surface immediately:

OptionDefaultDescription
maxRetries3Attempts after first failure
initialDelay1000First retry delay in ms
backoffMultiplier2Multiplier applied per failure
maxDelay30000Hard cap on delay
nonRetryableTypes"InvalidInput,MaxInteractionsExceeded"Exception types to skip

πŸ›‘οΈ GuardrailMiddleware β€” Defense in Depth

Block dangerous tools entirely, or reject tool calls whose arguments match regex patterns β€” before they ever reach the tool:

guardrail = new GuardrailMiddleware(
    blockedTools : [ "deleteRecord", "dropTable", "truncateAll" ],
    argPatterns  : {
        runSql  : [ { query: "(?i)drop|truncate|delete" } ],
        sendMail: [ { to: "@competitor\\.com$" } ]
    }
)

agent = aiAgent( name: "db-assistant", middleware: guardrail )

The hook fires in beforeToolCall β€” it checks the tool name against blockedTools first, then validates each argument against the configured regex patterns:

// From GuardrailMiddleware.bx
AiMiddlewareResult function beforeToolCall( required struct context ) {
    var toolName = context.toolName ?: (context.tool?.getName() ?: "")

    // 1. Blocked tool list check
    if ( variables.blockedTools.findNoCase( toolName ) > 0 ) {
        return AiMiddlewareResult.reject(
            "GuardrailMiddleware: tool '#toolName#' is in the blocked tools list."
        )
    }

    // 2. Argument pattern checks
    if ( variables.argPatterns.keyExists( toolName ) ) {
        // ... check each rule against the resolved tool arguments
    }

    return AiMiddlewareResult.continue()
}
OptionDefaultDescription
blockedTools[]Tool names always rejected (case-insensitive)
argPatterns{}{ toolName: [{ paramName: "regex" }] }

πŸ™‹ HumanInTheLoopMiddleware β€” Keeping Humans in Control

This middleware intercepts specific tool calls and requires a human to approve, reject, or edit before execution proceeds. Two modes, two very different use cases.

CLI mode β€” blocks on stdin. Perfect for local scripts, automation tools, and development workflows:

agent = aiAgent(
    name       : "finance-bot",
    middleware : new HumanInTheLoopMiddleware(
        toolsRequiringApproval : [ "transferFunds", "placeOrder" ],
        showArguments          : true
    )
)

When the LLM calls transferFunds, the terminal shows:

╔══════════════════════════════════════════════════╗
β•‘         HUMAN APPROVAL REQUIRED                  β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•
 Tool: transferFunds
 Args: {"amount": 5000, "account": "12345"}

 [A]pprove  [R]eject  [Q]uit
 Decision:

Web mode β€” suspends the run and returns an AiMiddlewareResult.suspend(). The calling code checkpoints state and presents the approval request asynchronously β€” via email, Slack, a web UI, whatever fits your workflow:

agent = aiAgent(
    name        : "finance-bot",
    middleware  : new HumanInTheLoopMiddleware(
        mode                  : "web",
        toolsRequiringApproval: [ "placeOrder" ]
    ),
    checkpointer: aiMemory( "cache" )
)

// First request β€” the LLM wants to place an order
result = agent.run( "Order 50 units of product SKU-789", {}, { userId: "alice" } )

if ( result.isSuspended() ) {
    // Send approval request to alice's manager via Slack, email, etc.
    notifyManager( result.getData() )
    // Store threadId for resume
    session.pendingApproval = result.getData().threadId
}

// After the manager approves (in a separate request/thread)
agent.resume( "approve", session.pendingApproval )

// Or if they edit the quantity
agent.resume( "edit", session.pendingApproval, { correctedArgs: { quantity: 10 } } )

The resume path in HumanInTheLoopMiddleware reads the _resumeContext injected by AiAgent.resume(), honours the decision, and either continues, rejects, or patches the tool arguments β€” then clears the context so subsequent tool calls in the same run go through normal HITL flow again.

OptionDefaultDescription
toolsRequiringApproval[]Tools needing sign-off
mode"cli""cli" or "web"
showArgumentstrueShow args in CLI prompt
approvalCallbackβ€”Custom approval function

πŸŽ™οΈ FlightRecorderMiddleware β€” AI Testing Solved

This is the one that changes how you think about testing AI agents.

The problem: agent behaviour is non-deterministic. The LLM might phrase something differently each run. The tool call order might vary. Writing assertions against agent output directly is fragile. And running tests against live providers is slow, expensive, and requires network access in CI.

FlightRecorderMiddleware solves this with a record/replay approach. Record a real run once β€” capturing every LLM round-trip and tool invocation to a JSON fixture file. Then replay that fixture in CI without any live calls.

Three modes:

// RECORD β€” calls real providers and tools, saves every interaction
agent = aiAgent(
    name       : "weather-bot",
    middleware : new FlightRecorderMiddleware( mode: "record" )
)
agent.run( "What's the weather in London and should I bring an umbrella?" )
// β†’ Writes: .ai/flight-recorder/weather-bot-20260402-143022.json
// REPLAY β€” zero live calls, fully deterministic
agent = aiAgent(
    name       : "weather-bot",
    middleware : new FlightRecorderMiddleware(
        mode        : "replay",
        fixturePath : "tests/fixtures/weather-bot.json"
    )
)
agent.run( "What's the weather in London and should I bring an umbrella?" )
// β†’ Returns the exact same response as the recorded run
// PASSTHROUGH (default) β€” no recording, calls pass through normally
agent = aiAgent(
    name       : "weather-bot",
    middleware : new FlightRecorderMiddleware()  // mode: "passthrough"
)

The fixture format β€” human-readable JSON that you can inspect, edit, and commit to version control:

{
    "version": "1",
    "recordedAt": "2026-04-02T14:30: 22",
    "agentName": "weather-bot",
    "interactions": [
        {
            "seq": 1,
            "type": "llm",
            "request": { "model": "gpt-4o", "messages": [...], "tools": [...] },
            "response": { "choices": [{ "message": { "tool_calls": [...] } }] }
        },
        {
            "seq": 2,
            "type": "tool",
            "toolName": "getWeather",
            "arguments": { "city": "London" },
            "result": "15Β°C, overcast, 80% chance of rain"
        },
        {
            "seq": 3,
            "type": "llm",
            "request": { ... },
            "response": { "choices": [{ "message": { "content": "Yes, bring an umbrella..." } }] }
        }
    ]
}

One implementation detail worth noting: the recorder flushes to disk after every interaction, not just at the end. This means if your agent crashes mid-run, the partial recording is preserved and can be inspected:

// From FlightRecorderMiddleware.bx
private void function _appendInteraction( required struct interaction ) {
    var seq = variables._tape.interactions.len() + 1
    arguments.interaction.seq = seq
    variables._tape.interactions.append( arguments.interaction )
    _saveSnapshot()  // flush after every interaction β€” crash-safe
}

Strict vs lenient replay:

// Strict (default): throw on type mismatch β€” "expecting llm but tape has tool"
new FlightRecorderMiddleware( mode: "replay", strict: true )

// Lenient: skip forward to find next matching interaction type
new FlightRecorderMiddleware( mode: "replay", strict: false )
OptionDefaultDescription
mode"passthrough""passthrough", "record", or "replay"
fixturePath""Path to fixture file
fixtureDir".ai/flight-recorder"Auto-generated fixture directory
recordToolstrueWhether to capture tool interactions
stricttrueThrow on type mismatch in replay

πŸ”’ MaxToolCallsMiddleware β€” Runaway Agent Prevention

Simple but essential in production β€” caps the total number of tool invocations per agent run:

agent = aiAgent(
    name       : "research-bot",
    middleware : new MaxToolCallsMiddleware( maxCalls: 10 )
)

The counter resets at the start of each new run() call. If the cap is hit mid-run, the chain is cancelled with a clear error message. Essential for preventing infinite tool call loops in complex multi-step reasoning tasks.


✍️ Writing Your Own Middleware

Two approaches, depending on how much structure you want.

Struct of closures β€” lightweight, no class needed:

agent.withMiddleware( {
    beforeToolCall: ( ctx ) => {
        if ( ctx.tool?.getName() == "dangerousTool" ) {
            return AiMiddlewareResult.cancel( "This tool is not allowed." )
        }
        return AiMiddlewareResult.continue()
    },

    wrapLLMCall: ( ctx, handler ) => {
        var start  = getTickCount()
        var result = handler()
        metricsService.record( "llm.latency", getTickCount() - start )
        return result
    },

    onError: ( ctx ) => {
        alertService.notify( "Agent error in #ctx.phase#: #ctx.error.message#" )
        return AiMiddlewareResult.continue()
    }
} )

Structs are automatically wrapped in StructMiddlewareAdapter β€” you only define the hooks you need.

Class-based β€” reusable, configurable, independently testable:

import bxModules.bxai.models.middleware.BaseAiMiddleware;
import bxModules.bxai.models.middleware.AiMiddlewareResult;

class extends="BaseAiMiddleware" {

    property name="tenantId" type="string";

    function init( required string tenantId ) {
        variables.tenantId = arguments.tenantId
        variables.name = "Tenant Audit Middleware"
        variables.description = "Logs all AI tool calls to the tenant audit trail"
        return this
    }

    AiMiddlewareResult function beforeToolCall( required struct context ) {
        auditLog.record(
            tenantId : variables.tenantId,
            tool     : context.tool?.getName() ?: "unknown",
            args     : context.toolCall?.function?.arguments ?: "{}"
        )
        return AiMiddlewareResult.continue()
    }

}

πŸš€ Composing Middleware

Middleware stacks compose cleanly β€” just pass an array:

agent = aiAgent(
    name       : "production-agent",
    middleware : [
        new LoggingMiddleware( logToConsole: false ),
        new RetryMiddleware( maxRetries: 3 ),
        new GuardrailMiddleware( blockedTools: [ "deleteRecord" ] ),
        new MaxToolCallsMiddleware( maxCalls: 15 ),
        new HumanInTheLoopMiddleware( toolsRequiringApproval: [ "placeOrder" ] )
    ]
)

Or fluently, one at a time:

agent
    .withMiddleware( new LoggingMiddleware() )
    .withMiddleware( new RetryMiddleware( maxRetries: 3 ) )
    .withMiddleware( new GuardrailMiddleware( blockedTools: [ "deleteRecord" ] ) )

In production, logging + retry + guardrails is the baseline stack. Add MaxToolCallsMiddleware for complex reasoning agents. Add HumanInTheLoopMiddleware for any agent touching money, data, or external systems. Use FlightRecorderMiddleware in record mode during QA and replay mode in CI.


What's Next

In Part 5, we close the series with a deep dive into BoxLang AI's provider architecture β€” how the capability system works, how BaseService and OpenAIService are structured, how to add custom providers, and a tour of the full 17-provider ecosystem.

πŸ“– Full Documentation πŸ“¦Install Today: install-bx-module bx-ai 🫢Professional Support

Add Your Comment

Recent Entries

How to Develop AI Agents Using BoxLang AI: A Practical Guide

How to Develop AI Agents Using BoxLang AI: A Practical Guide

AI agents are transforming how we build software. Unlike traditional chatbots that just answer questions, agents can reason about what tools they need, decide when to use them, chain multiple actions together, and remember what happened earlier in a conversation.

Luis Majano
Luis Majano
April 03, 2026