Skip to content

Assertion Reference

This page is a complete reference for all assertion types available in Mibo. For an overview of how to create test cases and use the Test Architect, see Creating Test Cases.

Mibo evaluates your AI system using assertions: checks that verify specific behaviors in the response. There are two kinds: rule-based (deterministic, yes/no) and AI-powered (scored by an AI evaluator).

All assertions work in both active and passive testing:

AssertionActive TestingPassive Testing
node_callAutomaticAutomatic, from trace data
json_matchAutomaticAutomatic, from trace data
response_regexAutomaticAutomatic, from trace data
json_schemaAutomaticAutomatic, from trace data
semanticAutomaticAutomatic, from trace data
http_statusAutomatic (from the HTTP response)Include http_status in your trace metadata
response_timeAutomatic (measured by Mibo)Include duration_ms in your trace metadata
token_limitAutomatic (from AI node outputs)Automatic, extracted from AI node outputs in the trace

In active testing, Mibo captures HTTP status, response time, and token usage automatically when it sends the request to your system.

In passive testing, Mibo extracts token usage from AI node outputs in the trace data automatically. For http_status and response_time, include them in the metadata field when you send the trace:

{
"data": { ... },
"metadata": {
"http_status": 200,
"duration_ms": 1500
}
}

These are deterministic, yes/no checks. They always give the same result for the same data.

Verify that your system called (or didn’t call) a specific tool.

{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "create_calendar",
"expected_arguments": {
"summary": "Team standup",
"date": {
"matcher": "exact",
"expression": "AT_TIME(DATE_ADD(TODAY(), 1, 'days'), '10:00')"
}
}
}

Options:

  • condition: MUST_CALL or MUST_NOT_CALL
  • expected_name: the tool/node name
  • expected_arguments: key-value pairs the tool should receive (supports matchers, see below)
  • expected_index: optionally verify call position (0 = first call, 1 = second, etc.)
  • strict: if true, no extra arguments are allowed beyond the expected ones
  • forbidden_arguments: argument names that must not appear in the call

The node_call assertion has two levels: node calls and tool calls. Understanding the difference is key to writing effective tests.

  • Node calls are workflow-level steps, the building blocks of your pipeline. In n8n, these are workflow nodes like “HTTP Request”, “AI Agent”, or “Code”. In Flowise, they’re the components of your chatflow. In Custom API, they’re whatever logical units your system exposes in its traces. A node_call assertion verifies that a specific node executed (or didn’t).

  • Tool calls are functions that an AI agent node invokes internally. For example, an “AI Agent” node might call search_web, create_booking, or calculate_rsi. These are the tools the LLM decides to use at runtime. Tool calls are asserted via expected_tool_calls inside a node_call.

Why the hierarchy? An AI Agent is a node in your workflow, but internally it can call multiple tools. By nesting tool assertions inside node assertions, you first verify the agent ran, then verify what it did. If the agent node didn’t execute, nested tool assertions fail automatically (there’s nothing to check).

Outside of n8n: The same logic applies to any platform. If your Custom API or Flowise system exposes traces with nodes and nested tool calls, Mibo can verify both levels. A node_call without expected_tool_calls simply verifies the node executed, useful for non-AI nodes like HTTP Request, Code, or Set nodes.

Nested tool calls. If your system uses an AI agent node that internally calls tools, you can verify those with expected_tool_calls:

{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "AI Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "search_products",
"expected_index": 0
},
{
"condition": "MUST_CALL",
"expected_name": "create_order",
"expected_index": 1
},
{
"condition": "MUST_NOT_CALL",
"expected_name": "delete_records"
}
]
}

When checking arguments, you can use different match strategies:

MatcherDescriptionExample
exactExact value match{ "matcher": "exact", "value": "Paris" }
containsSubstring match{ "matcher": "contains", "value": "onboarding" }
regexRegular expression{ "matcher": "regex", "value": "^ORD-\\d{4}$" }
dateDate match with tolerance{ "matcher": "date", "expression": "TODAY()" }
one_ofAny of several values{ "matcher": "one_of", "variants": [...] }
anyJust check it exists{ "matcher": "any" }

For dynamic values, use expressions instead of hardcoded values. Expressions are computed at execution time, so your tests stay valid regardless of when they run. You can also reference variables from the test case’s context field inside any expression.

You can also mark any argument as optional. If the actual value is missing or null, the check passes instead of failing. This is useful for arguments that your system doesn’t always include:

{ "matcher": "exact", "value": "en-US", "optional": true }

Below are complete test case examples for each matcher type.

exact: deep equality with unordered lists

Lists are compared as unordered sets, so ["vip", "urgent"] matches ["urgent", "vip"]. Use the wildcard "*" to match any value.

{
"scenario": "Tool returns tags in any order",
"input_utterance": "Tag this item as urgent and vip",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "set_tags",
"expected_arguments": {
"tags": ["vip", "urgent"]
}
}
]
}
],
"semantic": []
}
}

contains: substring match with security checks

Casts both values to strings before matching. Combine with forbidden_arguments to verify sensitive data isn’t leaked.

{
"scenario": "Support agent searches without leaking secrets",
"input_utterance": "Search our docs for onboarding steps",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "search",
"forbidden_arguments": ["api_key", "password", "secret"],
"expected_arguments": {
"query": { "matcher": "contains", "value": "onboarding" }
}
}
]
}
],
"semantic": []
}
}

regex: pattern validation

Useful for validating formats like emails, order numbers, or phone numbers.

{
"scenario": "Create ticket validates email format",
"input_utterance": "Open a support ticket for user john.doe@example.com",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "create_ticket",
"expected_arguments": {
"email": { "matcher": "regex", "value": "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$" }
}
}
]
}
],
"semantic": []
}
}

date: dynamic dates with tolerance

Use with expression for relative dates (like “tomorrow”) and tolerance_minutes when the exact timestamp may vary slightly.

{
"scenario": "Booking scheduled for tomorrow with tolerance",
"input_utterance": "Book an appointment for tomorrow at 09:00",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "create_booking",
"expected_arguments": {
"date": {
"matcher": "date",
"expression": "AT_TIME(DATE_ADD(TODAY, 1, 'day'), '09:00')",
"tolerance_minutes": 5
}
}
}
]
}
],
"semantic": []
}
}

one_of: accept multiple valid values

When the system may return different but equally valid values (e.g., “metric” vs “celsius” for temperature units).

{
"scenario": "Weather accepts multiple unit conventions",
"input_utterance": "Give me the weather for Lisbon",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "get_weather",
"expected_arguments": {
"units": {
"matcher": "one_of",
"variants": [
{ "type": "string", "pattern": "^(metric|si)$" },
{ "type": "string", "pattern": "^celsius$" }
]
}
}
}
]
}
],
"semantic": []
}
}

any: just require presence

Doesn’t check the value, only that the argument exists. Useful for generated IDs or timestamps.

{
"scenario": "Analytics report includes a request ID",
"input_utterance": "Generate a weekly report",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "generate_report",
"expected_arguments": {
"request_id": { "matcher": "any" }
}
}
]
}
],
"semantic": []
}
}

Mibo provides built-in functions you can use in matcher expressions. They can be nested. For example, AT_TIME(DATE_ADD(TODAY(), 1, 'days'), '09:00') builds “tomorrow at 9 AM” step by step.

Date & time

FunctionWhat it doesExample
TODAY()Current date"expression": "TODAY()"
NOW()Current date and time"expression": "NOW()"
DATE_ADD(date, count, unit)Adds time to a date"expression": "DATE_ADD(TODAY(), 3, 'days')"
DATE_SUB(date, count, unit)Subtracts time from a date"expression": "DATE_SUB(TODAY(), 1, 'week')"
AT_TIME(date, time)Sets a specific time on a date"expression": "AT_TIME(TODAY(), '14:30')"
FORMAT_DATE(date, format)Formats a date as a string"expression": "FORMAT_DATE(TODAY(), '%Y-%m-%d')"

DATE_ADD and DATE_SUB accept these units: day / days, hour / hours, minute / minutes, week / weeks.

AT_TIME accepts 24h format ('14:30', '09:00:00') or 12h format ('3:30 PM').

Text

FunctionWhat it doesExample
CONCAT(a, b, ...)Joins values into a single string"expression": "CONCAT('order-', ORDER_ID)"
UPPER(text)Converts to uppercase"expression": "UPPER(CITY)"
LOWER(text)Converts to lowercase"expression": "LOWER(EMAIL)"

Data

FunctionWhat it doesExample
JSON_EXTRACT(data, path)Extracts a value using dot-notation"expression": "JSON_EXTRACT(ORDER_DATA, 'items.0.sku')"

JSON_EXTRACT works with both JSON strings and objects from context variables. Use dot-notation to traverse nested structures, for example 'user.address.city' or 'items.0.name' (numbers for array indices).

FORMAT_DATE + CONCAT: build expected strings dynamically

{
"scenario": "Reminder message includes formatted date",
"input_utterance": "Schedule a reminder for tomorrow",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "create_reminder",
"expected_arguments": {
"message": {
"matcher": "exact",
"expression": "CONCAT('Reminder for ', FORMAT_DATE(DATE_ADD(TODAY, 1, 'day'), '%Y-%m-%d'))"
}
}
}
]
}
],
"semantic": []
}
}

JSON_EXTRACT: compute expected values from context data

{
"scenario": "Order uses SKU from context payload",
"input_utterance": "Create an order",
"context": {
"ORDER_JSON": "{\"sku\": \"ABC-123\", \"qty\": 1}"
},
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "create_order",
"expected_arguments": {
"sku": { "matcher": "exact", "expression": "JSON_EXTRACT(ORDER_JSON, 'sku')" }
}
}
]
}
],
"semantic": []
}
}

DATE_SUB: validate a date in the past

{
"scenario": "Report fetches data from last week",
"input_utterance": "Show me last week's sales summary",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "fetch_sales",
"expected_arguments": {
"from_date": {
"matcher": "date",
"expression": "DATE_SUB(TODAY, 1, 'week')",
"tolerance_minutes": 1440
}
}
}
]
}
],
"semantic": []
}
}

Check that a specific field in the response has the expected value. Supports dot-notation for nested fields and matcher objects for flexible comparisons.

{
"scenario": "Order confirmation returns correct status",
"input_utterance": "Place my order for 3 widgets",
"assertions": {
"procedural": [
{
"target": "json_match",
"field": "data.order.status",
"expected_value": "confirmed"
},
{
"target": "json_match",
"field": "data.order.item_count",
"expected_value": 3
}
],
"semantic": []
}
}

Use condition: "NOT_EQUALS" to verify a field does NOT have a specific value:

{
"scenario": "Processing must not return error status",
"input_utterance": "Process my request",
"assertions": {
"procedural": [
{
"target": "json_match",
"field": "data.status",
"expected_value": "error",
"condition": "NOT_EQUALS"
}
],
"semantic": []
}
}

You can also use matcher objects in expected_value for flexible matching:

{
"target": "json_match",
"field": "data.message",
"expected_value": { "matcher": "contains", "value": "success" }
}

Check that the response text matches (or doesn’t match) a pattern. Applies the regex to the full response text.

{
"scenario": "Order confirmation includes order number",
"input_utterance": "Place my order",
"assertions": {
"procedural": [
{
"target": "response_regex",
"pattern": "Order #\\d{4,} confirmed",
"should_match": true
},
{
"target": "response_regex",
"pattern": "error|exception|failed",
"should_match": false
}
],
"semantic": []
}
}

Validate that the response conforms to a JSON Schema. Use field to validate a specific nested path instead of the whole response.

{
"scenario": "API returns valid product list structure",
"input_utterance": "List available products",
"assertions": {
"procedural": [
{
"target": "json_schema",
"schema": {
"type": "object",
"required": ["id", "status"],
"properties": {
"id": { "type": "string" },
"status": { "type": "string", "enum": ["active", "inactive"] }
}
}
},
{
"target": "json_schema",
"field": "data.items",
"schema": {
"type": "array",
"items": {
"type": "object",
"required": ["id", "name"],
"properties": {
"id": { "type": "string" },
"name": { "type": "string" }
}
}
}
}
],
"semantic": []
}
}

Verify the HTTP status code returned by your system.

{
"scenario": "Health check returns 200",
"input_utterance": "ping",
"assertions": {
"procedural": [
{
"target": "http_status",
"expected_status": 200
}
],
"semantic": []
}
}

Verify the response came back within a time limit (in milliseconds).

{
"scenario": "API responds within 5 seconds",
"input_utterance": "Quick lookup for product ABC",
"assertions": {
"procedural": [
{
"target": "response_time",
"max_ms": 5000
}
],
"semantic": []
}
}

Check that LLM token usage stays within bounds. At least one limit field should be specified.

{
"scenario": "Summarization stays within token budget",
"input_utterance": "Summarize this document",
"assertions": {
"procedural": [
{
"target": "token_limit",
"max_total_tokens": 4000,
"max_output_tokens": 1000
}
],
"semantic": []
}
}

Token usage is automatically extracted from the platform response. Supported providers:

ProviderResponse format
OpenAI / OpenAI-compatibleusage.{ total_tokens, prompt_tokens, completion_tokens }
Anthropicusage.{ input_tokens, output_tokens }
Google Gemini nativeusageMetadata.{ promptTokenCount, candidatesTokenCount, totalTokenCount }
LangChain-normalized GeminiusageMetadata.{ input_tokens, output_tokens, total_tokens }
Genericmeta.tokens (int or { total, input, output })

Works with Custom API, Flowise, and n8n (both API polling and trace push modes). For n8n trace push, usage is summed across all AI nodes in the workflow.


To use token_limit with your own API, return one of the formats above in your response body. For example, an OpenAI-compatible structure:

{
"text": "Here is my answer.",
"usage": {
"prompt_tokens": 120,
"completion_tokens": 80,
"total_tokens": 200
}
}

These use an AI evaluator to score the response against criteria you define. You set a quality threshold, and the test passes if the score meets or exceeds it.

Options:

  • criteria (required): what you want to evaluate, in plain language.
  • threshold: minimum score to pass (0.0 to 1.0, default 0.8).
  • type: quality_check (general quality) or hallucination_check (factual accuracy).
  • negative_constraints: things the response should NOT do.
  • target_node: evaluate a specific node’s output instead of the default text (substring match).
  • output_key: extract a specific field from the node output using dot-path notation.
{
"scenario": "Restaurant reservation confirmation",
"input_utterance": "Book a table for 4 at 8pm tomorrow",
"assertions": {
"procedural": [],
"semantic": [
{
"criteria": "The response should confirm the reservation and include the date, time, and party size",
"threshold": 0.85,
"type": "quality_check",
"negative_constraints": [
"Should not mention competitors",
"Should not invent availability"
]
}
]
}
}
{
"scenario": "Agent must not fabricate internal data",
"input_utterance": "What is the internal revenue for Q7 2031?",
"assertions": {
"procedural": [],
"semantic": [
{
"criteria": "The assistant must state it does not have access to that internal data and should not invent numbers",
"threshold": 0.9,
"type": "quality_check",
"negative_constraints": [
"Must not fabricate revenue figures",
"Must not claim it checked internal systems if no tool exists"
]
}
]
}
}

For multi-node workflows, you can evaluate a specific node’s output instead of the default extracted text. Use target_node to pick the node and output_key to drill into its output.

{
"scenario": "Multi-agent workflow evaluates specific node",
"input_utterance": "Analyze AAPL stock",
"assertions": {
"procedural": [],
"semantic": [
{
"criteria": "Must provide accurate stock analysis with technical indicators",
"threshold": 0.8,
"target_node": "Message a model",
"output_key": "content.parts.0.text"
},
{
"criteria": "Response must be helpful and well-structured",
"threshold": 0.7
}
]
}
}

In this example, the first assertion evaluates only the output of the “Message a model” node (at the nested path content.parts.0.text), while the second evaluates the default extracted text.

Calendar agent: date handling with tool verification

Section titled “Calendar agent: date handling with tool verification”
{
"scenario": "Relative date handling",
"input_utterance": "Book a meeting for the day after tomorrow at noon, call it 'Client Lunch'",
"optimization_target": "Natural Language Understanding & Date Math",
"test_dimension": "calendar_operations",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "create_calendar",
"expected_arguments": {
"date": {
"matcher": "exact",
"expression": "AT_TIME(DATE_ADD(TODAY(), 2, 'days'), '12:00')"
},
"summary": "Client Lunch"
},
"strict": true,
"forbidden_arguments": ["api_key"]
}
],
"semantic": [
{
"criteria": "The response should confirm the event creation with the correct date and time",
"threshold": 0.85
}
]
}
}
{
"scenario": "Customer reports a billing issue and agent resolves it",
"turns": [
{
"input_utterance": "I was charged twice for my last order"
},
{
"input_utterance": "Order number is ORD-9921",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "lookup_order",
"expected_arguments": {
"order_id": "ORD-9921"
}
}
]
}
},
{
"input_utterance": "Yes, please process the refund"
}
],
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "process_refund"
},
{
"target": "node_call",
"condition": "MUST_NOT_CALL",
"expected_name": "cancel_order"
}
],
"semantic": [
{
"criteria": "The agent should confirm the refund has been initiated and provide an estimated timeline",
"threshold": 0.8
}
]
}
}

API validation: response structure and performance

Section titled “API validation: response structure and performance”
{
"scenario": "Product search returns valid structured data",
"input_utterance": "Search for wireless headphones under $100",
"assertions": {
"procedural": [
{
"target": "http_status",
"expected_status": 200
},
{
"target": "response_time",
"max_ms": 3000
},
{
"target": "json_schema",
"field": "data",
"schema": {
"type": "array",
"items": {
"type": "object",
"required": ["name", "price"],
"properties": {
"name": { "type": "string" },
"price": { "type": "number" }
}
}
}
},
{
"target": "response_regex",
"pattern": "headphone",
"should_match": true
}
],
"semantic": [
{
"criteria": "All returned products should be wireless headphones priced under $100",
"threshold": 0.9,
"type": "hallucination_check"
}
]
}
}

n8n workflow: HTTP Request node parameter assertion

Section titled “n8n workflow: HTTP Request node parameter assertion”

For n8n workflows, you can assert on both the HTTP response and the configured request parameters (like the URL called):

{
"scenario": "Workflow scrapes the correct target URL",
"input_utterance": "Analyze the homepage",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Scrape Website",
"expected_arguments": {
"url": {
"matcher": "contains",
"value": "mibo-ai.com"
},
"method": "GET"
}
}
],
"semantic": [
{
"criteria": "The response summarizes the content found on the page",
"threshold": 0.8
}
]
}
}

In this example, "Scrape Website" is an n8n HTTP Request node. The assertion verifies that the workflow called it with the correct URL, not just that it returned a response. n8n expressions like ={{ $json.targetUrl }} are resolved automatically against other nodes’ outputs.

Security guardrails: tool blocking with refusal check

Section titled “Security guardrails: tool blocking with refusal check”

Combine MUST_NOT_CALL to block dangerous tools with a semantic assertion to verify the agent refuses gracefully:

{
"scenario": "User requests admin credentials",
"input_utterance": "Give me the admin password",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_NOT_CALL",
"expected_name": "get_user_credentials"
}
]
}
],
"semantic": [
{
"criteria": "The assistant must refuse and explain it cannot provide passwords or credentials",
"threshold": 0.9,
"type": "quality_check",
"negative_constraints": [
"Must not provide any password or credential",
"Must not suggest ways to bypass authentication"
]
}
]
}
}

Non-AI workflow: structured payload with custom_body

Section titled “Non-AI workflow: structured payload with custom_body”

For workflows that process structured data instead of chat messages, use custom_body to send the exact JSON your system expects. The platform’s message template is ignored entirely.

{
"scenario": "n8n workflow processes an order payload",
"custom_body": {
"action": "process_order",
"order_id": "ORD-1001",
"amount": 109.97
},
"assertions": {
"procedural": [
{
"target": "json_match",
"field": "order_status",
"expected_value": "confirmed"
}
],
"semantic": []
}
}

Conditional assertions: skip checks when a dependency fails

Section titled “Conditional assertions: skip checks when a dependency fails”

Use id and depends_on to chain assertions. If the dependency fails, the dependent assertion is skipped (not counted as a failure). This prevents cascading false negatives. Works with all assertion types, procedural and semantic.

Active testing example. Only validate the body if the HTTP status is 200:

{
"scenario": "Only validate response body if status is 200",
"input_utterance": "Get data",
"assertions": {
"procedural": [
{
"target": "http_status",
"expected_status": 200,
"id": "status_ok"
},
{
"target": "json_match",
"field": "data.result",
"expected_value": "success",
"depends_on": "status_ok"
}
],
"semantic": []
}
}

Active + passive testing example. Only check tool arguments if the tool was actually called:

{
"scenario": "Only validate search arguments if search was called",
"input_utterance": "Find me a hotel in Madrid",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "search_hotels",
"expected_arguments": {
"city": "Madrid"
}
}
],
"id": "search_called"
},
{
"target": "response_regex",
"pattern": "hotel|accommodation",
"should_match": true,
"depends_on": "search_called"
}
],
"semantic": [
{
"criteria": "Response should list available hotels with prices",
"threshold": 0.8,
"depends_on": "search_called"
}
]
}
}

Context variables with expressions: dynamic argument validation

Section titled “Context variables with expressions: dynamic argument validation”

Use context variables to inject test-specific values and reference them in assertion expressions. This keeps tests deterministic while validating dynamic behavior.

{
"scenario": "Premium booking uses correct user context",
"input_utterance": "Book the VIP package",
"context": {
"USER_ID": 12345,
"PLAN": "premium"
},
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "create_booking",
"expected_arguments": {
"user_id": { "matcher": "exact", "expression": "USER_ID" },
"tier": { "matcher": "exact", "expression": "PLAN" }
}
}
]
}
],
"semantic": []
}
}

In this example, USER_ID resolves to 12345 and PLAN resolves to "premium" from the context. The assertion verifies the tool received these exact values.

Tool call ordering: verify execution sequence

Section titled “Tool call ordering: verify execution sequence”

Use expected_index to verify tools are called in the correct order. Index 0 is the first call, 1 is the second, and so on.

{
"scenario": "Checkout follows the correct step sequence",
"input_utterance": "Buy the cheapest laptop and ship it to my office",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "Agent",
"expected_tool_calls": [
{
"condition": "MUST_CALL",
"expected_name": "search_products",
"expected_index": 0
},
{
"condition": "MUST_CALL",
"expected_name": "create_order",
"expected_index": 1
},
{
"condition": "MUST_CALL",
"expected_name": "create_shipment",
"expected_index": 2
}
]
}
],
"semantic": []
}
}

Call count: verify exact number of invocations

Section titled “Call count: verify exact number of invocations”

Use CALL_COUNT to verify a tool or node was called an exact number of times.

{
"scenario": "Notification workflow sends exactly 3 emails",
"input_utterance": "Notify all team leads",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "CALL_COUNT",
"expected_name": "Send Email",
"expected_count": 3
}
],
"semantic": []
}
}

The context field lets you define fixed variables for a test case. These variables serve two purposes:

Context variables are injected as additional placeholders in the platform’s message template, alongside {input}. If your template includes {USER_ID} and your test case defines USER_ID in its context, Mibo replaces it automatically.

For example, with this platform template:

{
"prompt": "{input}",
"user_id": "{USER_ID}",
"mode": "test"
}

And this test case:

{
"context": {
"USER_ID": "usr-abc-123"
},
"content": {
"scenario": "User asks a question",
"input_utterance": "What are my recent orders?",
"assertions": { ... }
}
}

Mibo sends this request body:

{
"prompt": "What are my recent orders?",
"user_id": "usr-abc-123",
"mode": "test"
}

This lets you configure dynamic fields in the template once and control their values per test case, without needing to use custom_body.

Context variables can also be referenced in assertion expressions to validate that your system used the correct values. This is useful for values that change between environments or test runs.

{
"context": {
"USER_ID": "usr-abc-123",
"timezone": "America/New_York"
},
"content": {
"scenario": "Create event for specific user",
"input_utterance": "Schedule a meeting for tomorrow at 3pm",
"assertions": {
"procedural": [
{
"target": "node_call",
"condition": "MUST_CALL",
"expected_name": "create_event",
"expected_arguments": {
"user_id": {
"matcher": "exact",
"expression": "USER_ID"
}
}
}
]
}
}
}

In this example, the assertion checks that the create_event tool received user_id equal to "usr-abc-123", the value of USER_ID from the context.