Passive Testing
Active testing verifies your setup works. Passive testing is where Mibo delivers its real value — your production system sends every real user interaction as a trace, and Mibo evaluates it automatically against your test cases. You get continuous visibility into AI quality, including edge cases and real-world scenarios that synthetic tests can never cover.
How it works
Section titled “How it works”- Your AI system handles a real user interaction.
- Your system sends a canonical trace (input, output, and any execution details) to Mibo’s trace ingestion endpoint.
- Mibo automatically triggers an execution that evaluates the trace against all active test cases for that agent.
- You get the same quality scores, AI Judge evaluations, and Failure Matrix, without Mibo ever touching your system.
Setting it up
Section titled “Setting it up”-
Configure your agent
Set up an agent connection in your project. Passive testing works regardless of agent type — n8n, Flowise, or HTTP API. The agent type controls active-testing semantics; passive ingestion is decoupled from it.
-
Create an API key
Go to your project settings and create an API key. You’ll use this key to authenticate trace requests.
Scope the key to a single agent if you can — it removes the need to specify
platformIdon every request. See API Keys & Trace Routing. -
Create test cases
Add the test cases you want to evaluate against incoming traces. Only active test cases are used; paused or draft test cases are skipped.
-
Send traces from your system
Pick the path that fits your stack — see Sending Traces for the decision matrix.
The two paths
Section titled “The two paths”Both arrive at the same endpoint, both produce the same canonical trace internally, and both feed identical assertion evaluation.
The ingestion endpoint
Section titled “The ingestion endpoint”POST https://api.mibo-ai.com/public/tracesRequest — Custom API example
Section titled “Request — Custom API example”curl -X POST "https://api.mibo-ai.com/public/traces" \ -H "Content-Type: application/json" \ -H "x-api-key: YOUR_API_KEY" \ -H "x-request-id: chat-001" \ -d '{ "spans": [ { "span_id": "root-001", "name": "Customer Support Agent", "attributes": { "gen_ai.response.text": "We are open Monday to Friday, 9am to 5pm." }, "status": { "code": 1 } } ] }'For the OTLP envelope shape, see OTLP ingestion.
Required
Section titled “Required”x-api-keyheader: your project API key.Content-Type: application/json.- A body matching either Your API (
{ spans: [...] }) or OTLP ({ resourceSpans: [...] }). Any other shape returns400.
Optional
Section titled “Optional”x-request-idheader: the trace’s external identifier. Present → Mibo upserts by it (re-posting overwrites the previous trace). Absent → server-generated UUID, always creates a new trace.platformId(top-level UUID): explicit agent target.metadata(object): arbitrary context stored alongside the trace. Setmetadata.mibo.platform_idto route when the API key is multi-agent. Other keys are free-form.externalMetadata(object): context only Mibo’s API sees; never round-tripped to the runner.Content-Encoding: gzip: compressed body. Useful for large traces.
Agent resolution
Section titled “Agent resolution”Mibo decides which agent a trace belongs to in this order:
- If the API key is scoped to a single agent, that agent is used.
- Otherwise, the request must carry an explicit target —
platformIdat the top level,metadata.mibo.platform_id(Your API), or themibo.platform_idresource attribute (OTLP). - If neither resolves, the request returns
400with a hint pointing here.
What happens after a trace is received
Section titled “What happens after a trace is received”When Mibo receives a new trace:
- The trace data is stored.
- Mibo checks if the agent has active test cases.
- If it does, a passive execution is created and queued automatically.
- The worker picks up the execution and evaluates the trace against each test case.
- Results appear in your dashboard, just like active test results.
The evaluation happens asynchronously. The trace endpoint responds immediately with a 201, and the evaluation runs in the background.
Which test cases run on a real conversation
Section titled “Which test cases run on a real conversation”Picture your n8n or Flowise assistant handling real people all day. Some ask about pricing, some want a refund, some just say hello. You’ve written a test for each of those situations.
Should your “refund” test run on a conversation where someone only asked for your opening hours? No. It would turn red, even though your assistant did nothing wrong. That’s a false alarm, and a dashboard full of false alarms is one you stop trusting.
To prevent this, every test has one simple setting: Passive behavior. It tells Mibo when to run that test on real conversations. You pick it from a dropdown when you create or edit a test. There are three choices:
| Choice | What it means | Good for |
|---|---|---|
| Always run | Mibo checks this on every conversation, no matter what the person said. | Rules that should hold for every single reply. “The answer is never empty.” “The assistant always replies in English.” |
| Run when relevant (default) | Mibo only runs this when the conversation actually matches the situation you described. | Most tests. “When someone asks to cancel, the assistant explains how.” This test won’t run on a “what are your hours?” chat. |
| Manual runs only | Mibo never runs this on real conversations. It only runs when you press Run yourself. | Tricky inputs you’d only send on purpose. “Send a broken request and check the error message.” |
How “Run when relevant” decides
Section titled “How “Run when relevant” decides”For these tests, Mibo reads each real conversation and compares it to the Scenario you wrote. If it matches, the test runs. If it clearly doesn’t, Mibo skips it for that conversation (you’ll see it counted as skipped, never as a failure).
So your Scenario does two jobs: it tells Mibo how to judge the reply, and it tells Mibo when this test applies. Write it the way you’d explain the situation to a coworker:
- Clear: “The user asks to cancel their subscription.”
- Too vague to match: “Handle the request.”
Choosing it (no code needed)
Section titled “Choosing it (no code needed)”- Open a test in the editor, or create a new one.
- In the form, find Passive behavior under General Configuration.
- Pick Always run, Run when relevant, or Manual runs only.
If you don’t touch it, the test stays on Run when relevant, which is the safe default for most tests. That’s all there is to it.
This setting only affects real conversations. When you press Run yourself, Mibo runs all of your tests, because you chose the input on purpose.
Missing instrumentation, not silent pass/fail
Section titled “Missing instrumentation, not silent pass/fail”If a token_limit, http_status, or node_call assertion can’t find the attribute it needs in your trace, Mibo flags it as missing instrumentation rather than passing or failing silently. The dashboard renders it distinctly so you know to add the right attribute to your spans.
The attributes Mibo reads are listed under each path’s deep-dive:
Use cases
Section titled “Use cases”Production monitoring
Section titled “Production monitoring”Connect your system’s logging pipeline to Mibo. Every real user interaction gets evaluated against your test cases, giving you continuous quality visibility without extra API calls.
Post-mortem analysis
Section titled “Post-mortem analysis”Something went wrong in production? Send the interaction trace to Mibo and get a full quality breakdown, including the Failure Matrix, AI Judge scores, and stage-level analysis.
Shadow testing
Section titled “Shadow testing”Evaluate production traffic against new or updated test cases before deploying changes. Catch regressions early by comparing quality scores over time.