Skip to content

Passive Testing

Active testing verifies your setup works. Passive testing is where Mibo delivers its real value — your production system sends every real user interaction as a trace, and Mibo evaluates it automatically against your test cases. You get continuous visibility into AI quality, including edge cases and real-world scenarios that synthetic tests can never cover.

  1. Your AI system handles a real user interaction.
  2. Your system sends a canonical trace (input, output, and any execution details) to Mibo’s trace ingestion endpoint.
  3. Mibo automatically triggers an execution that evaluates the trace against all active test cases for that agent.
  4. You get the same quality scores, AI Judge evaluations, and Failure Matrix, without Mibo ever touching your system.
  1. Configure your agent

    Set up an agent connection in your project. Passive testing works regardless of agent type — n8n, Flowise, or HTTP API. The agent type controls active-testing semantics; passive ingestion is decoupled from it.

  2. Create an API key

    Go to your project settings and create an API key. You’ll use this key to authenticate trace requests.

    Scope the key to a single agent if you can — it removes the need to specify platformId on every request. See API Keys & Trace Routing.

  3. Create test cases

    Add the test cases you want to evaluate against incoming traces. Only active test cases are used; paused or draft test cases are skipped.

  4. Send traces from your system

    Pick the path that fits your stack — see Sending Traces for the decision matrix.

Both arrive at the same endpoint, both produce the same canonical trace internally, and both feed identical assertion evaluation.

POST https://api.mibo-ai.com/public/traces
Terminal window
curl -X POST "https://api.mibo-ai.com/public/traces" \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-H "x-request-id: chat-001" \
-d '{
"spans": [
{
"span_id": "root-001",
"name": "Customer Support Agent",
"attributes": {
"gen_ai.response.text": "We are open Monday to Friday, 9am to 5pm."
},
"status": { "code": 1 }
}
]
}'

For the OTLP envelope shape, see OTLP ingestion.

  • x-api-key header: your project API key.
  • Content-Type: application/json.
  • A body matching either Your API ({ spans: [...] }) or OTLP ({ resourceSpans: [...] }). Any other shape returns 400.
  • x-request-id header: the trace’s external identifier. Present → Mibo upserts by it (re-posting overwrites the previous trace). Absent → server-generated UUID, always creates a new trace.
  • platformId (top-level UUID): explicit agent target.
  • metadata (object): arbitrary context stored alongside the trace. Set metadata.mibo.platform_id to route when the API key is multi-agent. Other keys are free-form.
  • externalMetadata (object): context only Mibo’s API sees; never round-tripped to the runner.
  • Content-Encoding: gzip: compressed body. Useful for large traces.

Mibo decides which agent a trace belongs to in this order:

  1. If the API key is scoped to a single agent, that agent is used.
  2. Otherwise, the request must carry an explicit target — platformId at the top level, metadata.mibo.platform_id (Your API), or the mibo.platform_id resource attribute (OTLP).
  3. If neither resolves, the request returns 400 with a hint pointing here.

When Mibo receives a new trace:

  1. The trace data is stored.
  2. Mibo checks if the agent has active test cases.
  3. If it does, a passive execution is created and queued automatically.
  4. The worker picks up the execution and evaluates the trace against each test case.
  5. Results appear in your dashboard, just like active test results.

The evaluation happens asynchronously. The trace endpoint responds immediately with a 201, and the evaluation runs in the background.

Which test cases run on a real conversation

Section titled “Which test cases run on a real conversation”

Picture your n8n or Flowise assistant handling real people all day. Some ask about pricing, some want a refund, some just say hello. You’ve written a test for each of those situations.

Should your “refund” test run on a conversation where someone only asked for your opening hours? No. It would turn red, even though your assistant did nothing wrong. That’s a false alarm, and a dashboard full of false alarms is one you stop trusting.

To prevent this, every test has one simple setting: Passive behavior. It tells Mibo when to run that test on real conversations. You pick it from a dropdown when you create or edit a test. There are three choices:

ChoiceWhat it meansGood for
Always runMibo checks this on every conversation, no matter what the person said.Rules that should hold for every single reply. “The answer is never empty.” “The assistant always replies in English.”
Run when relevant (default)Mibo only runs this when the conversation actually matches the situation you described.Most tests. “When someone asks to cancel, the assistant explains how.” This test won’t run on a “what are your hours?” chat.
Manual runs onlyMibo never runs this on real conversations. It only runs when you press Run yourself.Tricky inputs you’d only send on purpose. “Send a broken request and check the error message.”

For these tests, Mibo reads each real conversation and compares it to the Scenario you wrote. If it matches, the test runs. If it clearly doesn’t, Mibo skips it for that conversation (you’ll see it counted as skipped, never as a failure).

So your Scenario does two jobs: it tells Mibo how to judge the reply, and it tells Mibo when this test applies. Write it the way you’d explain the situation to a coworker:

  • Clear: “The user asks to cancel their subscription.”
  • Too vague to match: “Handle the request.”
  1. Open a test in the editor, or create a new one.
  2. In the form, find Passive behavior under General Configuration.
  3. Pick Always run, Run when relevant, or Manual runs only.

If you don’t touch it, the test stays on Run when relevant, which is the safe default for most tests. That’s all there is to it.

This setting only affects real conversations. When you press Run yourself, Mibo runs all of your tests, because you chose the input on purpose.

Missing instrumentation, not silent pass/fail

Section titled “Missing instrumentation, not silent pass/fail”

If a token_limit, http_status, or node_call assertion can’t find the attribute it needs in your trace, Mibo flags it as missing instrumentation rather than passing or failing silently. The dashboard renders it distinctly so you know to add the right attribute to your spans.

The attributes Mibo reads are listed under each path’s deep-dive:

Connect your system’s logging pipeline to Mibo. Every real user interaction gets evaluated against your test cases, giving you continuous quality visibility without extra API calls.

Something went wrong in production? Send the interaction trace to Mibo and get a full quality breakdown, including the Failure Matrix, AI Judge scores, and stage-level analysis.

Evaluate production traffic against new or updated test cases before deploying changes. Catch regressions early by comparing quality scores over time.