Skip to content

Passive Testing

Active testing verifies your setup works. Passive testing is where Mibo delivers its real value — your production system sends every real user interaction as a trace, and Mibo evaluates it automatically against your test cases. You get continuous visibility into AI quality, including edge cases and real-world scenarios that synthetic tests can never cover.

  1. Your AI system handles a real user interaction.
  2. Your system sends the trace data (input, output, and any execution details) to Mibo’s trace ingestion endpoint.
  3. Mibo automatically triggers an execution that evaluates the trace against all active test cases for that platform.
  4. You get the same quality scores, AI Judge evaluations, and Failure Matrix, without Mibo ever touching your system.
  1. Configure your platform

    Set up a platform connection in your project. Passive testing works with any platform type: Custom API, Flowise, or n8n.

    For Custom API platforms, choose the Push trace mode. This tells Mibo to expect trace data sent separately from your system.

  2. Create an API key

    Go to your project settings and create an API key. You’ll use this key to authenticate trace requests.

    You can optionally restrict the key to specific platforms. If you do, traces sent with that key will automatically be routed to the right platform.

  3. Create test cases

    Add the test cases you want to evaluate against incoming traces. Only active test cases are used; paused or draft test cases are skipped.

  4. Send traces from your system

    Have your AI system send trace data to Mibo after each interaction. See the API reference below.

Send a POST request to /public/traces with your trace data.

Terminal window
curl -X POST "https://api.mibo-ai.com/public/traces" \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-d '{
"data": {
"text": "We are open Monday to Friday, 9am to 5pm."
}
}'
  • x-api-key header: your project API key.
  • data (object): the trace payload. Must be a non-empty object. The structure inside data depends on your platform type; see the Trace Format Reference for exact schemas per platform.
  • platformId (UUID): explicitly specify which platform this trace belongs to. Required if your API key has access to multiple platforms and you’re not using metadata for resolution.
  • externalId (string, max 255 chars): your own identifier for the trace.
  • metadata (object): additional context stored alongside the trace. Can include platform identifiers for auto-routing (e.g., { "chatflowId": "abc" } for Flowise, { "workflowId": "xyz" } for n8n), environment info, or any other key-value pairs.

For large traces, you can send gzip-compressed payloads by setting Content-Encoding: gzip.

Mibo determines which platform a trace belongs to using (in order):

  1. The platformId field in the request body.
  2. If the API key is restricted to a single platform, that platform is used automatically.
  3. Matching metadata fields against platform configurations (e.g., chatflowId for Flowise, workflowId for n8n).

When Mibo receives a new trace:

  1. The trace data is encrypted and stored.
  2. Mibo checks if the platform has active test cases.
  3. If it does, a passive execution is created and queued automatically.
  4. The worker picks up the execution and evaluates the trace against each test case.
  5. Results appear in your dashboard, just like active test results.

The evaluation happens asynchronously. The trace endpoint responds immediately with a 201, and the evaluation runs in the background.

Connect your system’s logging pipeline to Mibo. Every real user interaction gets evaluated against your test cases, giving you continuous quality visibility without extra API calls.

Something went wrong in production? Send the interaction trace to Mibo and get a full quality breakdown, including the Failure Matrix, AI Judge scores, and stage-level analysis.

Evaluate production traffic against new or updated test cases before deploying changes. Catch regressions early by comparing quality scores over time.