Creating Test Cases
The Test Architect is Mibo’s AI assistant for creating tests. Instead of writing test cases manually, you describe what your AI system should do in plain language, and the AI generates the test scenarios for you.
Opening the Test Architect
Section titled “Opening the Test Architect”Navigate to your platform and click New Tests. You’ll see a split-screen interface: a chat panel on the left and a test case editor on the right.
Creating tests through conversation
Section titled “Creating tests through conversation”Start by telling the AI about your system. For example:
“My agent helps users book restaurant reservations. It should ask for the date, time, and party size before confirming the booking.”
The AI will generate a set of test cases based on your description. You can keep the conversation going to refine or add more:
“Also test what happens when the user asks for a date that’s already fully booked.”
“Add a test for when the user doesn’t specify a time.”
Each time you describe a scenario, the Test Architect creates or updates the test cases in the editor panel.
You can also attach JSON files to the conversation, for example an n8n workflow export, a Flowise chatflow config, or a JSON document describing your API. The AI analyzes the structure to detect testable nodes and suggest test targets. You can also paste JSON directly into the chat instead of uploading a file.
What a test case contains
Section titled “What a test case contains”Each test case has these parts:
- Name: a short description of what’s being tested.
- Scenario: a description of the situation and what the expected behavior is.
- Input: the message (or messages) that Mibo will send to your system.
- Assertions: the checks Mibo runs against the response.
Input types and how they’re sent
Section titled “Input types and how they’re sent”When you set up a platform, you configure a message template: a JSON structure that defines the request body Mibo sends to your system. This template uses {input} as a placeholder. Your message template is configured in the platform settings (see n8n, Flowise, or Custom API for platform-specific setup).
For example, a Custom API platform might have this template:
{ "input": "{input}" }And a Flowise platform uses:
{ "question": "{input}" }When a test runs, Mibo takes this template, replaces {input} with the test’s input, and sends the result to your system. The rest of the template is sent as-is on every request. You configure the payload structure once at the platform level, and each test case only provides the variable part.
You can provide the test input in three ways:
Input utterance
Section titled “Input utterance”A plain text message, like a real user would type. This is the most common option. When a test runs, Mibo replaces {input} in the platform’s message template with this text.
For example, with the template { "input": "{input}" } and this test case:
{ "scenario": "User asks about store hours", "input_utterance": "What time do you close on Sundays?", "assertions": { ... }}Mibo sends this request body to your system:
{ "input": "What time do you close on Sundays?" }If your template has additional fields (like model or temperature), those are included as-is in every request.
Custom body
Section titled “Custom body”A raw JSON object that replaces the entire message template. When a test uses custom_body, the platform’s template is ignored completely and the custom body is sent as-is.
Use this when your system expects a specific structure beyond a simple message, for example a webhook that receives structured data.
{ "scenario": "Process an incoming order", "custom_body": { "order_id": "ORD-5678", "customer": { "name": "Jane Doe", "email": "jane@example.com" }, "items": [ { "product": "Widget A", "quantity": 2 } ] }, "assertions": { ... }}Multi-turn conversations (Turns)
Section titled “Multi-turn conversations (Turns)”For testing conversations that span multiple messages, use turns. Each turn represents one message in the conversation, sent sequentially with shared session context.
This is essential for testing flows where your system needs to gather information across several exchanges, like booking a flight, filling out a form, or handling a multi-step support request.
{ "scenario": "Complete booking flow", "turns": [ { "input_utterance": "I want to book a flight to Paris" }, { "input_utterance": "Next Monday, economy class" }, { "input_utterance": "John Doe, passport ABC123" } ], "assertions": { "semantic": [ { "criteria": "The agent confirms the booking with all details: destination, date, class, and passenger info", "threshold": 0.85 } ] }}How multi-turn execution works:
- Mibo generates a unique session ID for the test.
- Each turn is sent sequentially to your system, using the same session ID to maintain conversation context.
- Your system receives each message as if it were a real user continuing the conversation.
- The top-level assertions are evaluated against the last turn’s response.
Per-turn assertions
Section titled “Per-turn assertions”You can optionally add assertions to individual turns to verify intermediate steps, not just the final response:
{ "scenario": "Restaurant reservation with validation", "turns": [ { "input_utterance": "Book a table for tomorrow at 8pm" }, { "input_utterance": "4 people", "assertions": { "semantic": [ { "criteria": "The agent should confirm the party size and ask for a name", "threshold": 0.8 } ] } }, { "input_utterance": "Under the name García" } ], "assertions": { "procedural": [ { "target": "node_call", "condition": "MUST_CALL", "expected_name": "create_reservation", "expected_arguments": { "party_size": 4, "name": "García" } } ], "semantic": [ { "criteria": "The agent confirms the full reservation: date, time, party size, and name", "threshold": 0.85 } ] }}In this example, the second turn checks that the agent asks for a name after receiving the party size. The final assertions verify that the reservation tool was called correctly and the confirmation message is complete.
Assertions
Section titled “Assertions”Tests use two kinds of checks: rule-based (deterministic, did the system call the right tool?) and AI-powered (scored by an evaluator, is the response good?). The Test Architect generates these automatically based on your descriptions.
Managing test cases
Section titled “Managing test cases”Status
Section titled “Status”Each test case has a status:
- Active: included in test runs. This is the default.
- Disabled: skipped during test runs. Use this to temporarily exclude a test without deleting it.
You can toggle a test case’s status from the test list.
Editing and organizing
Section titled “Editing and organizing”After the AI generates your test cases, you can:
- Edit any test case to adjust the input, scenario, or assertions.
- Remove test cases you don’t need.
- Add more by continuing the conversation with the AI.
- Disable tests you want to keep but skip during runs.
Saving your tests
Section titled “Saving your tests”When you’re happy with the tests, you have two options:
- Save: saves the tests and takes you to the test list.
- Save & Run: saves the tests and immediately runs them against your system.
Tips for effective tests
Section titled “Tips for effective tests”- Be specific. “Should respond with the store hours for the Manhattan location” is much better than “Should respond correctly.”
- Test one thing per case. Each test should verify a single behavior, not five things at once.
- Include edge cases. What happens with empty inputs? Very long messages? Unexpected formats?
- Use realistic inputs. Write test inputs that sound like actual user interactions, not robotic commands.
- Combine check types. Use a tool call check to verify the right function was called, and a semantic check to verify the response was well-worded.
- Set appropriate thresholds. Start with 0.8 for semantic checks and adjust based on how strict you need to be.
- Use multi-turn for conversations. If your system is designed for back-and-forth interaction, test the full flow, not just isolated messages.
- Add per-turn assertions sparingly. Only check intermediate steps when the order of operations matters. Keep most assertions on the final response.