Skip to content

The Test Architect

The Test Architect is Mibo’s AI assistant for creating tests. Instead of writing test cases manually, you describe what your AI system should do in plain language, and the AI generates the test scenarios for you.

Navigate to your platform and click New Tests. You’ll see a split-screen interface: a chat panel on the left and a test case editor on the right.

Start by telling the AI about your system. For example:

“My agent helps users book restaurant reservations. It should ask for the date, time, and party size before confirming the booking.”

The AI will generate a set of test cases based on your description. You can keep the conversation going to refine or add more:

“Also test what happens when the user asks for a date that’s already fully booked.”

“Add a test for when the user doesn’t specify a time.”

Each time you describe a scenario, the Test Architect creates or updates the test cases in the editor panel.

Each test case has three main parts:

  • Name — a short description of what’s being tested (for example, “User books a table for two”).
  • Input — the message that Mibo will send to your system (for example, “I’d like to book a table for 2 people on Friday at 7pm”).
  • Expected behavior — what the system should do or respond with. This is where Mibo’s two types of checks come in.

Mibo evaluates your AI system in two complementary ways:

These are straightforward yes/no questions:

  • Did the system use the correct tool? (For example, did it call the booking system, not the cancellation system?)
  • Were the inputs correct? (Did it pass the right date, time, and party size?)
  • Did the response include the expected information?

Rule-based checks are deterministic — they always give the same answer. They’re great for verifying that your system follows the right process.

These evaluate the overall quality of the response:

  • Is the response helpful and clear?
  • Does it stick to the facts, or does it make things up?
  • Is the tone appropriate for your brand?

AI-powered checks use an AI evaluator to score each response. You set a quality threshold (for example, 80%), and the test passes if the score meets or exceeds it.

After the AI generates your test cases, you can:

  • Edit any test case to adjust the input or expected behavior.
  • Remove test cases you don’t need.
  • Add more by continuing the conversation with the AI.

When you’re happy with the tests, you have two options:

  • Save — saves the tests and takes you to the test list.
  • Save & Run — saves the tests and immediately runs them against your system.
  • Be specific. “Should respond with the store hours for the Manhattan location” is much better than “Should respond correctly.”
  • Test one thing per case. Each test should verify a single behavior, not five things at once.
  • Include edge cases. What happens with empty inputs? Very long messages? Unexpected formats?
  • Use realistic inputs. Write test inputs that sound like actual user interactions, not robotic commands.