Rahul Verma at TribeQonf 2026

Session

Test, Then Trust Proportionately:
Evaluating AI Systems
from a Tester’s Lens

Outline

What we’ll cover in this session.

Further details coming soon.

AI systems have made one thing impossible to ignore: uncertainty. The same prompt can produce different answers. Users rarely ask the same question the same way twice. Even deciding what is “correct” is often more nuanced than pass or fail. It feels like testing has fundamentally changed.

But has it?

In this talk, we’ll explore AI evaluation through a tester’s lens. As AI systems become less predictable, are we facing a new testing problem or rediscovering one that has always existed? Along the way, we’ll question a few assumptions we’ve quietly carried for years.

This is not a talk about the latest evaluation framework or AI buzzwords. It is about a way of thinking. Using familiar testing principles, we’ll explore how to understand uncertainty through evidence and decide what deserves our trust.

Whether you’re just starting with AI or already building AI-powered systems, you’ll leave with a practical mental model for evaluating AI. You may also leave with a different perspective on what testing has always been about.

I’m about 73.69% confident of that. Give or take.

We’ll walk through

Where agentic workflows actually earn their keep in a real QA pipeline — and the two places they quietly fail.

The four control surfaces to set up before an agent touches production: scope, evaluation, failure cataloguing, human-in-the-loop.

Patterns for flaky-test triage, regression pruning, and visual-diff arbitration with receipts from three production systems.

A reference architecture you can take back to Monday’s sprint planning, plus the metrics that prove it’s working.

Speaker

Rahul Verma

Sr. Consultant & AI Coach

Rahul Verma is an awarded thought leader in the testing community, working as a Senior Coach and Consultant at trendig technology services gmbh. He created Swayam, an LLM framework for layered prompting, and Arjuna, a free, open-source Python test automation framework. He has contributed as an author and reviewer for certification bodies like Artificial Intelligence United, Selenium United, ISTQB, and CMAP.

His testing experience covers LLMs in testing, Python automation frameworks, web security, white box testing, and web performance testing. His research explores meta-programming and object-oriented design patterns for automating these areas. He has presented, published, and trained thousands of testers, with his work deeply influenced by his interest in poetry and spirituality.

Catch this session live.

One pass, every talk, no parallel tracks. Super Early Bird ends when the next 200 seats are gone.

Test, Then Trust Proportionately: Evaluating AI Systems from a Tester’s Lens

What we’ll cover in this session.

Rahul Verma

Catch this session live.

Welcome to The Test Tribe!

Get the TribeQonf 2026 Agenda

Test, Then Trust Proportionately:
Evaluating AI Systems
from a Tester’s Lens