Rahul Verma - The Test Tribe
Session

Verify, Then Trust Proportionately:
Evaluating AI Systems
from a Tester’s Lens

Outline

What we’ll cover in this session.

Further details coming soon.

AI systems do not behave like traditional software, which makes life interesting for testers. When the same input can lead to different outputs, and “correct” is sometimes open to debate, how do we decide what deserves our trust?

This talk looks at AI evaluation through a tester’s lens and makes a simple case: trust should not be assumed, it should be earned, and earned in proportion to the evidence.

This is not a deep dive into tools, metrics, or complex frameworks. In 25 minutes, we will focus on the mindset shift that AI asks of testers. What changes when software stops behaving predictably? What stays the same? And why are testers actually well suited for this moment?

If you have ever looked at an AI response and thought, “Well… that seems confident,” this talk is for you.

You will leave with a clearer way to think about evaluating AI systems, asking better questions, and applying your testing instincts in a world where software occasionally sounds very sure of itself.

We’ll walk through
01

Where agentic workflows actually earn their keep in a real QA pipeline — and the two places they quietly fail.

02

The four control surfaces to set up before an agent touches production: scope, evaluation, failure cataloguing, human-in-the-loop.

03

Patterns for flaky-test triage, regression pruning, and visual-diff arbitration with receipts from three production systems.

04

A reference architecture you can take back to Monday’s sprint planning, plus the metrics that prove it’s working.

Rahul Verma
Speaker

Rahul Verma

Sr. Consultant & AI Coach

trendig e1700143729757

Rahul Verma is an awarded thought leader in the testing community, working as a Senior Coach and Consultant at trendig technology services gmbh. He created Swayam, an LLM framework for layered prompting, and Arjuna, a free, open-source Python test automation framework. He has contributed as an author and reviewer for certification bodies like Artificial Intelligence United, Selenium United, ISTQB, and CMAP.

His testing experience covers LLMs in testing, Python automation frameworks, web security, white box testing, and web performance testing. His research explores meta-programming and object-oriented design patterns for automating these areas. He has presented, published, and trained thousands of testers, with his work deeply influenced by his interest in poetry and spirituality.

Catch this session live.

One pass, every talk, no parallel tracks. Super Early Bird
ends when the next 200 seats are gone.

The Test Tribe Logo
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.