Sandesh Gawande - The Test Tribe

Author: Sandesh Gawande

AI and Emerging Careers in Data Testing for QA Professionals

The emergence of AI has created uncertainties in the software and technology world. As it encroaches into the conventional application test-automation space, QA professionals might feel threatened or even cornered. While it is true that AI is changing traditional testing roles, it also opened new opportunities in the data testing space.

But what does AI rely on? Obviously, data!

The more organizations rely on AI, the more data they needโ€“and not just quantity of data but quality as well. And how do you get better data? By collecting and processing more data, and for data processing, we need to use more data pipelines, aka ETL processes. If you develop more ETL processes, you need to test them. In addition, these are multi-year and some are even never-ending projects. Bingo! What a great opportunity for QA pros!

Application Testing vs. Data Testing

While traditional application testing focuses on verifying functionality, usability, and performance of software interfaces, data testing delves into validating the integrity, accuracy, and transformation of data within complex ETL pipelines.

Application testing is primarily focused on user interfaces and screens, whereas data processes operate without a visible UI and run as background tasks handling large volumes of information. Screen testing typically involves recording UI actions and replaying them using tools such as Selenium. Conversely, data testing requires large-scale comparison of input and output data, utilizing products like iceDQ to ensure accuracy and integrity.

Therefore, data testing necessitates a paradigm shift in perspective, competencies, and tools.

The Data Testing Opportunity

Now that we are aware of the opportunity, as QA professionals, what can we do?

It’s not that complicated. Most QA professionals working for large enterprises such as banks, insurance, healthcare, and manufacturing companies will already have multiple big data projects running in the organization. The managers and architects are already looking for talent to support these massive projects.

Here’s what you need to learn for data testing

Data Models and ERD Diagrams: The core of databases is understanding the data stored in the tables and columns, their relationships with each other, and the meaning of that data. Entity Relationship Diagrams (ERDs) visually represent these structures and are essential for understanding data flow and dependencies.

SQL: Once you understand the schema and structure, you should be able to query it, and this is where you need SQL. Mastering SELECT statements, JOINs, aggregations, and subqueries will allow you to validate data at every stage of the pipeline.

Data Mapping Documents: A critical artifact in ETL projects, data mapping documents trace how data moves from source to target systems. They define transformation rules, business logic, and data quality requirements. Learning to read and validate these documents is essential for effective data testing.

Learn ETL Concepts: Understanding the basics of data pipelines and ETL (Extract, Transform, Load) concepts is fundamental. Learn about data ingestion, transformation logic, error handling, data cleansing, and how to validate each stage of the process.

Business Concepts: While technical skills are important, understanding the business domain and how data supports business decisions is equally crucial. Learn the key performance indicators (KPIs), business rules, and regulatory requirements that drive data requirements in your industry.

Connect with Business Users: Build relationships with business analysts, data analysts, and end users who consume the data. They can provide invaluable context about what the data means, how it’s used, and what “quality” looks like from a business perspective.

ETL Testing Concepts: Master data testing techniques including data completeness checks, data accuracy validation, transformation logic verification, duplicate detection, referential integrity checks, and reconciliation between source and target systems.

Finally, a Data Testing Tool: Familiarize yourself with at least one data testing or data quality tool such as iceDQ, Informatica Data Quality, Great Expectations, dbt (data build tool), or even Python libraries like Pandas for custom validation scripts.

So, once you learn all these, what kind of projects can you get involved in?

  • Data Warehouse Projects: Testing dimensional models, fact and dimension tables, slowly changing dimensions, and ensuring accurate aggregations for analytics.
  • Data and Cloud Migration Projects: Validating that data migrates accurately from legacy systems to cloud platforms like AWS, Azure, or Google Cloud, with no data loss or corruption.
  • ERP and Platform Modernization: Testing data migration and integration when organizations upgrade to modern ERP systems like SAP S/4HANA, Oracle Cloud, or Microsoft Dynamics.
  • Big Data Projects: Working with technologies like Hadoop, Spark, and Kafka to validate massive data volumes, real-time streaming data, and distributed processing pipelines.
  • BI Reports and Analytics: Ensuring business intelligence dashboards and reports reflect accurate data by validating the underlying data pipelines and transformation logic.

Beyond Testing: The role of QA professionals in Production Environments

QA professionals often develop expertise in data testing, primarily within non-production settings. However, this knowledge can also lead to roles in production environments, such as Site Reliability Engineering (SRE) with a focus on data. Why is that?

The skills needed for data testing are essentially the same as those required for data quality assurance, only the environment changes. The rules applied during data testing can be integrated into production pipelines, allowing teams to monitor both data flows and overall data quality.

In such cases, who is more equipped with the knowledge and expertise? You are the QA professionals! Through your experience in data testing and quality assurance, you possess a unique understanding of both the technical requirements and the practical challenges involved in ensuring data accuracy. Your familiarity with data testing tools, validation processes, and best practices makes you the ideal candidate to manage data quality and integrity across AI and analytics initiatives.

Conclusion

Remember, you are not missing the bus but best positioned to benefit from the AI opportunities. Not everybody has to use AI to make decisions; you can position yourself as the person who feeds the AI beast. You essentially ensure that you feed quality data and help the organization achieve better outcomes and secure a valuable role for yourself as well. The future belongs to those who can ensure AI systems have the quality data they need to make reliable decisions. As a QA professional, that future can be yours.

02
[sibwp_form id=2]
The Test Tribe Logo
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.