ISTQB CT-AI: Introduction to AI and Quality Characteristics

Q: What's the difference between narrow AI and general AI?

Narrow AI (also called weak AI) is designed for specific tasks within limited domains, like image recognition, language translation, or playing chess. Every AI system deployed today is narrow AI. General AI (AGI) would possess human-level intelligence across all cognitive domains, able to learn any task and transfer knowledge between domains. AGI remains theoretical and doesn't exist. This distinction matters for testing because we're always testing narrow AI with specific capabilities and predictable limitations, not general-purpose intelligent systems.

Q: Why is explainability important for AI systems?

Explainability matters for several reasons. First, users need to understand AI recommendations to trust and act on them appropriately. Second, regulations like the EU AI Act and GDPR require explanations for certain automated decisions. Third, developers need interpretability to debug and improve models when they fail. Fourth, affected individuals have legal rights to explanations in many jurisdictions. Testing explainability involves verifying that explanations are provided when required, understandable to intended audiences, accurate reflections of actual model reasoning, and consistent for similar predictions.

Q: How does the EU AI Act categorize AI systems?

The EU AI Act uses a risk-based approach with four categories. Unacceptable risk systems are banned entirely, including social scoring and most real-time biometric surveillance. High risk systems face strict requirements including documentation, testing, and human oversight; examples include AI in hiring, credit decisions, and medical devices. Limited risk systems require transparency measures like disclosure that users are interacting with AI. Minimal risk systems have no special requirements; this includes most consumer applications and spam filters.

Q: What types of bias can affect AI systems?

Several bias types can affect AI systems. Selection bias occurs when training data doesn't represent the target population. Measurement bias involves systematic errors in data collection. Aggregation bias happens when model assumptions don't fit all subgroups equally. Evaluation bias occurs when benchmark data doesn't represent real-world usage. Deployment bias happens when systems are used differently than designed. Testing for bias requires evaluating training data representativeness, testing model performance across subgroups, checking for proxy discrimination through correlated features, and monitoring for bias in production.

Q: How do quality characteristic trade-offs work in AI systems?

AI systems often can't maximize all quality characteristics simultaneously. Common trade-offs include accuracy vs explainability (complex models are accurate but hard to explain), fairness vs accuracy (enforcing fairness constraints may reduce overall accuracy), and autonomy vs human oversight (more autonomous systems are efficient but reduce control). These aren't purely technical decisions; they involve business priorities, regulatory requirements, ethical considerations, and risk tolerance. Testers should understand these trade-offs, communicate them to stakeholders, and verify that chosen trade-offs are implemented correctly.

Q: What makes testing AI systems different from testing traditional software?

AI systems present unique testing challenges. The oracle problem makes determining correct answers difficult for subjective tasks. Non-determinism means the same input might produce different outputs across runs. Emergent behavior means systems can exhibit unprogrammed behaviors. Data acts as specification since training data effectively defines system behavior. These challenges require new testing approaches including metamorphic testing, statistical validation, and continuous monitoring. Testers also need to understand AI concepts to communicate with development teams and design appropriate test strategies.

Q: What is robustness in AI systems and why does it matter?

Robustness refers to an AI system's ability to maintain performance under varying conditions, including unexpected inputs and adversarial attacks. It matters because AI systems can fail unpredictably when encountering data different from training, small input changes can cause dramatic prediction errors, and adversarial attacks can intentionally fool systems. Testing robustness involves testing with noisy and corrupted inputs, testing with out-of-distribution data, performing adversarial testing, and evaluating graceful degradation when the system encounters situations it cannot handle well.

Q: How do I remember all the AI-specific quality characteristics for the exam?

Focus on understanding rather than memorizing. The key AI-specific characteristics are Explainability (can decisions be explained?), Fairness (is treatment equitable across groups?), Freedom from Bias (is the system free from systematic errors?), Transparency (are operations visible?), and Robustness (does performance hold under varying conditions?). For each, understand why it matters, how it's tested, and common trade-offs. Practice with scenario questions asking which characteristic applies to given situations. Remember that these characteristics supplement, not replace, traditional software quality characteristics.

Parul Dhingra13+ Years ExperienceHire Me

Senior Quality Analyst

Updated: 1/25/2026

Understanding what artificial intelligence actually is - versus what popular culture suggests - forms the foundation for everything else in AI testing. This article covers the first two chapters of the CT-AI syllabus: Introduction to AI and Quality Characteristics for AI-Based Systems.

These chapters establish essential concepts you'll build upon throughout the certification. You'll learn how to define AI precisely, distinguish between current AI capabilities and theoretical future developments, understand the regulatory landscape shaping AI development, and recognize the unique quality concerns AI systems present.

Table Of Contents-

Chapter 1: Introduction to AI

The first chapter of the CT-AI syllabus establishes foundational AI knowledge that testers need to communicate effectively with AI development teams and understand what they're testing.

Why Testers Need AI Fundamentals

You might wonder why testers need to understand AI theory at all. Can't we just test the system's behavior without understanding how it works internally?

For traditional software, black-box testing works well precisely because deterministic systems behave predictably. Given the same input, you expect the same output. Test oracles are straightforward: compare actual results to expected results.

AI systems break these assumptions. Understanding why helps you design effective tests:

Probabilistic outputs require different test strategies than deterministic ones
Training data influences system behavior in ways that requirements don't capture
Model architecture choices create specific failure modes you should test for
AI limitations help you identify what can't be tested traditionally

The goal isn't to make you an ML engineer. It's to give you sufficient knowledge to ask the right questions and design appropriate test strategies.

Defining Artificial Intelligence

The term "artificial intelligence" gets thrown around loosely in marketing and media. For testing purposes, you need precise definitions.

The ISTQB Definition

According to ISTQB, artificial intelligence refers to systems that display intelligent behavior by analyzing their environment and taking actions - with some degree of autonomy - to achieve specific goals.

Key elements of this definition:

Intelligent behavior: The system does something that appears intelligent - recognizing patterns, making predictions, understanding language, or making decisions.

Analyzing environment: AI systems perceive and process information from their surroundings, whether that's sensor data, text, images, or structured databases.

Autonomy: AI systems make decisions without explicit human instructions for each action. The degree of autonomy varies widely across systems.

Goal-oriented: AI systems work toward defined objectives, whether that's classifying images correctly, maximizing game scores, or minimizing delivery times.

AI vs Machine Learning vs Deep Learning

These terms are related but distinct:

Artificial Intelligence is the broadest category - any system exhibiting intelligent behavior. This includes rule-based expert systems, statistical models, and learning systems.

Machine Learning is a subset of AI where systems learn patterns from data rather than following explicitly programmed rules. Instead of telling the system exactly what to do, you provide examples and let it discover patterns.

Deep Learning is a subset of machine learning using neural networks with multiple layers. It's particularly effective for complex pattern recognition tasks like image and speech recognition.

AI (broadest)
  └── Machine Learning
        └── Deep Learning

For CT-AI, most content focuses on machine learning systems since they're the most common AI applications today.

Exam Tip: Be precise about terminology. Questions may test whether you understand that all ML is AI, but not all AI is ML. Similarly, deep learning is one type of machine learning, not a separate category.

Narrow AI vs General AI

One of the most important distinctions in AI is between narrow and general artificial intelligence.

Narrow AI (Weak AI)

Narrow AI systems are designed to perform specific tasks within limited domains. Every AI system deployed today is narrow AI:

Image classifiers that identify objects in photos
Speech recognition systems that transcribe audio
Recommendation engines that suggest products or content
Language models that generate or translate text
Game-playing systems that master specific games
Fraud detection systems that identify suspicious transactions

Characteristics of narrow AI:

Excels at specific tasks, often surpassing human performance
Cannot transfer learning to different domains
Doesn't understand context beyond training
Has no general reasoning ability
Cannot set its own goals

A chess AI that beats world champions cannot play checkers without complete retraining. A translation system doesn't understand meaning - it recognizes patterns linking text in different languages.

General AI (Strong AI / AGI)

Artificial General Intelligence would possess human-level intelligence across all cognitive domains. It would:

Learn any task a human can learn
Transfer knowledge between domains
Reason about abstract concepts
Understand context and meaning
Potentially set its own goals

AGI does not exist today. Despite media hype and science fiction, no current system approaches general intelligence. This distinction matters for testers because:

We're always testing narrow AI with specific capabilities and limitations
Test strategies must account for the narrow scope of what systems can actually do
Claims about AI capabilities should be evaluated against narrow AI realities

Why This Distinction Matters for Testing

When testing narrow AI, you must understand what the system is designed to do and what lies outside its scope. A narrow AI will fail - often unpredictably - when encountering situations outside its training domain.

Effective AI testing includes:

Testing within the intended domain (does it work as designed?)
Testing at domain boundaries (how does it handle edge cases?)
Testing outside the domain (how does it fail when misused?)

The AI Landscape Today

Understanding current AI capabilities and applications helps contextualize what you're likely to test.

Common AI Applications

Computer Vision

Object detection and recognition
Facial recognition
Medical image analysis
Quality inspection in manufacturing
Autonomous vehicle perception

Natural Language Processing

Text classification and sentiment analysis
Machine translation
Chatbots and virtual assistants
Document summarization
Named entity recognition

Predictive Analytics

Demand forecasting
Churn prediction
Fraud detection
Risk assessment
Maintenance prediction

Recommendation Systems

Product recommendations
Content suggestions
Ad targeting
Personalization engines

Robotics and Automation

Autonomous vehicles
Warehouse robots
Industrial automation
Surgical robots

AI Limitations

Understanding AI limitations is as important as understanding capabilities for effective testing:

Data dependency: AI systems are only as good as their training data. Biased, incomplete, or low-quality data produces biased, incomplete, or low-quality models.

Brittleness: AI systems can fail unexpectedly with inputs that differ from training data, even when those inputs seem similar to humans.

Lack of common sense: AI systems don't have general world knowledge. They can make errors that seem obvious to humans.

Opacity: Many AI models, especially deep learning, are difficult to interpret. Understanding why a model made a specific decision is challenging.

Adversarial vulnerability: Carefully crafted inputs can fool AI systems while appearing normal to humans.

AI Regulations and Standards

The regulatory landscape for AI is evolving rapidly. Testers need awareness of regulations affecting the systems they test.

EU AI Act

The European Union's AI Act is the most comprehensive AI regulation to date. It takes a risk-based approach:

Unacceptable Risk (Banned)

Social scoring by governments
Real-time biometric identification in public spaces (with limited exceptions)
Manipulation systems that cause harm

High Risk (Strict Requirements)

Safety components in regulated products
Critical infrastructure management
Educational and vocational training
Employment and worker management
Essential services access
Law enforcement applications
Migration and border control
Justice administration

Limited Risk (Transparency Requirements)

Chatbots (must disclose AI interaction)
Emotion recognition systems
Biometric categorization
Deepfakes and synthetic content

Minimal Risk (No Special Requirements)

AI-enabled video games
Spam filters
Most consumer applications

For high-risk systems, the AI Act requires:

Risk management systems
Data governance
Technical documentation
Record-keeping
Transparency to users
Human oversight
Accuracy, robustness, and cybersecurity

Exam Tip: You don't need to memorize every detail of the AI Act, but understand the risk-based approach and know that high-risk systems face significant testing and documentation requirements.

Other Regulatory Frameworks

IEEE Standards IEEE has developed several AI-related standards including:

IEEE 7000: Addressing ethical concerns
IEEE 7001: Transparency of autonomous systems
IEEE 7002: Data privacy
IEEE 7010: Well-being metrics

ISO/IEC Standards

ISO/IEC 22989: AI concepts and terminology
ISO/IEC 23053: ML framework
ISO/IEC 42001: AI management systems

Industry-Specific Regulations

Medical AI: FDA regulations, MDR in EU
Financial AI: Model risk management requirements
Automotive AI: Safety standards like ISO 26262

Implications for Testing

Regulations create testing requirements:

Documentation: Tests must be documented to demonstrate compliance
Bias testing: High-risk systems require fairness evaluation
Transparency testing: Systems must provide explanations where required
Robustness testing: Systems must handle errors gracefully
Security testing: Protection against adversarial attacks
Human oversight: Systems must support human intervention

How AI Changes the Tester's Role

AI fundamentally changes what testers do and how they do it.

New Testing Challenges

The Oracle Problem Traditional testing compares actual results to expected results. For AI systems, determining the "correct" answer is often impossible:

What's the correct sentiment of a nuanced review?
What's the correct translation of an ambiguous phrase?
What's the correct next frame an autonomous vehicle should predict?

Testers must develop alternative approaches when exact expected results aren't available.

Non-Determinism The same input might produce different outputs across runs due to:

Random initialization in training
Floating-point precision differences
Model updates in production
Environmental factors

Testing strategies must accommodate acceptable variation.

Emergent Behavior AI systems can exhibit behaviors not explicitly programmed. Testing must explore what the system might do, not just what it was designed to do.

Data as Specification Training data effectively specifies AI behavior. Testers need to evaluate data quality, not just code quality.

New Testing Skills

CT-AI prepares testers to:

Understand AI/ML concepts to communicate with development teams
Evaluate training data quality and representativeness
Design tests for non-deterministic systems
Test AI-specific quality characteristics
Use specialized AI testing techniques
Evaluate AI-powered testing tools

Collaboration with AI Teams

Testers working with AI systems interact with new roles:

Data Scientists: Develop models and analyze data. Testers provide feedback on model performance in realistic scenarios.

ML Engineers: Build and deploy ML systems. Testers verify end-to-end system behavior, not just model accuracy.

Data Engineers: Manage data pipelines. Testers evaluate data quality and pipeline reliability.

Domain Experts: Provide business knowledge. Testers help translate domain requirements into testable criteria.

Chapter 2: Quality Characteristics for AI Systems

The second chapter extends software quality concepts to AI-specific concerns.

Traditional Quality Characteristics Applied to AI

ISO 25010 defines standard software quality characteristics. These still apply to AI systems but require adaptation:

Functional Suitability

Does the system do what it should?

For AI systems, this means:

Does the model achieve its intended purpose?
Does accuracy meet requirements across relevant scenarios?
Are predictions useful for the business problem?

Performance Efficiency

How well does the system use resources?

For AI systems:

Inference time (how fast are predictions?)
Training time (how long to develop/update models?)
Resource consumption (memory, compute, energy)
Scalability under load

Compatibility

Does the system work with other systems?

For AI systems:

Integration with data pipelines
API compatibility
Model format standards
Interoperability with existing systems

Usability

Can users effectively use the system?

For AI systems:

Are outputs understandable?
Can users provide feedback for improvement?
Are confidence levels communicated appropriately?
Does the interface support appropriate trust calibration?

Reliability

Does the system perform consistently?

For AI systems:

How consistent are predictions?
How does the system handle unexpected inputs?
What happens when the model encounters data drift?
How does the system degrade under stress?

Security

Is the system protected from threats?

For AI systems:

Model theft protection
Adversarial attack resistance
Training data privacy
Secure model deployment

Maintainability

Can the system be modified effectively?

For AI systems:

Model retraining processes
Version control for models and data
Monitoring and alerting
Debugging and diagnosis capabilities

Portability

Can the system move between environments?

For AI systems:

Model format portability
Environment independence
Hardware compatibility
Deployment flexibility

AI-Specific Quality Characteristics

Beyond traditional characteristics, AI systems have unique quality concerns.

Explainability

Definition: The degree to which a system can explain how and why it reached a particular decision or prediction.

Why it matters:

Users need to understand recommendations to trust and act on them
Regulators require explanations for certain high-risk decisions
Developers need interpretability to debug and improve models
Affected individuals have rights to explanations in many jurisdictions

Levels of explainability:

Global: Overall model behavior patterns
Local: Explanation for specific predictions
Intrinsic: Models that are inherently interpretable (decision trees, linear models)
Post-hoc: Explanations generated after predictions (LIME, SHAP)

Testing explainability:

Are explanations provided when required?
Are explanations understandable to intended audiences?
Are explanations accurate (do they reflect actual model reasoning)?
Are explanations consistent for similar predictions?

Fairness

Definition: The degree to which a system treats different groups equitably and avoids unjust discrimination.

Why it matters:

Legal requirements prohibit discrimination in many domains
Ethical obligations to treat people fairly
Business risk from reputational damage
Actual harm to individuals from biased decisions

Types of fairness:

Individual fairness: Similar individuals receive similar treatment
Group fairness: Different demographic groups receive comparable outcomes
Statistical parity: Equal positive prediction rates across groups
Equalized odds: Equal true positive and false positive rates across groups

Testing fairness:

Define protected attributes and relevant fairness metrics
Measure model performance across different groups
Test for proxy discrimination (bias through correlated features)
Evaluate historical bias in training data

Exam Tip: Understand that different fairness definitions can conflict with each other. A model can't always satisfy all fairness criteria simultaneously. Knowing which fairness metric applies requires understanding the specific context and requirements.

Freedom from Bias

Definition: The degree to which a system is free from systematic errors that favor certain outcomes.

Bias types:

Selection bias: Training data doesn't represent the target population
Measurement bias: Systematic errors in data collection
Aggregation bias: Model assumptions don't fit all subgroups
Evaluation bias: Benchmark data doesn't represent real-world usage
Deployment bias: System used differently than designed

Testing for bias:

Evaluate training data representativeness
Test model performance on subgroups
Compare predictions against unbiased baselines
Monitor for bias in production data

Transparency

Definition: The degree to which system operations, including inputs, outputs, and processes, are visible and understandable.

Components of transparency:

What data was used to train the model?
What features influence predictions?
How was the model validated?
What are the known limitations?
Who is responsible for the system?

Testing transparency:

Is documentation complete and accurate?
Are data sources identified?
Are model limitations communicated?
Can users access relevant information about the system?

Robustness

Definition: The degree to which a system maintains performance under varying conditions, including unexpected inputs and adversarial attacks.

Robustness concerns:

Input variation: How does the system handle noisy or unusual inputs?
Distribution shift: How does performance change when data differs from training?
Adversarial inputs: Can crafted inputs fool the system?
Edge cases: How does the system handle boundary conditions?

Testing robustness:

Test with noisy and corrupted inputs
Test with out-of-distribution data
Perform adversarial testing
Evaluate graceful degradation

Autonomy

Definition: The degree to which a system can operate and make decisions without human intervention.

Testing autonomy:

What decisions does the system make independently?
When does it escalate to humans?
Can humans override system decisions?
Are autonomy boundaries appropriate for the risk level?

Balancing Quality Characteristics

Quality characteristics often trade off against each other. Understanding these trade-offs is essential for testers.

Common Trade-offs

Accuracy vs Explainability More complex models (deep learning) often achieve higher accuracy but are harder to explain. Simpler models (decision trees) are interpretable but may sacrifice accuracy.

Fairness vs Accuracy Enforcing fairness constraints may reduce overall accuracy. The model performs worse overall to ensure equitable treatment across groups.

Performance vs Robustness Highly optimized models may be more vulnerable to adversarial attacks. Robustness techniques often add computational overhead.

Autonomy vs Human Oversight More autonomous systems are efficient but reduce human control. Less autonomous systems are safer but may be slower or more expensive.

Making Trade-off Decisions

Testers should understand that quality decisions aren't purely technical. They involve:

Business priorities and risk tolerance
Regulatory requirements
Ethical considerations
User needs and expectations
Cost and resource constraints

Your role is to:

Test relevant quality characteristics
Communicate trade-offs to stakeholders
Provide evidence for decision-making
Verify that chosen trade-offs are implemented correctly

Connecting to Other CT-AI Topics

Chapters 1 and 2 provide foundations that later chapters build upon:

Machine learning overview (Chapter 3) expands on AI concepts introduced here
Testing AI-specific characteristics (Chapter 8) operationalizes the quality characteristics
Methods and techniques (Chapter 9) provides concrete approaches for testing these characteristics
Regulations influence what quality characteristics are mandatory versus optional

Master these foundational chapters before moving to more specialized testing content.

Test Your Knowledge

Quiz on CT-AI Introduction to AI

Your Score: 0/10

Question: According to the ISTQB definition, which of the following is NOT a key characteristic of artificial intelligence?

Displays intelligent behaviorAchieves 100% accuracy on all tasksTakes actions with some degree of autonomyAnalyzes its environment

Frequently Asked Questions

Frequently Asked Questions (FAQs) / People Also Ask (PAA)

What's the difference between narrow AI and general AI?

Why is explainability important for AI systems?

How does the EU AI Act categorize AI systems?

What types of bias can affect AI systems?

How do quality characteristic trade-offs work in AI systems?

What makes testing AI systems different from testing traditional software?

What is robustness in AI systems and why does it matter?

How do I remember all the AI-specific quality characteristics for the exam?

Complete Certification Guide Quality Characteristics for AI Systems