ISTQB CT-GenAI Prompt Engineering: Mastering AI Prompts for Software Testing

Q: What is the CRISP framework in prompt engineering?

CRISP is a framework for structuring effective prompts, standing for Context, Role, Instructions, Scope, and Personalization. Context provides background about your situation and system. Role tells the AI what perspective or expertise to apply. Instructions give clear directions about what you want. Scope defines boundaries and limitations. Personalization specifies format, tone, and style preferences. Using CRISP ensures prompts include essential components that guide AI toward useful, targeted outputs rather than generic responses.

Q: How many iterations should I expect when refining prompts?

The number of iterations varies based on task complexity and initial prompt quality. Simple tasks with clear requirements might need 1-2 iterations. Complex tasks requiring specific formats or covering multiple scenarios often need 3-5 iterations. If you're consistently needing more than 5 iterations, your initial prompt may need fundamental restructuring rather than incremental refinement. With practice, you'll develop intuition for creating better initial prompts that require fewer refinement cycles. The goal is useful output, not a specific iteration count.

Q: Should I always use the full CRISP framework for every prompt?

No, not every prompt needs all CRISP components explicitly stated. Simple, straightforward tasks may only need clear Instructions. Complex or specialized tasks benefit from the full framework. Context is almost always valuable for testing tasks because AI doesn't know your system. Role helps when you need specific expertise or perspective. Scope becomes important for complex tasks where AI might go off track. Personalization matters when output format affects usability. Use judgment about which components add value for your specific situation.

Q: How do I handle prompts that exceed context window limits?

When prompts exceed context limits: First, prioritize information by importance, cutting lower-priority details. Second, summarize lengthy content rather than including full text. Third, break large tasks into smaller, focused prompts that can be handled separately. Fourth, extract only relevant sections from long documents rather than pasting everything. Fifth, use progressive disclosure where you establish context in one prompt, then reference it in follow-ups. The key is providing essential information within limits while maintaining enough context for useful outputs.

Q: Why do the same prompts sometimes produce different results?

AI outputs are non-deterministic due to probabilistic token selection. Even identical prompts can produce different responses because the model samples from probability distributions rather than selecting deterministically. Temperature settings amplify or reduce this variation. For testing, this means you can't expect identical outputs from repeated prompts. Strategies for managing this include: treating outputs as drafts requiring review, generating multiple versions and selecting the best, using lower temperature settings when available, and saving outputs rather than assuming you can regenerate them identically.

Q: What's the difference between Context and Scope in CRISP?

Context provides background information about your situation, system, and constraints. It answers 'What am I working with?' Examples: the technology stack, testing level, feature description, existing constraints. Scope defines boundaries for the AI's response. It answers 'What should you include or exclude?' Examples: 'Focus only on API testing,' 'Generate exactly 10 test cases,' 'Don't include UI scenarios.' Context sets the stage; Scope draws the boundaries. Both are important but serve different purposes in guiding AI responses.

Q: How do I prompt AI to generate test automation code that follows our team's conventions?

To generate code following your conventions: First, describe your conventions explicitly in the prompt (naming patterns, project structure, preferred approaches). Second, provide examples of existing code that demonstrates your style. Third, specify the framework and libraries you use. Fourth, include constraints like 'Use explicit waits, not implicit' or 'Follow Page Object Model pattern.' Fifth, reference any coding standards or style guides. Few-shot learning is particularly effective here - showing an example of code that follows your conventions helps AI replicate that style more reliably than describing conventions in abstract terms.

Q: When should I use few-shot learning in prompts?

Use few-shot learning when: the desired output format is specific or unusual; when describing what you want is harder than showing it; when you need consistent structure across multiple outputs; or when quality of initial outputs is poor despite clear instructions. Provide 1-3 examples that demonstrate the exact structure, detail level, and style you want. Few-shot learning is especially effective for test case formats, code patterns, and documentation structures. The examples serve as templates that AI can pattern-match rather than interpret abstract descriptions.

Parul Dhingra13+ Years ExperienceHire Me

Senior Quality Analyst

Updated: 1/25/2026

Prompt engineering is the highest-weighted topic on the CT-GenAI exam, representing approximately 30% of questions. This reflects a practical reality: the quality of AI outputs depends almost entirely on how you communicate with these systems. Vague prompts produce vague results. Precise, well-structured prompts generate useful testing artifacts.

This isn't about memorizing magic phrases. Prompt engineering is a systematic approach to communicating with AI systems to get reliable, useful outputs for testing activities. You'll learn frameworks for structuring prompts, patterns specific to testing tasks, techniques for refining outputs, and strategies for managing context effectively.

Whether you're generating test cases with ChatGPT, writing automation scripts with Claude, or creating test data with any AI assistant, these techniques apply. The principles are tool-agnostic, which is exactly how CT-GenAI approaches the topic.

Table Of Contents-

Why Prompt Engineering Matters

The difference between a frustrating AI interaction and a productive one usually comes down to the prompt. Consider these two approaches to generating test cases:

Poor prompt: "Write test cases for login"

Better prompt: "Generate 10 functional test cases for a web application login feature. The login accepts email and password, has a 'Remember me' checkbox, and includes a 'Forgot password' link. Include positive tests for successful login, negative tests for invalid credentials, and boundary tests for input validation. Format each test case with: ID, Title, Preconditions, Steps, Expected Result."

The first prompt might generate generic test cases that don't match your system. The second provides context, constraints, and format requirements that guide the AI toward useful outputs.

The Skill Gap

Many testers treat AI tools like search engines, typing brief queries and expecting perfect results. This approach fails because:

AI doesn't know your system: Without context, it generates generic outputs based on common patterns.

AI doesn't read your mind: Ambiguous instructions lead to outputs that may not match your intent.

AI needs structure: Without format guidance, outputs vary unpredictably.

AI benefits from examples: Showing what you want is often clearer than describing it.

Prompt engineering closes this gap by systematically providing the context, instructions, and structure AI needs to generate valuable outputs.

Why CT-GenAI Emphasizes This Topic

The exam weights prompt engineering heavily because:

Practical impact: Better prompts directly translate to more useful AI assistance in daily testing work.

Foundation for other topics: Effective prompting underlies all AI applications in testing, whether generating test cases, automation code, or documentation.

Measurable skill: Unlike abstract concepts, prompt engineering quality is observable in outputs.

Risk mitigation: Good prompts reduce hallucinations, irrelevant outputs, and wasted effort.

Exam Tip: Questions about prompt engineering often present scenarios with problematic AI outputs and ask you to identify what's wrong with the prompt or how to improve it. Think systematically about what context, instructions, or structure might be missing.

Anatomy of an Effective Prompt

Before diving into frameworks, understand the components that make prompts effective.

Context

Context tells the AI about your situation, system, or constraints. Without context, AI falls back on generic patterns that may not fit your needs.

Context examples:

"I'm testing an e-commerce checkout flow for a mobile app..."
"Our system uses a PostgreSQL database with the following schema..."
"The API endpoint accepts JSON and returns XML..."

Context grounds AI responses in your specific situation rather than generic possibilities.

Clear Instructions

Instructions tell the AI exactly what you want. Vague instructions produce vague outputs.

Weak instruction: "Help with testing" Strong instruction: "Generate negative test cases that verify error handling when the API receives malformed JSON requests"

Strong instructions are specific about:

What type of output you want
What scope to cover
What to include or exclude
What quality characteristics matter

Output Format

Format specifications ensure AI outputs are immediately usable rather than requiring restructuring.

Format specifications:

"Present as a numbered list"
"Use a table with columns: Test ID, Description, Input, Expected Output"
"Format as Gherkin scenarios with Given/When/Then"
"Write as executable Python code with pytest"

When you don't specify format, AI makes assumptions that may not match your needs.

Examples (Few-Shot Learning)

Providing examples of what you want is often more effective than describing it. This technique is called "few-shot learning."

Example-based prompt: "Generate test cases following this format:

Example: TC001: Verify successful login with valid credentials Precondition: User account exists with email test@example.com Steps:

Navigate to login page
Enter email: test@example.com
Enter valid password
Click Login button Expected: User is redirected to dashboard

Now generate 5 similar test cases for the password reset feature."

The example demonstrates the exact structure and detail level you want.

Constraints and Boundaries

Constraints prevent AI from going off track or producing unsuitable content.

Constraint examples:

"Generate exactly 10 test cases, no more"
"Focus only on API testing, not UI"
"Don't include any tests requiring external services"
"Use only standard library functions, no external dependencies"

Constraints narrow the solution space, reducing irrelevant outputs.

The CRISP Framework

CT-GenAI introduces the CRISP framework for structuring prompts. This mnemonic helps ensure prompts include essential components.

C - Context

Provide background about your situation, system, technology stack, or constraints.

Questions to answer:

What system or feature are you testing?
What technologies are involved?
What's the testing context (unit, integration, system, acceptance)?
What constraints or limitations exist?

Example context: "I'm testing a REST API for a banking application built with Spring Boot. The API handles account transactions and requires OAuth2 authentication. We're conducting integration testing before production deployment."

R - Role

Tell the AI what perspective or expertise to apply. Role assignment influences the style, depth, and focus of responses.

Role examples:

"Act as a senior QA engineer with expertise in API testing"
"You are a security testing specialist"
"Respond as a test automation architect"

Roles help AI calibrate its responses to appropriate expertise levels and perspectives.

I - Instructions

Provide clear, specific directions about what you want the AI to do.

Instruction components:

Action verb (generate, analyze, create, review, explain)
Object (test cases, automation script, test data, defect report)
Qualifiers (comprehensive, focused, detailed, high-level)

Example instructions: "Generate comprehensive functional test cases covering all CRUD operations for the account transactions endpoint. Include both positive and negative scenarios with emphasis on error handling and boundary conditions."

S - Scope

Define boundaries, limitations, and what to exclude. Scope prevents AI from going beyond what's needed or relevant.

Scope elements:

What to include
What to exclude
Priority or focus areas
Quantity limits

Example scope: "Focus on the create and update operations only. Don't include delete testing as that's handled separately. Generate exactly 15 test cases. Prioritize validation and business rule testing over UI aspects."

P - Personalization

Specify format, tone, style, and presentation preferences.

Personalization options:

Output format (list, table, code, prose)
Detail level (summary, detailed, comprehensive)
Language style (technical, business-friendly)
Specific templates or conventions

Example personalization: "Format each test case as a table row with columns: ID, Title, Preconditions, Test Steps, Expected Result, Priority. Use technical language appropriate for developer review. Number test cases sequentially starting from TC-AUTH-001."

Complete CRISP Example

Here's a complete prompt using all CRISP components:

Context: "I'm testing a user authentication module for a web application. The module handles login, logout, password reset, and session management. It uses JWT tokens for session handling and bcrypt for password hashing."

Role: "Act as a senior test analyst with expertise in security testing."

Instructions: "Generate comprehensive test cases for the password reset functionality. Include positive tests for successful password reset flow, negative tests for invalid inputs and error handling, security tests for common vulnerabilities, and edge cases for boundary conditions."

Scope: "Focus only on the password reset feature, not login or session management. Generate 12-15 test cases. Don't include performance testing scenarios. Prioritize security-related test cases."

Personalization: "Format as a numbered list with each test case containing: Title, Type (Positive/Negative/Security/Edge), Preconditions, Steps, Expected Result. Use Gherkin-style Given/When/Then for the steps."

Exam Tip: CT-GenAI exam questions may ask you to identify which CRISP component is missing from a prompt, or which component would most improve a given prompt. Practice analyzing prompts for completeness.

Prompt Patterns for Test Case Generation

Test case generation is one of the most common AI applications in testing. These patterns produce consistently useful results.

Pattern 1: Requirements-Based Generation

Start with requirements and generate test cases that verify them.

Context: [Paste or describe the requirement/user story]

Generate test cases to verify this requirement is correctly implemented.

For each test case, include:
- Test ID
- Test objective (what aspect of the requirement this verifies)
- Preconditions
- Test steps
- Expected results
- Traceability to requirement

Generate tests covering:
- Positive scenarios (happy path)
- Negative scenarios (invalid inputs, error conditions)
- Boundary values
- Edge cases

Pattern 2: Feature-Based Exploration

When you need comprehensive coverage of a feature without specific requirements.

Feature: [Describe the feature and its functionality]

Generate a comprehensive test suite for this feature covering:

Functional Testing:
- Core functionality verification
- Input validation
- Output verification
- State transitions

Error Handling:
- Invalid input responses
- Error message accuracy
- Recovery scenarios

Boundary Testing:
- Minimum and maximum values
- Empty and null inputs
- Length limits

Integration Points:
- Dependencies on other features
- External system interactions

Format each test case with: ID, Category, Description, Steps, Expected Result

Pattern 3: Scenario-Based Testing

Generate tests from user scenarios or workflows.

User Scenario: [Describe the user journey or workflow]

Generate test cases covering this user scenario including:

1. Main flow test cases (happy path through the scenario)
2. Alternative flow test cases (valid variations)
3. Exception flow test cases (error paths and recovery)
4. Interruption tests (what happens if the flow is interrupted)

For each test case, specify:
- Scenario path being tested
- User actions
- System responses
- Final state verification

Pattern 4: Risk-Based Test Case Generation

Focus test generation on high-risk areas.

Context: [Describe the system and its risk profile]

Known Risk Areas:
- [List high-risk areas or concerns]

Generate test cases prioritized by risk, focusing on:
1. Critical business functions
2. Security-sensitive operations
3. Data integrity scenarios
4. Performance-critical paths
5. Previously defect-prone areas

For each test case, include a risk justification explaining why this test case is important from a risk perspective.

Prompt Patterns for Test Automation

AI can assist with automation script generation, but requires careful prompting to produce maintainable code.

Pattern 1: Page Object Generation

Generate page object classes for web automation.

Context: I'm building a Selenium WebDriver test automation framework using Python with the Page Object Model pattern.

Page Details:
- Page name: [Name]
- URL: [URL]
- Key elements: [List elements with their purposes]

Generate a Page Object class that includes:
1. Locators as class attributes (use CSS selectors where possible)
2. Constructor with WebDriver initialization
3. Methods for each user action on this page
4. Wait mechanisms for dynamic elements
5. Return appropriate page objects for navigation methods

Follow these conventions:
- Use explicit waits, not implicit waits or sleep
- Include docstrings for all methods
- Use meaningful method names that describe the action
- Handle common exceptions

Pattern 2: Test Script Generation

Generate executable test scripts.

Context: [Framework and language details]

Test Scenario: [Describe what to test]

Test Data:
- Valid inputs: [List]
- Invalid inputs: [List]

Generate a test script that:
1. Sets up test preconditions
2. Executes the test steps
3. Includes appropriate assertions
4. Handles cleanup in teardown
5. Uses parameterization for multiple data sets

Include:
- Descriptive test method names
- Comments explaining complex logic
- Proper assertion messages
- Logging for debugging

Avoid:
- Hardcoded waits (use explicit waits)
- Hardcoded test data in the script
- Tightly coupled locators

Pattern 3: API Test Generation

Generate API test cases with code.

API Endpoint: [Method] [URL]

Request Format:
[Paste example request body or describe structure]

Response Format:
[Paste example response or describe structure]

Generate API tests covering:
1. Successful requests with valid data
2. Validation errors for each required field
3. Authentication/authorization scenarios
4. Response structure verification
5. Status code verification

Use [framework, e.g., requests + pytest] and include:
- Parameterized tests for multiple scenarios
- Helper functions for common operations
- Assertions for status codes, response structure, and business logic
- Meaningful test names and descriptions

Pattern 4: Test Refactoring

Improve existing automation code.

Here's my current test code:

[Paste existing code]

Refactor this code to:
1. Improve maintainability
2. Add proper error handling
3. Implement appropriate wait strategies
4. Extract reusable components
5. Add meaningful logging
6. Follow [framework] best practices

Explain each significant change and why it improves the code.

Prompt Patterns for Test Data Creation

Generating test data with AI saves time while ensuring variety and coverage.

Pattern 1: Structured Data Generation

Generate data conforming to a schema.

Data Schema:
[Describe or paste the data structure]

Business Rules:
- [List validation rules and constraints]

Generate [number] test data records including:
1. Valid data covering typical scenarios
2. Boundary value data (min/max for each field)
3. Edge case data (special characters, Unicode, etc.)
4. Invalid data for negative testing

Output as [format: JSON, CSV, SQL INSERT statements, etc.]

Label each record with its purpose (e.g., "valid_typical", "boundary_max_length", "invalid_missing_required")

Pattern 2: Realistic Data Sets

Generate data that looks realistic for demos or testing.

Generate realistic test data for a [domain, e.g., e-commerce, healthcare, banking] application.

Entity: [Describe the entity]

Requirements:
- Generate [number] records
- Data should appear realistic, not obviously fake
- Include variety in [specific fields]
- Respect these relationships: [describe relationships]

Constraints:
- [List business rules]
- [Date ranges]
- [Value ranges]

Output format: [Specify format]

Pattern 3: Edge Case Data

Generate data specifically for edge case testing.

For the following data fields, generate edge case test values:

Fields:
- [Field name]: [Type, constraints]
- [Field name]: [Type, constraints]

For each field, generate values for:
- Minimum valid value
- Maximum valid value
- Just below minimum (invalid)
- Just above maximum (invalid)
- Empty/null (if applicable)
- Special characters
- Unicode characters
- Extremely long values
- Injection attempt values (for security testing)

Format as a table with columns: Field, Test Case Type, Value, Expected Behavior

Prompt Patterns for Defect Management

AI can improve defect reporting quality and assist with defect analysis.

Pattern 1: Defect Report Enhancement

Improve a brief defect description into a comprehensive report.

Initial Defect Information:
[Paste brief description or notes]

Expand this into a comprehensive defect report including:

1. Summary: Clear, concise one-line description
2. Description: Detailed explanation of the issue
3. Steps to Reproduce: Numbered, precise steps
4. Expected Result: What should happen
5. Actual Result: What actually happens
6. Environment: System/browser/version details
7. Severity: [Suggest based on impact]
8. Priority: [Suggest based on urgency]
9. Additional Information: Screenshots, logs, related issues

Make the report clear enough that any developer could reproduce the issue without additional clarification.

Pattern 2: Defect Pattern Analysis

Analyze a set of defects for patterns.

Here are recent defects from our project:

[Paste defect summaries or descriptions]

Analyze these defects and identify:
1. Common root cause patterns
2. Modules or features with clustering
3. Defect type distribution
4. Potential systemic issues
5. Testing gaps that might explain why these escaped detection

Provide recommendations for:
- Improving test coverage
- Process improvements
- Areas needing focused attention

Pattern 3: Root Cause Hypothesis

Help investigate a defect's root cause.

Defect Description: [Describe the defect]

System Context: [Relevant technical context]

Observed Behavior: [What happens]

Expected Behavior: [What should happen]

What I've Checked:
- [List investigation steps taken]

Based on this information, provide:
1. Potential root cause hypotheses (ranked by likelihood)
2. Additional diagnostic steps for each hypothesis
3. Questions to ask the development team
4. Related areas that might also be affected

Iterative Refinement Techniques

Initial AI outputs rarely match your needs perfectly. Iterative refinement improves results systematically.

The Refinement Cycle

Generate: Create initial output with your prompt
Evaluate: Assess output against your needs
Identify gaps: Note what's missing, wrong, or suboptimal
Refine: Modify prompt to address gaps
Regenerate: Get improved output
Repeat: Continue until output meets needs

Refinement Prompt Patterns

Adding missing coverage: "The test cases you generated don't cover [specific scenario]. Add test cases for [missing scenario] while keeping the existing ones."

Correcting errors: "Test case TC-005 references [incorrect element]. This feature actually uses [correct element]. Please correct this and any similar errors."

Changing format: "Convert these test cases to Gherkin format using Given/When/Then structure while preserving the test coverage."

Adjusting detail level: "These test cases are too high-level. Expand each test case to include specific input values, exact navigation steps, and precise expected values."

Narrowing scope: "Focus only on error handling scenarios. Remove the positive test cases and expand the negative and edge case coverage."

Building on Previous Outputs

When refining, reference what AI already produced:

"You generated 10 test cases. Now add 5 more test cases specifically covering security scenarios like session timeout, invalid token handling, and privilege escalation attempts. Use the same format as the previous test cases."

When to Start Over

Sometimes refinement isn't working. Start fresh when:

Fundamental misunderstanding of the requirement
Wrong technical approach in generated code
Format too different from what you need
Accumulated errors making the output confusing

A new prompt with lessons learned often works better than extensive refinement.

Context Management Strategies

AI context windows are limited. Managing context effectively maximizes the value you get from each interaction.

Prioritizing Information

Include information in order of importance:

Essential context: Information required for correct outputs
Guiding examples: Examples that demonstrate what you want
Constraints: Important limitations or requirements
Nice-to-have details: Helpful but not critical information

If you hit context limits, cut from the bottom of this priority list.

Summarization Techniques

For long conversations, periodically summarize:

"So far we've established:

The system uses [technology stack]
We need to test [features]
The test cases should [format/approach]

Now let's continue with [next task]."

This preserves essential context while freeing space for new content.

Chunking Large Tasks

Break large requests into smaller pieces:

Instead of: "Generate a complete test suite for the entire application"

Use:

"Generate test cases for the user registration module"
"Now generate test cases for the login module"
"Next, generate test cases for the profile management module"

Each chunk fits within context limits while building toward comprehensive coverage.

Providing Focused Context

Rather than pasting entire documents, extract relevant sections:

Instead of: [Entire 50-page requirements document]

Use: "Here's the specific requirement for the feature we're testing: [relevant excerpt]"

Focused context produces more targeted outputs.

Exam Tip: Questions about context management often test whether you understand that information outside the context window is inaccessible, and that strategic information prioritization improves results.

Common Prompt Engineering Mistakes

Learn from common errors to avoid them in your practice and on the exam.

Mistake 1: Vague Instructions

Problem: "Help me with testing" Issue: No specific direction about what help is needed Fix: "Generate 10 functional test cases for the user registration feature covering input validation and successful registration flow"

Mistake 2: Missing Context

Problem: "Write test cases for the checkout" Issue: AI doesn't know what your checkout does Fix: Provide context about your specific checkout: payment methods, shipping options, user types, etc.

Mistake 3: No Output Format

Problem: "Give me some test cases" Issue: Output format varies unpredictably Fix: "Format each test case with: ID, Title, Preconditions, Steps, Expected Result as a table"

Mistake 4: Overloading Single Prompts

Problem: "Generate test cases, automation scripts, test data, and a test plan for the entire application" Issue: Too much for one response; quality suffers Fix: Break into separate, focused requests

Mistake 5: Accepting First Output

Problem: Using AI output without review or refinement Issue: Missing errors, hallucinations, or misalignments Fix: Always review critically and refine as needed

Mistake 6: Ignoring Hallucination Risk

Problem: Trusting AI-generated technical details without verification Issue: AI may invent APIs, methods, or features that don't exist Fix: Verify all specific technical claims against actual documentation

Mistake 7: Excessive Prompt Engineering

Problem: Spending more time crafting prompts than the task is worth Issue: Diminishing returns; sometimes manual work is faster Fix: Balance prompt investment against task complexity and reuse value

Practical Exercises

Practice these exercises to develop prompt engineering skills:

Exercise 1: Test Case Generation

Take a feature from software you use daily (email, calendar, shopping app). Write a prompt that generates comprehensive test cases. Compare results from different prompt structures.

Exercise 2: Prompt Improvement

Find a vague prompt and systematically improve it using CRISP. Document how each addition improves output quality.

Exercise 3: Iterative Refinement

Generate test cases, identify gaps, and practice the refinement cycle. Track how many iterations are needed to get useful output.

Exercise 4: Format Experimentation

Request the same content in different formats (table, Gherkin, numbered list, code). Learn how format specifications affect output usability.

Exercise 5: Context Limits

Experiment with long contexts. Find where truncation occurs and practice summarization techniques to preserve essential information.

Test Your Knowledge

Quiz on CT-GenAI Prompt Engineering

Your Score: 0/10

Question: What does the 'R' in the CRISP prompt engineering framework stand for?

Requirements - the specifications being testedRole - the perspective or expertise the AI should applyResults - the expected output formatRefinement - the iteration process

Frequently Asked Questions

Frequently Asked Questions (FAQs) / People Also Ask (PAA)

What is the CRISP framework in prompt engineering?

How many iterations should I expect when refining prompts?

Should I always use the full CRISP framework for every prompt?

How do I handle prompts that exceed context window limits?

Why do the same prompts sometimes produce different results?

What's the difference between Context and Scope in CRISP?

How do I prompt AI to generate test automation code that follows our team's conventions?

When should I use few-shot learning in prompts?

Generative AI Fundamentals Risks, Ethics & Data Privacy