ai test
YAML Framework
ai test
is a YAML-based test framework/runner that can be used to run tests on any command-line tool or script. It is designed to be simple to use and understand, and to be able to run tests in parallel.
- name: Build search index
command: ai search index update --files "data/*.md" --index-name myindex
expect-regex: |
Updating search index 'myindex' ...
Updating search index 'myindex' ... Done!
The test case YAML file contains a list of test cases. Each test case is a dictionary with the following keys:
Key | Required | Description |
---|---|---|
tests , steps |
Required | A list of test cases to run. |
command , script , bash |
Required | The command or script to run. |
name |
Required | The name of the test case. |
env |
Optional | A dictionary of environment variables to set before running the command or script. |
input |
Optional | The input to pass to the command or script. |
expect |
Optional | A string that instructs the LLM (e.g. GPT-4) to decide pass/fail based on stdout/stderr. |
expect-regex |
Optional | A list of regular expressions that must be matched in the stdout/stderr output. |
not-expect-regex |
Optional | A list of regular expressions that must not be matched in the stdout/stderr output. |
parallelize |
Optional | Whether the test case should run in parallel with other test cases. |
skipOnFailure |
Optional | Whether the test case should be skipped when it fails. |
tag /tags |
Optional | A list of tags to associate with the test case. |
timeout |
Optional | The maximum time allowed to execute the test case, in milliseconds. |
workingDirectory |
Optional | The working directory where the test will be run. |
matrix , matrix-file |
Optional | A matrix used to parameterize and/or create multiple variations of a test case. |
Test cases can be organized into areas, sub-areas, and so on.
- area: Area 1
tests:
- name: Test 1
command: echo "Hello, world!"
- name: Test 2
command: echo "Goodbye, world!"
Test cases can also be grouped into classes.
- class: Class 1
tests:
- name: Test 1
command: echo "Hello, world!"
- name: Test 2
command: echo "Goodbye, world!"
If no class is specified, the default class is "TestCases".
YAML Reference
tests
, steps
Required.
When present, specifies a list of test cases to run.
By default, for steps
, all tests will be run sequentially. The full set of tests will be run in parallel with other steps
for other test areas, unless parallelize: false
is set.
Examples:
command
, script
, bash
Required.
Represents how the test case will be run.
If the specified command or script returns an error level of non-zero, the test will fail. If it returns zero, it will pass (given that all 'expect' conditions are also met).
Example command:
Example for a bash script:
env
Optional. Inherits from parent.
When present, a dictionary of environment variables to set before running the command or script.
Example:
input
Optional.
When present, will be passed to the command or script as stdin.
Example:
expect
Optional.
Represents instructions given to LLM (e.g. GPT-4) along with stdout/stderr to decide whether the test passes or fails.
Example:
expect-regex
Optional.
Each string (or line in multiline string) is a regular expression that must be matched in the stdout/stderr output.
If any regular expression is not matched, the test will fail. If all expressions are matched, in order, the test will pass.
Example:
not-expect-regex
Optional.
When present, each string (or line in multiline string) is a regular expression that must not be matched in the stdout/stderr output.
If any regular expression is matched, the test fails. If none match, the test passes.
Example:
parallelize
Optional.
When present, specifies if the test cases should run in parallel or not.
By default, it is set to false
for all tests, except for the first step in a steps
test sequence.
Example:
skipOnFailure
Optional.
When present, specifies if the test case should be skipped when it fails.
By default, it is set to false
.
Example:
tag
/tags
Optional. Inherits from parent.
When present, specifies a list of tags to associate with the test case.
Tags accumulate from parent to child, so if a tag is specified in a parent, it will be inherited by all children.
Examples:
area: Area 1
tags: [echo]
tests:
- name: Test 1
command: echo "Hello, world!"
tags: [hello]
- name: Test 2
command: echo "Goodbye, world!"
tags: [bye]
timeout
Optional.
When present, specifies the maximum time allowed to execute the test case, in milliseconds. Defaults to infinite.
Example:
workingDirectory
Optional. Inherits from parent.
When present, specifies an absolute path or relative path where the test will be run.
When specified as a relative path, it will be relative to the working directory of the parent, or if no parent exists, where the test case file is located.
matrix
, matrix-file
Optional. Inerits from parent.
When present, specifies a matrix (set of values) used to parameterize and/or create multiple variations of a test case with different parameter values. This is achieved using the matrix
key and ${{ matrix.VALUE }}
syntax within the test cases.
Examples:
In this example, the test case will run three times with VALUE
set to 1, 2, and 3 respectively.
matrix:
animals: [ cats, bears, goats ]
temperature: [ 0.8, 1.0 ]
command: 'ai chat --question "Tell me a joke about ${{ matrix.animals }}"'
expect: 'The joke should be about ${{ matrix.animals }}'
In this example the test case will run six times (2x3) with temperature
set to 0.8 and 1.0, and animals
set to cats, bears, and goats.
matrix:
assistant-id: asst_TqfFCksyWK83VKe76kiBYWGt
foreach:
- question: How do you create an MP3 file with speech synthesis from the text "Hello, World!"?
- question: How do you recognize speech from an MP3 file?
- question: How do you recognize speech from a microphone?
steps:
- name: Inference call to `ai chat`
command: ai chat
arguments:
question: ${{ matrix.question }}
assistant-id: ${{ matrix.assistant-id }}
output-chat-history: chat-history-${{ matrix.__matrix_id__ }}.jsonl
In this example the test case will run three times with assistant-id
set to asst_TqfFCksyWK83VKe76kiBYWGt
, and question
set to the three questions specified in the foreach
list, and the output-chat-history
specifies where to save the chat history, using a filename that is unique to the "matrix" combination.
questions.yaml
:
foreach:
- question: How do you create an MP3 file with speech synthesis from the text "Hello, World!"?
- question: How do you recognize speech from an MP3 file?
- question: How do you recognize speech from a microphone?
In this example, the matrix is loaded from a file.