Skip to content

ai test YAML Framework

ai test is a YAML-based test framework/runner that can be used to run tests on any command-line tool or script. It is designed to be simple to use and understand, and to be able to run tests in parallel.

Example test case
- name: Build search index
  command: ai search index update --files "data/*.md" --index-name myindex
  expect-regex: |
    Updating search index 'myindex' ...
    Updating search index 'myindex' ... Done!

The test case YAML file contains a list of test cases. Each test case is a dictionary with the following keys:

Key Required Description
tests, steps Required A list of test cases to run.
command, script, bash Required The command or script to run.
name Required The name of the test case.
env Optional A dictionary of environment variables to set before running the command or script.
input Optional The input to pass to the command or script.
expect Optional A string that instructs the LLM (e.g. GPT-4) to decide pass/fail based on stdout/stderr.
expect-regex Optional A list of regular expressions that must be matched in the stdout/stderr output.
not-expect-regex Optional A list of regular expressions that must not be matched in the stdout/stderr output.
parallelize Optional Whether the test case should run in parallel with other test cases.
skipOnFailure Optional Whether the test case should be skipped when it fails.
tag/tags Optional A list of tags to associate with the test case.
timeout Optional The maximum time allowed to execute the test case, in milliseconds.
workingDirectory Optional The working directory where the test will be run.
matrix, matrix-file Optional A matrix used to parameterize and/or create multiple variations of a test case.

Test cases can be organized into areas, sub-areas, and so on.

- area: Area 1
  tests:

  - name: Test 1
    command: echo "Hello, world!"

  - name: Test 2
    command: echo "Goodbye, world!"

Test cases can also be grouped into classes.

- class: Class 1
  tests:

  - name: Test 1
    command: echo "Hello, world!"

  - name: Test 2
    command: echo "Goodbye, world!"

If no class is specified, the default class is "TestCases".

YAML Reference

tests, steps

Required.

When present, specifies a list of test cases to run.

By default, for steps, all tests will be run sequentially. The full set of tests will be run in parallel with other steps for other test areas, unless parallelize: false is set.

Examples:

tests:
- name: Test 1
  command: echo "Hello, world!"
- name: Test 2
  command: echo "Goodbye, world!"
steps:
- name: Step 1
  bash: echo "Hello, world!"
- name: Step 2
  bash: echo "Goodbye, world!"

command, script, bash

Required.

Represents how the test case will be run.

If the specified command or script returns an error level of non-zero, the test will fail. If it returns zero, it will pass (given that all 'expect' conditions are also met).

Example command:

command: ai chat --interactive

Example for a bash script:

bash: |
  if [ -f /etc/os-release ]; then 
    python3 script.py 
  else 
    py script.py
  fi

env

Optional. Inherits from parent.

When present, a dictionary of environment variables to set before running the command or script.

Example:

env:
  JAVA_HOME: /path/to/java

input

Optional.

When present, will be passed to the command or script as stdin.

Example:

input: |
  Tell me a joke
  Tell me another
  exit

expect

Optional.

Represents instructions given to LLM (e.g. GPT-4) along with stdout/stderr to decide whether the test passes or fails.

Example:

expect: the output must have exactly two jokes

expect-regex

Optional.

Each string (or line in multiline string) is a regular expression that must be matched in the stdout/stderr output.

If any regular expression is not matched, the test will fail. If all expressions are matched, in order, the test will pass.

Example:

expect-regex: |
  Regex 1
  Regex 2

not-expect-regex

Optional.

When present, each string (or line in multiline string) is a regular expression that must not be matched in the stdout/stderr output.

If any regular expression is matched, the test fails. If none match, the test passes.

Example:

not-expect-regex: |
  ERROR
  curseword1
  curseword2

parallelize

Optional.

When present, specifies if the test cases should run in parallel or not.

By default, it is set to false for all tests, except for the first step in a steps test sequence.

Example:

parallelize: true

skipOnFailure

Optional.

When present, specifies if the test case should be skipped when it fails.

By default, it is set to false.

Example:

skipOnFailure: true

tag/tags

Optional. Inherits from parent.

When present, specifies a list of tags to associate with the test case.

Tags accumulate from parent to child, so if a tag is specified in a parent, it will be inherited by all children.

Examples:

tag: skip
tags:
- slow
- performance
- long-running
area: Area 1
tags: [echo]
tests:

- name: Test 1
  command: echo "Hello, world!"
  tags: [hello]

- name: Test 2
  command: echo "Goodbye, world!"
  tags: [bye]

timeout

Optional.

When present, specifies the maximum time allowed to execute the test case, in milliseconds. Defaults to infinite.

Example:

timeout: 3000  # 3 seconds

workingDirectory

Optional. Inherits from parent.

When present, specifies an absolute path or relative path where the test will be run.

When specified as a relative path, it will be relative to the working directory of the parent, or if no parent exists, where the test case file is located.

matrix, matrix-file

Optional. Inerits from parent.

When present, specifies a matrix (set of values) used to parameterize and/or create multiple variations of a test case with different parameter values. This is achieved using the matrix key and ${{ matrix.VALUE }} syntax within the test cases.

Examples:

- name: Example Matrix Test
  matrix:
    VALUE: [1, 2, 3]
  bash: echo "Value is ${{ matrix.VALUE }}"

In this example, the test case will run three times with VALUE set to 1, 2, and 3 respectively.

matrix:
  animals: [ cats, bears, goats ]
  temperature: [ 0.8, 1.0 ]
command: 'ai chat --question "Tell me a joke about ${{ matrix.animals }}"'
expect: 'The joke should be about ${{ matrix.animals }}'

In this example the test case will run six times (2x3) with temperature set to 0.8 and 1.0, and animals set to cats, bears, and goats.

matrix:
  assistant-id: asst_TqfFCksyWK83VKe76kiBYWGt
  foreach:
  - question: How do you create an MP3 file with speech synthesis from the text "Hello, World!"?
  - question: How do you recognize speech from an MP3 file?
  - question: How do you recognize speech from a microphone?
steps:
- name: Inference call to `ai chat`
  command: ai chat
  arguments:
    question: ${{ matrix.question }}
    assistant-id: ${{ matrix.assistant-id }}
    output-chat-history: chat-history-${{ matrix.__matrix_id__ }}.jsonl

In this example the test case will run three times with assistant-id set to asst_TqfFCksyWK83VKe76kiBYWGt, and question set to the three questions specified in the foreach list, and the output-chat-history specifies where to save the chat history, using a filename that is unique to the "matrix" combination.

matrix-file: questions.yaml

questions.yaml:

foreach:
- question: How do you create an MP3 file with speech synthesis from the text "Hello, World!"?
- question: How do you recognize speech from an MP3 file?
- question: How do you recognize speech from a microphone?

In this example, the matrix is loaded from a file.