Skip to content

Speech Recognition

The ai speech recognize command allows you to recognize speech from audio. You can recognize speech from a microphone or a file, and output subtitles in SRT or VTT format.

Prerequisites

Before you begin, you'll need to install the ai CLI and set up Azure Speech.

Install the ai CLI
Setup Azure Speech

Recognize speech from audio

From microphone
ai speech recognize --microphone
From WAV file
ai speech recognize --file hello-world.wav
From MP3 file
ai speech recognize --file hello-world.mp3 --format mp3

Recognize speech with a specific language

From microphone in Spanish
ai speech recognize --microphone --language es-ES
From WAV file in multiple languages
ai speech recognize --file hello-world.wav --languages es-ES;fr-FR

Output subtitles

Output SRT subtitles
ai speech recognize --file hello-world.wav --output-srt-file captions.srt
Output VTT subtitles
ai speech recognize --file hello-world.wav --output-vtt-file captions.vtt

Generate code for the scenarios above

The ai dev new command allows you to generate sample code that demonstrates how to use speech recognition.

Prerequisites

Before you begin, you'll need to install the ai CLI and set up Azure Speech.

Install the ai CLI
Setup Azure Speech

List samples

List available templates
ai dev new list
AI - Azure AI CLI, Version 1.0.0
Copyright (c) 2024 Microsoft Corporation. All Rights Reserved.

Name                                                      Short Name                                       Language                        
------------------------------------------------------    ---------------------------------------------    --------------------------------
Environment Variables                                     .env                                                                             
Azure AI Inference Chat Completions (Streaming)           az-inference-chat-streaming                      C#, Python                      
Helper Function Class Library                             helper-functions                                 C#                              
OpenAI Assistants                                         openai-asst                                      C#, JavaScript, Python          
OpenAI Assistants (Streaming)                             openai-asst-streaming                            C#, JavaScript, Python          
OpenAI Assistants (w/ Code Interpreter)                   openai-asst-streaming-with-code                  C#, JavaScript, Python          
OpenAI Assistants (w/ File Search)                        openai-asst-streaming-with-file-search           C#, JavaScript, Python          
OpenAI Assistants (w/ Functions)                          openai-asst-streaming-with-functions             C#, JavaScript, Python          
OpenAI Assistants Webpage                                 openai-asst-webpage                              JavaScript, TypeScript          
OpenAI Assistants Webpage (w/ Functions)                  openai-asst-webpage-with-functions               JavaScript                      
OpenAI Chat Completions                                   openai-chat                                      C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (Streaming)                       openai-chat-streaming                            C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (w/ Data + AI Search)             openai-chat-streaming-with-data                  C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (w/ Functions)                    openai-chat-streaming-with-functions             C#, Go, JavaScript, Python      
OpenAI Chat Webpage                                       openai-chat-webpage                              JavaScript, TypeScript          
OpenAI Chat Webpage (w/ Functions)                        openai-chat-webpage-with-functions               JavaScript, TypeScript          
OpenAI Chat Webpage (w/ Speech input/output)              openai-chat-webpage-with-speech                  TypeScript                      
OpenAI Chat Webpage (w/ Functions + Speech)               openai-chat-webpage-with-speech-and-functions    TypeScript                      
Phi-3 Chat Completions (w/ ONNX)                          phi3-onnx-chat-streaming                         C#                              
Phi-3 Chat Completions (w/ ONNX + Functions)              phi3-onnx-chat-streaming-with-functions          C#                              
Semantic Kernel Chat Completions (Streaming)              sk-chat-streaming                                C#                              
Semantic Kernel Chat Completions (w/ Data + AI Search)    sk-chat-streaming-with-data                      C#                              
Semantic Kernel Chat Completions (w/ Functions)           sk-chat-streaming-with-functions                 C#                              
Semantic Kernel Chat Completions (w/ Agents)              sk-chat-with-agents                              C#                              
Speech-to-text                                            speech-to-text                                   C#, Python                      
Speech-to-text (w/ Continuous recognition)                speech-to-text-continuous-reco                   C#, Python                      
Speech-to-text (w/ File input)                            speech-to-text-with-file                         C#, Python                      
Speech-to-text (w/ Keyword detection)                     speech-to-text-with-keyword                      C#, Python                      
Speech-to-text (w/ Translation)                           speech-to-text-with-translation                  C#, Python                      
Text-to-speech                                            text-to-speech                                   C#, Python                      
Text-to-speech (w/ File output)                           text-to-speech-with-file                         C#, Python                      
List only C# samples
ai dev new list --csharp
Filter the list by name
ai dev new list speech --csharp

Generate, build, and run a sample

Generate a C# sample that recognizes speech from the microphone.

ai dev new speech-to-text --csharp
cd speech-to-text-cs
See the code; learn how it works...

Program.cs

Deep dive on how it works

Install dependencies
dotnet restore
Run the sample
ai dev shell
dotnet run

Generate a C# sample that continuously recognizes speech from the microphone.

ai dev new speech-to-text-continuous-reco --csharp
cd speech-to-text-continuous-reco-cs
See the code; learn how it works...

Program.cs

Deep dive on how it works

Install dependencies
dotnet restore
Run the sample
ai dev shell
dotnet run

Generate a C# sample that continuously recognizes speech from a file.

ai dev new speech-to-text-with-file --csharp
cd speech-to-text-with-file-cs
See the code; learn how it works...

Program.cs

Deep dive on how it works

Install dependencies
dotnet restore
Run the sample
ai dev shell
dotnet run

List samples

List available templates
ai dev new list
AI - Azure AI CLI, Version 1.0.0
Copyright (c) 2024 Microsoft Corporation. All Rights Reserved.

Name                                                      Short Name                                       Language                        
------------------------------------------------------    ---------------------------------------------    --------------------------------
Environment Variables                                     .env                                                                             
Azure AI Inference Chat Completions (Streaming)           az-inference-chat-streaming                      C#, Python                      
Helper Function Class Library                             helper-functions                                 C#                              
OpenAI Assistants                                         openai-asst                                      C#, JavaScript, Python          
OpenAI Assistants (Streaming)                             openai-asst-streaming                            C#, JavaScript, Python          
OpenAI Assistants (w/ Code Interpreter)                   openai-asst-streaming-with-code                  C#, JavaScript, Python          
OpenAI Assistants (w/ File Search)                        openai-asst-streaming-with-file-search           C#, JavaScript, Python          
OpenAI Assistants (w/ Functions)                          openai-asst-streaming-with-functions             C#, JavaScript, Python          
OpenAI Assistants Webpage                                 openai-asst-webpage                              JavaScript, TypeScript          
OpenAI Assistants Webpage (w/ Functions)                  openai-asst-webpage-with-functions               JavaScript                      
OpenAI Chat Completions                                   openai-chat                                      C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (Streaming)                       openai-chat-streaming                            C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (w/ Data + AI Search)             openai-chat-streaming-with-data                  C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (w/ Functions)                    openai-chat-streaming-with-functions             C#, Go, JavaScript, Python      
OpenAI Chat Webpage                                       openai-chat-webpage                              JavaScript, TypeScript          
OpenAI Chat Webpage (w/ Functions)                        openai-chat-webpage-with-functions               JavaScript, TypeScript          
OpenAI Chat Webpage (w/ Speech input/output)              openai-chat-webpage-with-speech                  TypeScript                      
OpenAI Chat Webpage (w/ Functions + Speech)               openai-chat-webpage-with-speech-and-functions    TypeScript                      
Phi-3 Chat Completions (w/ ONNX)                          phi3-onnx-chat-streaming                         C#                              
Phi-3 Chat Completions (w/ ONNX + Functions)              phi3-onnx-chat-streaming-with-functions          C#                              
Semantic Kernel Chat Completions (Streaming)              sk-chat-streaming                                C#                              
Semantic Kernel Chat Completions (w/ Data + AI Search)    sk-chat-streaming-with-data                      C#                              
Semantic Kernel Chat Completions (w/ Functions)           sk-chat-streaming-with-functions                 C#                              
Semantic Kernel Chat Completions (w/ Agents)              sk-chat-with-agents                              C#                              
Speech-to-text                                            speech-to-text                                   C#, Python                      
Speech-to-text (w/ Continuous recognition)                speech-to-text-continuous-reco                   C#, Python                      
Speech-to-text (w/ File input)                            speech-to-text-with-file                         C#, Python                      
Speech-to-text (w/ Keyword detection)                     speech-to-text-with-keyword                      C#, Python                      
Speech-to-text (w/ Translation)                           speech-to-text-with-translation                  C#, Python                      
Text-to-speech                                            text-to-speech                                   C#, Python                      
Text-to-speech (w/ File output)                           text-to-speech-with-file                         C#, Python                      
List only Go samples
ai dev new list --go
Filter the list by name
ai dev new list speech --go

Generate, build, and run a sample

... 🚧 UNDER CONSTRUCTION ...

... 🚧 UNDER CONSTRUCTION ...

... 🚧 UNDER CONSTRUCTION ...

List samples

List available templates
ai dev new list
AI - Azure AI CLI, Version 1.0.0
Copyright (c) 2024 Microsoft Corporation. All Rights Reserved.

Name                                                      Short Name                                       Language                        
------------------------------------------------------    ---------------------------------------------    --------------------------------
Environment Variables                                     .env                                                                             
Azure AI Inference Chat Completions (Streaming)           az-inference-chat-streaming                      C#, Python                      
Helper Function Class Library                             helper-functions                                 C#                              
OpenAI Assistants                                         openai-asst                                      C#, JavaScript, Python          
OpenAI Assistants (Streaming)                             openai-asst-streaming                            C#, JavaScript, Python          
OpenAI Assistants (w/ Code Interpreter)                   openai-asst-streaming-with-code                  C#, JavaScript, Python          
OpenAI Assistants (w/ File Search)                        openai-asst-streaming-with-file-search           C#, JavaScript, Python          
OpenAI Assistants (w/ Functions)                          openai-asst-streaming-with-functions             C#, JavaScript, Python          
OpenAI Assistants Webpage                                 openai-asst-webpage                              JavaScript, TypeScript          
OpenAI Assistants Webpage (w/ Functions)                  openai-asst-webpage-with-functions               JavaScript                      
OpenAI Chat Completions                                   openai-chat                                      C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (Streaming)                       openai-chat-streaming                            C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (w/ Data + AI Search)             openai-chat-streaming-with-data                  C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (w/ Functions)                    openai-chat-streaming-with-functions             C#, Go, JavaScript, Python      
OpenAI Chat Webpage                                       openai-chat-webpage                              JavaScript, TypeScript          
OpenAI Chat Webpage (w/ Functions)                        openai-chat-webpage-with-functions               JavaScript, TypeScript          
OpenAI Chat Webpage (w/ Speech input/output)              openai-chat-webpage-with-speech                  TypeScript                      
OpenAI Chat Webpage (w/ Functions + Speech)               openai-chat-webpage-with-speech-and-functions    TypeScript                      
Phi-3 Chat Completions (w/ ONNX)                          phi3-onnx-chat-streaming                         C#                              
Phi-3 Chat Completions (w/ ONNX + Functions)              phi3-onnx-chat-streaming-with-functions          C#                              
Semantic Kernel Chat Completions (Streaming)              sk-chat-streaming                                C#                              
Semantic Kernel Chat Completions (w/ Data + AI Search)    sk-chat-streaming-with-data                      C#                              
Semantic Kernel Chat Completions (w/ Functions)           sk-chat-streaming-with-functions                 C#                              
Semantic Kernel Chat Completions (w/ Agents)              sk-chat-with-agents                              C#                              
Speech-to-text                                            speech-to-text                                   C#, Python                      
Speech-to-text (w/ Continuous recognition)                speech-to-text-continuous-reco                   C#, Python                      
Speech-to-text (w/ File input)                            speech-to-text-with-file                         C#, Python                      
Speech-to-text (w/ Keyword detection)                     speech-to-text-with-keyword                      C#, Python                      
Speech-to-text (w/ Translation)                           speech-to-text-with-translation                  C#, Python                      
Text-to-speech                                            text-to-speech                                   C#, Python                      
Text-to-speech (w/ File output)                           text-to-speech-with-file                         C#, Python                      
List only Java samples
ai dev new list --java
Filter the list by name
ai dev new list speech --java

Generate, build, and run a sample

... 🚧 UNDER CONSTRUCTION ...

... 🚧 UNDER CONSTRUCTION ...

... 🚧 UNDER CONSTRUCTION ...

List samples

List available templates
ai dev new list
AI - Azure AI CLI, Version 1.0.0
Copyright (c) 2024 Microsoft Corporation. All Rights Reserved.

Name                                                      Short Name                                       Language                        
------------------------------------------------------    ---------------------------------------------    --------------------------------
Environment Variables                                     .env                                                                             
Azure AI Inference Chat Completions (Streaming)           az-inference-chat-streaming                      C#, Python                      
Helper Function Class Library                             helper-functions                                 C#                              
OpenAI Assistants                                         openai-asst                                      C#, JavaScript, Python          
OpenAI Assistants (Streaming)                             openai-asst-streaming                            C#, JavaScript, Python          
OpenAI Assistants (w/ Code Interpreter)                   openai-asst-streaming-with-code                  C#, JavaScript, Python          
OpenAI Assistants (w/ File Search)                        openai-asst-streaming-with-file-search           C#, JavaScript, Python          
OpenAI Assistants (w/ Functions)                          openai-asst-streaming-with-functions             C#, JavaScript, Python          
OpenAI Assistants Webpage                                 openai-asst-webpage                              JavaScript, TypeScript          
OpenAI Assistants Webpage (w/ Functions)                  openai-asst-webpage-with-functions               JavaScript                      
OpenAI Chat Completions                                   openai-chat                                      C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (Streaming)                       openai-chat-streaming                            C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (w/ Data + AI Search)             openai-chat-streaming-with-data                  C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (w/ Functions)                    openai-chat-streaming-with-functions             C#, Go, JavaScript, Python      
OpenAI Chat Webpage                                       openai-chat-webpage                              JavaScript, TypeScript          
OpenAI Chat Webpage (w/ Functions)                        openai-chat-webpage-with-functions               JavaScript, TypeScript          
OpenAI Chat Webpage (w/ Speech input/output)              openai-chat-webpage-with-speech                  TypeScript                      
OpenAI Chat Webpage (w/ Functions + Speech)               openai-chat-webpage-with-speech-and-functions    TypeScript                      
Phi-3 Chat Completions (w/ ONNX)                          phi3-onnx-chat-streaming                         C#                              
Phi-3 Chat Completions (w/ ONNX + Functions)              phi3-onnx-chat-streaming-with-functions          C#                              
Semantic Kernel Chat Completions (Streaming)              sk-chat-streaming                                C#                              
Semantic Kernel Chat Completions (w/ Data + AI Search)    sk-chat-streaming-with-data                      C#                              
Semantic Kernel Chat Completions (w/ Functions)           sk-chat-streaming-with-functions                 C#                              
Semantic Kernel Chat Completions (w/ Agents)              sk-chat-with-agents                              C#                              
Speech-to-text                                            speech-to-text                                   C#, Python                      
Speech-to-text (w/ Continuous recognition)                speech-to-text-continuous-reco                   C#, Python                      
Speech-to-text (w/ File input)                            speech-to-text-with-file                         C#, Python                      
Speech-to-text (w/ Keyword detection)                     speech-to-text-with-keyword                      C#, Python                      
Speech-to-text (w/ Translation)                           speech-to-text-with-translation                  C#, Python                      
Text-to-speech                                            text-to-speech                                   C#, Python                      
Text-to-speech (w/ File output)                           text-to-speech-with-file                         C#, Python                      
List only JavaScript samples
ai dev new list --javascript
Filter the list by name
ai dev new list speech --javascript

Generate, build, and run a sample

... 🚧 UNDER CONSTRUCTION ...

... 🚧 UNDER CONSTRUCTION ...

... 🚧 UNDER CONSTRUCTION ...

List samples

List available templates
ai dev new list
AI - Azure AI CLI, Version 1.0.0
Copyright (c) 2024 Microsoft Corporation. All Rights Reserved.

Name                                                      Short Name                                       Language                        
------------------------------------------------------    ---------------------------------------------    --------------------------------
Environment Variables                                     .env                                                                             
Azure AI Inference Chat Completions (Streaming)           az-inference-chat-streaming                      C#, Python                      
Helper Function Class Library                             helper-functions                                 C#                              
OpenAI Assistants                                         openai-asst                                      C#, JavaScript, Python          
OpenAI Assistants (Streaming)                             openai-asst-streaming                            C#, JavaScript, Python          
OpenAI Assistants (w/ Code Interpreter)                   openai-asst-streaming-with-code                  C#, JavaScript, Python          
OpenAI Assistants (w/ File Search)                        openai-asst-streaming-with-file-search           C#, JavaScript, Python          
OpenAI Assistants (w/ Functions)                          openai-asst-streaming-with-functions             C#, JavaScript, Python          
OpenAI Assistants Webpage                                 openai-asst-webpage                              JavaScript, TypeScript          
OpenAI Assistants Webpage (w/ Functions)                  openai-asst-webpage-with-functions               JavaScript                      
OpenAI Chat Completions                                   openai-chat                                      C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (Streaming)                       openai-chat-streaming                            C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (w/ Data + AI Search)             openai-chat-streaming-with-data                  C#, Go, Java, JavaScript, Python
OpenAI Chat Completions (w/ Functions)                    openai-chat-streaming-with-functions             C#, Go, JavaScript, Python      
OpenAI Chat Webpage                                       openai-chat-webpage                              JavaScript, TypeScript          
OpenAI Chat Webpage (w/ Functions)                        openai-chat-webpage-with-functions               JavaScript, TypeScript          
OpenAI Chat Webpage (w/ Speech input/output)              openai-chat-webpage-with-speech                  TypeScript                      
OpenAI Chat Webpage (w/ Functions + Speech)               openai-chat-webpage-with-speech-and-functions    TypeScript                      
Phi-3 Chat Completions (w/ ONNX)                          phi3-onnx-chat-streaming                         C#                              
Phi-3 Chat Completions (w/ ONNX + Functions)              phi3-onnx-chat-streaming-with-functions          C#                              
Semantic Kernel Chat Completions (Streaming)              sk-chat-streaming                                C#                              
Semantic Kernel Chat Completions (w/ Data + AI Search)    sk-chat-streaming-with-data                      C#                              
Semantic Kernel Chat Completions (w/ Functions)           sk-chat-streaming-with-functions                 C#                              
Semantic Kernel Chat Completions (w/ Agents)              sk-chat-with-agents                              C#                              
Speech-to-text                                            speech-to-text                                   C#, Python                      
Speech-to-text (w/ Continuous recognition)                speech-to-text-continuous-reco                   C#, Python                      
Speech-to-text (w/ File input)                            speech-to-text-with-file                         C#, Python                      
Speech-to-text (w/ Keyword detection)                     speech-to-text-with-keyword                      C#, Python                      
Speech-to-text (w/ Translation)                           speech-to-text-with-translation                  C#, Python                      
Text-to-speech                                            text-to-speech                                   C#, Python                      
Text-to-speech (w/ File output)                           text-to-speech-with-file                         C#, Python                      
List only Python samples
ai dev new list --python
Filter the list by name
ai dev new list speech --python

Generate, build, and run a sample

Generate a Python sample that recognizes speech from the microphone.

ai dev new speech-to-text --python
cd speech-to-text-py
See the code; learn how it works...

main.py

Deep dive on how it works

Create virtual environment
python -m venv env
env/Scripts/activate
Install requirements
pip install -r requirements.txt
Run the sample
ai dev shell
python main.py
Create virtual environment
python3 -m venv env
source env/bin/activate
Install requirements
pip install -r requirements.txt
Run the sample
ai dev shell
python3 main.py
Create virtual environment
python3 -m venv env
source env/bin/activate
Install requirements
pip install -r requirements.txt
Run the sample
ai dev shell
python3 main.py

Generate a Python sample that continuously recognizes speech from the microphone.

ai dev new speech-to-text-continuous-reco --python
cd speech-to-text-continuous-reco-py
See the code; learn how it works...

main.py

Deep dive on how it works

Create virtual environment
python -m venv env
env/Scripts/activate
Install requirements
pip install -r requirements.txt
Run the sample
ai dev shell
python main.py
Create virtual environment
python3 -m venv env
source env/bin/activate
Install requirements
pip install -r requirements.txt
Run the sample
ai dev shell
python3 main.py
Create virtual environment
python3 -m venv env
source env/bin/activate
Install requirements
pip install -r requirements.txt
Run the sample
ai dev shell
python3 main.py

Generate a Python sample that continuously recognizes speech from a file.

ai dev new speech-to-text-with-file --python
cd speech-to-text-with-file-py
See the code; learn how it works...

main.py

Deep dive on how it works

Create virtual environment
python -m venv env
env/Scripts/activate
Install requirements
pip install -r requirements.txt
Run the sample
ai dev shell
python main.py
Create virtual environment
python3 -m venv env
source env/bin/activate
Install requirements
pip install -r requirements.txt
Run the sample
ai dev shell
python3 main.py
Create virtual environment
python3 -m venv env
source env/bin/activate
Install requirements
pip install -r requirements.txt
Run the sample
ai dev shell
python3 main.py