Provides a convenient abstraction layer over the Microsoft Cognitive Services Speech SDK, simplifying the integration of speech-to-text and text-to-speech functionality into client/browser applications. Using this package, developers can quickly integrate basic STT and TTS capabilities into their applications without the need to write intricate code.
- Perform a single speech recognition operation with ease.
- Enable continuous speech recognition for real-time applications.
- Multilingual speech recognition.
- Text to speech synthesis.
- SSML/Text input for TTS.
Using npm:
npm install azure-speech-utilities
Creates a new speech recognizer instance.
Parameter | Type | Default Value | Description |
---|---|---|---|
cogSvcSubKey | string | "" |
The Cognitive Services subscription key for Speech Services. (Required, default is empty string ) |
cogSvcRegion | string | "" |
The region of Cognitive Services subscription. (Required, default is empty string ) |
recognitionLang | string[] | ["en-US"] |
An array of language recognition codes. (Optional, default is ["en-US"] ). |
Used for single-shot recognition, which recognizes a single utterance. The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.
Parameter | Type | Default Value | Description |
---|---|---|---|
recognizer | sdk.SpeechRecognizer | undefined | undefined | The speech recognizer instance to use. |
The previous function performs single-shot recognition, which recognizes a single utterance. In contrast, you can use continuous recognition to get a real-time recognized text stream. Make a call to StopContinuousRecognitionAsync() at some point to stop recognition
Parameter | Type | Default Value | Description |
---|---|---|---|
recognizer | sdk.SpeechRecognizer | undefined | undefined | The speech recognizer instance to use. |
callbackRecognized | (value: string) => void | val => console.log(value) |
A callback function called with recognized text. |
callbackRecognizing | (value: string) => void | val => console.log(value) |
A callback function called while speech is being recognized. |
Stops ongoing continuous speech recognition.
Parameter | Type | Default Value | Description |
---|---|---|---|
recognizer | sdk.SpeechRecognizer | undefined | undefined | The speech recognizer instance to use. |
Note: Use the same recognizer instance which you are using for ContinuousRecognitionAsync() as an argument to this function.
Creates a new speech synthesizer instance.
Parameter | Type | Default Value | Description |
---|---|---|---|
cogSvcSubKey | string | "" |
The Cognitive Services subscription key for Speech Services. (Required) |
cogSvcRegion | string | "" |
The region of Cognitive Services subscription. (Required) |
synthesisLang | string | "" |
The language code for the speech synthesizer. (Required) |
synthesisVoiceName | string | "" |
The name of the voice to use for speech synthesis. (Optional, default is "" ) |
createAudioConfig | boolean | false |
Whether to create an audio config for speech output. (Optional, default is false ) |
Note: The voice that speaks is determined in order of priority as follows:
- Passing false for
createAudioConfig
, doesn't play the audio by default on the current active output device. - If you only set
synthesisLang
, the default voice for the specified locale speaks. - If both
synthesisVoiceName
andsynthesisLang
are set, thesynthesisLang
setting is ignored. The voice that you specify by usingsynthesisVoiceName
speaks. - If the voice element is set by using Speech Synthesis Markup Language (SSML), the
synthesisVoiceName
andsynthesisLang
settings are ignored.
Performs speech synthesis and returns the result (synthesized audio) in form of arrayBuffer.
Parameter | Type | Default Value | Description |
---|---|---|---|
synthesizer | sdk.SpeechSynthesizer | undefined | undefined | The speech synthesizer instance to use. |
inputString | string | "I'm excited to try text to speech" |
The text to be synthesized. |
inputType | string | "text" |
The format of the input text. (Optional, default is "text" ) |
callback | (result: sdk.SynthesisResult, error?: Error) => void | (result, error) => {} | A callback function called with the synthesis result or an error. |
Recognize Once
import { CreateRecognizer, RecognizeOnceAsync } from "azure-speech-utilities"
const CGV_KEY = "AZURE_SPEECH_SERVICE_KEY"
const CGV_REGION = "AZURE_SPEECH_SERVICE_REGION"
async function recognizeSpeech() {
const recognizer = CreateRecognizer(CGV_KEY, CGV_REGION, ["hi-IN"])
try {
const recognizedText = await RecognizeOnceAsync(recognizer)
if (recognizedText.type === "text") {
console.log(recognizedText.message)
} else {
console.log(recognizedText.message)
}
} catch (error) {
console.error(error)
}
}
Continuous Recognition
import { CreateRecognizer, ContinuousRecognitionAsync } from "azure-speech-utilities"
const CGV_KEY = "AZURE_SPEECH_SERVICE_KEY"
const CGV_REGION = "AZURE_SPEECH_SERVICE_REGION"
// As there are 2 or more recognition languages "hi-IN" and "en-US" so it will be multilingual recognition.
const recognizer = CreateRecognizer(CGV_KEY, CGV_REGION, ["hi-IN", "en-US"])
function callbackRecognized(text) {
console.log("RECOGNIZED: ", text)
}
function callbackRecognizing(text) {
console.log("RECOGNIZING: ", text)
}
async function recognizeSpeech() {
try {
const response = await ContinuousRecognitionAsync(recognizer, callbackRecognized, callbackRecognizing)
if (response.type === "success") {
console.log(response.message)
} else {
console.error(response.message)
}
} catch (error) {
console.error(error)
}
}
function stopContinuousRecognition() {
StopContinuousRecognitionAsync(recognizer)
}
Speak Async
import { CreateSynthesizer, SpeakAsync } from "azure-speech-utilities"
const CGV_KEY = "AZURE_SPEECH_SERVICE_KEY"
const CGV_REGION = "AZURE_SPEECH_SERVICE_REGION"
const SYNTHESIS_LANGUAGE = "en-US"
const SYNTHESIS_VOICE_NAME = "en-US-JennyNeural"
function handleSpeck() {
// By default, the input type is 'text.' If you change the input type to 'ssml,' then the input string should be in the following SSML format.
// const ssml = `
// <speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="${SYNTHESIS_LANGUAGE}">
// <voice name="${SYNTHESIS_VOICE_NAME}">
// When you're on the freeway, it's a good idea to use a GPS.
// </voice>
// </speak>
// `
const text = "When you're on the freeway, it's a good idea to use a GPS."
// Please note that the 'createAudioConfig' is set to false, meaning audio will not play by default on the currently active output device.
const synthesizer = CreateSynthesizer(CGV_KEY, CGV_REGION, SYNTHESIS_LANGUAGE, SYNTHESIS_VOICE_NAME, false)
SpeakAsync(synthesizer, text, "text", (result, error) => {
if (error) {
console.error(error)
} else {
console.log(result)
const audioBlob = new Blob([result.audioData], { type: "audio/wav" })
// You can use this URL as an audio source, which allows easy user control such as starting, stopping, resetting, etc.
console.log(URL.createObjectURL(audioBlob))
}
})
}
const stopSpeaking = () => {
audioRef.current.pause()
}
Note: If you do not wish to play audio through an audio source, you can set createAudioConfig
to true
. This will cause the audio to play on the current active output device by default. However, using this method will not provide the user with the ability to reset, play, or pause the speaking audio.
This project welcomes contributions and suggestions.