Speech Markdown grammar, parser, and formatters for use with JavaScript.

Supported platforms:

  • microsoft-azure

Partial / no support:

  • amazon-alexa
  • amazon-polly
  • amazon-polly-neural
  • google-assistant
  • samsung-bixby

how to use

import { SpeechMarkdown } from '@davi-ai/speechmarkdown-davi-js'

const options = {
  platform: 'microsoft-azure',
  includeSpeakTag: false,
  globalVoiceAndLang: {
    voice: 'en-US-JennyMultiLingualNeural',
    lang: 'fr-FR'

const speechMarkdownParser = new SpeechMarkdown(options)

You can use multiple options, the most useful ones are :

  • platform : 'microsoft-azure' to generate SSML for azure neural voices
  • includeSpeakTag : add or not a tag at the beginning and tag at the ending.
  • globalVoiceAndLang: { voice?: string, lang?: string } : added for microsoft voices and retorik-framework architecture. If you use a selected voice as main voice, put it in 'voice' field
    (format language-CULTURE-VoiceName (ex: en-US-GuyNeural, en-US-JennyNeural)) When using a multilingual voice (ex: JennyMultilingualNeural, if the text has to be spoken in a different language than the one of this language, add
    the 'lang' field with the desired language, formatted language-CULTURE (ex: fr-FR, en-US, de-DE, ...)

With theses parameters, you will receive a complete SSML string, excepted for the tag that has to be put manually around. We don't use the includeSpeakTag = true
parameter because it only puts a tag, and to use Microsoft voices we need a complete tag as follows :

  <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="https://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" xml:lang="fr-FR">

available speechmarkdown tags

There are many different tags and most of them have restrictions. To get the current documentation, go to docs.microsoft.com

On 2023/07/28, the available tags are :

  • voice :
    • (text to be read with that voice)[voice:"voice name"]
    • the text can contain other tags except 'voice'
    • the voice name can be as follows :
      • language-CULTURE-VoiceName (ex: en-US-GuyNeural, en-US-JennyNeural))
      • full Microsoft name (ex: Microsoft Server Speech Text to Speech Voice (en-US, JennyMultilingualNeural))
    • example : (Bonjour, comment ça va ?)[voice:"fr-FR-DeniseNeural"]
  • lang :
    • (text to be read in this language)[lang:"language name"]
    • the text can contain other tags except 'voice' and 'lang'
    • the lang name must be formatted as language-CULTURE (ex: fr-FR, en-US)
    • example : (Bonjour, comment ça va ?)[lang:"en-US"]
  • break :
    • [break time in seconds / milliseconds] or [break:"strength value"]
    • strength values :
      • none
      • x-weak
      • weak
      • medium
      • strong
      • x-strong
    • example : ts [break:"strong"] / [1s] / [250ms]
  • silence :
    • [silence:"type value"]
    • type and value are required
    • type can be :
      • Leading : beginning of text
      • Tailing : end of text
      • SentenceBoundary : between adjacent sentences
    • value is an integer giving time in seconds or milliseconds, lower than 5000ms
    • example : [silence:"Leading 1s"]
  • prosody :
    • (text for which the prosody will be adjusted)[pitch:"value";contour="value";range="value";rate="value";volume="value"]
    • you can use any of the modifiers below, from one to all of them
    • modifiers :
      • pitch
      • contour
      • range
      • rate
      • volume
    • example : (this will be spoken slow and high)[rate:"slow";pitch:"high"]
  • emphasis :
    • [emphasis:"value"] or ++text will be strong++
    • value can be / corresponding symbols around text :
      • reduced / -text reduced-
      • none / ~text without change
      • moderate / +text stronger+
      • strong / ++text much stronger++
    • example : [emphasis:"moderate"] / +bonjour+
  • say-as :
    • (text to be said as)[modifier]
    • modifier can be :
      • address
      • number
      • characters
      • fraction
      • ordinal
      • telephone
      • time
      • date
    • example : I need this answer (ASAP)[characters] / My phyone number is (0386300000)[telephone]
  • ipa :
    • the International Phonetic Alphabet (ipa) allows you to force the pronunciation of a word / sentence
    • example : I love (paintball)[ipa:"peɪntbɔːl"]
  • emotions :
    • [emotion:"style role/styledegree"]
    • the style is mandatory, and depends on the voice speaking at that time (ex: fr-FR-DeniseNeural can only use 'sad' and 'cheerful' while ja-JP-NanamiNeural can use
      'chat', cheerful' and 'customerservice')
    • role and styledegree are optionnal. Role is a string, while styledegree is a number. Note that 'role' is restricted to very few voices
    • example : (It's so cool ! We are going to a great park today !)[voice:"en-US-JennyNeural";emotion:"excited 2"]
  • audio :
  • backgroundaudio :
    • [backgroundaudio:"src volume fadein fadeout"]
    • src mandatory, other fields optionnal but all fields on the left must be provided before using one on the right (ex: to use fadein,
      you must have provided a value for src and volume)
    • only one backgroundaudio tag possible
    • example : [backgroundaudio:"https://cdn.retorik.ai/retorik-framework/audiofiles/audiotest.mp3 0.5 2000 1500"]
  • lexicon :
    • [lexicon:"url to the lexicon xml file"]
    • the lexicon file is restricted to one language (en-US, fr-FR, ...) so it won't be used if the voice uses another language
    • it does nothing when using a multilingual voice (ex: JennyMultilingualNeural), even if the lang tag of this voice is the same as the one in the lexicon file
    • the lexicon inputs are case-sensitive, for example 'hello' and 'Hello' must be treated separately
    • example : [lexicon:"https://cdn.retorik.ai/retorik-framework/lexicon-en-US.xml"] Hi everybody ! BTW how are you today ?
  • bookmark :
    • [bookmark:"bookmark text"]
    • example : Bookmark after city name : first Paris [bookmark:"city1"], then Berlin [bookmark:"city2"]


Licensed under the MIT. See the LICENSE file for details.




