Spokestack offers a rich, controllable text-to-speech (TTS) system. The voices available to use in your app are determined by your API client identifier, but we offer a demo voice to all users so that you can try custom TTS in your app without making a commitment. If you need a unique voice for your project, our Maker tier lets you create your own! If you’d like a custom voice created from studio-quality recordings, contact us, and we can work with you to create a unique experience for your users.
Apart from voice selection, Spokestack’s TTS enables fine control over pronunciation by supporting a subset of both the SSML standard and Speech Markdown syntax. Synthesis defaults to using raw text, but you can opt into using one of these other markups instead; see the platform-specific documentation (iOS | Android) for details.
SSML is an XML-based markup language; the root element must be
<speak>. Aside from
speak, Spokestack supports the following elements:
- phoneme with the
alphabetattribute set to “ipa”
interpret-asattribute only, with one of the following values:
Note that long inputs should be split into separate
s (“sentence”) elements for the best performance.
Currently, Spokestack is focused on pronunciation of English words and loan words/foreign words common in spoken English and thus restricts its character set from the full range of IPA characters. Characters valid for an IPA
ph attribute are:
[' ', ',', 'a', 'b', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'v', 'w', 'z', 'æ', 'ð', 'ŋ', 'ɑ', 'ɔ', 'ə', 'ɛ', 'ɝ', 'ɪ', 'ʃ', 'ʊ', 'ʌ', 'ʒ', 'ˈ', 'ˌ', 'ː', 'θ', 'ɡ', 'x', 'y', 'ɹ', 'ʰ', 'ɜ', 'ɒ', 'ɚ', 'ɱ', 'ʔ', 'ɨ', 'ɾ', 'ɐ', 'ʁ', 'ɵ', 'χ']
Using invalid characters will not cause an error, but it might result in unexpected pronunciation.
When you just can’t give up that web prefix:
<speak> See all our products at <say-as interpret-as="characters">www</say-as> dot my company dot com. </speak>
Insert a pregnant pause:
<speak> Today's stock price <break time="500ms"/> fell three percent. </speak>
Customize pronunciation to make a point:
<speak> I don't care what you say; it's pronounced <phoneme alphabet="ipa" ph="gɪf">gif</phoneme>, not <phoneme alphabet="ipa" ph="dʒɪf">gif</phoneme>! </speak>
Speech Markdown is a convenience wrapper around SSML syntax, so Spokestack’s support for it mirrors our SSML support. The structure of Speech Markdown is flatter than SSML’s; support for the SSML elements above translates into the following Speech Markdown syntax:
Here are the above SSML examples translated into Speech Markdown:
See all our products at (www)[characters] dot my company dot com.
Today's stock price [500ms] fell three percent.
I don't care what you say; it's pronounced (gif)[/gɪf/], not (gif)[/dʒɪf/]!
Want to dive deeper into the world of Android voice integration? We've got a lot to say on the subject: