Text-to-speech for iOS

Text-to-speech is a broad topic, but as far as Spokestack is concerned, there are two things your app has to handle: sending the input to be synthesized, and playing the resulting audio for your users. This guide will cover both.

Starting up

To synthesize speech in Spokestack, use the TextToSpeech component:

let tts = TextToSpeech(self, configuration: configuration)

In this example, self implements the TextToSpeechDelegate protocol, which utilizes the delegate pattern to forward TTS events to your app.

We also use the default credentials for SpeechConfiguration.apiId and SpeechConfiguration.apiSecret, which are set to public values that let you try Spokestack TTS without creating an account. Create an account or sign in to get your own free API credentials and access to additional features!

Generating the audio

Generating a URL to an audio stream of a TTS synthesized voice is just a single method in Spokestack!

tts.synthesize(TextToSpeechInput("Here I am, a brain the size of a planet."))

This is the simplest arity of synthesize, which takes an instance of TextToSpeechInput constructed with a plain text string as input. The success(TextToSpeechResult:) delegate function will be called with the result, which has a url property with the audio stream.

TextToSpeechInput has additional properties that you can use for more sophisticated speech synthesis. As always, there’s more detail in the API documentation. Let’s detail those advanced properties briefly:

The input is simply the string you want to hear synthesized.

The inputFormat argument is here because Spokestack supports serveral input formats: plain text, SpeechMarkdown, and a subset of the SSML spec for specifying pronunciation and specific pause times. See the TTS concept guide for more information on providing SSML input. If you don’t need this level of control, TTSInputFormat.text is the default.

The voice argument allows you to specify which of Spokestack’s library of synthetic voices you wish to use. Want something besides demo-male? Create a Spokestack Maker account to train your own!

The optional id property allows you to track individual TTS synthesis requests; it will be echoed back in the corresponding TextToSpeechResult.id.

Using the generated audio

What you do with the synthesis result (or failure) is up to you! The streaming URL is valid for 60 seconds, so you can save it for later or play it back immediately.

To save it for later, you can simply download the audio before the 60-second TTL expires:

let destinationUrl = documentsUrl.appendingPathComponent(url.lastPathComponent)
let urlData = NSData(contentsOf: url)
urlData!.write(to: destinationUrl, atomically: false)

To use your own AVPlayer instance to control playback:

let playerItem = AVPlayerItem(url: streamingFile)
let player = AVPlayer(playerItem: playerItem)

Simply speak!

Want to skip all that and let Spokestack handle the both the synthesis and playback of your synthesis? We’ve combined the two sections above into a single function call! Try speak(TextToSpeechInput:)!

let input = TextToSpeechInput(text)

You’ll receive event notifications when synthesis has completed (success), playback has begun (didBeginSpeaking) and playback has finished (didFinishSpeaking). A whole TTS feature in just one function call!

Related Resources

Want to dive deeper into the world of Android voice integration? We've got a lot to say on the subject: