Easily add voice to your UX

Spokestack makes it easy to add voice user interfaces to your mobile apps and websites. Our open-source native libraries for Android and iOS provide a unified API for using voice on mobile. Our hosted web services augment the natural language processing capabilities across other platforms.

Wake word

A wake word is a specific term or phrase that can wake up an app for active listening. “Hey Siri” and “Alexa” are two of the most widely known wake words. The Spokestack native libraries have built-in support for wake words on mobile. Our services include building a customized, high-performance wake word model for your brand.


A keyword is a brief command that supports variations in phrasing—using a fast, lightweight model—without user audio leaving the device.

Text-to-speech & custom voice

Text-to-speech (TTS) is how voice user interfaces talk back. Spokestack provides a hosted TTS service that you can access directly or through our native libraries. What separates Spokestack TTS from other providers is our synthetic voice capability. Spokestack will build a custom voice model from your audio data so you can present a branded voice experience to your customers.

Automatic Speech Recognition

The technology for converting spoken words to text is known as Automatic Speech Recognition (ASR). The Spokestack open-source native libraries provide a convenient API across multiple ASR providers such as Apple, Google, and Microsoft.

Natural Language Understanding

Natural Language Understanding (NLU) is what makes user speech actionable. Spokestack provides deep learning-based NLU models that can be deployed on device or to a web service. On-device NLU keeps your customer data away from third-party services and can operate even without a network connection.

Voice Activity Detection

Voice Activity Detection (VAD) is responsible for making an initial determination of whether or not a snippet of audio contains human speech. Ignoring audio that's not detected as speech saves energy and processing power. The savings grow with each downstream processor you have in your speech pipeline.

Speech Pipeline

The speech pipeline is the main way you interact with Spokestack’s VAD, wake word, and speech recognition. The speech pipeline is an extensible audio processing pipeline that includes a variety of built-in speech processors for voice activity detection (VAD), wake word activation, and automatic speech recognition (ASR).

Become a Spokestack Maker and #OwnYourVoice

Access our hosted services for model import, natural language processing, text-to-speech, and wakeword.