A wake word is a specific term or phrase that can wake up an app for active listening. “Hey Siri” and “Alexa” are two of the most widely known wake words. The Spokestack native libraries have built-in support for wake words on mobile. Our services include building a customized, high-performance wake word model for your brand.
A keyword is a brief command that supports variations in phrasing—using a fast, lightweight model—without user audio leaving the device.
Text-to-speech (TTS) is how voice user interfaces talk back. Spokestack provides a hosted TTS service that you can access directly or through our native libraries. What separates Spokestack TTS from other providers is our synthetic voice capability. Spokestack will build a custom voice model from your audio data so you can present a branded voice experience to your customers.
The technology for converting spoken words to text is known as Automatic Speech Recognition (ASR). The Spokestack open-source native libraries provide a convenient API across multiple ASR providers such as Apple, Google, and Microsoft.
Natural Language Understanding (NLU) is what makes user speech actionable. Spokestack provides deep learning-based NLU models that can be deployed on device or to a web service. On-device NLU keeps your customer data away from third-party services and can operate even without a network connection.
Voice Activity Detection (VAD) is responsible for making an initial determination of whether or not a snippet of audio contains human speech. Ignoring audio that's not detected as speech saves energy and processing power. The savings grow with each downstream processor you have in your speech pipeline.
The speech pipeline is the main way you interact with Spokestack’s VAD, wake word, and speech recognition. The speech pipeline is an extensible audio processing pipeline that includes a variety of built-in speech processors for voice activity detection (VAD), wake word activation, and automatic speech recognition (ASR).