This guide will get you up and running with Spokestack for Python, and you’ll have a voice interface in your application in no time.
There are some system dependencies that need to be downloaded in order to install
spokestack via pip.
brew install lame portaudio
sudo apt-get install portaudio19-dev libmp3lame-dev
We currently do not support Windows 10 natively, and recommend you install Windows Subsystem for Linux (WSL) with the Debian dependencies. However, if you would like to work on native Windows support, we gladly accept pull requests.
Another potential avenue for using Spokestack on Windows 10 is via anaconda. PortAudio can be installed via
conda, but Lame cannot. Hence, microphone input will be supported, but text-to-speech will not.
conda install portaudio
Once system dependencies have been satisfied, you can install the library with the following.
pip install spokestack
pyenv for virtual environments.
pyenv install 3.8.6 pyenv virtualenv 3.8.6 spokestack pyenv local spokestack pip install -r requirements.txt
pip install tensorflow
In use cases where you require a small footprint, such as on a Raspberry Pi or similar Internet of Things (IOT) devices, you will want to install the TFLite Interpreter. You can install it for your platform by following the instructions.
In order for your application to use Spokestack’s features, there are a few things you will need:
- A free Spokestack Account
- Audio Input Device
- Audio Output Device
Go to spokestack.io to set up your own account (it’s free!). Once you’ve got that, go grab one of our free NLU models. We’ll use the
Highlow one in this example, but you can choose another, or create your own
Once you’ve downloaded your NLU, unzip
nlu.tar.gz with the three files inside (
vocab.txt). The location of the directory isn’t important, because we will pass the path on initialization.
PyAudioInput class will use the system default audio input device. Most personal computers have some form of microphone, but in the case of an embedded device, you may need to purchase a small USB microphone.
Spokestack’s speech pipeline handles collecting audio from the input device and transcribing speech directed at your app. The
SpeechPipeline guide has a detailed explanation of how to set up the pipeline, so we will show the quickest way here using a profile, which configures the pipeline’s components for a specific use case. The profile we use here includes wakeword activation and speech transcription using Spokestack’s cloud ASR.
from spokestack.profile.wakeword_asr import WakewordSpokestackASR pipeline = WakewordSpokestackASR.create( "spokestack_id", "spokestack_secret", model_dir="path_to_tflite_model_dir" ) pipeline.start()
Translating the text into an action is the job of the Natural Language Understanding (NLU) component. A great thing about Spokestack NLU models is that they run entirely on device. The NLU can be initialized like this:
from spokestack.nlu.tflite import TFLiteNLU nlu = TFLiteNLU("path_to_tflite_model_dir")
Input to the NLU model is the ASR transcript. The transcript can be accessed as a property of
SpeechContext. Below is a sample event handler for running inference on the speech transcript.
@pipeline.event def on_recognize(context) results = nlu(context.transcript)
Some useful links for configuring Spokestack’s NLU:
If you want the full smart speaker experience, you will need to give your application a voice. This can be achieved with text-to-speech (TTS). For more information on TTS, see the TTS concept guide. TTS playback uses the
PyAudioOutput class, which plays audio with the default speaker for the device. Like NLU, TTS can be used in an event handler. Take a look at the example below, which simply speaks what the ASR heard.
@pipeline.event def on_recognize(context): tts.synthesize("welcome to spokestack")
That’s all there is to setting up an application with Spokestack. Your Python application can now accept and respond to voice commands.
Thank you for taking the time to read this!