Natural Language Understanding

Turn speech into software commands by classifying intent and slot variables from speech.

What is Natural Language Understanding?

Natural language understanding, or NLU, uses cutting-edge machine learning techniques to classify speech as commands for your software. It works in concert with ASR to turn a transcript of what someone has said into actionable commands. Check out Spokestack's pre-built models to see some example use cases, import a model that you've configured in another system, or use our training data format to create your own.

How Does NLU Work?

NLU is a task within the broader field of natural language processing, or NLP, that focuses on processing an individual phrase or sentenct to extract its intent and any slots containing information necessary to fulfill that intent. In other words, it fits natural language (sometimes referred to as unstructured text) into a structure that an application can act on.

In many systems, this task is performed after ASR as a separate step. Occasionally it's combined with ASR in a model that receives audio as input and outputs structured text or, in some cases, application code like an SQL query or API call. This combined task is typically called spoken language understanding, or SLU.

Example of NLU in Action

To illustrate the basics of NLU, let's look at an example utterance:

In this example, this user clearly intends to buy a plane ticket, so the intent could be named something like book_flight. The flight’s departure time is necessary for the booking, so the book_flight intent would have a slot named departure_time, in this case filled by “8:00 AM on April 13”. A good NLU might further parse that slot value into an ISO time string or similar formal representation.

Note, however, that more information is necessary to book a flight, such as departure airport and arrival airport. The book_flight intent, then, would have unfilled slots for which the application would need to gather further information. An NLU component's job is to recognize the intent and as many related slot values as are present in the input text; getting the user to fill in information for missing slots is the job of a dialogue management component.

A convenient analogy for the software world is that an intent roughly equates to a function (or method, depending on your programming language of choice), and slots are the arguments to that function. One can easily imagine our travel application containing a function named book_flight with arguments named departureAirport, arrivalAirport, and departureTime.

Why Should I Use NLU?

Turn Speech Into Software Commands

Extract intent and variables from a sentence.

Don't Just Listen to Your Users

Respond the Same Way You Would to a Tap/Click

Integrate a voice interface into your software by responding to an NLU intent the same way you respond to a screen tap or mouse click.

Import Models from 3rd-Party Providers

Easily import Alexa, DialogFlow, or Jovo NLU models into your software on all Spokestack Open Source platforms.

Use Case for NLU

Simple Commands

Complex Utterances

More Sophisticated

Move from using RegEx-based approaches to a more sophisticated, robust solution.

What Are NLU Techniques?

Spokestack's approach to NLU attempts to minimize the distance between slot value and function argument through the use of slot parsers, designed to deliver data from the NLU in the shape you'll actually need in your code. For example, the value of an integer slot will be a numeral instead of a string (100 instead of one hundred). Slot parsers are designed to be pluggable, so you can add your own as needed.

The basic task of NLU can be accomplished with many techniques, ranging from running regular expressions on incoming text to see if it contains any commands relevant to your application, classical machine learning methods like classifiers driven by logistic regression or support-vector machines, or neural networks.

Spokestack's NLU uses the third approach, starting with a large pre-trained language model and fine-tuning it with data relevant to your application's domain. The fine-tuned model is then optimized, resulting in a version that's small enough to run in under a second on a modern mobile device.

You may have noticed that NLU produces two types of output, intents and slots. The intent is a form of pragmatic distillation of the entire utterance and is produced by a portion of the model trained as a classifier. Slots, on the other hand, are decisions made about individual words (or tokens) within the utterance. These decisions are made by a tagger, a model similar to those used for part of speech tagging.

Training NLU Models

Spokestack makes it simple to train an NLU model for your application. All you'll need is a collection of intents and slots and a set of example utterances for each intent, and we'll train and package a model that you can download and include in your application.

If you've already created a smart speaker skill, you likely have this collection already. Spokestack can import an NLU model created for Alexa, DialogFlow, or Jovo directly, so there's no additional work required on your part.

If you're starting from scratch, we recommend Spokestack's NLU training data format. This will give you the maximum amount of flexibility, as our format supports several features you won't find elsewhere, like implicit slots and generators.

Once you've assembled your data, import it to your account using the NLU tool in your Spokestack account, and we'll notify you when training is complete.

Become a Spokestack Maker and #OwnYourVoice

Access our hosted services for model import, natural language processing, text-to-speech, and wakeword.