In a conversation, we constantly try to interpret what our conversational partner is saying and decipher what meaning they want to convey. For social robots (or other conversational systems, such as voice assistants), we call this capability Natural Language Understanding (or NLU for short). Technically, it involves the transformation of the words that are being spoken to the corresponding intent. The words “What is your name?” are typically understood as a request for the other person’s name, and we can assign the intent a convenient (computer-friendly) label, such as RequestName. But this intent could also be expressed with other words, such as “Could you tell me your name?” or “Excuse me, I didn’t catch your name”. Thus, the task of NLU is to transform all the different ways of expressing a certain intent into one and the same label.

The meaning of what we say is however not only dependent on the words we use, but also on the context of their use. The words “Nice to meet you” could be both a Greeting and a Goodbye, depending on whether they are uttered at the beginning or the end of a conversation. Some utterances, such as “And you?” are even unintelligible without knowing the preceding context. If you asked a person her name, and she said “Emma” / “And you?”, the utterance should be taken to mean “What is your name?”. In another context (I leave it up to you to create one), the same words could mean “How old are you?”.

Furhat Software provides an integrated approach to understand the meaning of words in context

When creating a new application (or what we call a skill) for Furhat, it is typically useful to write down a set of example interactions, as you imagine them. Then, you can try to identify what Intents the user’s utterances correspond to, and what States the dialog goes through. The dialog state is a way of representing context in Furhat Software. A collection of dialog states constitute a Flow. Let’s say we want to build a robot that sells burgers. An example interaction could look like this:

How to integrate NLU and Dialog with Furhat

As can be seen, many user utterances, like “Medium” and “What do you have?” cannot be fully understood by themselves, and their meaning will depend on the context. In the dialog state RequestFlavor (turn 8), RequestOptions is a request for which flavors are available. If the intent RequestOptions would have been detected in turn 14, we should understand it is a request of which different sizes there are. Still, we have to assign some intent to these utterances, even if it is something vague, like Size.

The dialog state we are in will determine how the intent will be processed, and thereby get its ultimate meaning. 

There are two important observations to make here. First, certain intents are relevant in most dialog states, such as RequestRepeat (“what did you say?”), whereas others are only relevant in very specific dialog states, such as Flavor (“strawberry”). Second, their meaning (i.e. their ultimate interpretation) is sometimes dependent on the current state, such as RequestOptions, whereas others, such as RequestRepeat, have the same meaning in all states.

The way this is handled in the Furhat Software is to organize the dialog states in a hierarchy. For the example above, the hierarchy could look like this:

How to integrate NLU and Dialog with Furhat

On the top in this hierarchy, we find the abstract Dialog state, which is inherited by the Welcome and RequestOrder states, and so on. Intents that are valid for each state are written in italics. As you can see, the intent RequestRepeat is only defined in the Dialog state, but it is still available in all states (you can always ask the robot to repeat). The intent Drink is only available in RequestDrink and RequestFlavor. The intent RequestOptions is available in the RequestOrder state and all its descendants. However, the consequences of triggering the intent are overridden in each of these states, as it gets a different meaning depending on the specific state. If you are used to object-oriented programming, this might remind you of how classes inherit each other, and how methods get overridden.

Now, when we are in a specific state, such as RequestFlavor, Furhat Software will collect relevant intents from all ancestor states. Thus, even if the robot asked “What flavor would you like?”, the user is free to say either “chocolate” (Flavor), “You know, forget it, I would just like a burger” (PlaceOrder), or “sorry?” (RequestRepeat). However, it would not be relevant to say “large” (Size) in this context. From the collected intents, Furhat Software compiles an intent classifier that can map the words in the user’s utterance to the correct intent. How this intent will be processed is up to the state that defined the matching intent.

I hope you found this introduction to NLU and Dialog in Furhat Software interesting. This is just an introduction to the topic, there are many other aspects that we haven’t discussed here. One important aspect is that intents can also have entities in them, which work as parameters for the intent. For example, RequestOrder (“I would like a large cheeseburger menu with fries”), can be further specified with entities such as MainOrder (“cheeseburger”), SideOrder (“fries”) and Size (“large”). Furhat Software also supports multiple intents in the same utterance. An example of this is if the user would say “Hi there, I would like a large veggie burger menu”. This would then contain both a Greeting and a PlaceOrder intent. You can read more about these and other interesting features in our SDK documentation.


Gabriel Skantze, Co-founder & Chief Scientist

Gabriel Skantze is Chief scientist and co-founder of Furhat Robotics. Gabriel is also a Professor in Speech Technology with a specialization in Conversational Systems at KTH. He is leading several research projects and has published 100+ papers on conversational systems and human-robot interaction.