In the world of AI, we are witnessing an impressive rate of growth that is not entirely unexpected considering that, ever since the invention of the calculator, we have been progressively building more and more intelligent machines. Recent AI advancements may feel sudden, but we have been slowly and painstakingly building up to this point.
Large Language Models (LLMs), of which OpenAI’s ChatGPT may be the most pivotal example, are simply the latest step on AI’s evolutionary journey and have reached the point, similar to when the internet first became accessible to the general public in 1993, where they can transform every aspect of human society, from how we work to how we interact with each other and with machines.
To fully comprehend the implications of LLMs like ChatGPT, it is worth exploring how they work. LLMs use Natural Language Processing (NLP) to provide the human-like responses that they have become so famous for. Essentially, LLMs are trained on an immense pool of text data (e.g. the internet) that they can sift through and analyze using a network of machine learning algorithms that operate and store information like the human brain does. Within this immense framework, the NLP algorithms and models are what allow LLMs to comprehend and use language the way a human would.
There are many other layers and details that are needed to fully understand the inner workings of LLMs like ChatGPT, but the ability to communicate like a human is particularly important when it comes to making the analytical and generative powers of AI accessible to the general public. It is this ability that is a game-changer as it makes it possible for any person and machine to effectively communicate with each other using simple language.
It’s no surprise that generative AI tools built on top of LLMs are currently popping up all over the place. LLMs have made it possible for humans to communicate with machines with the ease in which we communicate with each other, meaning AI now has something that it didn’t have before – accessibility. We can all finally speak the same language.
However, this brings us back to social robotics. Does the improved accessibility of AI and the advancements in LLMs make social robots themselves more accessible to people outside of research and computer science? Could this be the beginning of the widespread adoption of robots in society?
LLMs like ChatGPT do make the Furhat robot a lot more autonomous. Interactions no longer need to be carefully scripted and designed by developers for the robot to have a flowing conversation with someone. With ChatGPT, Bard, or any other AI chatbot, the Furhat robot can be placed anywhere and is ready to have a very human conversation about anything with anyone. Even those with zero programming experience can have a functional social robot that can have conversations with people. This is a game-changer in that it makes social robots a lot more attractive to a much broader group of people, and it is this expanded usability that has the potential to accelerate the widespread adoption of robots in society.
However, it’s important to remember that a robot is very different from a chatbot or a virtual avatar. As an embodied agent, a robot, especially a social robot, needs to be able to do and understand a lot more than what AI chatbots are currently capable of.
Aside from being able to generate answers to any question or prompt, a social robot needs to be able to perform gestures and understand the gestures of others. It needs to be able to express emotion and it also has to sound human, not just its voice, but also in how it speaks – where it injects silences or utterances like aha, hmmm, interesting, when it nods, how it looks at you while you speak, how much it blinks, etc. All these behaviors, known as backchanneling, are extremely important for an interaction with a social robot to feel natural and engaging.
It doesn’t matter how good the response generated by a chatbot is if a robot is looking at the ground, blinking uncontrollably or nodding non-stop while delivering it. When it comes to social robots, integrating them with AI chatbots does give them a lot more conversational autonomy, but it has to be combined with very good backchanneling in order to feel human. Until we can train LLMs on data from spoken interactions so backchanneling behaviors can be automatically generated, LLMs will be limited in what they can do for social robotics interactions.
Limitations aside, the current versions of AI chatbots do make robot skill development faster and more efficient than it has ever been. There remains a lot to be done, but there is no doubt that it is a very exciting time to be a social robotics company like ourselves. We are entering a time in which many of the AI companies that are currently popping up all over the place will be looking for new ways through which AI can interact with people and the world. Robots are the obvious choice, as they can give AI the more embodied and direct experience with the world that it needs to reach new levels of intelligence and understanding.
The Furhat robot now has the potential to become the human face of AI that can take LLMs and NLP to a whole new level that includes an understanding of body language, facial expressions and social intricacies. Being able to give AI such a multi-dimensional experience is an amazing prospect and a goal that we are ready to take on.