Teaching A New Dog Old Tricks

@Engaget recently published a very interesting article that discusses how Alexa is taking a step forward in its ability to actually do what a user is asking to do.

The most interesting part of the article is the first sentence:

Amazon is training its voice assistant to be able to tell which skill will most suit your needs if you have no idea which one to summon.

This and many other articles have described this as Alexa learning a “new trick.” In reality it’s an old trick that this new dog is struggling to learn; namely, the “trick” of simply understanding what someone is asking to do and doing it.

This is an example of an Assistant taking steps to begin to do what many people have always imagined they already did. The truth is that this one sentence sums up one of the huge challenges that face developers of Voice Assistants and Intelligent Personal Assistants.

The sad fact is that although Alexa is a great system she really isn’t that intelligent. The sadder fact is that Intelligent Assistants in general aren’t actually that intelligent. They don’t really understand much about what the user is requesting.

IPA’s are conceptually simple. They listen to an input, extract the intent and the objects of the intent, and then pass off to one of many “action execution modules” called skills that perform the requested function.

Fail, Funny, Stupid, Dumb, Crazy That sounds simple but it’s actually extremely hard, mainly because really understanding free-flowing natural language is still over the horizon. Real people use language that is vague, conversational, and heavily context-oriented. As a result one of the hardest problems in the world of IPA development is determining the intent.

The next hard problem after intent is deciding which skill to call to perform the intent. Alexa currently has more than 40,000 skills, many of which are not directly created or controlled by Amazon. Many of these skills have vague and overlapping capabilities.

The historic resolution of this problem has been to tacitly acknowledge that the new dog can’t actually learn old, basic human tricks and to pass the problem off to the user. Alexa (and other IPAs) can’t understand human language so she turns the situation around and insists that humans learn her language.

In the early days of Alexa the system was basically simply a voice interface that activated apps. To make that work the user needed to know the name of the app that was to be kicked off, in much the same sense that a user who wants to hear No Tears Left to Cry needs to provide the name of the song.

Alexa advanced from kicking off specific apps to kicking off more general skills but note that the first line of the @Engaget article presents Alexa not as an assistant who understands people and performs tasks but rather as a mechanisms for the user to “summon” one of 40,000 mini assistants that perform specific tasks. The user still needs to know the name of the skill or associated key words to get a reasonable result. It becomes harder and harder for the user to hold up their end as the number of skills explodes.

Amazon’s new feature, CanFulfillIntentRequest, will make real progress by going out to those 40,000 skills in an attempt to find one that can fulfill the request without the user having to be aware of the skills existence, capability, and name.

That’s a good thing but what we really need is for all the new dogs to learn the old tricks that we all expect them to have. They need to understand real conversation. They need to operate in context. They need to deal with input that is vague, inaccurate, and changing.

That’s all a lot easier said than done. It will be exciting to see where IPAs go in the future as various projects begin to work these fundamental problems.

Teaching A New Dog Old Tricks

Author:admin

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Tags