Vito Ostuni - The Voice: New Challenges in a Zero UI World
The adoption of voice-enabled devices has seen an explosive growth in the last few years and music consumption is among the most popular use cases. Music personalization and recommendation plays a major role at Pandora in providing a daily delightful listening experience for millions of users. In turn, providing the same perfectly tailored listening experience through these novel voice interfaces brings new interesting challenges and exciting opportunities. In this talk we will describe how we apply personalization and recommendation techniques in three common voice scenarios which can be defined in terms of request types: known-item, thematic, and broad open-ended. We will describe how we use deep learning slot filling techniques and query classification to interpret the user intent and identify the main concepts in the query.
We will also present the differences and challenges regarding evaluation of voice powered recommendation systems. Since pure voice interfaces do not contain visual UI elements, relevance labels need to be inferred through implicit actions such as play time, query reformulations or other types of session level information. Another difference is that while the typical recommendation task corresponds to recommending a ranked list of items, a voice play request translates into a single item play action. Thus, some considerations about closed feedback loops need to be made. In summary, improving the quality of voice interactions in music services is a relatively new challenge and many exciting opportunities for breakthroughs still remain. There are many new aspects of recommendation system interfaces to address to bring a delightful and effortless experience for voice users. We will share a few open challenges to solve for the future.
9. 9
Known-Item Queries
â Clear user intent
â âPlay Drakeâ
â Challenges
â ASR errors
â Partial/Imperfect user queries
â Ambiguities
âPlay In My
Feelingsâ
âPlay the song that goes
I got horses on my backâ
âPlay latest
Ed Sheeran
singleâ
Known-item Queries
18. 18
UX Considerations
â Design UX to elicit user feedback
â Collect feedback for low confidence results
âPlay John Mausâ
ASR Voice
Search
âplay john mouseâ confirm(âjohn mausâ)
âDo you want me to
play your John Maus
radio?â
âYesâ
yes
19. 19
Thematic Queries
â Semi-ambiguous intent
â âPlay me road trip musicâ
ârelaxing driveâ playlist âroad tripâ movie soundtrack
âPlay some music
to partyâ
âI am looking for
something happyâ
âI want some music
to help me sleepâ
24. 24
Natural Language Understanding
NLU
Intent detection
Slot filling
Voice2text
Intent: play music
Slots: character: romantic
context: dinner
âplay some romantic dinner musicâ
25. 25
NLU Architecture
play classics by aretha franklinInput query
CRF
Classifier
Bidirectional
LSTM
[Hakkani-Tur, INTERSPEECH 2016]
[Reimers & Gurevych, EMNPL 2017]
O B-obj O B-art I-art
Word features
word embeddings, CNN char features
26. 26
Broad Queries
â Open-ended intent
â âPlay something awesomeâ
â Pure recommendation problem
â No query
âPlay something newâ
âPlay music I likeâ
âPlay musicâ
28. 28
Other Considerations
â Different evaluation tasks
â Entity search
â Thematic recommendations
â General recommendations
â Implicit feedback
â Play time (no click data)
â Query retrials and session analysis
â Selection bias
â Only one recommended item
â Exploration policies