Vision mobile beyond_siri


Published on


Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Vision mobile beyond_siri

  1. 1. 12 Beyond Siri: the next frontier in User Interfaces © VisionMobile 2012. Some rights reserved. 1
  2. 2. 12 Beyond Siri: the next frontier in User Interfaces About VisionMobile Contents VisionMobile is a leading market analysis and strategy firm, 1. Virtual assistants: four generations in 20 years for all things connected. We offer competitive analysis, market due diligence, industry maps, executive training and 2. The evolving VA technology landscape strategy, on topics ranging from the industrys hottest trends 3. The VA Competitive landscape to under-the-radar market sectors. Our mantra: distilling market noise into market sense. 4. VA business models: Revenue share rather than paid app downloads VisionMobile Ltd. 90 Long Acre, Covent Garden, 5. Leaders and challengers in the VA value chain London WC2E 9RZ +44 845 003 8742 6. Beyond Siri: What’s in store in the VA market Follow us: @visionmobile Behind this report Lead researcher: Marlène Sellebråten About i-Free Project lead: Michael Vakulenko i-Free Innovations is specialised in development, testing and Marketing lead: Matos Kapetanakis implementation of venture projects, advanced technological designs and innovative products. It has a unique team of Editorial: Andreas Constantinou experts and IT-specialists, and great experience in implementation of innovative projects in high-tech industry. Companies interviewed For more information see : Artificial Solutions AB AT&T Labs, Inc. License Dexetra Software Solutions Private Limited i-Free Innovations Licensed under Creative Commons Nuance Communications, Inc. Attribution 3.0 license. Speaktoit LLC Any reuse or remixing of the work should SRI International be attributed to the VisionMobile xbrainsoft “Beyond Siri: The next frontier in User Interfaces” report. Also by VisionMobile Copyright © VisionMobile 2012 Mobile Industry Atlas | 5th Edition The complete map of the mobile industry landscape, mapping 1,700+ companies Disclaimer across 90+ market sectors. VisionMobile believes the statements contained in this Available in wallchart and online version publication to be based upon information that we consider reliable, but we do not represent that it is accurate or complete, and it should not be relied upon as such. Opinions expressed are current opinions as of the date appearing on this publication only, and the information, including the opinions contained herein, are subject to change without notice. Use of this publication by any third party for whatever purpose should not and does not absolve such third party from using due diligence in verifying the publication’s contents. VisionMobile disclaims all implied warranties, including, without limitation, warranties of merchantability or fitness for a particular purpose. VisionMobile, its affiliates and representatives shall have no liability for any direct, incidental, special, or consequential damages or lost profits, if any, suffered by any third party as a result of decisions made, or not made, or actions taken, or not taken, based on this publication. © VisionMobile 2012. Some rights reserved. 2 v.1.01
  3. 3. 12 Beyond Siri: the next frontier in User Interfaces Key Messages Helped by Apple’s successful launch of its Siri technology in 2011, voice-activated mobile virtual assistants (VAs) -- applications enabling users to complete tasks like search, dialling or texting via voice commands -- have crossed the chasm into mass-market deployments. Apple’s product triggered a wave of both imitation and innovation in the last year, including tens of smartphone applications. The most-downloaded examples on Android and iOS today include Vlingo Virtual Assistant, Iris, Voice Actions, Skyvi, Everfriends and Dragon Go. In this report, we profile four applications besides Siri: Dragon Go by SR specialist Nuance Corporation, Everfriends by visualisation-driven i-Free Innovations, iris by AI startup Dexetra and Speak4it by AT&T Labs. A shift from commands to dialogue. Following advances in Artificial Intelligence (AI) -- in particular Natural Language Processing (NLP), user profiling and search – VA technology is moving from understanding language to anticipating user intent. As such the focus for virtual assistant apps is shifting from today’s command-and-control (“I ask, you answer”) towards continual dialogues of recommendations and user actions. Established vendors such as SRI International, Apple, Google and Nuance as well as challengers like Artificial Solutions, Dexetra and i-Free Innovations, are all working on this shift from commands to dialogue. SRI International is to showcase a back-and-forth dialogue technology by fall 2012. VAs are disrupting search. Delivering answers rather than search results, is a core value proposition of virtual assistants. For traditional search engines, this translates into decreasing page hits and consequently to a fall in search advertising revenue. Google has seen declining search traffic from iPhones following the launch of Siri, according to our sources. We expect Google to launch a free Siri alternative across multiple smartphone platforms, hardwired to Google’s search results and advertising revenue streams. Virtual assistants as a control point for user targeting. As a point of convergence for user profiling data, virtual assistants establish a new control point. By amassing deep knowledge of user search terms, they can become pivotal to third parties wanting to target users by interest. Business models changing to service distribution deals. Today’s VA business models focus on user acquisition, and apps are therefore distributed primarily as free downloads. The top 43 VA apps generated under two million dollars (USD), despite having produced over 133 million cumulative downloads. Over 94 percent of downloads were on Android while nearly 86 percent of paid download revenue were on iOS. Moving forward, we see revenue coming in from search and advertising and, increasingly, from third-party deals and avatar customisation, rather than from paid downloads. Virtual assistants becoming a competitive differentiation for handset makers. Integrating VAs into the core user interface rather than as just-another-app gives OEMs better control over user experience and service discovery. Apple got a head start by integrating Siri into the iOS 5 user interface, with other handset manufacturers following closely. Samsung’s latest smartphone also has an integrated voice UI, Samsung Voice. And Nokia is readying a Siri-like UI for the late 2012, according to our sources. This new UI will be taking advantage of Nokia’s Navteq capabilities. Voice UIs a primary access point to connected screens. Voice-activated user interfaces are set to become a key component of multimodal UIs that support touch, gesture, or text input. More importantly, voice UIs can become a universal, cross-screen and screen-agnostic UI, starting with tablets, TVs and desktop computers. Besides Apple and Samsung, Nuance is well-positioned for speech recognition deployments on multiple screens. Telcos get involved. NTT DoCoMo, which pioneered the VA concept with iConcier in 1998 and AT&T, with a number of VA app launches in the past year are the leading telcos in the VA space. We expect to see more tier-1 telcos working on the deployment of VAs based on the Rich Communication Suite (RCS) standard, hitting the market in 2014. Besides running a VA as a service discovery gateway, optimising network access for VAs may provide operators with additional service differentiation. © VisionMobile 2012. Some rights reserved. 3
  4. 4. 12 Beyond Siri: the next frontier in User Interfaces VA personalities to remain in the cloud benefiting Google and Amazon. Virtual assistant personalities will move from devices to the cloud due to the immense amounts of data needed to process with the next-generation VAs. With personalities stored in the cloud, virtual assistants will become readily and seamless available not only on smartphones, but also on TVs, in cars, and in smart homes. Established cloud storage and processing companies like Google and Amazon stand to benefit the most. Google hits it big on free speech recognition API. Google’s free speech recognition API -- and thereby the Android platform -- is today the platform of choice for a majority of VA apps. More speech recognition vendors are expected to head towards free API use, as VAs using licensed speech recognition (SR) engines look into moving to free alternatives. Patent wars can spill into the VA market. SRI International patents extensively, Nuance holds 2,000 speech recognition patents while AT&T holds 600 patents in the AI space. As virtual assistants become a competitive asset for handset makers, we can expect patent wars to spill over from mobile handsets into the VA domain as well. Strong B2B vendor Nuance rising in the consumer VA market. Nuance’s speech recognition technology is used by Apple and Google and powers a large number of VA apps available for direct consumer download -- including two of the VA apps most often downloaded by consumers. Nuance’s direct-to-consumer apps will help its technology improve, but also put the company in competition with its own B2B customers. New opportunities for targeted marketing. Context-based user profiling opens new opportunities for contextual marketing and advertising, by letting brands push more user-relevant messages, offers and recommendations. Mobile advertising is today the fastest growing segment within digital advertising and mobile users’ interest in the format is proven to increase when ad relevance increases. © VisionMobile 2012. Some rights reserved. 4
  5. 5. 12 Beyond Siri: the next frontier in User Interfaces CHAPTER ONE Virtual assistants: four generations in 20 years A virtual assistant is a context-aware conversational application and interface for the delegation of tasks such as search, dialling or texting using natural language. Large companies have in the past decade deployed web-based VAs to complement traditional customer service agents. The launch of Siri by Apple in 2011 helped virtual assistants cross the chasm into a mass-market technology. But the journey for VAs began long before Siri; VAs have evolved through three generations in the last 20 years, and are now entering a fourth one, as shown in the table on the next page. Virtual assistants were first introduced in the mid-90s by General Magic, a spin-off of Apple’s Paradigm project, led by Marc Porat. General Magic’s Portico was a network-based virtual office assistant available to US business users on desktop computers and PDAs. Using keyword-based voice-command and text-to- speech, Portico could complete voicemail and email administration tasks. Despite retail deals with Sony, AT&T and Motorola, Portico proved to be a commercial failure. In 1998, NTT DoCoMO launched iConcier to the Japanese consumer market. This second-generation VA, available on i-mode-enabled handsets, used artificial intelligence functions such as phrase understanding for simple command-response dialogues with an avatar. NTT entered into content deals with over 250 third parties, in turn giving paying subscribers access to services ranging from navigation to bus timetables and coupon deals. Initially available only on NTT’s own media platform, i-mode, iConcier was made available to third-party Android developers in March of this year. About a year before Siri hit the market in April 2011, Nokia deployed a voice search function using Microsoft Tellme. Google had also made search-by-voice possible on Android a full eight months before Siri hit the market. A few elements set Siri and those third-generation VAs apart from Portico and the initial iConcier. Firstly, interactions between the VA and the user are more human-like, thanks to Natural Language Processing (NLP). Humour elements help give the VA more of the feel of a human personality. Secondly, today’s third-generation VAs perform tasks beyond traditional communications, such as dialing and texting, as they access third-party content, web search results in particular. They also pull and push user-specific content, like Facebook or Twitter status updates. The wide uptake of smartphones and improved mobile connectivity contributed to this evolution. VA players are now working on fourth-generation virtual assistants that communicate in an even more human-like way, understand not just language but intent and, ultimately, anticipate user needs. Fourth- generation VA personalities will live in the cloud, as large amounts of data need to be processed, giving cloud processing companies like Google and Amazon an upper hand. Fourth-generation VAs will draw upon advances in NLP, speech recognition, personalisation and search, from companies that include SRI, AT&T Labs, and Nuance. “Google and some research labs are capable of building a next generation VA. SRI would of course love to work with Google”, says Norman Winarsky, VP Ventures at SRI International and Visiting Scholar at Stanford University, and one of the brains behind Siri. SRI international will be showing an implementation of back-and-forth dialogue around fall 2012. This June, AT&T Labs plans to let third party developers access the APIs of Watson, its artificial intelligence platform. We understand that Apple is working on a deeper integration of Siri with its core iOS applications, to be rolled out on iPhones and other screens. I-Free is investing in 3D character visualisation, Dexetra on making personal history searchable. Nokia is readying a Siri-like UI for the late 2012, according to our sources, taking advantage of Nokia’s Navteq capabilities. © VisionMobile 2012. Some rights reserved. 5
  6. 6. 12 Beyond Siri: the next frontier in User Interfaces Four Generations Of Virtual Assistants 1995-1999 2000-2010 2011 2012-2015 The virtual telephone The virtual concierge The virtual search The new UI: your assistant assistant lifestyle buddy Type of VA Reactive, task- Reactive, person- Proactive, lifestyle- Reactive, program- centric, device- centric, device- centric, device- embedded embedded embedded embedded Cellular network- Device- and cellular Device- and cloud- Architecture Mostly cloud-based based network-based based Natural language Text-to-speech Speech recognition Speech recognition understanding Technology Keywords and Back-and forth Keywords Phrases phrases dialogue Simple voice Text-to-speech Multimodal: speech, Interface Text-to-speech Commands Speech-to-text text, gesture, touch US-English US/UK-English Languages US English Local language for All Some local languages locally developed VAs Takes messages, Web search, Delivers third-party Delivers context- and forwards calls, reads navigation, set alarms information user-relevant third Tasks performed e-mail, keeps track of using user data, open (weather, coupons, party information, tasks, schedules other apps and etc.), set alarms recommendations appointments location data Desktop computers, Smartphones and Smartphones, tablets, Screen Feature phones PDAs tablets computers, TVs, cars Simple command- None Limited dialogue Conversation response Artificial Intelligence Keyword Humour, some intent Phrase understanding Intent anticipation understanding understanding All types of third- Developer APIs None None-some Some party APIs User- and context- User-specific content, Personalisation None Avatar specific content and avatar, voice services, avatar, voice US market, Asia, US market US market, Asia Global Europe Audience Business users, B2B, Business users B2B, consumers B2B, B2C, B2B2C consumers Free and paid apps, Third-party content Paid, subscription- ad/search revenue and services revenue Business model Paid, usage-based based share, licensing, share, licensing, vertical applications vertical applications Handset & device manufacturers, SR Handset and AI vendors, Launched by Telecoms operators Telecoms operators manufacturers, B2B2C, cloud developers, end-users companies, developers Siri Dragon Search NTTs iConcier Voice Actions SRIs next generation Leading Portico’s Mary (1996) (2008) Vlingo VPA examples Wildfire (1995) SK Telecoms Nate Everfriends Google Glass Iris Speaktoit Source: VisionMobile research © VisionMobile 2012. Some rights reserved. 6
  7. 7. 12 Beyond Siri: the next frontier in User Interfaces CHAPTER TWO The evolving VA technology landscape Technologies today and tomorrow Virtual assistants rely on five technological building blocks: Speech Recognition (SR), Natural Language Processing (NLP), user profiling, search & recommendations, and avatar visualisation. These technology blocks are very much in a state of continual evolution, leaving the field open for innovation by large vendors and start-ups alike. SPEECH NATURAL LANGUAGE USER PROFILING SEARCH & AVATAR RECOGNITION PROCESSING RECOMMENDATIONS VISUALISATION Technology building blocks for virtual assistants Source: VisionMobile research Speech recognition Speech recognition (SR), also referred to as automatic speech recognition (ASR) or speech-to-text (STT), deals with machine translation of spoken words into text. Text-to-speech (TTS) is also required to translate text into spoken words. Without it, no dialogue is possible between the user and the virtual assistant. Voice-activated VAs use speech recognition to carry out tasks such as web search, voice dialling and dictating text-based messages like email and SMS, or even entire documents. Key players in the speech recognition space are Nuance, Google, iSpeech and Microsoft. Outlook. In addition to high demand for US English-speaking VAs, speech recognition technology vendors are experiencing growing demand for local language support and are working towards speeding up their language production. One major challenge is the cost associated with language development; speech recognition support for every new language must be built almost from scratch. Language interdependency -- the fact that most languages are not self-contained -- adds to the difficulty. US English is today the language of choice for VAs, as it is the perfect SR engine training ground: The US is a linguistically homogeneous market, and there is a substantial amount of content and third-party APIs accessible in either English or in the US. Natural Language Processing -- Understanding the context “The next technology While speech recognition translates spoken words into text, Natural leap for the virtual Language Processing (NLP) turns text into meaning and context personal assistant will understanding. By understanding the user’s context -- their history, habits, tastes and location -- the VA can return the most relevant information or be to maintain a recommendations, and do it in a socially appropriate fashion. conversation.” Key players in the Natural Language Processing technology are SRI Norman D. Winarsky, Ph.D. International, Nuance, AT&T Labs, Google and Artificial Solutions. Vice President, SRI Ventures SRI International © VisionMobile 2012. Some rights reserved. 7
  8. 8. 12 Beyond Siri: the next frontier in User Interfaces Outlook. In order to make virtual assistants fully conversational, vendors today are working on technology that enables back-and-forth dialogues and on understanding the rules of social interaction. We should not forget how breaking these social interaction rules made Microsoft Office assistant Clippit (aka “Clippy”) unpopular. An intermediate solution is to let the user set rules of interaction on a case-by-case basis, whereby the user tells the VA their level of availability: open for chat, dialogue, recommendation or neither of those. VAs also need to learn and act upon the user’s historical data, which requires contextual training, processing vast amounts of data and substantial server capacity. The cloud is a natural place for this sort of “Big Data”, but for the foreseeable future, vendors favour a hybrid solution, with some data stored in the device to allow the virtual assistant to function where connectivity is unavailable. User profiling User profiling involves collecting information about a user and using it to model their interests, preferences, context and goals. User profiling is essential for having a VA able to deliver personalised information, dialogue and recommendations. Key players in user profiling technology are SRI International, Google, Apple, AT&T Labs, Artificial Solutions, and Tobii (Apple). Outlook. New techniques of user profiling promise to go beyond mere digital content tracking, by gathering information about mood and emotion through eye-tracking, keyboard tracking, and/or temperature tracking. Samsung’s latest smartphone, the Galaxy S III, features eye-tracking technology and Apple, who bought parts of eye-tracking specialist Tobii in 2009, is said to be integrating its technology in the future. Search and product recommendation Combined, NLP and user profiling enable personalised search results and recommendations, such as advice on content and services. Asking the VA for restaurants can for example lead to different recommendations based on the user profile and its context: a juice-bar for a jogger in a park, and a gourmet restaurant for a fine food enthusiast walking in the same park. Key players. Delivering recommendations is a matter of collaboration within the ecosystem: NLP vendors, search engines, knowledge and Q&A platforms, content providers, social networks and ad networks. Outlook. Recommendations technologies are already in use, most notably by Amazon and Netflix, where different books or movies are recommended to the user based on prior shopping or viewing history. Content recommendations via social sharing and social networks such as Facebook, or via websites using Facebook’s Graph API, are mainstream, and pull more user profiling data than Amazon. The missing piece is context, which is still work in process. Avatar visualisation and personalisation Avatars – graphic, animated representations of a person -- are also used by many VAs. Avatars are a way to humanize the assistant, with the intent of increasing emotional attachment. Avatar visualisation is a form of gamification that helps make interaction more fun and engaging. Outlook. For human-looking avatars, new technologies such as 3D body-scanning and facial recognition have the potential, when integrated with 3D graphics in devices, to take avatar visualisation to the next level. Avatars are used by a large number of VAs, but opinions differ as to the potential of monetising customisation. Selling customisation as an in-app purchase is one opportunity. Another is brand placement, buying for example a branded sweater for the avatar. © VisionMobile 2012. Some rights reserved. 8
  9. 9. 12 Beyond Siri: the next frontier in User Interfaces CHAPTER THREE The VA Competitive landscape Siri is only the tip of the iceberg in what is becoming a very competitive market. Apple’s product triggered a wave of both imitation and innovation in the last year, including tens of smartphone applications. In this report, we profile four applications besides Siri: Dragon Go by SR specialist Nuance Corporation, Everfriends by visualisation-driven i-Free Innovations, iris by AI startup Dexetra, and Speak4it by AT&T Labs. We identified 43 representative virtual assistant applications available on Android or iOS. They were collectively downloaded 133.3 million times. Some 94 percent of these VA downloads were Android apps. Google’s Voice Search -- a true VA only once Voice Actions has been activated -- single-handedly accounted for over 81 percent of the 133.3 downloads we looked at. Even without Voice Search, Android apps still represented 68 percent of the remaining 24.2 million downloads. On the contrary, iPhone and iPad apps accounted for nearly 90 percent of all paid downloads, and nearly 86 percent of paid VA app revenues. Such revenues amounted to 1.8 million dollars (USD), overall a meagre amount considering the potential. The top ten apps accounted for 42 percent of total revenues. Pannous’s Voice Actions alone represented nearly 36 percent of total revenues, closely followed by QuanticApps’s Voice Assistant and True Knowledge’s Evi. Besides Pannous’s Voice Actions, AIVC and Speaktoit were the only apps in the top ten to generate revenue with paid downloads. What sets apart the top ten downloadable VA apps is that they are to a large extent produced by either speech recognition vendors or start-ups working on AI research. Google and Nuance produce the most popular SR engines in the top ten, while one app uses iSpeech. Many VA makers admit to trying out and comparing multiple vendors ahead of a potential switch, quoting dissatisfaction with speech recognition quality or with service pricing. Looking at VA tasks and features, we identified a number of common denominators among top-ten apps. Must-have tasks for a VA are local search & general search, including weather forecasts, voice dialling and texting, including contact lookup and navigation. Key VA features, implemented more widely among top- ten apps, were personality, local language, access to third-party and local content. Such features require extended partnerships with technology vendors or investments in R&D, and local third-party content partnerships. As speech recognition is still under heavy R&D, app production is often driven within one company. Giving VA a personality makes it more human-like. Siri’s sarcastic personality has certainly contributed to an illusion of human-like features. A number of apps are also using customisable avatars to add personality to the VA. Another critical differentiator is access to third-party content, typically from Facebook, YouTube, Spotify,, or to local content, which is in high demand. Local content and local language support are two interdependent features. For example, Siri works fine in the US where there are third- party content deals in place, but is pretty much reduced an entertaining gizmo in other regions, due to lack of local content. © VisionMobile 2012. Some rights reserved. 9
  10. 10. 12 Beyond Siri: the next frontier in User Interfaces Table: Android the virtual assistants’ platform of choice Virtual Assistant applications (May 2012) Downloads (1,000s) Revenue (USD 1,000s) VA application name Publisher/Developer iOS Android Total iOS Android Total Voice Search Google Inc 0 109,000 109,000.0 0 0 0.0 Vlingo Corporation Vlingo Virtual Assistant 5,350 2,860 8,210.0 0 0 0.0 (Nuance Corporation) iris. (alpha) Dexetra 0 4,400 4,400.0 0 0 0.0 Skyvi BlueTornado 0 1,900 1,900.0 0 0 0.0 Speaktoit Assistant SpeaktoIt 58 1,650 1,708.0 57 0 57.4 AIVC YourApp24 0 1,416 1,416.1 0 58 57.7 Car Home Google Inc 0 1,190 1,190.0 0 0 0.0 Nuance Dragon Search 1,080 0 1,080.0 0 0 0.0 Communications Voice Actions/Jeannie Pannous 163 902 1,064.7 600 56 655.4 Everfriends i-Free Innovations 0 731 731.0 0 0 0.0 Evi True Knowledge Ltd 253 235 488.0 250 0 250.5 Andy - Siri for Android 74 Technologies 0 296 295.8 0 20 19.6 Edwin, Speech-to-Speech neureau 0 265 265.0 0 0 0.0 Nuance Dragon Go 148 117 265.0 0 0 0.0 Communications, Inc Speak4it AT&T Interactive R&D 231 0 231.0 0 0 0.0 Voice Assistant - Just use QuanticApps 193 0 193.0 589 0 588.7 your voice Pocket Blonde* i-Free Innovations 0 184 184.0 0 0 0.0 EVA - Virtual Assistant BulletProof 0 177 177.0 0 80 79.9 Phone Directories Ziplocal 123 20 143.4 0 0 0.0 Company/ZipLocal Cluzee Your Personal Tronton LLC 0 71 70.8 0 0 0.0 Assistant Voice Control Luka Kama 0 57 56.7 0 12 12.2 Risi Beta kkTeam 0 41 41.1 0 1 0.9 EVAN - Virtual Assistant BulletProof 0 28 28.2 0 10 10.0 Artificial Intelligence AnSoft 0 24 23.8 0 0 0.0 Dialog gSoft Technology Monica 22 0 22.0 0 0 0.0 Solutions Voice Control without K&J Software 0 20 20.0 0 0 0.0 internet netpeople:a iNAGO 0 16 16.0 0 0 0.0 My Virtual Assistant Narada Robotics 12 0 12.0 0 0 0.0 vokul KulTek, LLC 12 0 12.0 36 0 35.9 Vocal Search AppSimo 8 0 8.0 24 0 23.9 Serge Logovision Inc 8 0 8.0 0 0 0.0 Android Voice Xtreme BulletProof 0 8 7.7 0 10 10.0 Juke! Speech Driven David Cheney Design 0 5 5.1 0 0 0.0 Music Box © VisionMobile 2012. Some rights reserved. 10
  11. 11. 12 Beyond Siri: the next frontier in User Interfaces Virtual Assistant applications (May 2012) Downloads (1,000s) Revenue (USD 1,000s) VA application name Publisher/Developer iOS Android Total iOS Android Total Inclusive Design Tecla Access 0 5 4.9 0 0 0.0 Research Centre VoicePOD (Android 2.0 MOBk 0 4 4.4 0 5 4.9 up) Sprachsteuerung Bytetex 0 4 4.2 0 8 7.7 ScottyKnows nSphere Mobile 0 4 4.1 0 0 0.0 Talk to Eve sparklingapps 1 1 2.1 1 0 0.9 TasksEveryday Virtual iEverydayApps 2 0 2.0 0 0 0.0 Assistant Super Voice Assistant McFly Entertainment 2 0 2.0 2 0 2.0 mia powered by netpeople iNAGO 0 2 1.5 0 0 0.0 Voice Answer Sparkling Apps 0 1 1.0 0 4 3.7 Voice Ask sparklingapps 0 0 0.1 0 0 0.3 Source: VisionMobile research Data Source: Xyologic, Revenue analysis: Vision Mobile Research, based on top 43 VPA apps by download, cumulative from app launch to May 2012. Revenue is calculated using total estimated cumulative downloads since app launch and paid download price for the respective apps in May 2012, on the respective platforms. © VisionMobile 2012. Some rights reserved. 11
  12. 12. 12 Beyond Siri: the next frontier in User Interfaces Dragon Go -- Nuance moving into the consumer space "Nuance is investing in the direct-to-consumer category because we believe learning about what people like is the fastest way to bring innovations to market." Matt Revis, Vice President, Mobile and TV, Nuance Nuance -- whose speech recognition technology powers Siri -- has traditionally derived most of its revenue from working with verticals. In the past two years, it has nevertheless acquired a number of more directly consumer-facing competitors, such as Vlingo and SVOX Mobile Voices. It has also launched a number of consumer-facing applications, such as Dragon Search, Dragon Dictation and Dragon Go, the first fully- fledged Nuance-branded virtual assistant. Cars, TVs, PC and laptops are other screens Nuance is working on today. Moving forward, the vendor sees its technology being deployed on any consumer electronics device, from cameras to microwave ovens. Dragon Go’s content partnerships for music and recommendations position it as a content distribution platform. Nuance also derives valuable consumer insights and speech recognition engine training by extending its B2B business to consumer-facing apps. At the same time, growing its B2C business puts it in direct competition with its customers. The publisher Nuance Communications, The app Dragon Go Burlington, MA, USA Tagline Solutions and technologies Tagline Control your personal universe that help people work more with no boundaries intelligently Main activities Provides speech and Main tasks Web search, navigation, voice- imaging solutions for performed calling, third-party reservation businesses and consumers services, reviews, play music from Spotify and and movies from Netflix, social network updates, social sharing CEO Paul Ricci Total estimated 148,000 on iPhone downloads 117,00 on Android Revenue 1.4 billion USD in 2011 Revenue streams Third-party content and services Founders SRI International, of which Platform iPhone, Android Nuance is a spin-off, in 1992 availability Funding Public, traded on Nasdaq Regional US only availability & Main US download location Total apps 9 Languages US English published supported Website Avatar No Source for app download data: Xyologic, Data: Vision Mobile Research, Nuance Communications © VisionMobile 2012. Some rights reserved. 12
  13. 13. 12 Beyond Siri: the next frontier in User Interfaces Everfriends -- Third-party services and customisation "Building conversation logic around interactive characters helps engage users in a deeper activity than simple tasks." Kirill Petrov, co-founder of i-Free and head of i-Free Innovations. I-Free Innovations, publisher of VA app Everfriends, is a subsidiary of St Petersburg-based i-Free Ltd., which publishes apps and games for smartphones and conducts IT research and ventures. To i-Free, today’s voice-activated assistants are only a stepping-stone to cloud-based personalised services accessible from any device, using natural language and interactive avatars that create a deeper relationship between user and VA. Having a mix of revenue streams, Everfriend is a good case study for VA business models: 87 percent of its revenues come from third-party partnerships, five percent from advertising, and eight percent from avatar customisation, according to the company. Avatar customisation is a core focus for the vendor, which expects customisation revenues to grow strongly. I-Free foresees the VA ecosystem evolving towards open speech and natural language APIs, so as to enable third-party developers to create derivative services. The deployment of billing capability, enabling third-party transactions, is also key to new and growing revenue from third-party services. The publisher i-Free Innovations The app Everfriends St. Petersburg, Russia Tagline None Tagline: A new generation of pocket assistants -- with personality and a sense of humor! Main activities Development and Main tasks search, weather forecasts, voice implementation of performed calls, sms, e-mails, notes, maps, innovation projects in the alarm and news reminder mobile and digital space, setting, social network update B2C and B2B delivery, games and jokes, music playback, shopping, hotel booking, encyclopedia lookups. CEO Vyacheslav Ovchinnikov Total estimated 750,000 downloads Revenue 200 million USD in 2011 Revenue streams In-App purchases, third-party services ( 87%), customisation (8%), advertising (5%) Founders Kirill Petrov, Kirill Gorynya, Platform Android Sergey Shulga, in 2010 as a availability division of i-Free Funding Kirill Petrov, Kirill Gorynya, Regional 24 countries Sergey Shulga availability & Main USA download location Total apps 28 Languages English and Russian, plans for published supported more Website Avatar Yes, with free and paid-for customisation add-ons Source for app download data: Xyologic, Data: Vision Mobile Research, i-Free Innovations © VisionMobile 2012. Some rights reserved. 13
  14. 14. 12 Beyond Siri: the next frontier in User Interfaces Iris (alpha) – Largest independent vendor VA app “Phones should not only respond to you, they should talk to you.” Narayan Babu, CEO, Dexetra. Dexetra reportedly created the first version of virtual assistant iris in just a few hours, in order to bring a Siri-like capability to Android phones. Today, Dexetra derives most of its revenue from advertising, and sees OEMs and telcos as key partners. Dexetra has already signed a licensing deal with Micromax, allowing the phone manufacturer to integrate iris’s Indian version, Aisha, into devices. It also plans to co-market the app with telcos. Dexetra plans to reach sales of 500,000 devices in the first half of 2012. The company is also working on a private beta focusing on context recognition, and is further developing voice-based ads. Additionally, the company is refining its NLP engine with Friday, a project in beta, which maps and makes a user’s search history searchable and is due to be released in June 2012. The publisher Dexetra Software Solutions The app iris (beta) Private Limited, Bangalore, India Tagline Delivering surprises Tagline ask.listen. Main activities Building products at the Main tasks Local search, news search, voice intersection of mobile, cloud performed calling and texting, music and and machine intelligence video playback, movie reviews on request, alarm and reminder setting CEO Narayan Babu Total estimated 4,400,000 downloads Revenue Undisclosed Revenue streams In-app purchases, advertising- based revenue shares, licensing with OEMs and telcos Founders Narayan Babu, Nithin John, Platform Android Eby Chembola, Binil Antony, availability Yaser Hameed, Aibin Varghese, in 2010 Funding 200,000 USD from One97 Regional 24 countries Mobility Fund in 2011 availability & Main USA download location Total apps 11 Languages English published supported Website Avatar No Source for app data: Xyologic, Data: Vision Mobile Research, Dexetra © VisionMobile 2012. Some rights reserved. 14
  15. 15. 12 Beyond Siri: the next frontier in User Interfaces Siri -- The first mass market embedded virtual assistant “Siri’s release made a huge impact on the virtual assistants market, first of all as the first mass market case of natural language processing.” Kirill Petrov, co-founder of i-Free and head of i-Free Innovations. The launch of Siri increased awareness of voice-activated assistants, but more importantly triggered a wave of virtual assistant innovation, both in terms of downloadable apps and industrial research. A number of VAs are trying to piggyback on Apple’s success, with taglines such as “a friend of Siri”, “Siri for Android” or “Siri-like”. Research published by Parkes Associates in April found that about a third of US iPhone 4S owners they interviewed use Siri daily, to make phone calls, text or search. Another 87 percent use at least one Siri feature monthly, predominantly voice dialling, texting and information search. Siri’s speech recognition is powered by Nuance, and its search results by Wolfram Alpha. What sets Siri apart from other VAs is that it is fully embedded in the iPhone UI and core applications, including the contacts and calendar. With a head start on voice-activated interfaces for phones, Apple is said to be taking the next step, by integrating a voice UI into other devices, starting with tablets and the rumoured Apple TV. Apple is also readying support for additional languages, a key success factor provided that languages are associated with local search and, more importantly, local content deals. We also understand that Apple’s Siri has led to a measurable decline of search traffic to Google, which we believe will prompt Google to launch a free, cross-platform competitor. The publisher Apple, The app Siri Cupertino, CA, USA Tagline None Tagline What can I do for you? Main activities Offers mobile Main tasks Search, local search and maps communication and media performed (only US-English), weather, devices, personal computing voice dialling, sms and email, products, portable digital contact lookup, setting music players, and calendar, reminders and timers, associated software and playing music (iTunes only), peripherals. stock market tracking CEO Tim Cook Total estimated Undisclosed, There are as many downloads potential Siri users as there are activated iPhone 4S users Revenue 108 billion USD fiscal year Revenue streams Siri drives sales of iPhone 4S, 2011 third-party content and services Founders Steve Jobs, Steve Wozniak, Platform Exclusive to iPhone 4S, not Ronald G. Wayne, in 1997 availability downloadable Funding Public, traded on Nasdaq Regional Global (although without availability & Main specific local content), download location undisclosed Total apps 23 Languages English (United States, United published supported Kingdom, Australia), French (France), German, Japanese. Planned languages in 2012: Chinese, Korean, Italian, and Spanish Website Avatar No Source for app download data: Xyologic, Data: Vision Mobile Research, Apple © VisionMobile 2012. Some rights reserved. 15
  16. 16. 12 Beyond Siri: the next frontier in User Interfaces Speak4it -- Creating the multimodal VA “There is a huge opportunity combining gesture and voice. Speak4it is today the only app that allows you to do multimodal understanding.” Mazin Gilbert, Assistant Vice President of Technical Research, at AT&T Labs Speak4it is a local search app using both voice and gesture recognition. It was developed by AT&T Labs, and uses AT&T’s “Watson” artificial intelligence platform. The company sees virtual assistants becoming the natural interface for all types of devices within three to five years. To realise that vision, AT&T hopes to attract third-party developers to its AI platform engine. Only available to AT&T’s partners today, AT&T Watson’s API library is scheduled to open to third-party developers in June 2012. AT&T will first release libraries for speech-to-text, general and local search, voicemail and SMS. Multimodal capabilities (combining voice and gesture recognition) will remain exclusive to AT&T’s strategic partners at this point, though. AT&T sees its intelligent network as a key competitive advantage in the growing VA space. AT&T Labs’ 600 patents in the AI space certainly give it a competitive edge, along with monetisation potential from licensing deals. The publisher AT&T Interactive R&D, The app Speak4it Dallas, TX, USA Tagline Leading Invention, Driving Tagline: Mobile local search Innovation Main activities Research in communications, Main tasks National and local navigation computing and networks performed search CEO Krish Prabhu Total estimated 145,000 on iPhone, downloads 86,000 on iPad Revenue Undisclosed Revenue streams Third-party services Founders AT&T/Bell Labs, Platform iPhone, iPad in 1996 in its current form. availability Initially Bell Labs, founded in 1926 Funding AT&T spent 1.1 billion USD Regional Global for download, US-only on R&D under full year 2011 availability & Main for tasks download location USA Total apps 7 Languages US English published supported Website Avatar No Source for app download data: Xyologic, Data: Vision Mobile Research, AT&T Labs © VisionMobile 2012. Some rights reserved. 16
  17. 17. 12 Beyond Siri: the next frontier in User Interfaces CHAPTER FOUR VA business models: Revenue share rather than paid app downloads VA business models are only starting to shape themselves, which is only natural for a market just past the early adopter chasm. Nearly 42 percent of the leading 43 VA apps on the market choose a paid-download model, compared to 30 percent for top-ten VA apps. Top-ten VA apps were more likely to offer both a free and a paid version of the same app. They were also more likely to offer in-app purchase: 50 percent do it with free apps and 10 percent with paid apps. These revenue numbers are consistent with the fact that many virtual assistant apps are looking at alternative ways of generating revenue, beyond paid-per-download and subscription models. App publishers are exploring business models like search and display advertising, third party service distribution, avatar customisation, or white-label VA licensing. Business models vary depending on whether the VA publisher owns the underlying technology building blocks or is licensing them. © VisionMobile 2012. Some rights reserved. 17
  18. 18. 12 Beyond Siri: the next frontier in User Interfaces Paid-per-download and Subscription models Our analysis of the virtual assistant market shows that only about 42 percent of VA apps choose the paid- per-download model. In the top ten, three VAs use this revenue model accompanied by a free download version. Pannous, the company behind the Jeannie virtual assistant app, tops the revenue chart with over USD $655,000 in revenue from paid downloads on iOS. Android generates most VA downloads with most Android VA apps being free downloads - apart from AIVC, Eva/Evan, Andy and Android Voice Extreme, which offer free and paid versions. On the contrary, most paid app revenues are generated by virtual assistants on iOS. There are also fewer VA apps on iOS, with 40 percent of VA apps available on iOS, against over 77 percent on Android. Siri’s presence is a major competitive obstacle on iOS. Additionally, Google’s speech recognition engine API is open on Android. Another paid VA model is used by Japanese telecom operator NTT DoCoMo, who develops iConcier, one of the first mobile virtual assistants ever. The service, which uses NTT’s proprietary i-mode platform, reached four million paid customers in its launch year and had by then about 250 third-party deals in place. In March this year, NTT DoCoMo extended iConcier availability to Android. NTT DoCoMo has also attempted, without success, to license its virtual assistant platform to other telcos outside of Japan. © VisionMobile 2012. Some rights reserved. 18
  19. 19. 12 Beyond Siri: the next frontier in User Interfaces Outlook. As expected, paid VA apps attract fewer downloads than their free counterparts. We expect the number of paid VA apps to decrease after Google launches its answer to Siri, and OEMs deploy their versions of voice UI. The expected growth in revenue derived from third-party revenue share agreements is also likely to contribute to a decline in the number of paid VAs. Search & Display Advertising Advertising is by far the most widespread way for VAs to generate revenue today, be it via ad placement within apps or via mobile search. Advertising is likely to continue being a primary VA revenue model in the future, as VA implementations of user profiling and NLP improve targeting precision for ads and therefore per-ad revenues. Leading mobile ad networks are Adfonic, Admob (Google), Jumptap, inMobi, iAds (Apple), Mojiva, Millennial Media, and Vserv. Outlook. Mobile advertising is already an established model for mobile app ad revenue generation. Mobile advertising is also the fastest growing segment within digital advertising. Today, performance- based advertising on the Internet (including via mobile) is already generating twice as much revenue as impression-based ads, with an estimated 20.4 billion dollars (USD) in 2011 for the US alone, according to research by PwC on behalf of IAB. But at the same time, advertising is a game of scale, and smaller virtual assistant apps might face a catch 22-situation if they fail to attract users with relevant content: No content means no users and no users means no ad revenue. Third party service distribution Many VAs today, including Siri, i-Free, Nuance, and Speak4it, link to restaurant booking sites, travel bookings sites or, in Nuance’s case, to music services such as Spotify and With virtual assistants becoming a mainstream form of search, third-party content and service distribution is poised to rapidly grow, especially for local services. It is a matter of both more third-party APIs opening up and the ability for VAs to integrate and bill for them. Today, I-Free Innovations’ Everfriends already generates 87 percent of its revenue via linking to third-party services, with a goal to add more content and services in the future. It is also a matter of partnering not only with global content, but more crucially with local content, closest to users’ interests and location. Outlook. As virtual assistants move from being apps to becoming access points for personalised service discovery, they will become strategic distribution channels for both mainstream and niche service providers. The major opportunity lies in using VAs to distribute local content, a digital good where demand currently outstrips supply. At the same time, profit margins are relatively low on purchase intermediation, making it necessary for VAs to target either a large user base or a very specialized one. There are also restrictions in local content and regulation in some countries -- India for example -- which restrict or forbid outright the billing of third-party services via the telephone bill, a payment model that has proven successful among mobile users. Avatar personalisation Charging users for personalisation of the avatar gender, body, clothes, voice and personality is an additional revenue stream for virtual assistant apps. This can be monetised through in-app items, items marking achievements (badges) or product placements in the form of sponsored items, like a branded jacket. Product placement is an important up-and-coming revenue model in mobile games as well. One of the profiled apps, Everfriends, lets the user choose between three avatars and a range of clothes. © VisionMobile 2012. Some rights reserved. 19
  20. 20. 12 Beyond Siri: the next frontier in User Interfaces Outlook. Virtual assistant vendors do not all agree on the value of providing avatar personalisation. Everfriends’s publisher i-Free Innovations already generates eight percent of its revenue from avatar personalisation, and sees great potential in this revenue model. It is investing in 3D-animated character generation. Avatar customisation represents a costly investment, but may pay off if it can be amortised over a large-enough user base, i.e., in the millions of users. White label licensing A number of vendors including xBrainSoft and Artificial Solutions offer white-label virtual assistant solutions for B2B2C. Not only is the model economically viable, it allows for the training of virtual assistants within specific verticals. Outlook. As the VA market becomes more consumer-facing, virtual assistant vendors are looking into making their development platforms available not only to B2B customers, but to third-party developers. Smaller or newer players developing their own AI technology may also want to consider this option as a complement to their consumer-facing business, as it brings in revenue, and vertical training possibilities in specific environments and across diverse languages. © VisionMobile 2012. Some rights reserved. 20
  21. 21. 12 Beyond Siri: the next frontier in User Interfaces CHAPTER FIVE Leaders and challengers in the VA value chain Building a virtual assistant is a complex undertaking, in part due to the need to assemble building blocks across the supply chain. It requires licensing and partnership deals with technology vendors, search engines, ad networks, third-party service providers, app marketplaces and handset makers. In this chapter we look at the leaders and challengers across the virtual assistant market. Leading VA apps are driven by R&D efforts. US-based companies lead the pack, with Russian and Indian companies closing in. Virtual assistant solutions: Leaders Apple (Cupertino, CA, USA): Apple took a leap forward when acquiring Siri from SRI International and integrating it in the iPhone 4S user interface. The Cupertino company is working on the integration of Siri as a UI beyond the iPhone. In addition, virtual assistant apps sold via the Apple App Store generate the majority of paid VA app revenue today. See our analysis of Siri in chapter 3. SRI International (Menlo Park, CA, USA): The founders of Siri continue to drive AI research, focusing their VA efforts on vertical markets. No less than three new VA implementations are underway. SRI owns a large number of fundamental AI patents. Google (Mountain View, CA, USA): Google has a number of apps using speech recognition, of which Voice Search, via the activation of Voice Actions, tops our ranking. With on-going research into search technology, speech recognition, augmented reality and translation, not to forget ownership of the fastest growing mobile platform, Android, the company has a lot to bring to the table. Google is a favoured SR vendor, as it offers its technology for free. Google is said to be working on a VA of its own, code named Majel, which could even be combined with the much-talked-about augmented reality glasses, Google Glass. Nuance (Burlington, MA, USA): With three apps in the top 20 -- Vlingo Virtual Assistant, Dragon Search and Dragon Go -- Nuance continues to buy and extend its way into speech recognition technology, in particular Natural Language Processing and speech-to-text. Nuance holds about 2,000 patents, and its speech recognition engine is used by many VA apps, including six in the top ten. Microsoft (Redmond, WA, USA): Microsoft’s speech engine Tellme powers the company’s speech- enabled products and services. It is integrated in WP7 and XBox Kinect, as well as an array of B2B applications. Microsoft’s platform is open to developers with a licensing model. AT&T Labs (Dallas, TX, USA): AT&T holds about 600 patents in the AI space, and uses its own AI engine (“Watson”) for its top-20 virtual assistant app, Speak4it. The company plans to open its API library in June, 2012, for developers who want to build voice-enabled apps. In the virtual assistant space, AT&T Labs focuses on multimodal UIs, using speech and gesture recognition. Virtual assistant solutions: Challengers Dexetra (Bangalore, India): In the top ten with Iris (Siri written backwards), the company has great ambitions in the knowledge-enabling VA space, and works also on voice ads. Its core project is Friday, an app that maps a user’s personal web trails, as well as content stored in handsets, and makes it searchable. Pannous (Hasloh, Germany): Besides having the top grossing VA app (Voice Actions -- not to be confused with Google’s assistant function with same name) Pannous offers R&D services in the fields of enterprise search and artificial intelligence. Speaktoit (Newark, DE, USA): Besides having a consumer-facing VA app in the top ten, Speaktoit develops custom VAs for third-parties. Speaktoit founders are from Russia. © VisionMobile 2012. Some rights reserved. 21