Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

KasaDaka: a sustainable voice-service platform - Master thesis presentation André Baart


Published on

The presentation of André Baart's master thesis

Published in: Science
  • Be the first to comment

  • Be the first to like this

KasaDaka: a sustainable voice-service platform - Master thesis presentation André Baart

  1. 1. KasaDaka: a sustainable voice-service platform Developing a Voice Service Development Kit André Baart, Dec. 2017
  2. 2. Connecting the unconnected How to provide the ‘unconnected’ with the benefits of ICTs? ICT for Development (ICT4D) Sub-Saharan Africa, Sahel: (Mali, Burkina Faso, Ghana) ● Low income (2EUR/day) ● Low literacy ● No internet, intermittent electricity ● Under-resourced languages (Bambara, Mooré, etc) Internet adoption very low, however mobile phones have a high adoption rate, and GSM coverage is good!
  3. 3. KasaDaka – ‘Talking Box’ Raspberry Pi GSM connection
  4. 4. KasaDaka platform ● Low-resource, low-cost (60 EUR) ● Almost completely open-source (and thus free) ● Uses the existing infrastructure: GSM network, simple mobile phones ● Telco-independent Components and technologies: ● Asterisk ● VoiceXML ● VXI (proprietary VoiceXML browser) ● Apache ● MySQL ● Django (Python) Some use-cases of voice-services: ● Citizen Journalism ● Market information ● Weather Information ● Animal health ● Diary value chain
  5. 5. Key to sustainable voice-services: Local development ● Low cost is essential (2EUR/day avg. income) ● Foreign developers are expensive! ● A lack of local knowledge causes dependence on foreign labor ● Development is the biggest expense In order to allow local communities to afford voice-services, local development is necessary. ● Greatly reduces development costs ● Reduces distance between developer and user ● Local businesses, economic growth However, local developers are hard to find! Only 12K African GitHub accounts in 2015 Number of voice-service developers in Mali and Burkina Faso: extremely low (thus not cheap!) Solution: make it easier to develop voice-services See:
  6. 6. Simplifying voice-service development Hypothesis: ● Voice-services are comprised of a combination of interactions. ● These interactions can be generalized into a small set, e.g. ○ Menu with choices ○ Message/information playback ○ User voice input ○ User digit input ○ Language selection ● By providing building-blocks for these interactions, inexperienced users can build simple voice-applications by deploying and customizing these building-blocks. Goal: voice-service development in a graphical (web-)interface, no programming skills required.
  7. 7. Voice Service Development Kit Development of voice-services from a locally hosted web-interface ● Based on the Django framework (MVC) ● Development through admin interface (screenshot) ● Voice-service structure stored in database ● VoiceXML generation including dynamic data from database ● Slot-and-filler TTS, support for all languages
  8. 8. Evaluation: ICT4D course @ VU 2017 ICT4D course: ● 31 students from varying backgrounds, most no or little programming experience ● 10 applications developed ● 9/10 used VSDK to develop application ● 80% of applications functioned correctly ● 78% extended the VSDK with custom data models ● 67% extended the VSDK with custom types of interactions Note: the set of provided interactions was minimal: only menu/choice structures and playback of messages Survey key findings: ● Interaction building-blocks work well for voice-service development, but included set limited for complex use-cases ● Simple voice-services can be developed quickly and easily, compared to writing VXML ● Expanding the functionalities of the VSDK has a high learning curve (requires VXML, Django, Python) ● Debugging voice-services is difficult, testing takes up a long time ● Setting up a local development environment is difficult (students did not have access their own RPi)
  9. 9. Related work Same principle, but: ● Not open-source, thus foreign dependency ● Expensive ● Require internet connectivity ● Require enterprise telephone connectivity (not available, expensive) ● Rely on the use of TTS/ASR, no support for under-resourced languages ● Do not support voice, only SMS
  10. 10. Conclusions Building-block approach to voice-service development: ● Works for development of simple voice-services ● Does not require programming skills ● Less work than writing static VoiceXML ● Even less compared to writing VoiceXML generators for specific applications So, the VSDK enables fast development of voice-service prototypes for users without programming skills. <?xml version="1.0" encoding="UTF-8"?> <!--- <vxml version = "2.1" > --> <!DOCTYPE vxml SYSTEM ""> <vxml xmlns="" version="2.1" xmlns:xsi="" xsi:schemaLocation=""> <property name="inputmodes" value="dtmf" /> <!-- Kasadaka VoiceXML File --> <form id="language_form"> <field name="language_field"> <prompt> <audio src="/uploads/pre_choice_option_nl.wav"/> <audio src="/uploads/dutch_nl.wav"/> <audio src="/uploads/post_choice_option_nl.wav"/> <audio src="/uploads/1_nl.wav"/> <audio src="/uploads/pre_choice_option_en.wav"/> <audio src="/uploads/english_en.wav"/> <audio src="/uploads/post_choice_option_en.wav"/> <audio src="/uploads/2_en.wav"/> <audio src="/uploads/pre_choice_option_fr.wav"/> <audio src="/uploads/french_fr.wav"/> <audio src="/uploads/post_choice_option_fr.wav"/> <audio src="/uploads/3_fr.wav"/> </prompt> <grammar xml:lang="en-US" root = "MYRULE" mode="dtmf"> <rule id="MYRULE" scope = "public"> <one-of> <item>1</item> <item>2</item> <item>3</item> </one-of> </rule> </grammar> <filled> <if cond="language_field == '1'"> <assign name="language_id" expr="'1'"/> <elseif cond="language_field == '2'" /> <assign name="language_id" expr="'2'"/> <elseif cond="language_field == '3'" /> <assign name="language_id" expr="'3'"/> <else/> </if> <goto next="#submit_form"/> </filled> </field> </form> <form id="submit_form"> <block> <assign name="session_id" expr="'7'"/> <assign name="caller_id" expr="'123'"/> <submit next="/vxml/user/register/" method="post" namelist="language_id session_id caller_id "/> </block> </form> </vxml>
  11. 11. Limitations & future work VSDK works well for simple prototypes, but not (yet) for more complex applications ● Provide more types of interactions ● Solve problem of dynamic data model generation ○ Linked data? ○ Data2Documents for VXML? (Ockeloen et al, 2016) ● Implement more sophisticated TTS (Justyna Kleczar MSc thesis, 2017) ● Develop a better testing/debugging workflow Are the conclusions also valid in the true ICT4D context? ● Pilot Burkina Faso (Rainfall use-case) ● Train the first local voice-service developer Other improvements: ● KasaDaka stack in Docker ● Alternative for proprietary VXML browser ● Fix limitations of Raspberry Pi in ICT4D context ○ Power issues ○ Availability GSM dongles Later, maybe: ● Do micropayments with mobile money (large in Africa) ● Create a ‘bip’ voting system ● Data exchange between offline KasaDakas ● Connect sensors to RPi
  12. 12. ???