langitselatan as one of the established astronomy community in Indonesia have been actively use social media to interact and discussion with their members and general public. Since 2011 langitselatan received question from public to answer in the form of blog article and now planning to extensively use social media network for astronomy outreach.
This paper reports the development and implementation of Waluku, an online astronomy knowledge base management system with the extension of the dialogue based natural language chatbot on the Twitter social network, that creates responses based on information extracted from langitselatan blog articles, Wikipedia articles and community supplied answers.
2. langitselatan
Bandung, Indonesia. Near the Bosscha observatory.
Premiere online Astronomy
media house since 2007.
Astronomy news, answering hoaxes and providing, basic astronomy, educational
resources and fun science
3. Social media in Indonesia
•
•
Facebook and Twitter the numbers matter.
•
Mobile telephony is at 100% of population with 40%
of mobiles are internet capable.
Internet in Indonesia is nearly there with its
penetration at 50% and growing.
7. Waluku
Summary: Twitter chatbot that
heavily uses natural language
equipped with closed-domain
(Astronomy) knowledge.
Why? Because asking questions is one of the most basic
human norm. Because social media is cool. Because
astronomy is cool. Because artificial intelligence is cool.
Because giving answer “Google it” is considered rude.
9. System overview
User Input
Input Analysis
Module
Q&A Module
Response
Response
Generator
Crowdsourcing
Framework
Knowledge
Base
Unstructured
Documents
10. User input
Challenge: People do not
check whether their question
has been asked before therefore
they will ask the same question
using different wording.
12. Input analysis
Input
Extracted keywords
Ranks
Bagaimana bulan sabit dapat terjadi?
bulan sabit, bulan
0.89212
Apa penyebab terjadinya bulan sabit?
bulan sabit, bulan,
penyebab
0.75419
Apakah wajah bulan akan selalu sama?
wajah bulan, bulan
0.64021
Jelaskan proses terjadinya bulan sabit?
bulan sabit, proses
0.65993
14. Response generation
Sources: langitselatan’s Tanya Jawab website section,
astronomy-related Wikipedia articles and volunteer provided
question and answer.
Raw unstructured contents will be automatically summarized
(factoid extraction) by NLP (Natural Language Processing)
engine and will be saved to the database as utterance pair
(question and answer).
Response formats: Summarized factoids (less than 500
characters and contain URLs) and Twitter-friendly messages
(140 characters or less).
17. Factoid extraction
Raw text
Periode rotasi Bulan tidak sama dengan periode rotasi Bumi. Periode
rotasi Bumi adalah 24 jam (1 hari), sementara periode rotasi Bulan
adalah 27.3 hari.
Wajah bulan yang dilihat oleh seluruh manusia di Bumi, baik di
Indonesia maupun di belahan Bumi lainnya selalu nampak sama.
Extracted factoids
Periode rotasi Bumi adalah 24 jam (1 hari), sementara periode rotasi
Bulan adalah 27.3 hari.
Wajah bulan selalu nampak sama.
18. Utterance pair
A Concept
fase bulan
B Question
•
•
•
•
•
•
C Answer
(long)
Fase Bulan (sabit maupun yang lain) terjadi karena kita di Bumi mengamati sinar
Matahari jatuh ke Bulan pada sudut pandang yang berbeda-beda.
Mengapa bulan berbentuk sabit?
Bagaimana bulat sabit bisa terjadi?
Bagaimana bulan sabit dapat terjadi?
Apa penyebab terjadinya bulan sabit?
Apakah wajah bulan akan selalu sama?
Jelaskan proses terjadinya bulan sabit?
Lebih lanjut, lihat http://langitselatan.com/2012/05/27/apakah-wajah-bulan-selalusama/
D Answer
(short)
Itu terjadi karena kita di Bumi mengamati pada sudut pandang yang berbeda-beda.
http://goo.gl/i5MrxA
19. Crowdsourcing
Technology limitation.
NLP is still considered a hard task (to make long sentences
short, to extract factoids, etc). Constant training and
tweaking are required.
Quality improvement.
Help find an adequate response to user input.
20. Design limitation
Waluku is a factoid chatbot.
It will not be able to handle casual chat input.
NLP is language dependent.
Indonesian NLP initiatives and support are very limited. We
have to roll our own approach.