Indonesian public's enthusiasm on research in the field of machine learning are on the rise. Together, Kofera & Data Science Indonesia launch "Indonesia Open Data Initiative" to tackle the barrier to entry in machine learning research field.
2. Opportunity
• Indonesian public's enthusiasm on research in the
field of technology are on the rise.
• There are many researchers in Indonesia,
especially in the field of technology.
• Indonesia is a country that have variety of cultures
and languages that can be used as research
material.
Background: Mars Rover by Wiki Images
3. Problems
● Conducted research only used on a limited circle such as
companies, institutions, etc.
● Publication is done only in the form of paper that basically is
not useful for everyday life.
● Data and algorithms are not open to the public so that other
researchers can not continue the previous studies.
● No data standardization in term of machine learning and
artificial intelligence research.
● Lack of good national publication center for data science.
● Lack of financial support for applied research that has been
done.
4. Goals & Objectives
Goals
1. Build research environment and culture, especially in the field of
technology in Indonesia
2. Build standardization of research data
3. Build Indonesia’s center of data science & publication
Objectives
1. Build Indonesia’s center of open data that can be used by individual,
community, institution and company to get and share research data
2. Build training data standardization
3. Open sources data & module/algorithms access
4. Community and study groups for researchers
8. Activities Plan
❏ Data acquisition (crowdsourcing)
❏ Data standardization (crowdsourcing)
❏ Continuous model research & development
❏ Continuous model standardization
❏ Weekly or monthly meetup
❏ Forum discussion
❏ Collaboration research & paper
❏ Events: Industry gathering, new research presentation, etc.
9. Data Acquisition
TEXT
1. WORDNET Bahasa Indonesia
2. Corpus Bahasa Indonesia
3. Stopword Bahasa Indonesia
4. Translation data (Bahasa - other language)
5. Annotation data for Part of Speech (POS) Tagging
6. Sentiment Analysis data
7. Question-Answering (QA) data for some domain (medical domain, etc.)
8. Etc.
SOUND
1. Text to Speech data (Bahasa Indonesia)
2. Speech translation data (Bahasa - other language)
3. Music data (traditional & modern)
4. Sound from traditional musical instrument data
5. Tone emotion recognition data
6. Audio classification data
7. Etc.
IMAGES
1. Images to text database with label (concept-based image processing)
2. Indonesian cultural images data (such as batik, traditional dress, etc.)
3. Indonesian herbal plants images data
4. handwriting images on traditional languages data
5. Indonesia spatial data
6. Biomedical data
7. Etc.