Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Drone Emprit: Konsep dan Teknologi

13,082 views

Published on

IT CAMP – BIG DATA & DATA MINING
Onno Center, Situ Gintung - Jakarta 1 Oktober 2017

Published in: Internet

Drone Emprit: Konsep dan Teknologi

  1. 1. Drone Emprit Konsep dan Teknologi Ismail Fahmi, PhD. Drone Emprit Media Kernels Indonesia Ismail.fahmi@gmail.com IT CAMP – BIG DATA & DATA MINING Onno Center, Situ Gintung - Jakarta 1 Oktober 2017
  2. 2. 2 1992 – 1997 S1, Teknik Elektro, ITB 2003 – 2004 S2, Computational Linguistics, Universitas Groningen, Belanda 2004 – 2009 S3, Computational Linguistics, Universitas Groningen, Belanda 2000 – 2003 Inisiator IndonesiaDLN (Digital Library Network pertama di Indonesia) Mengembangkan Ganesha Digital Library (GDL) Mendirikan Knowledge Management Research Group (KMRG) ITB Membangun Digital Library ITB 2009 – Sekarang Engineer di Weborama, Perusahaan berbasis big data (Paris/Amsterdam) 2012 – Sekarang Co-Founder Awesometrics, Media Monitoring & Analytics Company 2014 – Sekarang Founder PT. Media Kernels Indonesia, a Natural Language Processing Company 2015 – Sekarang Konsultan Perpustakaan Nasional, Inisiator Indonesia OneSearch 2017 – Sekarang Dosen Tetap Magister Teknik Informatika Universitas Islam Indonesia Ismail Fahmi, PhD. Ismail.fahmi@gmail.com
  3. 3. Agenda SESI 1 • Konsep • Tentang Drone Emprit • Data, tambang emas baru • Arsitektur & Fitur • Teknologi • Crawler • Twitter • Facebook • Online News • Indexing • Sharding • Replication • Analytics • Sentiment Analysis • Opinion Analysis • Term Extraction • Clustering • Social Network Analysis • Visualisasi SESI 2 • Studi Kasus • Analisis Pilkada Jawa Barat • Analisis Pro-Kontra PKI • Membaca Agenda Setting Media • Demo • Membuat topik monitoring baru • Membaca hasil analisis • Edit sentimen • Social Network Analysis 3
  4. 4. Tentang Drone Emprit 4
  5. 5. Media Kernels a.k.a Drone Emprit • Sebuah sistem untuk memonitor dan menganalisa media online dan sosial berbasis teknologi big data. • Dikembangkan sejak tahun 2009 di Amsterdam, Belanda, oleh anak bangsa, melalui Media Kernels Netherlands B.V. • Mulai tahun 2012 digunakan di Indonesia. • Berbasis teknologi Artificial Intelligent (Machine Learning) dan Natural Language Processing (NLP). • Dikenal sebagai ‘Drone Emprit’ dalam berbagai pemberitaan di TV dan Media Nasional. 5 Drone Emprit
  6. 6. 6 2-8 Januari 2017 TEMPO Topik: Peternakan hoax di media sosial Media Kernels: • Diberitakan dengan name ‘Drone Emprit’. • Menyajikan peta Social Network Analysis (SNA) tentang bagaimana sebuah hoax berasal, menyebar, siapa influencers utama, dan siapa groupnya. • Beberapa isu yang dianalisis: 10 Juta Tenaga Kerja China, dan Aleppo (ISIS). LAPORAN UTAMA TEMPO, 2-8 Januari 2017
  7. 7. confidential 7 12 Januari 2017 KANTOR STAF PRESIDEN Kasus: Isu hoax menyerang pemerintah tentang 10 Juta Tenaga Kerja China Illegal. Media Kernels: • Menyajikan dua studi kasus: 10 Juta tenaga kerja china illegal, dan sentimen negatif terhadap gerakan anti hoax. • Menunjukkan timeline resonansi isu, dan peta percakapan dengan fitur SNA. • Menunjukkan kurang efektifnya komunikasi pemerintah, dan apa yang bisa dilakukan untuk perbaikan. FGD KEHUMASAN SELURUH KEMENTERIAN DAN LEMBAGA DI KANTOR STAF PRESIDEN (KSP)
  8. 8. confidential 8 22 Maret 2017 MATA NAJWA Kasus: Virus Dusta (alias Hoax) Nara Sumber: • Stanley (Dewan Pers) • Johan Budi (Stafsus Presiden) • Boy Rafli (Humas Polri) • Ismail Fahmi (MK) • Septiaji & Khairul Anshar (Masy. Anti Hoax) Media Kernels: • Menyajikan analisis ttg 10 Juta Tenaga Kerja China Illegal. • Hoax Panglima TNI vs PKI. MATA NAJWA LIVE ‘VIRUS DUSTA’
  9. 9. Data is New Gold 9
  10. 10. 10 6 Mei 2017
  11. 11. Data Collection: Gold = Expensive 11
  12. 12. Free Data 12
  13. 13. Twitter Analysis: World Eco. Forum 2016 13 https://medium.com/@swainjo/wef16-davos-twitter-sna-analysis-4c38cf4bc46d
  14. 14. 14
  15. 15. Arsitektur 15
  16. 16. MK Big Data Architecture confidential 16 News Crawler Twitter Crawler Twitter Streaming FB Page Crawler Data Pipeline Data SOLR Indexer 1 SOLR Indexer 2 SOLR Indexer 3 SOLR Indexer 4 Hadoop Framework Physical Hardware Insight DataIngest Management&Queue RealtimeJob Processing Google Custom Search Database Framework ScheduledJob Processing Map Reduce Sentiment Analysis Other Processings Data&Workflow Management Access Visualization Other sources Analytics UI
  17. 17. 17 Social Media Twitter Facebook Search+JSON Detik (ID) Reuters (EN) Etc.. RSS+HTML Gatra (ID) Bloomberg (EN) Etc.. HTML Kaskus Detik Forum Etc.. HTML Online News Forums Twitter StreamJSON Kompas TEXT Warta Ekonomi Etc.. Print PUSHJSON Subscriber Projects Storage Search + Account Crawler RSS + HTML Crawler HTML Crawler HTML Crawler SOLR Nodes Shard 1 SOLR Nodes Shard N Index Servers Redis Queue Cache Manager Mentions Storage Keywords + Accounts Filters deletes Sentiment Analysis Sentiment Models Backtrack Filters Sentiment Analysis Analyses Control Room Screens Smart phones, tablets Desktops Client(s) Converter System Architecture
  18. 18. Fitur-fitur Media Kernels confidential 18 Trends DASHBOARD Comparison Topic Map NEWS PORTAL Latest News Media ANALYTICS News Sites Page Ranks Sentiment Analysis PF-Chart Engagement Exposure Retweets TOPICS Replies Most Shared URLs Most Shared Videos Topic Map Word Cloud Impact INFLUENCERS Engagement Reach Most Engaged Followers Influencer Network SNA Topic Network PR-Values Reach Hashtags Posts Bubble Map Twitter User Map DEMOGRAPHY User Locations Edit Sentiments MENTIONS Training & Learning Backtracking Compare SNA COMPARE Compare Projects Popularity vs Favorability Background Jobs Upload Report REPORTING Download Report User Management ADMIN Project Management Client Management Source Management Label and Training OPINION ANALYSIS Opinion Chart Insight Explorer
  19. 19. News Crawler 19
  20. 20. Online News 20 Dan Ratusan Media Non-mainstream
  21. 21. Crawling Online News 21 Crawler Indeks Server
  22. 22. Web Crawler Tools 22 http://bigdata-madesimple.com/top-50-open-source- web-crawlers-for-data-mining/
  23. 23. Web Crawler Tools (2) 23 http://bigdata-madesimple.com/top-50-open-source- web-crawlers-for-data-mining/
  24. 24. Contoh: Scrapy.org 24
  25. 25. Web Crawler Drone Emprit 25 Bikin sendiri, powered by:
  26. 26. Anatomi: Metadata dan Fullteks 26 Ambil: Tanggal, judul, isi berita, penulis, url gambar Buang: Iklan, daftar headline, komentar.
  27. 27. Twitter API 27
  28. 28. API: search/tweets 28
  29. 29. Contoh: Free Twitter Search 29 History: 7 days Start search 100% results
  30. 30. API: Realtime (Sample) 30 Random SampleAll Statuses Kurang dari 10%
  31. 31. API: Realtime (Filter) 31
  32. 32. API: Realtime (Filter) 32 Filtered StatusesAll Statuses ~ 100% POST statuses/filter Filter max 400 keywords Filter: Max 400 keywords
  33. 33. API: > 400 keywords? 33 All Statuses Max 400 keywords Server IP Addr 1 Server IP Addr 2 Server IP Addr n Max 400 keywords Max 400 keywords
  34. 34. Twitter API Tools 34 Net::Twitter
  35. 35. Twitter API: Drone Emprit 35 Net::Twitter AnyEvent::Twitter::Stream
  36. 36. Facebook API 36
  37. 37. FB API (v1): Public Search 37 April 2014 à distop Facebook
  38. 38. FB API (v2): Searching 38
  39. 39. FB API (v2): Object 39 https://graph.facebook.com/$object_id/$type? fields=id, parent_id, from, to, type, status_type, story, message, link, likes.summary(true), shares, comments.order(reverse_chronological).summary(true), created_time, updated_time &order=reverse_chronological &access_token=$access_token&limit=$limit&until=$last_timestamp $object_id = FB Page ID, etc $type = [feed, comment, ...]
  40. 40. FB API Tools 40 Facebook::Graph fb 0.4.0
  41. 41. FB API: Drone Emprit 41 WWW::Curl Bikin sendiri, powered by:
  42. 42. Question: Perl or Python? 42 Of course!
  43. 43. Why Perl? 43 Perl yang menolong manusia setelah jatuh di bumi, dan tentu lebih ‘nyunah’ Python yang bikin Adam-Hawa tergoda, lalu turun dari surga
  44. 44. Search Engine/Indexing 44
  45. 45. Full Text Indexing 45 Data Sources Search Engine
  46. 46. Full Text Search Engines 46
  47. 47. Search Engine: Drone Emprit 47 Simple - Powerful - Robust - Scalable
  48. 48. Solr Server Configuration 48
  49. 49. Sharding 49
  50. 50. Replication 50
  51. 51. Analytics 51
  52. 52. Analytics: Server Configuration 52 Slave Analysis Results Analysis Processes
  53. 53. Analytics Engine 53 Search by Keywords News, Twits, Statuses, etc Sentiment Analysis Opinion Analysis Term Extraction Segmentation Quote Extraction Named Entity Recognition Search Results
  54. 54. Paragraph Segmentation 54 NEWS ARTICLES MENTIONS
  55. 55. Sentiment Analysis 55
  56. 56. Sentiment Analysis 56 Positif Negatif Netral ? MENTIONS
  57. 57. Sentiment Analysis 57 Positif ? MENTIONS Untuk Setya Novanto
  58. 58. Sentiment Analysis 58 Negatif? MENTIONS Untuk KPK
  59. 59. Sentiment Analysis 59 Netral ? MENTIONS Untuk Hakim Cepi Iskandar
  60. 60. Sentiment Analysis Techniques 60 http://www.sciencedirect.com/science/article/pii/S2090447914000550
  61. 61. Evaluasi 61 http://www.sciencedirect.com/science/article/pii/S2090447914000550 ”one model for all” tidak bisa memberi label yang tepat untuk setiap subyek. Lexicon base tergantung dari keberadaan kata dalam kamus sentimen, tidak bisa memberi label yang tepat untuk subyek yang berbeda.
  62. 62. Sentiment Analysis Tools 62 https://breakthroughanalysis.com/2012/01/08/what-are- the-most-powerful-open-source-sentiment-analysis-tools/ Text Mining Module
  63. 63. Sentiment Analysis: Drone Emprit 63 Adaptive Multiple Models
  64. 64. Training Data 64DOI: 10.1109/ICMLA.2015.22 81.000
  65. 65. Opinion Analysis 65
  66. 66. Kapolri: Opinion Analysis 66
  67. 67. Bersama DivHumas Polri di Kompas Petang 67
  68. 68. Fitur Opinion Analysis MK 68
  69. 69. Analisis Terhadap Statistik 69
  70. 70. Membaca Voice, bukan Noise 70
  71. 71. Analisis Terpengaruh Noise 71 Sayang, analisis berbasis ‘noise’ ini yang menjadi viral.
  72. 72. Opinion Analysis Techniques 72 Drone Emprit Regular Expression Opinion Analysis
  73. 73. Quote Extraction 73
  74. 74. Quote Extraction 74 QUOTE QUOTE HOLDER
  75. 75. Quote Extraction: Drone Emprit 75 Pattern Matching dengan Regular Expression
  76. 76. Named Entity Recognition 76
  77. 77. Named Entity Recognition 77 LOCATION PERSON ORGANIZATION
  78. 78. NER Tools 78
  79. 79. NER: Drone Emprit 79
  80. 80. Contoh NER 80
  81. 81. Clustering 81
  82. 82. Clustering 82
  83. 83. Clustering Types 83
  84. 84. Clustering Tools 84 http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm
  85. 85. Topic Map: Document Clustering 85
  86. 86. Social Network Analysis 86
  87. 87. SNA: Social Network Analysis • SNA adalah pemetaan terhadap relasi antar orang, organisasi, topik, lokasi, dan entitas informasi lainnya. • Node atau titik di dalam jaringan menggambarkan orang, organisasi, lokasi, atau entitas informasi. • Garis sambungan antar titik menggambarkan relasi antar titik. 87
  88. 88. Betweenness Centrality 88 Betweenness Centrality: a measure of centrality. Highest betweenness centrality (8 connections) Lowest betweenness centrality (4 connections)
  89. 89. Anatomi Sebuah Twit 89
  90. 90. Anatomi Sebuah Twit 90
  91. 91. Relasi Retweet 91
  92. 92. Link Functions: Retweet / Mention 92
  93. 93. Retweet Network
  94. 94. 94 Mention Network
  95. 95. Information Arbitrage 95
  96. 96. 96 Information arbitrage: translate information across groups
  97. 97. Visualization 97
  98. 98. User Dashboard 98 Analysis Results Slave
  99. 99. Visualization Tools 99
  100. 100. D3js.org 100
  101. 101. Drone Emprit is Hiring 101 System Administrator & Programmer
  102. 102. Terimakasih 102 Ismail Fahmi, PhD Drone Emprit PT Media Kernels Indonesia Email: ismail.fahmi@gmail.com Hp: 0812 8908 3894

×