The document provides an overview of extracting insights from big data. It discusses sources of big data, the process of extracting information from big data including data acquisition, pre-processing, exploration and modeling, visualization and reporting. It covers characteristics of big and small data, challenges of big data, and methods used in machine learning including clustering, classification, forecasting, and their applications to Twitter data including analysis of people, tweets, and geolocation information. Examples of research using big data from social media are also mentioned.
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Getting Insight from Big Data
1. Getting Insight from Big Data
Canggih Puspo Wibowo
Ujang Fahmi
Forum Penelitian Bulanan organized by MKP, FISIPOL, UGM
2. Purposes
1. Mengetahui dan megenal sumber-sumber big data
2. Mengetahui proses ekstraksi informasi dari big data
3. Mengetahui metode-metode yang sering digunakan
dalam pengelolaan Big Data
4. Melakukan percobaan untuk memahami data, proses,
dan hasil pengolahan data melalui dashboard
3. What, Why...?
Large volumes of data that are produced routinely by organizations and are too complex for standard software
packages to process (Mayer-Schonberger & Cukier, 2013).
In the United Kingdom, local
governments have been identified
as one segment of the public sector
which can mostly benefit from the
systematic exploitation of Big Data,
with some commentators
suggesting it can help them save up
to £25.4 billion over five years
(Policy Exchange, 2015).
4. Characteristic Small data Big data
Data sources Traditional enterprise data, user studies,
personal data
Social media, sensor data, log data, device data,
video, images
Volume Up to Gigabytes Terabytes or more
Velocity Slow: batch to real time, not always a fast
response is needed
Fast: often real time, immediate response needed
Variety Structured, semi-structured, unstructured Unstructured, structured, multi-structured
Veracity Easier Harder
Value Business intelligence, analysis and reporting Complex, advanced, predictive business analysis
and insights
Scope Partial: small populations, samples Exhaustive: continuous streams, entire populations
Privacy Usually private Private but also anonymous
Density Coarse to dense Dense
Identity Weak to Precise Precise
Relations Weak to strong Strong
Flexibility Low to Middle High
Small vs BIG data
30. Tweet Clustering
Sample Tweets
Mas @ujang, ayo main PES
Dulu pas masih murah gak mau, sekarang?
Duet CR-Dybala kerenn bangett #forzajuve
Gaji tenaga pendidik memang harus naik..
Cuma aku kah yang gak nonton avenger?
CBOW lebih mudah konvergen dibanding
Skipgram?
Skipgram dengan 100 vektor data sudah
cukup kok, kebanyakan nanti bikin lambat
Cluster A
Cluster B
Cluster C
31. Tweet Classification
Sample Tweets
Mas @ujang, ayo main PES
Dulu pas masih murah gak mau, sekarang?
Duet CR-Dybala kerenn bangett #forzajuve
Gaji tenaga pendidik memang harus naik..
Cuma aku kah yang gak nonton avenger?
CBOW lebih mudah konvergen dibanding
Skipgram?
Skipgram dengan 100 vektor data sudah
cukup kok, kebanyakan nanti bikin lambat
Bola
Lain-lain
Komputer
32. Tweet Classification
Sample Tweets
Mas @ujang, ayo main PES
Dulu pas masih murah gak mau, sekarang?
Duet CR-Dybala kerenn bangett #forzajuve
Gaji tenaga pendidik memang harus naik..
Cuma aku kah yang gak nonton avenger?
CBOW lebih mudah konvergen dibanding
Skipgram?
Skipgram dengan 100 vektor data sudah
cukup kok, kebanyakan nanti bikin lambat
Positive
Negative
Sentiment
Neutral
33. Tweet Topic Model
Sample Tweets
Mas @ujang, ayo main PES
Dulu pas masih murah gak mau, sekarang?
Duet CR-Dybala kerenn bangett #forzajuve
Gaji tenaga pendidik memang harus naik..
Cuma aku kah yang gak nonton avenger?
CBOW lebih mudah konvergen dibanding
Skipgram?
Skipgram dengan 100 vektor data sudah
cukup kok, kebanyakan nanti bikin lambat
Topic A: bola, game, main
Topic B: film, main, uang
Topic C: vektor, skipgram,
cbow
Topic D: …
ALL Tweets Approach
34. Tweet Topic Model
Sample Tweets
Mas @ujang, ayo main PES
Dulu pas masih murah gak mau, sekarang?
Duet CR-Dybala kerenn bangett #forzajuve
Gaji tenaga pendidik memang harus naik..
Cuma aku kah yang gak nonton avenger?
CBOW lebih mudah konvergen dibanding
Skipgram?
Skipgram dengan 100 vektor data sudah
cukup kok, kebanyakan nanti bikin lambat
Topic each tweet
Topic A, Topic E, Topic B, ..
Topic C, Topic A, Topic F, ..
Topic A, Topic B, Topic C, …
…
…
…
…
EACH Tweet Approach
39. Research Health
Modeling Spread of Disease from Social Interactions
Sadilek, A., Kautz, H. A., & Silenzio, V. (2012, June). Modeling
Spread of Disease from Social Interactions. In ICWSM (pp. 322-
329).
40. Research Linguistic
Diffusion of Lexical Change in Social Media
Eisenstein, J., O'Connor, B., Smith, N. A., & Xing, E.
P. (2014). Diffusion of lexical change in social
media. PloS one, 9(11), e113114.
41. Research Psychology
Modeling Public Mood
and Emotion: Twitter
Sentiment and Socio-
Economic Phenomena
Bollen, J., Mao, H., & Pepe, A. (2011). Modeling
public mood and emotion: Twitter sentiment and
socio-economic phenomena. Icwsm, 11, 450-453.
42. Bollen, J., Mao, H., & Zeng, X. (2011). Twitter
mood predicts the stock market. Journal of
computational science, 2(1), 1-8.
Research Economy
Twitter mood predicts the stock market