SlideShare a Scribd company logo
Twitter: A source of Big Data in a
NSO.
Experimental Indicators in Sentiment and Mobility
The ESS Big Data Workshop 2016
Objective
To share the experience of INEGI in the use of Twitter as a
big data source.
Types of Big Data Sources
• Meters and smart sensors. Traffic cameras, GPS devices, power
consume meters, IoT, smartwatches, smartphones, etc.
• Social interactions. Conversations and publications on social
networks like Twitter, Facebook, FourSquare, etc.
• Business transactions. Credit cards movements, scanned data,
cell phone records, etc.
• Electronic files. Documents which are available in electronic
formats such as PDF files, websites, videos, audio, images, photos,
etc.
• Broadcast media. Digital video and audio streamed on real-time
Types of Big Data Sources
• Meters and smart sensors. Traffic cameras, GPS devices,
power consume meters, IoT, smartwatches, smartphones, etc.
• Social interactions. Conversations and publications on social
networks like Twitter, Facebook, FourSquare, etc.
• Business transactions. Credit cards movements, electronic
cash registers, cell phone records, etc.
• Electronic files. Documents which are available in electronic
formats such as PDF files, websites, videos, audio, digital media
broadcasting
• Broadcast media. Digital video and audio streamed on real-time
Process Followed by INEGI (until now)
Production process
Study Case
• Initial objective of INEGI’s Big Data Project: To generate
experimental indicators using Big Data techniques with
social media data, to complement statistical information
obtained from traditional methods and sources.
• Initial Goal: To obtain indicators of subjective wellbeing
from social media data sources.
Why did we choose Twitter as a data
source?
• It’s a widely adopted social network where you can find content
written by common people
• Tweets are public, so we can use them without concerns about
privacy
• There is a free API which allows to get up to 1% of the tweets
that are being produced on real time
• ( https://dev.twitter.com/streaming )
In the beginning…
February 2014 – October 2015
24/7
Collection / Analysis infrastructure
Since October 2015 - …
(EMC / VMWare)
LOGSTASH
Location Query
Free Access
https://twittercommunity.com/t/stream-filter-sample-size/30865
Apache Spark
Clean & Sentiment
Analysis
Tweets
Daily
Processing
(4 a.m.)
300 K
Geo-Tweets
Minimal
Representation
~260 Millions
of Geo-Tweets
~130 Millions inside Mexico
~ 2 Years and 8 Months ~ 24/7
Multivariate Stratification
of blocks in Apache Spark
%Acceso a Internet, %Pc, %Telefono Celular, %Automovil
Software Stack (2016)
Tweet Structure
Tweets Map
Tweets Map
We found that
• The JSON structure is easy to process (APACHE SPARK)
• The content is text that we can examine to make the
sentiment analysis
• Geographical coordinates can be used to filter the tweets
and obtain only those of interest (warning: not all the
tweets are geo referenced)
• We can make mobility analysis, based in Tweets’ location
and time (a serendipity)
Exploring Big Data Base
Supervised Training
Manual Tagging
5000 people (TecMilenio students),
100 tweets tagged by each one,
each tweet was tagged nine times,
about 40,000 different tagged tweets,
interpreted accordingly to regional
idioms
http://cienciadedatos.inegi.org.mx/animotuitero/
They’re not enchiladas!
Training and modeling Collaborative
Research
Python
Implementation
Sentiment Visualization (2015)
(Monthly Indicator)
http://www.inegi.org.mx/inegi/contenidos/investigacion/experimentales/animotuitero/default.aspx
{JSON:File}
Sentiment Visualization (Late 2016)
(Daily Indicator)
C#
{RESTful:API}
{NoSQL}
Mobility (Late 2016)
Integration of other sources
Big Spatial Join
(5 M Economic Units with +60 M Tweets)
https://github.com/syoummer/SpatialSpark
Some applications
• Tourism
• Migration
• Use of roads
• Regional influence of big cities
• Mobility patterns
• Business activity patterns
• Subjective wellbeing
• Inequities impact
• Impact analysis of relevant
news
• Mental health
• Misogynist/discriminatory
language use
• SDG indicators?
Collaboration
• International
– UNECE
• ICHEC
– UNSD
– LAMBDoop
– University of Pensylvania
• National
– KioNetworks
– Dattlas
– TecMilenio
– INFOTEC
– Centro Geo
– CIDE
– CIMAT
– Sectur
• Internal
– INEGI General Directorates
Questions?
Abel.Coronado@inegi.org.mx
M.Sc. Abel Coronado
@abxda
Conociendo México
01 800 111 46 34
www.inegi.org.mx
atencion.usuarios@inegi.org.mx
@inegi_informa INEGI Informa

More Related Content

What's hot

The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
Rainer Sternfeld
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
Splunk
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
Revolution Analytics
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge
Christopher Williams
 
Exploring the Great Olympian Graph
Exploring the Great Olympian GraphExploring the Great Olympian Graph
Exploring the Great Olympian Graph
Neo4j
 
From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data Scientist
Gaines Kergosien
 
Schneller Nutzen mit Neo4j: das Beispiel Panama Papers
Schneller Nutzen mit Neo4j: das Beispiel Panama PapersSchneller Nutzen mit Neo4j: das Beispiel Panama Papers
Schneller Nutzen mit Neo4j: das Beispiel Panama Papers
Neo4j
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
South London Geek Nights
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
Marissa Kobylenski
 
A Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's DatasetsA Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's Datasets
Oscar Corcho
 
Hardcore Data Science - in Practice
Hardcore Data Science - in PracticeHardcore Data Science - in Practice
Hardcore Data Science - in Practice
Mikio L. Braun
 
It takes a village (to raise a ML model)
It takes a village (to raise a ML model)It takes a village (to raise a ML model)
It takes a village (to raise a ML model)
Anselmo Rodrigues da Silva
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT Operations
Neo4j
 
Big data - An Introduction
Big data - An IntroductionBig data - An Introduction
Big data - An Introduction
Spotle.ai
 
What you need to know to start an AI company?
What you need to know to start an AI company?What you need to know to start an AI company?
What you need to know to start an AI company?
Mo Patel
 
Data analytics using the cloud challenges and opportunities for india
Data analytics using the cloud   challenges and opportunities for india Data analytics using the cloud   challenges and opportunities for india
Data analytics using the cloud challenges and opportunities for india
Ajay Ohri
 
Skillshare - Let's talk about R in Data Journalism
Skillshare - Let's talk about R in Data JournalismSkillshare - Let's talk about R in Data Journalism
Skillshare - Let's talk about R in Data Journalism
School of Data
 
Resume (kaushik shakkari)
Resume (kaushik shakkari)Resume (kaushik shakkari)
Resume (kaushik shakkari)
Kaushik Shakkari
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
Nguyen Cao
 
Autodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataAutodiscovery or The long tail of open data
Autodiscovery or The long tail of open data
Connected Data World
 

What's hot (20)

The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...The Critical Role of IoT Data Integration to develop Big Data Applications (f...
The Critical Role of IoT Data Integration to develop Big Data Applications (f...
 
Getting Started with Splunk Breakout Session
Getting Started with Splunk Breakout SessionGetting Started with Splunk Breakout Session
Getting Started with Splunk Breakout Session
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge2017-01-08-scaling tribalknowledge
2017-01-08-scaling tribalknowledge
 
Exploring the Great Olympian Graph
Exploring the Great Olympian GraphExploring the Great Olympian Graph
Exploring the Great Olympian Graph
 
From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data Scientist
 
Schneller Nutzen mit Neo4j: das Beispiel Panama Papers
Schneller Nutzen mit Neo4j: das Beispiel Panama PapersSchneller Nutzen mit Neo4j: das Beispiel Panama Papers
Schneller Nutzen mit Neo4j: das Beispiel Panama Papers
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
 
A Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's DatasetsA Linked Data Dataset for Madrid Transport Authority's Datasets
A Linked Data Dataset for Madrid Transport Authority's Datasets
 
Hardcore Data Science - in Practice
Hardcore Data Science - in PracticeHardcore Data Science - in Practice
Hardcore Data Science - in Practice
 
It takes a village (to raise a ML model)
It takes a village (to raise a ML model)It takes a village (to raise a ML model)
It takes a village (to raise a ML model)
 
Network and IT Operations
Network and IT OperationsNetwork and IT Operations
Network and IT Operations
 
Big data - An Introduction
Big data - An IntroductionBig data - An Introduction
Big data - An Introduction
 
What you need to know to start an AI company?
What you need to know to start an AI company?What you need to know to start an AI company?
What you need to know to start an AI company?
 
Data analytics using the cloud challenges and opportunities for india
Data analytics using the cloud   challenges and opportunities for india Data analytics using the cloud   challenges and opportunities for india
Data analytics using the cloud challenges and opportunities for india
 
Skillshare - Let's talk about R in Data Journalism
Skillshare - Let's talk about R in Data JournalismSkillshare - Let's talk about R in Data Journalism
Skillshare - Let's talk about R in Data Journalism
 
Resume (kaushik shakkari)
Resume (kaushik shakkari)Resume (kaushik shakkari)
Resume (kaushik shakkari)
 
Introduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & ApplicationsIntroduction to Big Data Technologies & Applications
Introduction to Big Data Technologies & Applications
 
Autodiscovery or The long tail of open data
Autodiscovery or The long tail of open dataAutodiscovery or The long tail of open data
Autodiscovery or The long tail of open data
 

Viewers also liked

Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGIEjemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
Abel Alejandro Coronado Iruegas
 
Big data taller inegi sedesol
Big data taller inegi sedesolBig data taller inegi sedesol
Big data taller inegi sedesol
Abel Alejandro Coronado Iruegas
 
PresentacionParaINFOTEC
PresentacionParaINFOTECPresentacionParaINFOTEC
PresentacionParaINFOTEC
Abel Alejandro Coronado Iruegas
 
Scala 1
Scala 1Scala 1
Realidades y Sueños de Big Data en México
Realidades y Sueños de Big Data en MéxicoRealidades y Sueños de Big Data en México
Realidades y Sueños de Big Data en México
Abel Alejandro Coronado Iruegas
 
Revelando los secretos de las redes sociales, Universidad Autónoma de Aguasca...
Revelando los secretos de las redes sociales, Universidad Autónoma de Aguasca...Revelando los secretos de las redes sociales, Universidad Autónoma de Aguasca...
Revelando los secretos de las redes sociales, Universidad Autónoma de Aguasca...
Abel Alejandro Coronado Iruegas
 
Taller de Big Data y Ciencia de Datos en COLMEX dia 2
Taller de Big Data y Ciencia de Datos en COLMEX dia 2Taller de Big Data y Ciencia de Datos en COLMEX dia 2
Taller de Big Data y Ciencia de Datos en COLMEX dia 2
Abel Alejandro Coronado Iruegas
 
Big data lead colmex
Big data lead colmexBig data lead colmex
Big data lead colmex
Abel Alejandro Coronado Iruegas
 
Taller de Big Data y Ciencia de Datos en COLMEX dia 1
Taller de Big Data y Ciencia de Datos en COLMEX dia 1 Taller de Big Data y Ciencia de Datos en COLMEX dia 1
Taller de Big Data y Ciencia de Datos en COLMEX dia 1
Abel Alejandro Coronado Iruegas
 
Big data big opportunities
Big data big opportunitiesBig data big opportunities
Big data big opportunities
Abel Alejandro Coronado Iruegas
 
Que es big data huejutla uaeh
Que es big data huejutla uaehQue es big data huejutla uaeh
Que es big data huejutla uaeh
Abel Alejandro Coronado Iruegas
 
Explorando Big Data y Ciencia de Datos con GPUs
Explorando Big Data y Ciencia de Datos con GPUsExplorando Big Data y Ciencia de Datos con GPUs
Explorando Big Data y Ciencia de Datos con GPUs
Abel Alejandro Coronado Iruegas
 
¿Qué es big data?
¿Qué es big data?¿Qué es big data?
¿Qué es big data?
Abel Alejandro Coronado Iruegas
 
Revelando los secretos de twitter en México sg virtual
Revelando los secretos de twitter en México sg virtualRevelando los secretos de twitter en México sg virtual
Revelando los secretos de twitter en México sg virtual
Abel Alejandro Coronado Iruegas
 
Anatomía de un proyecto de Big Data
Anatomía de un proyecto de Big DataAnatomía de un proyecto de Big Data
Anatomía de un proyecto de Big Data
Abel Alejandro Coronado Iruegas
 
Geo Big Data 2015
Geo Big Data 2015 Geo Big Data 2015
Big Data, Revelando los secretos de twitter, CIMAT Zacatecas 2014
Big Data, Revelando los secretos de twitter, CIMAT Zacatecas 2014Big Data, Revelando los secretos de twitter, CIMAT Zacatecas 2014
Big Data, Revelando los secretos de twitter, CIMAT Zacatecas 2014
Abel Alejandro Coronado Iruegas
 
Revelando los secretos de las redes sociales
Revelando los secretos de las redes socialesRevelando los secretos de las redes sociales
Revelando los secretos de las redes sociales
Abel Alejandro Coronado Iruegas
 
Revelando los secretos de twitter, Festival de Software Libre 2014
Revelando los secretos de twitter, Festival de Software Libre 2014Revelando los secretos de twitter, Festival de Software Libre 2014
Revelando los secretos de twitter, Festival de Software Libre 2014
Abel Alejandro Coronado Iruegas
 

Viewers also liked (19)

Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGIEjemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
Ejemplos de Proyectos de Ciencia de Datos y Big Data en el INEGI
 
Big data taller inegi sedesol
Big data taller inegi sedesolBig data taller inegi sedesol
Big data taller inegi sedesol
 
PresentacionParaINFOTEC
PresentacionParaINFOTECPresentacionParaINFOTEC
PresentacionParaINFOTEC
 
Scala 1
Scala 1Scala 1
Scala 1
 
Realidades y Sueños de Big Data en México
Realidades y Sueños de Big Data en MéxicoRealidades y Sueños de Big Data en México
Realidades y Sueños de Big Data en México
 
Revelando los secretos de las redes sociales, Universidad Autónoma de Aguasca...
Revelando los secretos de las redes sociales, Universidad Autónoma de Aguasca...Revelando los secretos de las redes sociales, Universidad Autónoma de Aguasca...
Revelando los secretos de las redes sociales, Universidad Autónoma de Aguasca...
 
Taller de Big Data y Ciencia de Datos en COLMEX dia 2
Taller de Big Data y Ciencia de Datos en COLMEX dia 2Taller de Big Data y Ciencia de Datos en COLMEX dia 2
Taller de Big Data y Ciencia de Datos en COLMEX dia 2
 
Big data lead colmex
Big data lead colmexBig data lead colmex
Big data lead colmex
 
Taller de Big Data y Ciencia de Datos en COLMEX dia 1
Taller de Big Data y Ciencia de Datos en COLMEX dia 1 Taller de Big Data y Ciencia de Datos en COLMEX dia 1
Taller de Big Data y Ciencia de Datos en COLMEX dia 1
 
Big data big opportunities
Big data big opportunitiesBig data big opportunities
Big data big opportunities
 
Que es big data huejutla uaeh
Que es big data huejutla uaehQue es big data huejutla uaeh
Que es big data huejutla uaeh
 
Explorando Big Data y Ciencia de Datos con GPUs
Explorando Big Data y Ciencia de Datos con GPUsExplorando Big Data y Ciencia de Datos con GPUs
Explorando Big Data y Ciencia de Datos con GPUs
 
¿Qué es big data?
¿Qué es big data?¿Qué es big data?
¿Qué es big data?
 
Revelando los secretos de twitter en México sg virtual
Revelando los secretos de twitter en México sg virtualRevelando los secretos de twitter en México sg virtual
Revelando los secretos de twitter en México sg virtual
 
Anatomía de un proyecto de Big Data
Anatomía de un proyecto de Big DataAnatomía de un proyecto de Big Data
Anatomía de un proyecto de Big Data
 
Geo Big Data 2015
Geo Big Data 2015 Geo Big Data 2015
Geo Big Data 2015
 
Big Data, Revelando los secretos de twitter, CIMAT Zacatecas 2014
Big Data, Revelando los secretos de twitter, CIMAT Zacatecas 2014Big Data, Revelando los secretos de twitter, CIMAT Zacatecas 2014
Big Data, Revelando los secretos de twitter, CIMAT Zacatecas 2014
 
Revelando los secretos de las redes sociales
Revelando los secretos de las redes socialesRevelando los secretos de las redes sociales
Revelando los secretos de las redes sociales
 
Revelando los secretos de twitter, Festival de Software Libre 2014
Revelando los secretos de twitter, Festival de Software Libre 2014Revelando los secretos de twitter, Festival de Software Libre 2014
Revelando los secretos de twitter, Festival de Software Libre 2014
 

Similar to INEGI ESS big data workshop

Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Enrico Motta
 
Information Management Trends 2009
Information Management Trends 2009Information Management Trends 2009
Information Management Trends 2009
Christopher Eagle
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
Suman Saurabh
 
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
Paolo Nesi
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Artificial Intelligence Institute at UofSC
 
SFX, Metalib, mobile services and social networks
SFX, Metalib, mobile services and social networksSFX, Metalib, mobile services and social networks
SFX, Metalib, mobile services and social networks
Ian Clark
 
MAX: Realtime messaging and activity stream engine
MAX: Realtime messaging and activity stream engineMAX: Realtime messaging and activity stream engine
MAX: Realtime messaging and activity stream engine
sneridagh
 
Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analytics
MJ Xavier
 
Itsme_MBD@DomusAcademy04/2009
Itsme_MBD@DomusAcademy04/2009Itsme_MBD@DomusAcademy04/2009
Itsme_MBD@DomusAcademy04/2009
Itsmecommunity
 
Social Media Analytics for Official Statistics
Social Media Analytics for Official StatisticsSocial Media Analytics for Official Statistics
Social Media Analytics for Official Statistics
Ismail Fahmi
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
Andrea Volpini
 
FinalPPT-StJoseph (3).pptx
FinalPPT-StJoseph (3).pptxFinalPPT-StJoseph (3).pptx
FinalPPT-StJoseph (3).pptx
ssuser046cf5
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
stelligence
 
(Some of) Wikipedia's Open Data
(Some of) Wikipedia's Open Data(Some of) Wikipedia's Open Data
(Some of) Wikipedia's Open Data
nuria_ruiz
 
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city ServicesOntology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
Paolo Nesi
 
Wherever Your Patrons Are: Mobile Services for Libraries
Wherever Your Patrons Are: Mobile Services for LibrariesWherever Your Patrons Are: Mobile Services for Libraries
Wherever Your Patrons Are: Mobile Services for Libraries
Meredith Farkas
 
Michael Yoch (NPR) - The NPR API: Powering "Radio" in a Multiplatform World
Michael Yoch (NPR) - The NPR API: Powering "Radio" in a Multiplatform WorldMichael Yoch (NPR) - The NPR API: Powering "Radio" in a Multiplatform World
Michael Yoch (NPR) - The NPR API: Powering "Radio" in a Multiplatform World
Radiocamp 2011
 
Transmission Media for Networking
Transmission Media for NetworkingTransmission Media for Networking
Presentation of context: Web Annotations (& Pundit) during the StoM Project (...
Presentation of context: Web Annotations (& Pundit) during the StoM Project (...Presentation of context: Web Annotations (& Pundit) during the StoM Project (...
Presentation of context: Web Annotations (& Pundit) during the StoM Project (...
Net7
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysis
AntaraBhattacharya12
 

Similar to INEGI ESS big data workshop (20)

Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Information Management Trends 2009
Information Management Trends 2009Information Management Trends 2009
Information Management Trends 2009
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
Snap4City: Smart City IOT/IOE Platform scalable Smart aNalytic APplication bu...
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
SFX, Metalib, mobile services and social networks
SFX, Metalib, mobile services and social networksSFX, Metalib, mobile services and social networks
SFX, Metalib, mobile services and social networks
 
MAX: Realtime messaging and activity stream engine
MAX: Realtime messaging and activity stream engineMAX: Realtime messaging and activity stream engine
MAX: Realtime messaging and activity stream engine
 
Data sciences and marketing analytics
Data sciences and marketing analyticsData sciences and marketing analytics
Data sciences and marketing analytics
 
Itsme_MBD@DomusAcademy04/2009
Itsme_MBD@DomusAcademy04/2009Itsme_MBD@DomusAcademy04/2009
Itsme_MBD@DomusAcademy04/2009
 
Social Media Analytics for Official Statistics
Social Media Analytics for Official StatisticsSocial Media Analytics for Official Statistics
Social Media Analytics for Official Statistics
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
FinalPPT-StJoseph (3).pptx
FinalPPT-StJoseph (3).pptxFinalPPT-StJoseph (3).pptx
FinalPPT-StJoseph (3).pptx
 
SuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-finalSuanIct-Bigdata desktop-final
SuanIct-Bigdata desktop-final
 
(Some of) Wikipedia's Open Data
(Some of) Wikipedia's Open Data(Some of) Wikipedia's Open Data
(Some of) Wikipedia's Open Data
 
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city ServicesOntology Building vs Data Harvesting and Cleaning for Smart-city Services
Ontology Building vs Data Harvesting and Cleaning for Smart-city Services
 
Wherever Your Patrons Are: Mobile Services for Libraries
Wherever Your Patrons Are: Mobile Services for LibrariesWherever Your Patrons Are: Mobile Services for Libraries
Wherever Your Patrons Are: Mobile Services for Libraries
 
Michael Yoch (NPR) - The NPR API: Powering "Radio" in a Multiplatform World
Michael Yoch (NPR) - The NPR API: Powering "Radio" in a Multiplatform WorldMichael Yoch (NPR) - The NPR API: Powering "Radio" in a Multiplatform World
Michael Yoch (NPR) - The NPR API: Powering "Radio" in a Multiplatform World
 
Transmission Media for Networking
Transmission Media for NetworkingTransmission Media for Networking
Transmission Media for Networking
 
Presentation of context: Web Annotations (& Pundit) during the StoM Project (...
Presentation of context: Web Annotations (& Pundit) during the StoM Project (...Presentation of context: Web Annotations (& Pundit) during the StoM Project (...
Presentation of context: Web Annotations (& Pundit) during the StoM Project (...
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysis
 

More from Abel Alejandro Coronado Iruegas

Mobility Master Class.pdf
Mobility Master Class.pdfMobility Master Class.pdf
Mobility Master Class.pdf
Abel Alejandro Coronado Iruegas
 
Live UAEMex Cubo de Datos Geoespaciales de Mexico
Live UAEMex Cubo de Datos Geoespaciales de MexicoLive UAEMex Cubo de Datos Geoespaciales de Mexico
Live UAEMex Cubo de Datos Geoespaciales de Mexico
Abel Alejandro Coronado Iruegas
 
Cubo de datos uaemex
Cubo de datos uaemexCubo de datos uaemex
Cubo de datos uaemex
Abel Alejandro Coronado Iruegas
 
Geo Big Data 4 Datalab
Geo Big Data 4 DatalabGeo Big Data 4 Datalab
Geo Big Data 4 Datalab
Abel Alejandro Coronado Iruegas
 
Catedra INEGI Big Data en IBERO
Catedra INEGI Big Data en IBEROCatedra INEGI Big Data en IBERO
Catedra INEGI Big Data en IBERO
Abel Alejandro Coronado Iruegas
 
Integrating eo with official statistics using machine learning in mexico geo ...
Integrating eo with official statistics using machine learning in mexico geo ...Integrating eo with official statistics using machine learning in mexico geo ...
Integrating eo with official statistics using machine learning in mexico geo ...
Abel Alejandro Coronado Iruegas
 
Machine learning and Satellite Images
Machine learning and Satellite ImagesMachine learning and Satellite Images
Machine learning and Satellite Images
Abel Alejandro Coronado Iruegas
 
El Cubo de Datos Geoespaciales de Mexico
El Cubo de Datos Geoespaciales de MexicoEl Cubo de Datos Geoespaciales de Mexico
El Cubo de Datos Geoespaciales de Mexico
Abel Alejandro Coronado Iruegas
 
No Sql
No SqlNo Sql
Cubo de Datos Geoespaciales de Mexico
Cubo de Datos Geoespaciales de MexicoCubo de Datos Geoespaciales de Mexico
Cubo de Datos Geoespaciales de Mexico
Abel Alejandro Coronado Iruegas
 
Congreso UAA 2018 Animo Tuitero 2 0
Congreso UAA 2018 Animo Tuitero 2 0Congreso UAA 2018 Animo Tuitero 2 0
Congreso UAA 2018 Animo Tuitero 2 0
Abel Alejandro Coronado Iruegas
 
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en MexicoAnalisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
Abel Alejandro Coronado Iruegas
 

More from Abel Alejandro Coronado Iruegas (12)

Mobility Master Class.pdf
Mobility Master Class.pdfMobility Master Class.pdf
Mobility Master Class.pdf
 
Live UAEMex Cubo de Datos Geoespaciales de Mexico
Live UAEMex Cubo de Datos Geoespaciales de MexicoLive UAEMex Cubo de Datos Geoespaciales de Mexico
Live UAEMex Cubo de Datos Geoespaciales de Mexico
 
Cubo de datos uaemex
Cubo de datos uaemexCubo de datos uaemex
Cubo de datos uaemex
 
Geo Big Data 4 Datalab
Geo Big Data 4 DatalabGeo Big Data 4 Datalab
Geo Big Data 4 Datalab
 
Catedra INEGI Big Data en IBERO
Catedra INEGI Big Data en IBEROCatedra INEGI Big Data en IBERO
Catedra INEGI Big Data en IBERO
 
Integrating eo with official statistics using machine learning in mexico geo ...
Integrating eo with official statistics using machine learning in mexico geo ...Integrating eo with official statistics using machine learning in mexico geo ...
Integrating eo with official statistics using machine learning in mexico geo ...
 
Machine learning and Satellite Images
Machine learning and Satellite ImagesMachine learning and Satellite Images
Machine learning and Satellite Images
 
El Cubo de Datos Geoespaciales de Mexico
El Cubo de Datos Geoespaciales de MexicoEl Cubo de Datos Geoespaciales de Mexico
El Cubo de Datos Geoespaciales de Mexico
 
No Sql
No SqlNo Sql
No Sql
 
Cubo de Datos Geoespaciales de Mexico
Cubo de Datos Geoespaciales de MexicoCubo de Datos Geoespaciales de Mexico
Cubo de Datos Geoespaciales de Mexico
 
Congreso UAA 2018 Animo Tuitero 2 0
Congreso UAA 2018 Animo Tuitero 2 0Congreso UAA 2018 Animo Tuitero 2 0
Congreso UAA 2018 Animo Tuitero 2 0
 
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en MexicoAnalisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
Analisis del Sentimiento en el Estado de Animo de los Tuiteros en Mexico
 

Recently uploaded

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 

Recently uploaded (20)

Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 

INEGI ESS big data workshop

  • 1. Twitter: A source of Big Data in a NSO. Experimental Indicators in Sentiment and Mobility The ESS Big Data Workshop 2016
  • 2. Objective To share the experience of INEGI in the use of Twitter as a big data source.
  • 3. Types of Big Data Sources • Meters and smart sensors. Traffic cameras, GPS devices, power consume meters, IoT, smartwatches, smartphones, etc. • Social interactions. Conversations and publications on social networks like Twitter, Facebook, FourSquare, etc. • Business transactions. Credit cards movements, scanned data, cell phone records, etc. • Electronic files. Documents which are available in electronic formats such as PDF files, websites, videos, audio, images, photos, etc. • Broadcast media. Digital video and audio streamed on real-time
  • 4. Types of Big Data Sources • Meters and smart sensors. Traffic cameras, GPS devices, power consume meters, IoT, smartwatches, smartphones, etc. • Social interactions. Conversations and publications on social networks like Twitter, Facebook, FourSquare, etc. • Business transactions. Credit cards movements, electronic cash registers, cell phone records, etc. • Electronic files. Documents which are available in electronic formats such as PDF files, websites, videos, audio, digital media broadcasting • Broadcast media. Digital video and audio streamed on real-time
  • 5. Process Followed by INEGI (until now)
  • 7. Study Case • Initial objective of INEGI’s Big Data Project: To generate experimental indicators using Big Data techniques with social media data, to complement statistical information obtained from traditional methods and sources. • Initial Goal: To obtain indicators of subjective wellbeing from social media data sources.
  • 8. Why did we choose Twitter as a data source? • It’s a widely adopted social network where you can find content written by common people • Tweets are public, so we can use them without concerns about privacy • There is a free API which allows to get up to 1% of the tweets that are being produced on real time • ( https://dev.twitter.com/streaming )
  • 9. In the beginning… February 2014 – October 2015 24/7
  • 10. Collection / Analysis infrastructure Since October 2015 - … (EMC / VMWare) LOGSTASH Location Query Free Access https://twittercommunity.com/t/stream-filter-sample-size/30865 Apache Spark Clean & Sentiment Analysis Tweets Daily Processing (4 a.m.) 300 K Geo-Tweets Minimal Representation ~260 Millions of Geo-Tweets ~130 Millions inside Mexico ~ 2 Years and 8 Months ~ 24/7
  • 11. Multivariate Stratification of blocks in Apache Spark %Acceso a Internet, %Pc, %Telefono Celular, %Automovil
  • 16. We found that • The JSON structure is easy to process (APACHE SPARK) • The content is text that we can examine to make the sentiment analysis • Geographical coordinates can be used to filter the tweets and obtain only those of interest (warning: not all the tweets are geo referenced) • We can make mobility analysis, based in Tweets’ location and time (a serendipity)
  • 18. Supervised Training Manual Tagging 5000 people (TecMilenio students), 100 tweets tagged by each one, each tweet was tagged nine times, about 40,000 different tagged tweets, interpreted accordingly to regional idioms http://cienciadedatos.inegi.org.mx/animotuitero/ They’re not enchiladas!
  • 19. Training and modeling Collaborative Research Python Implementation
  • 20. Sentiment Visualization (2015) (Monthly Indicator) http://www.inegi.org.mx/inegi/contenidos/investigacion/experimentales/animotuitero/default.aspx {JSON:File}
  • 21. Sentiment Visualization (Late 2016) (Daily Indicator) C# {RESTful:API} {NoSQL}
  • 23. Integration of other sources Big Spatial Join (5 M Economic Units with +60 M Tweets) https://github.com/syoummer/SpatialSpark
  • 24. Some applications • Tourism • Migration • Use of roads • Regional influence of big cities • Mobility patterns • Business activity patterns • Subjective wellbeing • Inequities impact • Impact analysis of relevant news • Mental health • Misogynist/discriminatory language use • SDG indicators?
  • 25. Collaboration • International – UNECE • ICHEC – UNSD – LAMBDoop – University of Pensylvania • National – KioNetworks – Dattlas – TecMilenio – INFOTEC – Centro Geo – CIDE – CIMAT – Sectur • Internal – INEGI General Directorates
  • 27. Conociendo México 01 800 111 46 34 www.inegi.org.mx atencion.usuarios@inegi.org.mx @inegi_informa INEGI Informa