Social media with big data analytics

Social Media with Big Data Analytics
Mohammed Zuhair Al-Taie
Big Data Centre - Universiti Teknologi Malaysia - 2016

AGENDA
Web 2.0
Social Media
Big Data
Social Media with Big Data Analytics
Social Network Analysis
Sentiment Analysis

Web 2.0 is
A Complex,
Organic Online
Conversation
WHAT IS WEB 2.0?
Web 2.0 is powered by:
• Social Networks
•News and
Bookmarking
•Blogs
•Microblogging
•Video/Photo-sharing
•Message Boards
•Wikis
•Virtual reality
•Social gaming
•Podcasts
•Real Simple
syndication (RSS)
•Social Media Press
Release

TECHNOLOGY OVERVIEW
Search: The ease of finding information through keyword search
Links: Ad-hoc guides to other relevant information
Authoring: The ability to create constantly updating content over a platform
that is shifted from being the creation of a few to being constantly updated,
interlinked work.
Tags: Categorization of content by creating tags: simple,one-word user-
determined descriptions to facilitate searching and avoid rigid, pre-made
categories
Extensions: Powerful algorithms that leverage the Web as an application
platform as well as a documentserver
Signals: The use of RSS technology to rapidly notify users of content changes
Web 2.0 websites typically include some of the following features/techniques-
SLATES

Social media:
is an umbrella
term that
defines the
various activities
that integrate
technology,
social
interaction, and
the construction
of words,
pictures, videos
and audio.
WEB 2.0 TECHNOLOGIES:
SOCIAL MEDIA

“Creation of web content, by the
people, for the people”
In Simple Language…

 Variety of sources from where data is being
generated has also undergone a shift
 The types of data being created has changed
from structured to semi-structured to
unstructured data
Structured
Data
Semi-
Structured
Data
Unstructured
Data Need to manage broad range of data types
 Process analytic queries across numerous data
types
 Need to extract meaningful analysis from this
data has led to several technologies to gain
traction
 Examples include NoSQL databases to store
unstructured data as well as innovative
processing methods like Hadoop and massive
parallel processing (MPP)
Today 80% Of Data Existing In
Any Enterprise Is Unstructured
Data
Unstructured data from social
media has to be approached in a
non traditional manner.
UNSTRUCTURED DATA

Facebook
- User Likes and
Favorites
- Article/Video/Link
Shares
- Views
- Comments
- Location / Geospatial
Twitter
Tweet Characteristics
- Length
- Language Model
- Semantics
- Emoticons
- Location / Geospatial
Google / You Tube
- Blogs
- Comments
- Search Statistics
- Likes vs Dislikes
- Shares / Views /
Comments
IDENTIFYING UNSTRUCTURED DATA
SOURCES

“Big Data”
is data whose
scale, diversity,
and complexity
require new
architecture,
techniques,
algorithms, and
analytics to
manage it and
extract value and
hidden knowledge
from it…
BIG DATA IS…
BIG DATA =

Implication for an organization
2009 2011 2015 2020
0.8
1.9
7.9
35.0
CAGR
(2009-2020)
41.0%
Zetabytes
THE GLOBAL DATA GROWTH

>3,500
>40
>2,000
>200
>400
 Key verticals: Healthcare,
Manufacturing, Retail, Digital
Marketing
 Demand trend: High demand
of Big Data analytics
>250
 Key verticals: Telecom, Retail, Banking
 Demand trend: Still embryonic; most
organizations have wait and watch approach
 Demand trend: Current demand
appears to be limited, however,
lack of skills may drive
outsourcing of Big Data analytics
 Low awareness levels
 Key verticals: Technology, Financial services,
Oil & Gas, Utilities, Manufacturing
 Demand trend: European MNC’s are still in
the early stages of the adoption cycle
North
America
South America
Europe
Middle East
India
China
Japan
 Key verticals: Manufacturing,
Telecom, Health & Life Sciences
 Demand trend: Demand for BI
to derive operational efficiency
 Key verticals: Telecom, Bioinformatics,
Retail
 Demand trend: Industry is in nascent stage
with demand catching up, particularly in retail
>50
16
NORTH AMERICA & EUROPE DRIVES THE BIG DATA
OPPORTUNITY WITH OVER 85%
OF THE WORLD’S DATA

Tools Description
The Hadoop
Distributed
File System
(HDFS)
HDFS divides the data into smaller parts and distributes
it across the various servers/nodes
SQL Server
Integration
Service
These tools allow posts can be downloaded and loaded
into Hadoop
Apache
Flume
MapReduce
MapReduce is a process that transforms data loaded
into Hadoop into a format that can be used for analysis.
Hive
a runtime Hadoop support architecture that leverages
Structure Query Language (SQL) with the Hadoop
platform.
Jaql Jaql converts high-level queries into low-level queries
and
Zookeeper Zookeeper coordinate parallel processing across big
clusters
HBase HBase is a column-oriented database management
system that sits on top of HDFS by using a non-SQL
approach.
BIG DATA TOOLS

Variety
Veracity
Value
BIG DATA IS OFTEN DESCRIBED USING
FIVE Vs

Volume
refers to the vast amounts of
data generated every second.
We are not talking Terabytes
but Zettabytes or Brontobytes.
If we take all the data
generated in the world
between the beginning of time
and 2008, the same amount of
data will soon be generated
every minute.
This makes most data sets too
large to store and analyse
using traditional database
technology.
Variety
Veracity
Value
BIG DATA: VOLUME

BIG DATA: VELOCITY
Variety
Veracity
Value
Velocity
refers to the speed at which
new data is generated and
the speed at which data
moves around. Just think of
social media messages
going viral in seconds.
Technology allows us now to
analyse the data while it is
being generated
(sometimes referred to as
in-memory analytics),
without ever putting it into
databases.

Variety
Veracity
Value
Variety
refers to the different types
of data we can now use. In
the past we only focused on
structured data that neatly
fitted into tables or
relational databases, such
as financial data. In fact,
80% of the world’s data is
unstructured (text, images,
video, voice, etc.)
BIG DATA: VARIETY

Variety
Veracity
Value
Veracity
refers to the messiness or
trustworthiness of the data.
With many forms of big
data quality and accuracy
are less controllable (just
think of Twitter posts with
hash tags, abbreviations,
typos and colloquial speech
as well as the reliability and
accuracy of content) but
technology now allows us to
work with this type of data.
BIG DATA: VERACITY

Variety
Veracity
Value
VALUE
Then there is another V to
take into account when
looking at Big Data: Value!
Having access to big data is
no good unless we can turn
it into value.
Companies are starting to
generate amazing value
from their big data.
BIG DATA: VALUE

THE INTERSECTION OF SOCIAL MEDIA
AND BIG DATA

 Big Data is also characterized by
velocity or speed i.e. frequency of
data generation or the frequency of
data delivery
 New age communication channels
such as mobile phones, emails, social
networking has increased the rate of
information flows
Examples:
 Telcos adopting location based
marketing based on user location
sensed by mobile towers
 Satellite images can help monitor
and analyze troop movements, a
flood plane, cloud patterns, or forest
fires
 Video analysis systems could monitor
a sensitive or valuable facility,
watching for possible intruders and
alert authorities in real time
Big Data velocity enabling real
time use of data
Data
velocity
per
minute
600+
videos on
YouTube
200
million+
emails sent
2
million+
Google
search
queries
400,000+
minutes of
Skype
calling
400,000+
tweets on
Twitter
US$
300,000+
are spent
on online
shopping
700,000+
Facebook
updates
7,000+
photos on
flickr
1,500+
blog posts
3500+
ticks per
minute in
securities
trading
BIG DATA & REAL TIME USE

BIG DATA FOR SOCIAL MEDIA ANALYTICS
PROCESS MODEL

CONCEPTUAL VIEW OF FRAMEWORK FOR BIG DATA
EXTRACTION, MESSAGING AND STORE
This phase has a composite pattern that is
based on the store-and-explore and focuses on
obtaining and storing the relevant data from
sources outside our establishment.

CONCEPTUAL VIEW OF DISCUSSION TOPIC AND
OPINION ANALYSIS COMPONENT
This phase has a composite pattern that is based on
purposeful-and-predictive analytics to gain advanced
insight.

WHAT IS HADOOP?
*Hadoop is an open source
framework which is used for
storing and processing the
large scale of data sets on
large clusters of hardware.
*The specialty of Hadoop
involves in HDFS which is used
for storing data on large
commodity machines and
provides very huge bandwidth
for the cluster.

CONCEPTUAL VIEW OF APACHE HADOOP
ARCHITECTURE

CONCEPTUAL VIEW OF DATA VISUALIZATION AND
DECISION-MAKING COMPONENT
This project has a composite pattern based on
actionable-analysis with the aim of taking the next best
actions that leads to take appropriate actions by
related customers.

WHY SOCIAL NETWORK ANALYSIS
MATTERS?

SOCIAL NETWORK ANALYSIS: THE NEW
SCIENCE OF NETWORKS

Sentiment analysis…
• Analyzes people’s sentiments,
opinions, appraisals, attitudes,
evaluations, and emotions
• Towards entities such as
organizations, products,
services, individuals, topics,
issues, events, and their
attributes
• As presented online via text,
video and other means of
communication.
• These communications can fall
into three broad categories:
positive, neutral or negative.
SENTIMENT ANALYSIS

We can inquire about sentiment at
various linguistic levels:
O Words – objective, positive,
negative, neutral
O Clauses – “going out of my
mind”
O Sentences – possibly multiple
sentiments
O Documents
LEVEL OF ANALYSIS

Elections 2012 Dashboard
FILTER BY:
Facebook
Twitter
Google
Mitt Romney
RepublicanPrimary
Democratic Vote
Republican Vote
Democratic Sentiment
Republican Sentiment

TRUTHY: A SOCIAL MEDIA RESEARCH
PROJECT
Truthy is a research project to study how memes spread on social
media. A meme is a transmissible unit of information, such as a hashtag,
phrase, or link. This website highlights some of the research coming from
this effort and showcases some visualizations, tools, and data resources
demonstrating broader impacts of the project.

Social media with big data analytics

Social media with big data analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Social media with big data analytics

Similar to Social media with big data analytics (20)

More from Universiti Technologi Malaysia (UTM)

More from Universiti Technologi Malaysia (UTM) (11)

Recently uploaded

Recently uploaded (20)

Social media with big data analytics