Data Analytics
Data Analytics
Accumulation of raw data captured from various sources (i.e.
discussion boards, emails, exam logs, chat logs in e-learning
systems) can be used to identify fruitful patterns and relationships.
Exploratory visualization – uses exploratory data analytics by
capturing relationships that are perhaps unknown or at least less
formally formulated
Confirmatory visualization - theory-driven
 2012 - Dr. Yair Levy and Dr. Michelle M. Ramim – Chais 2012 Conference, February 16, 2012.
What Can Be Learned From Data
Sets?
Business Intelligence (BI) or Business Analytics (BA)
Data analytics is an emerging technique that dives into a data set
without prior set of hypotheses
The data derive meaningful trends or intriguing findings that were not
previously seen or empirically validated.
Data analytics enables quick decisions or help change policies due to
trends observed
 2012 - Dr. Yair Levy and Dr. Michelle M. Ramim – Chais 2012 Conference, February 16, 2012.
The Power of Data Analytics
Source: http://mobile.informationweek.com/80256/show/488f5c42fd3f92317e5ac29faeee033e/
 2012 - Dr. Yair Levy and Dr. Michelle M. Ramim – Chais 2012 Conference, February 16, 2012.
Data Analytics vs. Statistical Analysis
Statistical Analysis
Utilizes statistical
and/or mathematical
techniques
Used based on
theoretical foundation
Seeks to identify a
significant level to
address hypotheses
or RQs
Data Analytics
Utilizes data mining
techniques
Identifies inexplicable
or novel
relationships/trends
Seeks to visualize the
data to allow the
observation of
relationships/trends
The Foundation for Analytics
Difference between social media analysis and traditional business
intelligence is the competitive information highly available on social
media.
Historically, companies have mostly dealt with their own data, and
eventually added studies based on external data published by
research organizations and so forth. Even in the digital age, web
site analysis and digital advertising still binds most of the useful
data to its owners, and being competitive has never been an easy
task.
Competitive intelligence and other aspects of social media data
and analytics create a new context for social media and a data-
driven strategy.
Social Media Data Sources: Offline
and Online

Offline originated data - Data that has been generated
with no connection to the Internet, and then registered into
a system, which may be accessed via the Internet later.
Examples include physical retail, printed press, live
events, telephone marketing and customer support,
traditional television audience measurements, and so forth

Online originated data - Created from systems
connected to the Internet. Examples include web sites, e-
commerce, media streaming, e-mail, mobile applications,
social media, online devices, and so forth.
Within these systems we might also have
different points of data generation. The social
media post itself is one source, for example,
and the comments under that same social
media post comes from a different source. Now
think of everything that happens on social
media, with all the different types of interactions
and quick sharing of content, there are many
data sources within each social media network.
Defining Social Media Data
Within online generated data, only a portion of it
is considered truly social media data. There is
always a gray area of debate on what social
media is and what it is not. But to make it
simple, we can define social media data as
data generated within a self-named social
media platform
Data Sources in social Media
Channels

Create a profile.

Publish content.

Praise or react to content (e.g., likes, favorites, etc.).

Comment on content.

Share content.

Create groups and content only available to the group.

Send direct messages and chat with other users.

Connect to another profile (as a friend, follower, etc.).

Purchase products and perform transactions .
Estimated Data sources and Factual
Data Sources

It is important to understand when data sources
represent a fact, and when they represent a possible
fact or an estimate. Especially when going into paid
media and content promotion, we come across many
estimated, or sometimes questionable, sources.

These estimated metrics include views, impressions,
and reach, for example, which provide the number of
times a user has potentially seen a certain piece of
content or advertising.

With a large amount of high-quality data, we
can reach very high-quality estimates, to the
point where we get our estimates correct and
statistically validated every single time. That is
the point where machines take over the
decision-making process, and automation kicks
in to take care of such tasks.
Public and Private Data

Public data is what anyone can see when
navigating a social platform.

Private data is what only the owner of the
social media profile can see.

If we are looking for competitive analytics and social media
benchmarking, which is really a must-do for optimizing
social strategy, we need public data for that. Public-level
data allows us to see the performance of our competitors’
social media channels, or any of the available information
that does not belong to us.

Two points are important to remember:
1. What is public can be easily compared.
2. What is private, if it does not belong to us, can only
be estimated.

Some services, for example, offer to detect paid
posts on Facebook. Paid information is private,
so it is not available if we don’t own the data.
Paid detection offers an estimate of it based on
a machine learning process.
Data Gathering in Social Media
Analytics
Data can be gathered in two ways when it comes
to pulling human-action

Via API (application programming interface)

Web crawling or scraping
API : Application Programming
Interface

An API is a structured channel of access into an
application. It allows a programmer to see a
clear structure of the information that is stored
in the application. This structure points the
programmer straight to the data that he or she
is looking for. Facebook offers API access to its
data. A programmer can look into Facebook
and request any specific information; for
example, “total likes for a certain page post”
Web Crawling or Scraping

Everything we see on the Internet has a source code driving it—a set of
instructions for all the systems connecting and interacting with the data. A
good-looking web site full of images has a set of hidden instructions telling
the browser how to display all that information.

A programmer can tap into the source code and then crawl for any specific
information needed. Other terms are also used to describe this process, such
as scraping .Web crawling/scraping is a very fragile and unstable way of
gathering data, because when anything changes on the web site, the source
code is changed, and the programmer has to reprogram a new way to crawl
for the information.

It is also likely that crawling can bump into privacy regulations from web
sites; the owners of these web sites will not like that very much. So
whenever possible, API access is the way to go. APIs are what most
analytics platforms rely upon.

SMA-Unit-I: The Foundation for Analytics

  • 1.
    Data Analytics Data Analytics Accumulationof raw data captured from various sources (i.e. discussion boards, emails, exam logs, chat logs in e-learning systems) can be used to identify fruitful patterns and relationships. Exploratory visualization – uses exploratory data analytics by capturing relationships that are perhaps unknown or at least less formally formulated Confirmatory visualization - theory-driven
  • 2.
     2012 -Dr. Yair Levy and Dr. Michelle M. Ramim – Chais 2012 Conference, February 16, 2012. What Can Be Learned From Data Sets? Business Intelligence (BI) or Business Analytics (BA) Data analytics is an emerging technique that dives into a data set without prior set of hypotheses The data derive meaningful trends or intriguing findings that were not previously seen or empirically validated. Data analytics enables quick decisions or help change policies due to trends observed
  • 3.
     2012 -Dr. Yair Levy and Dr. Michelle M. Ramim – Chais 2012 Conference, February 16, 2012. The Power of Data Analytics Source: http://mobile.informationweek.com/80256/show/488f5c42fd3f92317e5ac29faeee033e/
  • 4.
     2012 -Dr. Yair Levy and Dr. Michelle M. Ramim – Chais 2012 Conference, February 16, 2012. Data Analytics vs. Statistical Analysis Statistical Analysis Utilizes statistical and/or mathematical techniques Used based on theoretical foundation Seeks to identify a significant level to address hypotheses or RQs Data Analytics Utilizes data mining techniques Identifies inexplicable or novel relationships/trends Seeks to visualize the data to allow the observation of relationships/trends
  • 5.
    The Foundation forAnalytics Difference between social media analysis and traditional business intelligence is the competitive information highly available on social media. Historically, companies have mostly dealt with their own data, and eventually added studies based on external data published by research organizations and so forth. Even in the digital age, web site analysis and digital advertising still binds most of the useful data to its owners, and being competitive has never been an easy task. Competitive intelligence and other aspects of social media data and analytics create a new context for social media and a data- driven strategy.
  • 6.
    Social Media DataSources: Offline and Online  Offline originated data - Data that has been generated with no connection to the Internet, and then registered into a system, which may be accessed via the Internet later. Examples include physical retail, printed press, live events, telephone marketing and customer support, traditional television audience measurements, and so forth  Online originated data - Created from systems connected to the Internet. Examples include web sites, e- commerce, media streaming, e-mail, mobile applications, social media, online devices, and so forth.
  • 7.
    Within these systemswe might also have different points of data generation. The social media post itself is one source, for example, and the comments under that same social media post comes from a different source. Now think of everything that happens on social media, with all the different types of interactions and quick sharing of content, there are many data sources within each social media network.
  • 8.
    Defining Social MediaData Within online generated data, only a portion of it is considered truly social media data. There is always a gray area of debate on what social media is and what it is not. But to make it simple, we can define social media data as data generated within a self-named social media platform
  • 9.
    Data Sources insocial Media Channels  Create a profile.  Publish content.  Praise or react to content (e.g., likes, favorites, etc.).  Comment on content.  Share content.  Create groups and content only available to the group.  Send direct messages and chat with other users.  Connect to another profile (as a friend, follower, etc.).  Purchase products and perform transactions .
  • 10.
    Estimated Data sourcesand Factual Data Sources  It is important to understand when data sources represent a fact, and when they represent a possible fact or an estimate. Especially when going into paid media and content promotion, we come across many estimated, or sometimes questionable, sources.  These estimated metrics include views, impressions, and reach, for example, which provide the number of times a user has potentially seen a certain piece of content or advertising.
  • 11.
     With a largeamount of high-quality data, we can reach very high-quality estimates, to the point where we get our estimates correct and statistically validated every single time. That is the point where machines take over the decision-making process, and automation kicks in to take care of such tasks.
  • 12.
    Public and PrivateData  Public data is what anyone can see when navigating a social platform.  Private data is what only the owner of the social media profile can see.
  • 13.
     If we arelooking for competitive analytics and social media benchmarking, which is really a must-do for optimizing social strategy, we need public data for that. Public-level data allows us to see the performance of our competitors’ social media channels, or any of the available information that does not belong to us.  Two points are important to remember: 1. What is public can be easily compared. 2. What is private, if it does not belong to us, can only be estimated.
  • 14.
     Some services, forexample, offer to detect paid posts on Facebook. Paid information is private, so it is not available if we don’t own the data. Paid detection offers an estimate of it based on a machine learning process.
  • 15.
    Data Gathering inSocial Media Analytics Data can be gathered in two ways when it comes to pulling human-action  Via API (application programming interface)  Web crawling or scraping
  • 16.
    API : ApplicationProgramming Interface  An API is a structured channel of access into an application. It allows a programmer to see a clear structure of the information that is stored in the application. This structure points the programmer straight to the data that he or she is looking for. Facebook offers API access to its data. A programmer can look into Facebook and request any specific information; for example, “total likes for a certain page post”
  • 17.
    Web Crawling orScraping  Everything we see on the Internet has a source code driving it—a set of instructions for all the systems connecting and interacting with the data. A good-looking web site full of images has a set of hidden instructions telling the browser how to display all that information.  A programmer can tap into the source code and then crawl for any specific information needed. Other terms are also used to describe this process, such as scraping .Web crawling/scraping is a very fragile and unstable way of gathering data, because when anything changes on the web site, the source code is changed, and the programmer has to reprogram a new way to crawl for the information.  It is also likely that crawling can bump into privacy regulations from web sites; the owners of these web sites will not like that very much. So whenever possible, API access is the way to go. APIs are what most analytics platforms rely upon.