SlideShare a Scribd company logo
1 of 19
Download to read offline
Analysis of the article
Big Data hype (or reality)
by Gregory Piatetsky-Shapiro
Gregory Piatetsky-Shapiro
Gregory I. Piatetsky-Shapiro is a
data scientist and the co-founder
of the KDD conference and the
Association for Computing Machinery SIGKDD association
for Knowledge Discovery and Data Mining.
Big data is a term that describes the large volume of
data – both structured and unstructured – that
inundates a business on a day-to-day basis. But it’s not
the amount of data that’s important. It’s what
organizations do with the data that matters. Big data
can be analyzed for insights that lead to better
decisions and strategic business moves.
Big data offers unprecedented awareness of
phenomena — particularly of consumers’
actions and attitudes
Three areas where
better prediction of
consumer behavior
would clearly be
valuable.
1) Film Ratings
2) Churn Prediction
3) Web advertising
response
Case #1: Film Ratings
“Film ratings are
critical for a
company that
thrives when people
consume more
content.”
This is a prediction
challenge
The Netflix launched a competition to improve on the
Cinematch algorithm it had developed over many
years. It released a record-large (for 2007) dataset,
with about 480,000 anonymized users, 17,770
movies, and user/movie ratings ranging from 1 to 5
(stars).
The error of Netflix’s own algorithm was about 0.95 (using a
root-mean-square error), meaning that its predictions tended to
be off by almost a full “star.” The Netflix Prize of $1 million
would go to the first algorithm to reduce that error by just
10%, to about 0.86.
It took about three years before the BellKor’s Pragmatic Chaos
team managed to win the prize with a score of 0.8567 RMSE.
The winning algorithm was a very complex ensemble of many
different approaches — so complex that it was never
implemented by Netflix.
Case #2: Churn Prediciton
If predictive analytics
drawing on big data
could accurately point
to who in particular
was about to jump
ship, direct marketing
dollars could be
efficiently deployed to
intervene, perhaps by
offering those wavering
customers new benefits
or discounts.
Lift of a target group identified by churn
analysis reflects the higher proportion of customers
who actually drop the service. when compared with
the population of customers as a whole. If,
typically, 2 percent of customers drop the service
per month, and, within the group identified as
“churners,” 8 percent drop the service, the “lift”
is 4.
Case #3: Web advertising response
Challenge of predicting
the click-thru rate (CTR
%) of an online ad —
clearly a valuable thing
to get right, given the
sums changing hands in
that business. We should
exclude search
advertising, where the ad
is always related to user
intent, and focus on the
rates for display ads.
The average CTR% for display ads has been reported
as low as 0.1-0.2% with researchers reporting up to
seven-fold improvements from 0.2% amounts to 1.4%
“Today’s  
  best 
  targeted 
  advertising  
  is ignored    
  98.6% of      
  the time.”
Relevant insights for a manager
INSiGHT #1
Randomness inherent in human behavior is the limiting
factor to consumer modeling success.
When an activity is driven by consumers’ whims, no
amount of ingenuity can produce the ability to know what
will happen.
Predictive analytics can figure out
how to land on Mars, but not who
will buy a Mars bar.
Big data analytics can improve predictions, but
the biggest effects of big data will be in
creating wholly new areas.
INSiGHT #2
The success of Facebook, Twitter, and LinkedIn
social networks depends on their scale, and big
data tools and analytics will be required for them
to keep growing.
“If you’re counting on Big Data to make people much
more predictable, you’re expecting too much.”
Thank You

More Related Content

Recently uploaded

Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 

Recently uploaded (16)

Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 

Big Data Hype (or Reality)

  • 1. Analysis of the article Big Data hype (or reality) by Gregory Piatetsky-Shapiro
  • 2. Gregory Piatetsky-Shapiro Gregory I. Piatetsky-Shapiro is a data scientist and the co-founder of the KDD conference and the Association for Computing Machinery SIGKDD association for Knowledge Discovery and Data Mining.
  • 3.
  • 4. Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
  • 5. Big data offers unprecedented awareness of phenomena — particularly of consumers’ actions and attitudes Three areas where better prediction of consumer behavior would clearly be valuable. 1) Film Ratings 2) Churn Prediction 3) Web advertising response
  • 6. Case #1: Film Ratings “Film ratings are critical for a company that thrives when people consume more content.” This is a prediction challenge
  • 7. The Netflix launched a competition to improve on the Cinematch algorithm it had developed over many years. It released a record-large (for 2007) dataset, with about 480,000 anonymized users, 17,770 movies, and user/movie ratings ranging from 1 to 5 (stars).
  • 8. The error of Netflix’s own algorithm was about 0.95 (using a root-mean-square error), meaning that its predictions tended to be off by almost a full “star.” The Netflix Prize of $1 million would go to the first algorithm to reduce that error by just 10%, to about 0.86. It took about three years before the BellKor’s Pragmatic Chaos team managed to win the prize with a score of 0.8567 RMSE. The winning algorithm was a very complex ensemble of many different approaches — so complex that it was never implemented by Netflix.
  • 9. Case #2: Churn Prediciton If predictive analytics drawing on big data could accurately point to who in particular was about to jump ship, direct marketing dollars could be efficiently deployed to intervene, perhaps by offering those wavering customers new benefits or discounts.
  • 10. Lift of a target group identified by churn analysis reflects the higher proportion of customers who actually drop the service. when compared with the population of customers as a whole. If, typically, 2 percent of customers drop the service per month, and, within the group identified as “churners,” 8 percent drop the service, the “lift” is 4.
  • 11. Case #3: Web advertising response Challenge of predicting the click-thru rate (CTR %) of an online ad — clearly a valuable thing to get right, given the sums changing hands in that business. We should exclude search advertising, where the ad is always related to user intent, and focus on the rates for display ads.
  • 12. The average CTR% for display ads has been reported as low as 0.1-0.2% with researchers reporting up to seven-fold improvements from 0.2% amounts to 1.4% “Today’s     best    targeted    advertising     is ignored       98.6% of         the time.”
  • 14. INSiGHT #1 Randomness inherent in human behavior is the limiting factor to consumer modeling success. When an activity is driven by consumers’ whims, no amount of ingenuity can produce the ability to know what will happen.
  • 15. Predictive analytics can figure out how to land on Mars, but not who will buy a Mars bar.
  • 16. Big data analytics can improve predictions, but the biggest effects of big data will be in creating wholly new areas. INSiGHT #2
  • 17. The success of Facebook, Twitter, and LinkedIn social networks depends on their scale, and big data tools and analytics will be required for them to keep growing.
  • 18. “If you’re counting on Big Data to make people much more predictable, you’re expecting too much.”