SlideShare a Scribd company logo
1 of 24
Download to read offline
Big Data
me@zynick.com
26th Dec 2013
Google Flu Trends Prediction (2008)
●

Epidemiologists use early detection of disease outbreak to reduce number
of people affected

●

CDC (Centers of Disease Control and Prevention) collects Influenza-like
Illness (ILI) from its surveillance network and from its surveillance network
and publishes weekly
Google Flu Trends Prediction (2008)
●
Hurricane in 2004
Hurricane in 2004

Result: 7 times their normal sales rate!
Grammar Checking
(Machine Learning) Algorithms
● Improve algorithm? Or pump in more data
● Testing
○ 1 million, 10 million, 100 million, 1 billion data

● Result
○ Worst algorithm perform better when it has billion
data
■ Accuracy rate from 75% to 95%
○ Best algorithm perform worst when it has billion data
■ Accuracy rate from 85% to 94%
Farecast.com (2006)
● Flight Price Prediction
○ Model had no understanding of why, only what.

●
●
●
●

Accuracy of 74.5%
Average $50 saving per Ticket
$10 million in potential customer savings
Acquired by Microsoft
○ Bing.com/travel

http://www.prnewswire.com/news-releases/farecast-launches-new-tools-to-help-savvy-travelers-catchelusive-airfare-price-drops-this-summer-58165652.html
Decide.com (2011)
● Analyzing 4 Millions Product Using 25 Billion
Price Observation
○ Identifies data that people had never been able to
‘see’ before, i.e. prices might temporarily increase
for older models once new ones are introduced

●
●
●
●

Price prediction 77% accurate
Average savings $87 per product
Total savings $72 million+
Acquired by Ebay

[1]http://techcrunch.com/2012/05/03/decide-com-brings-its-price-comparisons-to-ipad-reveals-plansto-expand-to-household-goods-cars/
[2]http://newbooksinbrief.com/2013/03/21/31-a-summary-of-big-data-a-revolution-that-will-transformhow-we-live-work-and-think-by-viktor-mayer-schonberger-and-kenneth-cukier/
UPS
● Use geo local data in multiple ways
○ Sensors, wireless modules, gps
○ Predict engine trouble
○ Know the truck whereabouts (in case of delays)

● Monitor employees
● Scrutinize itenary to optimie route
● Result (2011):
○ 30m miles, 3m gallon of fuel saving

● Safety efficiency, few turns, which tends to
lead to accidents, waste time, consume
more fuels when struck in jam
Pregnancy Prediction
● Shopping behavior is about to change explore for new brands and loyalty
● Baby gift registry, lotions (@ 3rd month),
supplement (magnesium, calcium, zinc, etc)
● Pregnancy Prediction Score
● Sends coupon

* http://icebreakerconsulting.com/target-predicts-pregnancy-with-big-data
Geo Local Data
● Targeted advertising on where he is located,
or where he is to go
● Aggregated to reveal trend
● Detects traffic jam without seeing the car number speed of smartphone travel in
highway
● Estimate how many protesters turn out at a
demonstration
Data Reuse (Secondary Usage)
● Google Street View
○ Primary Usage: Street View
○ Secondary Usage: Collecting Geo Local Data, Open
Wifi Connection to improve GPS Location

● Amazon
○ Primary Usage: Sales
○ Secondary Usage: Book Recommendation
Values of Big Data
●
●
●
●

Data can be grabbed easily and cheaply
What > Why (corrrelation vs causation)
Traditional Sampling (n), Big Data (n=ALL)
Quantification > Qualification
Values of Big Data
● Data Driven
○ Less Bias
○ More Accurate
○ Faster Result

● Pattern Prediction
○ Saves lives
○ Predict problem and correct them before the user
realize there were something wrong
Big Data 3 Major Shift
● Ability to analyze vast amount of data
about a topic rather than settle for a smaller
set
● Willingness to embrace data of messiness
rather than privilege exactitude
● Growing respect correlation vs continue
quest of causality
Correlation vs Causation
● Cause → Effect
● Correlation → Effect
○ Correlation → Cause? Optional

● Chris Anderson
○ Big Data make Science Method Obsolete
○ “With enough data, the numbers speak for
themselves”

* http://www.wired.com/science/discoveries/magazine/16-07/pb_theory
Is Correlation Good Enough?
It Depends.
“For many everyday needs, knowing what not why is good
enough.” The book is full of such examples from making
better diagnostic decisions when caring for premature
babies to which flavor Pop-Tarts to stock at the front of the
Walmart store before a hurricane. Big data can help answer
these questions, but they never required “knowing why.”
Big data analysis can be about correlations OR causation—
it all depends, as it has always been, on what question we
are asking, what problem we are solving, and what goal
we are trying to achieve.
Is Correlation Good Enough?
“If millions of electronic medical records reveal that cancer
sufferers who take a certain combination of aspirin and
orange juice see their disease go into remission, then the
exact cause for the improvement in health may be less
important than the fact that they lived. Likewise, if we
can save money by knowing the best time to buy a plane
ticket without understanding the method behind airfare
madness, that’s good enough.”
Risk (The Dark Side of Big Data)
● Privacy Invasion
○ Viewing Data in a Lower Level
○ NSA, GCHQ
○ Dangerous when falls into the wrong hands

● Minority Report (2002)
○ “If we hold people responsible for predicted future
acts, ones they may never commit, we also deny
that humans have a capacity for moral choice.”
Embracing Big Data
● Data
● Skills
● Ideas (Big Data Mindset)
Things to Aware Of
● Data Validity
○ books you read 10 years ago may not be applicable
for amazon recommendation anymore
Questions?
Read the book.
End.
me@zynick.com

More Related Content

Similar to Big Data

Big Data Berlin 2019 | Data Research vs Data Privacy: The New Battlefield in ...
Big Data Berlin 2019 | Data Research vs Data Privacy: The New Battlefield in ...Big Data Berlin 2019 | Data Research vs Data Privacy: The New Battlefield in ...
Big Data Berlin 2019 | Data Research vs Data Privacy: The New Battlefield in ...Dataconomy Media
 
Big Data in Disease Management
Big Data in Disease ManagementBig Data in Disease Management
Big Data in Disease ManagementInterpretOmics
 
SDNC13 -Day1- The Danger of Big Data by Kerry Bodine
SDNC13 -Day1- The Danger of Big Data by Kerry BodineSDNC13 -Day1- The Danger of Big Data by Kerry Bodine
SDNC13 -Day1- The Danger of Big Data by Kerry BodineService Design Network
 
Big data hype (and reality)
Big data hype (and reality)Big data hype (and reality)
Big data hype (and reality)Shesha
 
Webinar: Analytics as Your Business Edge
Webinar: Analytics as Your Business EdgeWebinar: Analytics as Your Business Edge
Webinar: Analytics as Your Business EdgeWSO2
 
Revolutionizing your Business with AI (AUC VLabs).pdf
Revolutionizing your Business with AI (AUC VLabs).pdfRevolutionizing your Business with AI (AUC VLabs).pdf
Revolutionizing your Business with AI (AUC VLabs).pdfOmar Maher
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyClaudiu Popa
 
Data analytics and the power of creating social impact
Data analytics and the power of creating social impactData analytics and the power of creating social impact
Data analytics and the power of creating social impactTA Telecom
 
Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Gramener
 
GSAMPerspectives7-BigData-Edition
GSAMPerspectives7-BigData-EditionGSAMPerspectives7-BigData-Edition
GSAMPerspectives7-BigData-EditionGang Li
 
AI for humans - the future of your digital self
AI for humans - the future of your digital selfAI for humans - the future of your digital self
AI for humans - the future of your digital selfSpeck&Tech
 
10 ways big data is used in the real world
10 ways big data is used in the real world10 ways big data is used in the real world
10 ways big data is used in the real worldKDR Talent Solutions
 
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020Gramener
 
Measures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessMeasures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessManojit Nandi
 
Big Data Analytics - GTech Seminar
Big Data Analytics - GTech SeminarBig Data Analytics - GTech Seminar
Big Data Analytics - GTech SeminarBijilash Babu
 
Big Data Analytics - The New Cold War
Big Data Analytics - The New Cold WarBig Data Analytics - The New Cold War
Big Data Analytics - The New Cold WarKunal Dutta
 
Predicting the Future of Predictive Analytics in Healthcare
Predicting the Future of Predictive Analytics in HealthcarePredicting the Future of Predictive Analytics in Healthcare
Predicting the Future of Predictive Analytics in HealthcareDale Sanders
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big DataLuca Naso
 

Similar to Big Data (20)

Big Data Berlin 2019 | Data Research vs Data Privacy: The New Battlefield in ...
Big Data Berlin 2019 | Data Research vs Data Privacy: The New Battlefield in ...Big Data Berlin 2019 | Data Research vs Data Privacy: The New Battlefield in ...
Big Data Berlin 2019 | Data Research vs Data Privacy: The New Battlefield in ...
 
Big Data in Disease Management
Big Data in Disease ManagementBig Data in Disease Management
Big Data in Disease Management
 
SDNC13 -Day1- The Danger of Big Data by Kerry Bodine
SDNC13 -Day1- The Danger of Big Data by Kerry BodineSDNC13 -Day1- The Danger of Big Data by Kerry Bodine
SDNC13 -Day1- The Danger of Big Data by Kerry Bodine
 
Big data hype (and reality)
Big data hype (and reality)Big data hype (and reality)
Big data hype (and reality)
 
U4 l03 Checking your Assumptions
U4 l03 Checking your AssumptionsU4 l03 Checking your Assumptions
U4 l03 Checking your Assumptions
 
Webinar: Analytics as Your Business Edge
Webinar: Analytics as Your Business EdgeWebinar: Analytics as Your Business Edge
Webinar: Analytics as Your Business Edge
 
Revolutionizing your Business with AI (AUC VLabs).pdf
Revolutionizing your Business with AI (AUC VLabs).pdfRevolutionizing your Business with AI (AUC VLabs).pdf
Revolutionizing your Business with AI (AUC VLabs).pdf
 
The REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on PrivacyThe REAL Impact of Big Data on Privacy
The REAL Impact of Big Data on Privacy
 
Data analytics and the power of creating social impact
Data analytics and the power of creating social impactData analytics and the power of creating social impact
Data analytics and the power of creating social impact
 
Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics Data Storytelling - Game changer for Analytics
Data Storytelling - Game changer for Analytics
 
GSAMPerspectives7-BigData-Edition
GSAMPerspectives7-BigData-EditionGSAMPerspectives7-BigData-Edition
GSAMPerspectives7-BigData-Edition
 
AI for humans - the future of your digital self
AI for humans - the future of your digital selfAI for humans - the future of your digital self
AI for humans - the future of your digital self
 
10 ways big data is used in the real world
10 ways big data is used in the real world10 ways big data is used in the real world
10 ways big data is used in the real world
 
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
Storytelling for analytics | Naveen Gattu | CDAO Apex 2020
 
Measures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairnessMeasures and mismeasures of algorithmic fairness
Measures and mismeasures of algorithmic fairness
 
Big Data Analytics - GTech Seminar
Big Data Analytics - GTech SeminarBig Data Analytics - GTech Seminar
Big Data Analytics - GTech Seminar
 
Big Data Analytics - The New Cold War
Big Data Analytics - The New Cold WarBig Data Analytics - The New Cold War
Big Data Analytics - The New Cold War
 
Big Data-Job 2
Big Data-Job 2Big Data-Job 2
Big Data-Job 2
 
Predicting the Future of Predictive Analytics in Healthcare
Predicting the Future of Predictive Analytics in HealthcarePredicting the Future of Predictive Analytics in Healthcare
Predicting the Future of Predictive Analytics in Healthcare
 
The What, Why and How of Big Data
The What, Why and How of Big DataThe What, Why and How of Big Data
The What, Why and How of Big Data
 

Recently uploaded

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...SOFTTECHHUB
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseWSO2
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 

Recently uploaded (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 

Big Data

  • 2.
  • 3. Google Flu Trends Prediction (2008) ● Epidemiologists use early detection of disease outbreak to reduce number of people affected ● CDC (Centers of Disease Control and Prevention) collects Influenza-like Illness (ILI) from its surveillance network and from its surveillance network and publishes weekly
  • 4. Google Flu Trends Prediction (2008) ●
  • 6. Hurricane in 2004 Result: 7 times their normal sales rate!
  • 7. Grammar Checking (Machine Learning) Algorithms ● Improve algorithm? Or pump in more data ● Testing ○ 1 million, 10 million, 100 million, 1 billion data ● Result ○ Worst algorithm perform better when it has billion data ■ Accuracy rate from 75% to 95% ○ Best algorithm perform worst when it has billion data ■ Accuracy rate from 85% to 94%
  • 8. Farecast.com (2006) ● Flight Price Prediction ○ Model had no understanding of why, only what. ● ● ● ● Accuracy of 74.5% Average $50 saving per Ticket $10 million in potential customer savings Acquired by Microsoft ○ Bing.com/travel http://www.prnewswire.com/news-releases/farecast-launches-new-tools-to-help-savvy-travelers-catchelusive-airfare-price-drops-this-summer-58165652.html
  • 9. Decide.com (2011) ● Analyzing 4 Millions Product Using 25 Billion Price Observation ○ Identifies data that people had never been able to ‘see’ before, i.e. prices might temporarily increase for older models once new ones are introduced ● ● ● ● Price prediction 77% accurate Average savings $87 per product Total savings $72 million+ Acquired by Ebay [1]http://techcrunch.com/2012/05/03/decide-com-brings-its-price-comparisons-to-ipad-reveals-plansto-expand-to-household-goods-cars/ [2]http://newbooksinbrief.com/2013/03/21/31-a-summary-of-big-data-a-revolution-that-will-transformhow-we-live-work-and-think-by-viktor-mayer-schonberger-and-kenneth-cukier/
  • 10. UPS ● Use geo local data in multiple ways ○ Sensors, wireless modules, gps ○ Predict engine trouble ○ Know the truck whereabouts (in case of delays) ● Monitor employees ● Scrutinize itenary to optimie route ● Result (2011): ○ 30m miles, 3m gallon of fuel saving ● Safety efficiency, few turns, which tends to lead to accidents, waste time, consume more fuels when struck in jam
  • 11. Pregnancy Prediction ● Shopping behavior is about to change explore for new brands and loyalty ● Baby gift registry, lotions (@ 3rd month), supplement (magnesium, calcium, zinc, etc) ● Pregnancy Prediction Score ● Sends coupon * http://icebreakerconsulting.com/target-predicts-pregnancy-with-big-data
  • 12. Geo Local Data ● Targeted advertising on where he is located, or where he is to go ● Aggregated to reveal trend ● Detects traffic jam without seeing the car number speed of smartphone travel in highway ● Estimate how many protesters turn out at a demonstration
  • 13. Data Reuse (Secondary Usage) ● Google Street View ○ Primary Usage: Street View ○ Secondary Usage: Collecting Geo Local Data, Open Wifi Connection to improve GPS Location ● Amazon ○ Primary Usage: Sales ○ Secondary Usage: Book Recommendation
  • 14. Values of Big Data ● ● ● ● Data can be grabbed easily and cheaply What > Why (corrrelation vs causation) Traditional Sampling (n), Big Data (n=ALL) Quantification > Qualification
  • 15. Values of Big Data ● Data Driven ○ Less Bias ○ More Accurate ○ Faster Result ● Pattern Prediction ○ Saves lives ○ Predict problem and correct them before the user realize there were something wrong
  • 16. Big Data 3 Major Shift ● Ability to analyze vast amount of data about a topic rather than settle for a smaller set ● Willingness to embrace data of messiness rather than privilege exactitude ● Growing respect correlation vs continue quest of causality
  • 17. Correlation vs Causation ● Cause → Effect ● Correlation → Effect ○ Correlation → Cause? Optional ● Chris Anderson ○ Big Data make Science Method Obsolete ○ “With enough data, the numbers speak for themselves” * http://www.wired.com/science/discoveries/magazine/16-07/pb_theory
  • 18. Is Correlation Good Enough? It Depends. “For many everyday needs, knowing what not why is good enough.” The book is full of such examples from making better diagnostic decisions when caring for premature babies to which flavor Pop-Tarts to stock at the front of the Walmart store before a hurricane. Big data can help answer these questions, but they never required “knowing why.” Big data analysis can be about correlations OR causation— it all depends, as it has always been, on what question we are asking, what problem we are solving, and what goal we are trying to achieve.
  • 19. Is Correlation Good Enough? “If millions of electronic medical records reveal that cancer sufferers who take a certain combination of aspirin and orange juice see their disease go into remission, then the exact cause for the improvement in health may be less important than the fact that they lived. Likewise, if we can save money by knowing the best time to buy a plane ticket without understanding the method behind airfare madness, that’s good enough.”
  • 20. Risk (The Dark Side of Big Data) ● Privacy Invasion ○ Viewing Data in a Lower Level ○ NSA, GCHQ ○ Dangerous when falls into the wrong hands ● Minority Report (2002) ○ “If we hold people responsible for predicted future acts, ones they may never commit, we also deny that humans have a capacity for moral choice.”
  • 21. Embracing Big Data ● Data ● Skills ● Ideas (Big Data Mindset)
  • 22. Things to Aware Of ● Data Validity ○ books you read 10 years ago may not be applicable for amazon recommendation anymore