3. Google Flu Trends Prediction (2008)
●
Epidemiologists use early detection of disease outbreak to reduce number
of people affected
●
CDC (Centers of Disease Control and Prevention) collects Influenza-like
Illness (ILI) from its surveillance network and from its surveillance network
and publishes weekly
7. Grammar Checking
(Machine Learning) Algorithms
● Improve algorithm? Or pump in more data
● Testing
○ 1 million, 10 million, 100 million, 1 billion data
● Result
○ Worst algorithm perform better when it has billion
data
■ Accuracy rate from 75% to 95%
○ Best algorithm perform worst when it has billion data
■ Accuracy rate from 85% to 94%
8. Farecast.com (2006)
● Flight Price Prediction
○ Model had no understanding of why, only what.
●
●
●
●
Accuracy of 74.5%
Average $50 saving per Ticket
$10 million in potential customer savings
Acquired by Microsoft
○ Bing.com/travel
http://www.prnewswire.com/news-releases/farecast-launches-new-tools-to-help-savvy-travelers-catchelusive-airfare-price-drops-this-summer-58165652.html
9. Decide.com (2011)
● Analyzing 4 Millions Product Using 25 Billion
Price Observation
○ Identifies data that people had never been able to
‘see’ before, i.e. prices might temporarily increase
for older models once new ones are introduced
●
●
●
●
Price prediction 77% accurate
Average savings $87 per product
Total savings $72 million+
Acquired by Ebay
[1]http://techcrunch.com/2012/05/03/decide-com-brings-its-price-comparisons-to-ipad-reveals-plansto-expand-to-household-goods-cars/
[2]http://newbooksinbrief.com/2013/03/21/31-a-summary-of-big-data-a-revolution-that-will-transformhow-we-live-work-and-think-by-viktor-mayer-schonberger-and-kenneth-cukier/
10. UPS
● Use geo local data in multiple ways
○ Sensors, wireless modules, gps
○ Predict engine trouble
○ Know the truck whereabouts (in case of delays)
● Monitor employees
● Scrutinize itenary to optimie route
● Result (2011):
○ 30m miles, 3m gallon of fuel saving
● Safety efficiency, few turns, which tends to
lead to accidents, waste time, consume
more fuels when struck in jam
11. Pregnancy Prediction
● Shopping behavior is about to change explore for new brands and loyalty
● Baby gift registry, lotions (@ 3rd month),
supplement (magnesium, calcium, zinc, etc)
● Pregnancy Prediction Score
● Sends coupon
* http://icebreakerconsulting.com/target-predicts-pregnancy-with-big-data
12. Geo Local Data
● Targeted advertising on where he is located,
or where he is to go
● Aggregated to reveal trend
● Detects traffic jam without seeing the car number speed of smartphone travel in
highway
● Estimate how many protesters turn out at a
demonstration
13. Data Reuse (Secondary Usage)
● Google Street View
○ Primary Usage: Street View
○ Secondary Usage: Collecting Geo Local Data, Open
Wifi Connection to improve GPS Location
● Amazon
○ Primary Usage: Sales
○ Secondary Usage: Book Recommendation
14. Values of Big Data
●
●
●
●
Data can be grabbed easily and cheaply
What > Why (corrrelation vs causation)
Traditional Sampling (n), Big Data (n=ALL)
Quantification > Qualification
15. Values of Big Data
● Data Driven
○ Less Bias
○ More Accurate
○ Faster Result
● Pattern Prediction
○ Saves lives
○ Predict problem and correct them before the user
realize there were something wrong
16. Big Data 3 Major Shift
● Ability to analyze vast amount of data
about a topic rather than settle for a smaller
set
● Willingness to embrace data of messiness
rather than privilege exactitude
● Growing respect correlation vs continue
quest of causality
17. Correlation vs Causation
● Cause → Effect
● Correlation → Effect
○ Correlation → Cause? Optional
● Chris Anderson
○ Big Data make Science Method Obsolete
○ “With enough data, the numbers speak for
themselves”
* http://www.wired.com/science/discoveries/magazine/16-07/pb_theory
18. Is Correlation Good Enough?
It Depends.
“For many everyday needs, knowing what not why is good
enough.” The book is full of such examples from making
better diagnostic decisions when caring for premature
babies to which flavor Pop-Tarts to stock at the front of the
Walmart store before a hurricane. Big data can help answer
these questions, but they never required “knowing why.”
Big data analysis can be about correlations OR causation—
it all depends, as it has always been, on what question we
are asking, what problem we are solving, and what goal
we are trying to achieve.
19. Is Correlation Good Enough?
“If millions of electronic medical records reveal that cancer
sufferers who take a certain combination of aspirin and
orange juice see their disease go into remission, then the
exact cause for the improvement in health may be less
important than the fact that they lived. Likewise, if we
can save money by knowing the best time to buy a plane
ticket without understanding the method behind airfare
madness, that’s good enough.”
20. Risk (The Dark Side of Big Data)
● Privacy Invasion
○ Viewing Data in a Lower Level
○ NSA, GCHQ
○ Dangerous when falls into the wrong hands
● Minority Report (2002)
○ “If we hold people responsible for predicted future
acts, ones they may never commit, we also deny
that humans have a capacity for moral choice.”