B I G D A T A L I T T L E
D E V I C E S
W H A T I T W I L L D O T O U S A N D F O R U S
W H A T I S B I G D A T A ?
0 - 2 0 0 3
5 exabytes
2 0 1 1
2.5 exabytes per day
P E R S P E C T I V E S
1MB 1GB 1TB 2PB 5EB
W H E R E ’ S I T C O M I N G F R O M ?
Source: domo.com 2012
W H A T D O E S I T L O O K L I K E ?
D E F I N I T I O N S
• Big Data: unstructured data, don’t know what questions are yet
• Business Intelligence: structured data, know what the questions
you want answered
• Statistics: structured data, not realtime, no action taken as a
result
• Machine Learning: creation of algorithms and applying them to
data sets in an attempt to learn from data
• Predictive Analytics: extracting existing data to predict trends
W H Y N O W ?
• 2003: Doug Cutting & Mike Cafarella, Nutch
• 2004:Google Labs: Map Reduce
• 2006:Doug Cutting moves to Yahoo and creates Hadoop
• 2008: Yahoo open sources Hadoop, Apache Software Foun
• 2009: Matei Zaharia starts Spark at UC Berkley
• 2013: Spark open sourced under Apache
M A P R E D U C E
Traditional / Sequential
Map
Reduce
S P A R K
x 100
Map
Reduce
C A S E S
W H A T I T W I L L D O T O U S
S E C U R I T Y - P R I V A C Y
N S A P R I S M
P R O F I L I N G
V U L N E R A B I L I T
Y
• Target
• Home Depot
• Michaels
• Blue Cross Blue Shield
• Sony Entertainment
S O C I E T Y
C O M M E R C E
A M A Z O N D A S H
C O M M E R C E
A M A Z O N
C A S E S
W H A T I T W I L L D O F O R U S
S P O R T S
S A B E R M E T R I C S ( M O N E Y B A L L )
95%
5%
P R O D U C T I V I T Y
G O O G L E N O W
P O L I T I C S
O B A M A C A M P A I G N 2 0 1 2
S C I E N C E
M O N T E R E Y B A Y A Q U A R I U M R E S E A R C H I N S T I T U T E
H E A L T H
A P P L E R E S E A R C H K I T
xt, Stanford says that it would normally take a national year-long effort to get that kind of scale. The flood of dat
M O R E R E A D I N G
• http://www.domo.com/blog/2014/04/data-never-sleeps-2-0/
• http://www.redorbit.com/education/reference_library/general-2/history-of/1113190638/the-history-of-
mobile-phone-technology/
• http://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/
• http://www.wired.com/2015/04/robots-roam-earths-imperiled-oceans/?mbid=nl_041315
• http://www.allbusiness.com/what-does-your-supermarket-know-about-you-15611312-1.html
• http://www.geekwire.com/2015/baseball-analytics-mystery-mlb-team-uses-a-cray-supercomputer-to-
crunch-data/
• http://www.geekwire.com/2015/this-big-data-startup-just-raised-cash-to-analyze-driver-behavior-creating-
safety-scores-for-individual-
motorists/?utm_source=GeekWire+Daily+Digest&utm_campaign=20eb1892b3-daily-digest-
email&utm_medium=email&utm_term=04e93fc7dfd-20eb1892b3-
233387065&mc_cid=20eb1892b3&mc_eid=7b61e5049a
• http://www.newyorker.com/culture/culture-desk/the-horror-of-amazons-new-dash-button
• https://www.amazon.com/oc/dash-button
• http://harvardmagazine.com/2014/03/why-big-data-is-a-big-deal http://www.businessinsider.com/big-data-
is-growing-thanks-to-mobile-2013-1http://venturebeat.com/2015/04/03/how-microsofts-using-big-data-to-
predict-traffic-jams-up-to-an-hour-in-advance/
• http://www.engadget.com/2015/04/13/ibm-watson-health-
cloud/?utm_source=Feed_Classic_Full&utm_medium=feed&utm_campaign=Engadget&?ncid=rss_full
?

Big Data and Small Devices: What will it do for us and to us

  • 1.
    B I GD A T A L I T T L E D E V I C E S W H A T I T W I L L D O T O U S A N D F O R U S
  • 2.
    W H AT I S B I G D A T A ? 0 - 2 0 0 3 5 exabytes 2 0 1 1 2.5 exabytes per day
  • 3.
    P E RS P E C T I V E S 1MB 1GB 1TB 2PB 5EB
  • 4.
    W H ER E ’ S I T C O M I N G F R O M ? Source: domo.com 2012
  • 5.
    W H AT D O E S I T L O O K L I K E ?
  • 6.
    D E FI N I T I O N S • Big Data: unstructured data, don’t know what questions are yet • Business Intelligence: structured data, know what the questions you want answered • Statistics: structured data, not realtime, no action taken as a result • Machine Learning: creation of algorithms and applying them to data sets in an attempt to learn from data • Predictive Analytics: extracting existing data to predict trends
  • 7.
    W H YN O W ? • 2003: Doug Cutting & Mike Cafarella, Nutch • 2004:Google Labs: Map Reduce • 2006:Doug Cutting moves to Yahoo and creates Hadoop • 2008: Yahoo open sources Hadoop, Apache Software Foun • 2009: Matei Zaharia starts Spark at UC Berkley • 2013: Spark open sourced under Apache
  • 8.
    M A PR E D U C E Traditional / Sequential Map Reduce
  • 9.
    S P AR K x 100 Map Reduce
  • 10.
    C A SE S W H A T I T W I L L D O T O U S
  • 11.
    S E CU R I T Y - P R I V A C Y N S A P R I S M
  • 12.
    P R OF I L I N G
  • 13.
    V U LN E R A B I L I T Y • Target • Home Depot • Michaels • Blue Cross Blue Shield • Sony Entertainment
  • 14.
    S O CI E T Y
  • 15.
    C O MM E R C E A M A Z O N D A S H
  • 16.
    C O MM E R C E A M A Z O N
  • 17.
    C A SE S W H A T I T W I L L D O F O R U S
  • 18.
    S P OR T S S A B E R M E T R I C S ( M O N E Y B A L L ) 95% 5%
  • 19.
    P R OD U C T I V I T Y G O O G L E N O W
  • 20.
    P O LI T I C S O B A M A C A M P A I G N 2 0 1 2
  • 21.
    S C IE N C E M O N T E R E Y B A Y A Q U A R I U M R E S E A R C H I N S T I T U T E
  • 22.
    H E AL T H A P P L E R E S E A R C H K I T xt, Stanford says that it would normally take a national year-long effort to get that kind of scale. The flood of dat
  • 23.
    M O RE R E A D I N G • http://www.domo.com/blog/2014/04/data-never-sleeps-2-0/ • http://www.redorbit.com/education/reference_library/general-2/history-of/1113190638/the-history-of- mobile-phone-technology/ • http://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/ • http://www.wired.com/2015/04/robots-roam-earths-imperiled-oceans/?mbid=nl_041315 • http://www.allbusiness.com/what-does-your-supermarket-know-about-you-15611312-1.html • http://www.geekwire.com/2015/baseball-analytics-mystery-mlb-team-uses-a-cray-supercomputer-to- crunch-data/ • http://www.geekwire.com/2015/this-big-data-startup-just-raised-cash-to-analyze-driver-behavior-creating- safety-scores-for-individual- motorists/?utm_source=GeekWire+Daily+Digest&utm_campaign=20eb1892b3-daily-digest- email&utm_medium=email&utm_term=04e93fc7dfd-20eb1892b3- 233387065&mc_cid=20eb1892b3&mc_eid=7b61e5049a • http://www.newyorker.com/culture/culture-desk/the-horror-of-amazons-new-dash-button • https://www.amazon.com/oc/dash-button • http://harvardmagazine.com/2014/03/why-big-data-is-a-big-deal http://www.businessinsider.com/big-data- is-growing-thanks-to-mobile-2013-1http://venturebeat.com/2015/04/03/how-microsofts-using-big-data-to- predict-traffic-jams-up-to-an-hour-in-advance/ • http://www.engadget.com/2015/04/13/ibm-watson-health- cloud/?utm_source=Feed_Classic_Full&utm_medium=feed&utm_campaign=Engadget&?ncid=rss_full
  • 24.

Editor's Notes

  • #12 This is the ultimate Big Data scenario. It’s bigger than big data. When building the NSA Prism data center in Utah, they referred to Yottabyte storage. Calculations at the time suggested that it would cost trillions to create that size storage array.
  • #14 Each of these are cases of a data breach, where customer data was stolen. In most cases, these are things like credit card data, address data. When we get to breaches like Blue Cross, the scenario starts to darken. This is only the beginning, once more of what represents who you are is online, the greater the risks of having that identity stolen.
  • #19 For starters, Bolding notes that 95 percent of baseball stats have been created over the last five years thanks to the growing amount of data sensors and innovative methods of analyzing players. “They are gathering so much data that a single person with an Excel spreadsheet can no longer analyze, in a sophisticated way, all the data they have,” Bolding said. “They need bigger and bigger computers to be able to analyze the data.” As popularized by Michael Lewis’ Moneyball and the subsequent movie, using baseball data to drive decisions about player personnel — and ultimately win more games — was a strategy first used successfully by the Oakland in 2003.
  • #21 The intent of Media Optimizer was to enable much more targeted ad purchases. Prior to Media Optimizer, TV ad buys were based on broad demographics, which is both costly and inefficient. With Media Optimizer in place, the campaign could use statistical analysis to identify the target voters in the DNC database. Next, the voter data was enriched, both with demographics data from TV ratings as well as advertisement pricing data. Finally, the results were fed back into Vertica and reanalyzed for further tuning. With the overall picture combining likely voters for Obama, the shows they watch, and the prices of the ads -- as well as the analysis feedback loop -- it was much easier to determine the most efficient ad buys. One result was that the Obama campaign purchased twice the number of cable TV advertisements as the Romney campaign, many during niche programs, aimed at the precise demographic slices the Obama campaign was trying to reach.
  • #22 MBARI has a fleet of them, three different kinds—autonomous machines that prowl the open oceans gathering data, allowing researchers to monitor it in real time. The machines do not tire, and they cannot drown. They survive shark bites. They can roam for months on end, beaming a steady stream of data to scientists sitting safely onshore.