Big data primer

2,370 views
2,284 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,370
On SlideShare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
57
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big data primer

  1. 1. A  Big  Data  Primer    Stacia Misner      E-mail: smisner@datainspirations.comTwitter: @StaciaMisnerBlog: blog.datainspirations.com
  2. 2. Session  Overview  •  What’s  the  Fuss?  •  What’s  in  the  Big  Data  Stack?  •  Where  Do  I  Start?  2 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  3. 3. What’s  the  Fuss?  •  Some  Background…  •  Classic  Data  Analysis  versus  Big  Data  •  Why  Now?  •  Why  Bother?  3 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  4. 4. Some  Background…   Google Trends: “Big Data”4 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  5. 5. Has  Big  Data  Jumped  the  Shark?     Volume   Velocity   Variety   Variability  5 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  6. 6. Is  Big  Data  the  Next  Fron;er?  6 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  7. 7. Classic  Data  Analysis  …Uses  Just  a  Subset   Data Warehouse & BI Solutions ETL7 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  8. 8. Classic  Data  Analysis  …Requires  Structure   Data Warehouse & BI Solutions ETL8 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  9. 9. Variety  Includes  Unstructured  Data  9 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  10. 10. Big  Data  versus  Tradi;onal  BI   http://blogs.forrester.com/brian_hopkins/11-08-29-big_data_brewer_and_a_couple_of_webinars10 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  11. 11. Why  Now?  The  Times…  They  Are  A’Changin’   Cost of Storage Decreasing 1970 1 TB $1,000,000 2013 1 TB < $100 Direct attached storage, not Enterprise SAN!11 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  12. 12. The  Times…  They  Are  A’Changin’   Data Volumes Increasing All Books 15 TB Daily Tweets 15 TB12 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  13. 13. The  Times…  They  Are  A’Changin’   Processing Power Increasing Then… Now… 10 Years 1 Week Completed in 2003 At 1/10th the Cost 3 Billion Base Pairs to Analyze13 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  14. 14. Why  Now?   Powerful, Scalable, Cheap, Elasticity14 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  15. 15. Why  Bother?    •  Make  more  data  available  faster    •  Deliver  access  to  more  detailed,  accurate  informa;on  to   adjust  just-­‐in-­‐;me  •  Segment  customers  at  more  granular  level  for   personaliza;on  of  products  and  services   http://•  Perform  more  sophis;cated  analy;cs   wiki.apache. org/hadoop/•  Improve  products   PoweredBy Case Study Customer,  Product,  Promo4on  Data    -­‐>   Personalized  Promo4ons   Before  Big  Data   A[er  Big  Data   8  weeks   1  week  and  dropping  15 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  16. 16. What’s  In  the  Big  Data  Stack?  •  Key  Differences  •  Hadoop  Ecosystem  •  Hadoop  and  Analysis  Services  16 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  17. 17. Key  Differences   Basically Available Soft-state Eventually consistent Scale Out As Needed Impose SchemaWith Commodity Hardware On Read17 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  18. 18. Hadoop  Ecosystem   Note: This is only a subset of ecosystem! MapReduce   HDFS  18 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  19. 19. Problem  to  Solve  •  Elas;city   o  Ability  to  analyze  structured,  unstructured  data   o  DW  imposes  structure  for  ques;ons  we  know  we  want   answered   o  Need  ability  to  incorporate  other  types  of  data  on  demand  •  Scale   o  Low  cost  commodity  hardware   o  Distributed  workload  19 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  20. 20. Hadoop  &  Analysis  Services  –  High  Latency  20 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  21. 21. Hadoop  &  Analysis  Services-­‐  Medium  Latency     Linked Server HiveODBC driver21 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  22. 22. Hadoop  &  Analysis  Services-­‐  Medium  Latency     Analysis Management Objects (AMO) to push data into SSAS22 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  23. 23. Hadoop  &  Analysis  Services-­‐Low  Latency   Options: •  Impala (Cloudera) •  Spark and Shark (UC Berkeley) •  Stinger (Hortonworks)23 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  24. 24. Where  Do  I  Start?  •  Big  Data  Lifecycle  •  Approaches  24 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  25. 25. Look at internal/externalBig  Data  Lifecycle   processes – What is a challenge? Where could overwhelming advantage be useful? Discovery   Formulate hypothesis Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building  25 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  26. 26. Big  Data  Business  Models    26 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  27. 27. Big  Data  Lifecycle   Explore the data in a sandbox Discovery   Condition the data Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building  27 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  28. 28. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Decide on methods and models Examine data for key variables Model  Building  28 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  29. 29. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Create data sets for testing, training, and production Model  Building   Set up hardware environment29 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  30. 30. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on  Validate (or not) hypothesisShare findings Result   Communica;on   Model  Planning   Model  Building  30 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  31. 31. Big  Data  Lifecycle   Pilot project Discovery   Operationalize Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building  31 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  32. 32. Approaches  –  Store  and  Analyze  •  Integrate  and  consolidate   o  Becer  data  quality   o  Access  to  history   o  Higher  storage  requirements  and  latency  impact  •  Choose  hardware   o  Massively  Parallel  Processing  (PDW)   o  Tabular  –  data  compression     o  RDBMS  –  column-­‐store   o  NoSQL  –  mul;ple  variable  data  sources  •  Analyze  data  at  rest  32 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  33. 33. Approaches  –  Analyze  and  Store  •  Filter  and  aggregate  data  before  adding  to  DW   o  Reduce  ac;on  ;me  (receipt  of  raw  data  to  decision  point)   to  acain  greater  business  agility   o  Lower  storage  and  administra;ve  overhead  •  Analyze  data  in  mo;on  (complex  event  processing)  33 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  34. 34. Overwhelmed?  Prototype  First!  •  Define  a  small  project  –  focus  on  one  product,  for   example  •  Capture  data  for  the  subset  of  focus  for  limited  dura;on   (one  month)  •  Take  ac;on  on  analy;cs  and  measure  resul;ng  change   http://www.microsoft.com/bigdata34 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  35. 35. Session  Review  •  What’s  the  Fuss?  •  What’s  in  the  Big  Data  Stack?  •  Where  Do  I  Start?  35 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  36. 36. Resources  •  Big  data  has  jumped  the  shark  (9/11/2011)   o  www.dbms2.com/2011/09/11/big-­‐data-­‐has-­‐jumped-­‐the-­‐ shark/    •  Big  data:  The  next  fron;er  for  innova;on,  compe;;on,   and  produc;vity  (aka  The  McKinsey  report)   o  hcp://www.mckinsey.com/Insights/MGI/Research/ Technology_and_Innova;on/ Big_data_The_next_fron;er_for_innova;on  •  What  a  Big  Data  Model  Looks  Like   o  hcp://blogs.hbr.org/cs/2012/12/what_a_big-­‐ data_business_model.html    36 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  37. 37. Resources  •   Architectures  for  Running  SSAS  on  Data  in  Hadoop  Hive   o  hcp://thinknook.com/architectures-­‐for-­‐running-­‐sql-­‐ server-­‐analysis-­‐service-­‐ssas-­‐on-­‐data-­‐in-­‐hadoop-­‐ hive-­‐2013-­‐02-­‐25/  37 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    

×