Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A	  Big	  Data	  Primer	          	  Stacia Misner       	         	  	  E-mail: smisner@datainspirations.comTwitter: @Sta...
Session	  Overview	  •    What’s	  the	  Fuss?	  •    What’s	  in	  the	  Big	  Data	  Stack?	  •    Where	  Do	  I	  Star...
What’s	  the	  Fuss?	  •    Some	  Background…	  •    Classic	  Data	  Analysis	  versus	  Big	  Data	  •    Why	  Now?	  ...
Some	  Background…	                  Google Trends: “Big Data”4               Copyright	  ©	  2013	  by	  Data	  Inspira;o...
Has	  Big	  Data	  Jumped	  the	  Shark?	  	                       Volume	                                           Veloc...
Is	  Big	  Data	  the	  Next	  Fron;er?	  6                      Copyright	  ©	  2013	  by	  Data	  Inspira;ons	  Inc.	  A...
Classic	  Data	  Analysis	  …Uses	  Just	  a	  Subset	                                                     Data Warehouse ...
Classic	  Data	  Analysis	  …Requires	  Structure	                                                   Data Warehouse &     ...
Variety	  Includes	  Unstructured	  Data	  9                  Copyright	  ©	  2013	  by	  Data	  Inspira;ons	  Inc.	  All	...
Big	  Data	  versus	  Tradi;onal	  BI	     http://blogs.forrester.com/brian_hopkins/11-08-29-big_data_brewer_and_a_couple_...
Why	  Now?	  The	  Times…	  They	  Are	  A’Changin’	               Cost of Storage Decreasing     1970   1 TB   $1,000,000...
The	  Times…	  They	  Are	  A’Changin’	              Data Volumes Increasing      All Books 15 TB                         ...
The	  Times…	  They	  Are	  A’Changin’	            Processing Power Increasing      Then…                                 ...
Why	  Now?	       Powerful, Scalable, Cheap, Elasticity14                Copyright	  ©	  2013	  by	  Data	  Inspira;ons	  ...
Why	  Bother?	  	  •    Make	  more	  data	  available	  faster	  	  •    Deliver	  access	  to	  more	  detailed,	  accur...
What’s	  In	  the	  Big	  Data	  Stack?	  •    Key	  Differences	  •    Hadoop	  Ecosystem	  •    Hadoop	  and	  Analysis	 ...
Key	  Differences	                                                                                                         ...
Hadoop	  Ecosystem	                                                                                                  Note:...
Problem	  to	  Solve	  •    Elas;city	        o    Ability	  to	  analyze	  structured,	  unstructured	  data	        o   ...
Hadoop	  &	  Analysis	  Services	  –	  High	  Latency	  20                   Copyright	  ©	  2013	  by	  Data	  Inspira;on...
Hadoop	  &	  Analysis	  Services-­‐	  Medium	  Latency	  	                Linked Server              HiveODBC driver21    ...
Hadoop	  &	  Analysis	  Services-­‐	  Medium	  Latency	  	                Analysis Management Objects              (AMO) t...
Hadoop	  &	  Analysis	  Services-­‐Low	  Latency	       Options:     •  Impala (Cloudera)     •  Spark and Shark (UC Berke...
Where	  Do	  I	  Start?	  •    Big	  Data	  Lifecycle	  •    Approaches	  24                                  Copyright	  ...
Look at internal/externalBig	  Data	  Lifecycle	                                                                          ...
Big	  Data	  Business	  Models                                                                        	  	  26            ...
Big	  Data	  Lifecycle	                                                                                                   ...
Big	  Data	  Lifecycle	                                                                    Discovery	                     ...
Big	  Data	  Lifecycle	                                                                         Discovery	                ...
Big	  Data	  Lifecycle	                                                                          Discovery	               ...
Big	  Data	  Lifecycle	       Pilot project                                                        Discovery	       Operat...
Approaches	  –	  Store	  and	  Analyze	  •    Integrate	  and	  consolidate	        o    Becer	  data	  quality	        o ...
Approaches	  –	  Analyze	  and	  Store	  •    Filter	  and	  aggregate	  data	  before	  adding	  to	  DW	        o    Red...
Overwhelmed?	  Prototype	  First!	  •    Define	  a	  small	  project	  –	  focus	  on	  one	  product,	  for	       exampl...
Session	  Review	  •    What’s	  the	  Fuss?	  •    What’s	  in	  the	  Big	  Data	  Stack?	  •    Where	  Do	  I	  Start?...
Resources	  •    Big	  data	  has	  jumped	  the	  shark	  (9/11/2011)	        o      www.dbms2.com/2011/09/11/big-­‐data-...
Resources	  •    	  Architectures	  for	  Running	  SSAS	  on	  Data	  in	  Hadoop	  Hive	        o    hcp://thinknook.com...
Upcoming SlideShare
Loading in …5
×

Big data primer

2,607 views

Published on

  • Be the first to comment

Big data primer

  1. 1. A  Big  Data  Primer    Stacia Misner      E-mail: smisner@datainspirations.comTwitter: @StaciaMisnerBlog: blog.datainspirations.com
  2. 2. Session  Overview  •  What’s  the  Fuss?  •  What’s  in  the  Big  Data  Stack?  •  Where  Do  I  Start?  2 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  3. 3. What’s  the  Fuss?  •  Some  Background…  •  Classic  Data  Analysis  versus  Big  Data  •  Why  Now?  •  Why  Bother?  3 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  4. 4. Some  Background…   Google Trends: “Big Data”4 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  5. 5. Has  Big  Data  Jumped  the  Shark?     Volume   Velocity   Variety   Variability  5 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  6. 6. Is  Big  Data  the  Next  Fron;er?  6 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  7. 7. Classic  Data  Analysis  …Uses  Just  a  Subset   Data Warehouse & BI Solutions ETL7 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  8. 8. Classic  Data  Analysis  …Requires  Structure   Data Warehouse & BI Solutions ETL8 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  9. 9. Variety  Includes  Unstructured  Data  9 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  10. 10. Big  Data  versus  Tradi;onal  BI   http://blogs.forrester.com/brian_hopkins/11-08-29-big_data_brewer_and_a_couple_of_webinars10 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  11. 11. Why  Now?  The  Times…  They  Are  A’Changin’   Cost of Storage Decreasing 1970 1 TB $1,000,000 2013 1 TB < $100 Direct attached storage, not Enterprise SAN!11 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  12. 12. The  Times…  They  Are  A’Changin’   Data Volumes Increasing All Books 15 TB Daily Tweets 15 TB12 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  13. 13. The  Times…  They  Are  A’Changin’   Processing Power Increasing Then… Now… 10 Years 1 Week Completed in 2003 At 1/10th the Cost 3 Billion Base Pairs to Analyze13 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  14. 14. Why  Now?   Powerful, Scalable, Cheap, Elasticity14 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  15. 15. Why  Bother?    •  Make  more  data  available  faster    •  Deliver  access  to  more  detailed,  accurate  informa;on  to   adjust  just-­‐in-­‐;me  •  Segment  customers  at  more  granular  level  for   personaliza;on  of  products  and  services   http://•  Perform  more  sophis;cated  analy;cs   wiki.apache. org/hadoop/•  Improve  products   PoweredBy Case Study Customer,  Product,  Promo4on  Data    -­‐>   Personalized  Promo4ons   Before  Big  Data   A[er  Big  Data   8  weeks   1  week  and  dropping  15 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  16. 16. What’s  In  the  Big  Data  Stack?  •  Key  Differences  •  Hadoop  Ecosystem  •  Hadoop  and  Analysis  Services  16 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  17. 17. Key  Differences   Basically Available Soft-state Eventually consistent Scale Out As Needed Impose SchemaWith Commodity Hardware On Read17 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  18. 18. Hadoop  Ecosystem   Note: This is only a subset of ecosystem! MapReduce   HDFS  18 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  19. 19. Problem  to  Solve  •  Elas;city   o  Ability  to  analyze  structured,  unstructured  data   o  DW  imposes  structure  for  ques;ons  we  know  we  want   answered   o  Need  ability  to  incorporate  other  types  of  data  on  demand  •  Scale   o  Low  cost  commodity  hardware   o  Distributed  workload  19 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  20. 20. Hadoop  &  Analysis  Services  –  High  Latency  20 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  21. 21. Hadoop  &  Analysis  Services-­‐  Medium  Latency     Linked Server HiveODBC driver21 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  22. 22. Hadoop  &  Analysis  Services-­‐  Medium  Latency     Analysis Management Objects (AMO) to push data into SSAS22 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  23. 23. Hadoop  &  Analysis  Services-­‐Low  Latency   Options: •  Impala (Cloudera) •  Spark and Shark (UC Berkeley) •  Stinger (Hortonworks)23 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  24. 24. Where  Do  I  Start?  •  Big  Data  Lifecycle  •  Approaches  24 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  25. 25. Look at internal/externalBig  Data  Lifecycle   processes – What is a challenge? Where could overwhelming advantage be useful? Discovery   Formulate hypothesis Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building  25 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  26. 26. Big  Data  Business  Models    26 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  27. 27. Big  Data  Lifecycle   Explore the data in a sandbox Discovery   Condition the data Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building  27 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  28. 28. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Decide on methods and models Examine data for key variables Model  Building  28 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  29. 29. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Create data sets for testing, training, and production Model  Building   Set up hardware environment29 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  30. 30. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on  Validate (or not) hypothesisShare findings Result   Communica;on   Model  Planning   Model  Building  30 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  31. 31. Big  Data  Lifecycle   Pilot project Discovery   Operationalize Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building  31 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  32. 32. Approaches  –  Store  and  Analyze  •  Integrate  and  consolidate   o  Becer  data  quality   o  Access  to  history   o  Higher  storage  requirements  and  latency  impact  •  Choose  hardware   o  Massively  Parallel  Processing  (PDW)   o  Tabular  –  data  compression     o  RDBMS  –  column-­‐store   o  NoSQL  –  mul;ple  variable  data  sources  •  Analyze  data  at  rest  32 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  33. 33. Approaches  –  Analyze  and  Store  •  Filter  and  aggregate  data  before  adding  to  DW   o  Reduce  ac;on  ;me  (receipt  of  raw  data  to  decision  point)   to  acain  greater  business  agility   o  Lower  storage  and  administra;ve  overhead  •  Analyze  data  in  mo;on  (complex  event  processing)  33 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  34. 34. Overwhelmed?  Prototype  First!  •  Define  a  small  project  –  focus  on  one  product,  for   example  •  Capture  data  for  the  subset  of  focus  for  limited  dura;on   (one  month)  •  Take  ac;on  on  analy;cs  and  measure  resul;ng  change   http://www.microsoft.com/bigdata34 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  35. 35. Session  Review  •  What’s  the  Fuss?  •  What’s  in  the  Big  Data  Stack?  •  Where  Do  I  Start?  35 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  36. 36. Resources  •  Big  data  has  jumped  the  shark  (9/11/2011)   o  www.dbms2.com/2011/09/11/big-­‐data-­‐has-­‐jumped-­‐the-­‐ shark/    •  Big  data:  The  next  fron;er  for  innova;on,  compe;;on,   and  produc;vity  (aka  The  McKinsey  report)   o  hcp://www.mckinsey.com/Insights/MGI/Research/ Technology_and_Innova;on/ Big_data_The_next_fron;er_for_innova;on  •  What  a  Big  Data  Model  Looks  Like   o  hcp://blogs.hbr.org/cs/2012/12/what_a_big-­‐ data_business_model.html    36 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  37. 37. Resources  •   Architectures  for  Running  SSAS  on  Data  in  Hadoop  Hive   o  hcp://thinknook.com/architectures-­‐for-­‐running-­‐sql-­‐ server-­‐analysis-­‐service-­‐ssas-­‐on-­‐data-­‐in-­‐hadoop-­‐ hive-­‐2013-­‐02-­‐25/  37 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    

×