Your SlideShare is downloading. ×
0
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Big data primer
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big data primer

2,046

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,046
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
55
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A  Big  Data  Primer    Stacia Misner      E-mail: smisner@datainspirations.comTwitter: @StaciaMisnerBlog: blog.datainspirations.com
  • 2. Session  Overview  •  What’s  the  Fuss?  •  What’s  in  the  Big  Data  Stack?  •  Where  Do  I  Start?  2 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 3. What’s  the  Fuss?  •  Some  Background…  •  Classic  Data  Analysis  versus  Big  Data  •  Why  Now?  •  Why  Bother?  3 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 4. Some  Background…   Google Trends: “Big Data”4 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 5. Has  Big  Data  Jumped  the  Shark?     Volume   Velocity   Variety   Variability  5 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 6. Is  Big  Data  the  Next  Fron;er?  6 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 7. Classic  Data  Analysis  …Uses  Just  a  Subset   Data Warehouse & BI Solutions ETL7 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 8. Classic  Data  Analysis  …Requires  Structure   Data Warehouse & BI Solutions ETL8 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 9. Variety  Includes  Unstructured  Data  9 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 10. Big  Data  versus  Tradi;onal  BI   http://blogs.forrester.com/brian_hopkins/11-08-29-big_data_brewer_and_a_couple_of_webinars10 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 11. Why  Now?  The  Times…  They  Are  A’Changin’   Cost of Storage Decreasing 1970 1 TB $1,000,000 2013 1 TB < $100 Direct attached storage, not Enterprise SAN!11 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 12. The  Times…  They  Are  A’Changin’   Data Volumes Increasing All Books 15 TB Daily Tweets 15 TB12 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 13. The  Times…  They  Are  A’Changin’   Processing Power Increasing Then… Now… 10 Years 1 Week Completed in 2003 At 1/10th the Cost 3 Billion Base Pairs to Analyze13 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 14. Why  Now?   Powerful, Scalable, Cheap, Elasticity14 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 15. Why  Bother?    •  Make  more  data  available  faster    •  Deliver  access  to  more  detailed,  accurate  informa;on  to   adjust  just-­‐in-­‐;me  •  Segment  customers  at  more  granular  level  for   personaliza;on  of  products  and  services   http://•  Perform  more  sophis;cated  analy;cs   wiki.apache. org/hadoop/•  Improve  products   PoweredBy Case Study Customer,  Product,  Promo4on  Data    -­‐>   Personalized  Promo4ons   Before  Big  Data   A[er  Big  Data   8  weeks   1  week  and  dropping  15 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 16. What’s  In  the  Big  Data  Stack?  •  Key  Differences  •  Hadoop  Ecosystem  •  Hadoop  and  Analysis  Services  16 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 17. Key  Differences   Basically Available Soft-state Eventually consistent Scale Out As Needed Impose SchemaWith Commodity Hardware On Read17 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 18. Hadoop  Ecosystem   Note: This is only a subset of ecosystem! MapReduce   HDFS  18 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 19. Problem  to  Solve  •  Elas;city   o  Ability  to  analyze  structured,  unstructured  data   o  DW  imposes  structure  for  ques;ons  we  know  we  want   answered   o  Need  ability  to  incorporate  other  types  of  data  on  demand  •  Scale   o  Low  cost  commodity  hardware   o  Distributed  workload  19 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 20. Hadoop  &  Analysis  Services  –  High  Latency  20 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 21. Hadoop  &  Analysis  Services-­‐  Medium  Latency     Linked Server HiveODBC driver21 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 22. Hadoop  &  Analysis  Services-­‐  Medium  Latency     Analysis Management Objects (AMO) to push data into SSAS22 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 23. Hadoop  &  Analysis  Services-­‐Low  Latency   Options: •  Impala (Cloudera) •  Spark and Shark (UC Berkeley) •  Stinger (Hortonworks)23 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 24. Where  Do  I  Start?  •  Big  Data  Lifecycle  •  Approaches  24 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 25. Look at internal/externalBig  Data  Lifecycle   processes – What is a challenge? Where could overwhelming advantage be useful? Discovery   Formulate hypothesis Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building  25 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 26. Big  Data  Business  Models    26 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 27. Big  Data  Lifecycle   Explore the data in a sandbox Discovery   Condition the data Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building  27 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 28. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Decide on methods and models Examine data for key variables Model  Building  28 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 29. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Create data sets for testing, training, and production Model  Building   Set up hardware environment29 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 30. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on  Validate (or not) hypothesisShare findings Result   Communica;on   Model  Planning   Model  Building  30 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 31. Big  Data  Lifecycle   Pilot project Discovery   Operationalize Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building  31 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 32. Approaches  –  Store  and  Analyze  •  Integrate  and  consolidate   o  Becer  data  quality   o  Access  to  history   o  Higher  storage  requirements  and  latency  impact  •  Choose  hardware   o  Massively  Parallel  Processing  (PDW)   o  Tabular  –  data  compression     o  RDBMS  –  column-­‐store   o  NoSQL  –  mul;ple  variable  data  sources  •  Analyze  data  at  rest  32 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 33. Approaches  –  Analyze  and  Store  •  Filter  and  aggregate  data  before  adding  to  DW   o  Reduce  ac;on  ;me  (receipt  of  raw  data  to  decision  point)   to  acain  greater  business  agility   o  Lower  storage  and  administra;ve  overhead  •  Analyze  data  in  mo;on  (complex  event  processing)  33 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 34. Overwhelmed?  Prototype  First!  •  Define  a  small  project  –  focus  on  one  product,  for   example  •  Capture  data  for  the  subset  of  focus  for  limited  dura;on   (one  month)  •  Take  ac;on  on  analy;cs  and  measure  resul;ng  change   http://www.microsoft.com/bigdata34 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 35. Session  Review  •  What’s  the  Fuss?  •  What’s  in  the  Big  Data  Stack?  •  Where  Do  I  Start?  35 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 36. Resources  •  Big  data  has  jumped  the  shark  (9/11/2011)   o  www.dbms2.com/2011/09/11/big-­‐data-­‐has-­‐jumped-­‐the-­‐ shark/    •  Big  data:  The  next  fron;er  for  innova;on,  compe;;on,   and  produc;vity  (aka  The  McKinsey  report)   o  hcp://www.mckinsey.com/Insights/MGI/Research/ Technology_and_Innova;on/ Big_data_The_next_fron;er_for_innova;on  •  What  a  Big  Data  Model  Looks  Like   o  hcp://blogs.hbr.org/cs/2012/12/what_a_big-­‐ data_business_model.html    36 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 37. Resources  •   Architectures  for  Running  SSAS  on  Data  in  Hadoop  Hive   o  hcp://thinknook.com/architectures-­‐for-­‐running-­‐sql-­‐ server-­‐analysis-­‐service-­‐ssas-­‐on-­‐data-­‐in-­‐hadoop-­‐ hive-­‐2013-­‐02-­‐25/  37 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    

×