SlideShare a Scribd company logo

Big data primer

1 of 37
Download to read offline
A	
  Big	
  Data	
  Primer	
  


        	
  
Stacia Misner       	
         	
  	
  

E-mail: smisner@datainspirations.com
Twitter: @StaciaMisner
Blog: blog.datainspirations.com
Session	
  Overview	
  
•    What’s	
  the	
  Fuss?	
  
•    What’s	
  in	
  the	
  Big	
  Data	
  Stack?	
  
•    Where	
  Do	
  I	
  Start?	
  




2                              Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
What’s	
  the	
  Fuss?	
  
•    Some	
  Background…	
  
•    Classic	
  Data	
  Analysis	
  versus	
  Big	
  Data	
  
•    Why	
  Now?	
  
•    Why	
  Bother?	
  




3                                 Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Some	
  Background…	
  




                Google Trends: “Big Data”


4               Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Has	
  Big	
  Data	
  Jumped	
  the	
  Shark?	
  




	
  

                     Volume	
                                           Velocity	
  
                     Variety	
                                      Variability	
  


5                     Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Is	
  Big	
  Data	
  the	
  Next	
  Fron;er?	
  




6                      Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Classic	
  Data	
  Analysis	
  …Uses	
  Just	
  a	
  Subset	
  

                                                   Data Warehouse &
                                                      BI Solutions




                     ETL




7                     Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Classic	
  Data	
  Analysis	
  …Requires	
  Structure	
  

                                                 Data Warehouse &
                                                    BI Solutions




                   ETL




8                   Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Variety	
  Includes	
  Unstructured	
  Data	
  




9                  Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Big	
  Data	
  versus	
  Tradi;onal	
  BI	
  




   http://blogs.forrester.com/brian_hopkins/11-08-29-big_data_brewer_and_a_couple_of_webinars
10                                   Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Why	
  Now?	
  The	
  Times…	
  They	
  Are	
  A’Changin’	
  

             Cost of Storage Decreasing




     1970   1 TB   $1,000,000                                                                                   2013           1 TB   < $100

                                                                                                              Direct attached storage,
                                                                                                              not Enterprise SAN!

11                         Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
The	
  Times…	
  They	
  Are	
  A’Changin’	
  

            Data Volumes Increasing




      All Books 15 TB                                                                                          Daily Tweets 15 TB




12                      Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
The	
  Times…	
  They	
  Are	
  A’Changin’	
  

          Processing Power Increasing

      Then…                                                                                                                      Now…

    10 Years                                                                                                                     1 Week
 Completed in 2003                                                                                                          At 1/10th the Cost




                     3 Billion Base Pairs to Analyze

13                      Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Why	
  Now?	
  




     Powerful, Scalable, Cheap, Elasticity
14                Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Why	
  Bother?	
  	
  
•    Make	
  more	
  data	
  available	
  faster	
  	
  
•    Deliver	
  access	
  to	
  more	
  detailed,	
  accurate	
  informa;on	
  to	
  
     adjust	
  just-­‐in-­‐;me	
  
•    Segment	
  customers	
  at	
  more	
  granular	
  level	
  for	
  
     personaliza;on	
  of	
  products	
  and	
  services	
  
                                                                     http://
•    Perform	
  more	
  sophis;cated	
  analy;cs	
                   wiki.apache.
                                                                     org/hadoop/
•    Improve	
  products	
                                           PoweredBy
                                           Case Study
                             Customer,	
  Product,	
  Promo4on	
  Data	
  	
  -­‐>	
  
                                   Personalized	
  Promo4ons	
  
           Before	
  Big	
  Data	
                                               A[er	
  Big	
  Data	
  
           8	
  weeks	
                                                          1	
  week	
  and	
  dropping	
  
15                                     Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
What’s	
  In	
  the	
  Big	
  Data	
  Stack?	
  
•    Key	
  Differences	
  
•    Hadoop	
  Ecosystem	
  
•    Hadoop	
  and	
  Analysis	
  Services	
  




16                             Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Key	
  Differences	
  


                                                                                                                              Basically
                                                                                                                              Available
                                                                                                                              Soft-state
                                                                                                                              Eventually
                                                                                                                                 consistent

  Scale Out As Needed                                      Impose Schema
With Commodity Hardware                                       On Read




17                        Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Hadoop	
  Ecosystem	
  
                                                                                                Note: This is only a
                                                                                                subset of ecosystem!




                                            MapReduce	
  


                     HDFS	
  




18               Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Problem	
  to	
  Solve	
  
•    Elas;city	
  
      o    Ability	
  to	
  analyze	
  structured,	
  unstructured	
  data	
  
      o    DW	
  imposes	
  structure	
  for	
  ques;ons	
  we	
  know	
  we	
  want	
  
           answered	
  
      o    Need	
  ability	
  to	
  incorporate	
  other	
  types	
  of	
  data	
  on	
  demand	
  
•    Scale	
  
      o    Low	
  cost	
  commodity	
  hardware	
  
      o    Distributed	
  workload	
  




19                                 Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Hadoop	
  &	
  Analysis	
  Services	
  –	
  High	
  Latency	
  




20                   Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Hadoop	
  &	
  Analysis	
  Services-­‐	
  Medium	
  Latency	
  	
  




              Linked Server
              HiveODBC driver



21                   Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Hadoop	
  &	
  Analysis	
  Services-­‐	
  Medium	
  Latency	
  	
  




              Analysis Management Objects
              (AMO) to push data into SSAS



22                   Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Hadoop	
  &	
  Analysis	
  Services-­‐Low	
  Latency	
  




     Options:
     •  Impala (Cloudera)
     •  Spark and Shark (UC Berkeley)
     •  Stinger (Hortonworks)


23                         Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Where	
  Do	
  I	
  Start?	
  
•    Big	
  Data	
  Lifecycle	
  
•    Approaches	
  




24                                  Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Look at internal/external
Big	
  Data	
  Lifecycle	
                                                                                              processes –
                                                                                                                        What is a challenge?
                                                                                                                        Where could overwhelming
                                                                                                                        advantage be useful?
                                                                  Discovery	
                                           Formulate hypothesis



                                                                                                                              Data	
  
                Produc;on	
                                                                                                Prepara;on	
  




                 Result	
  
              Communica;on	
                                                                                            Model	
  Planning	
  




                                                             Model	
  Building	
  



25                               Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Big	
  Data	
  Business	
  Models                                                                        	
  	
  




26                 Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Big	
  Data	
  Lifecycle	
  
                                                                                                                       Explore the data in a sandbox
                                                                  Discovery	
                                          Condition the data




                                                                                                                              Data	
  
                Produc;on	
                                                                                                Prepara;on	
  




                 Result	
  
              Communica;on	
                                                                                            Model	
  Planning	
  




                                                             Model	
  Building	
  



27                               Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Big	
  Data	
  Lifecycle	
  

                                                                  Discovery	
  




                                                                                                                              Data	
  
                Produc;on	
                                                                                                Prepara;on	
  




                 Result	
  
              Communica;on	
                                                                                            Model	
  Planning	
  



                                                                                                                         Decide on methods and models
                                                                                                                         Examine data for key variables
                                                             Model	
  Building	
  



28                               Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Big	
  Data	
  Lifecycle	
  

                                                                       Discovery	
  




                                                                                                                                   Data	
  
                     Produc;on	
                                                                                                Prepara;on	
  




                      Result	
  
                   Communica;on	
                                                                                            Model	
  Planning	
  




 Create data sets for testing,
 training, and production                                         Model	
  Building	
  

 Set up hardware environment
29                                    Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Big	
  Data	
  Lifecycle	
  

                                                                        Discovery	
  




                                                                                                                                    Data	
  
                      Produc;on	
                                                                                                Prepara;on	
  




Validate (or not) hypothesis
Share findings


                       Result	
  
                    Communica;on	
                                                                                            Model	
  Planning	
  




                                                                   Model	
  Building	
  



30                                     Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Big	
  Data	
  Lifecycle	
  
     Pilot project                                                        Discovery	
  
     Operationalize


                                                                                                                                      Data	
  
                        Produc;on	
                                                                                                Prepara;on	
  




                         Result	
  
                      Communica;on	
                                                                                            Model	
  Planning	
  




                                                                     Model	
  Building	
  



31                                       Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Approaches	
  –	
  Store	
  and	
  Analyze	
  
•    Integrate	
  and	
  consolidate	
  
      o    Becer	
  data	
  quality	
  
      o    Access	
  to	
  history	
  
      o    Higher	
  storage	
  requirements	
  and	
  latency	
  impact	
  
•    Choose	
  hardware	
  
      o    Massively	
  Parallel	
  Processing	
  (PDW)	
  
      o    Tabular	
  –	
  data	
  compression	
  	
  
      o    RDBMS	
  –	
  column-­‐store	
  
      o    NoSQL	
  –	
  mul;ple	
  variable	
  data	
  sources	
  
•    Analyze	
  data	
  at	
  rest	
  

32                                Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Approaches	
  –	
  Analyze	
  and	
  Store	
  
•    Filter	
  and	
  aggregate	
  data	
  before	
  adding	
  to	
  DW	
  
      o    Reduce	
  ac;on	
  ;me	
  (receipt	
  of	
  raw	
  data	
  to	
  decision	
  point)	
  
           to	
  acain	
  greater	
  business	
  agility	
  
      o    Lower	
  storage	
  and	
  administra;ve	
  overhead	
  
•    Analyze	
  data	
  in	
  mo;on	
  (complex	
  event	
  processing)	
  




33                                 Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Overwhelmed?	
  Prototype	
  First!	
  
•    Define	
  a	
  small	
  project	
  –	
  focus	
  on	
  one	
  product,	
  for	
  
     example	
  
•    Capture	
  data	
  for	
  the	
  subset	
  of	
  focus	
  for	
  limited	
  dura;on	
  
     (one	
  month)	
  
•    Take	
  ac;on	
  on	
  analy;cs	
  and	
  measure	
  resul;ng	
  change	
  




                     http://www.microsoft.com/bigdata




34                              Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Session	
  Review	
  
•    What’s	
  the	
  Fuss?	
  
•    What’s	
  in	
  the	
  Big	
  Data	
  Stack?	
  
•    Where	
  Do	
  I	
  Start?	
  




35                                  Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Resources	
  
•    Big	
  data	
  has	
  jumped	
  the	
  shark	
  (9/11/2011)	
  
      o      www.dbms2.com/2011/09/11/big-­‐data-­‐has-­‐jumped-­‐the-­‐
             shark/	
  	
  
•    Big	
  data:	
  The	
  next	
  fron;er	
  for	
  innova;on,	
  compe;;on,	
  
     and	
  produc;vity	
  (aka	
  The	
  McKinsey	
  report)	
  
      o      hcp://www.mckinsey.com/Insights/MGI/Research/
             Technology_and_Innova;on/
             Big_data_The_next_fron;er_for_innova;on	
  
•    What	
  a	
  Big	
  Data	
  Model	
  Looks	
  Like	
  
      o      hcp://blogs.hbr.org/cs/2012/12/what_a_big-­‐
             data_business_model.html	
  
      	
  
36                               Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  
Resources	
  
•    	
  Architectures	
  for	
  Running	
  SSAS	
  on	
  Data	
  in	
  Hadoop	
  Hive	
  
      o    hcp://thinknook.com/architectures-­‐for-­‐running-­‐sql-­‐
           server-­‐analysis-­‐service-­‐ssas-­‐on-­‐data-­‐in-­‐hadoop-­‐
           hive-­‐2013-­‐02-­‐25/	
  




37                             Copyright	
  ©	
  2013	
  by	
  Data	
  Inspira;ons	
  Inc.	
  All	
  rights	
  reserved.	
  	
  

Recommended

What is the Point of Hadoop
What is the Point of HadoopWhat is the Point of Hadoop
What is the Point of HadoopDataWorks Summit
 
Cetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsCetas Analytics as a Service for Predictive Analytics
Cetas Analytics as a Service for Predictive AnalyticsJ. David Morris
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data WarehousingThomas Kejser
 
Using hadoop to expand data warehousing
Using hadoop to expand data warehousingUsing hadoop to expand data warehousing
Using hadoop to expand data warehousingDataWorks Summit
 
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...
Hadoop World 2011: The Blind Men and the Elephant - Matthew Aslett - The 451 ...Cloudera, Inc.
 

More Related Content

What's hot

Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksJongwook Woo
 
Big Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesBig Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesJongwook Woo
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012Gigaom
 
Hadoop explained [e book]
Hadoop explained [e book]Hadoop explained [e book]
Hadoop explained [e book]Supratim Ray
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data ApplicationsRichard McDougall
 
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | EdurekaBig Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | EdurekaEdureka!
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Surveyijeei-iaes
 
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REXHadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REXModern Data Stack France
 
Loyalty Management Innovator AIMIA's Transformation Journey to Modernized and...
Loyalty Management Innovator AIMIA's Transformation Journey to Modernized and...Loyalty Management Innovator AIMIA's Transformation Journey to Modernized and...
Loyalty Management Innovator AIMIA's Transformation Journey to Modernized and...Dana Gardner
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataEd Dodds
 
SAP Sybase IQ Sunumu-Sybase Türkiye
SAP Sybase IQ Sunumu-Sybase TürkiyeSAP Sybase IQ Sunumu-Sybase Türkiye
SAP Sybase IQ Sunumu-Sybase TürkiyeSybase Türkiye
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Cloudera, Inc.
 
How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...
How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...
How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...Dana Gardner
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016Andrey Karpov
 
Whitepaper: Big Data - Infrastructure Considerations - Happiest Minds
Whitepaper: Big Data - Infrastructure Considerations - Happiest MindsWhitepaper: Big Data - Infrastructure Considerations - Happiest Minds
Whitepaper: Big Data - Infrastructure Considerations - Happiest MindsHappiest Minds Technologies
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMEGigaom
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 

What's hot (20)

Big Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on NetworksBig Data and Data Intensive Computing on Networks
Big Data and Data Intensive Computing on Networks
 
Big Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use CasesBig Data and Data Intensive Computing: Use Cases
Big Data and Data Intensive Computing: Use Cases
 
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
 
Hadoop explained [e book]
Hadoop explained [e book]Hadoop explained [e book]
Hadoop explained [e book]
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | EdurekaBig Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
Big Data Career Path | Big Data Learning Path | Hadoop Tutorial | Edureka
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Big Data-Survey
Big Data-SurveyBig Data-Survey
Big Data-Survey
 
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REXHadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
Hadoop User Group 29Jan2015 Apache Flink / Haven / CapGemnini REX
 
Loyalty Management Innovator AIMIA's Transformation Journey to Modernized and...
Loyalty Management Innovator AIMIA's Transformation Journey to Modernized and...Loyalty Management Innovator AIMIA's Transformation Journey to Modernized and...
Loyalty Management Innovator AIMIA's Transformation Journey to Modernized and...
 
NextGen Infrastructure for Big Data
NextGen Infrastructure for Big DataNextGen Infrastructure for Big Data
NextGen Infrastructure for Big Data
 
SAP Sybase IQ Sunumu-Sybase Türkiye
SAP Sybase IQ Sunumu-Sybase TürkiyeSAP Sybase IQ Sunumu-Sybase Türkiye
SAP Sybase IQ Sunumu-Sybase Türkiye
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
 
How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...
How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...
How HudsonAlpha Innovates on IT for Research-Driven Education, Genomic Medici...
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016HPE IDOL Technical Overview - july 2016
HPE IDOL Technical Overview - july 2016
 
Whitepaper: Big Data - Infrastructure Considerations - Happiest Minds
Whitepaper: Big Data - Infrastructure Considerations - Happiest MindsWhitepaper: Big Data - Infrastructure Considerations - Happiest Minds
Whitepaper: Big Data - Infrastructure Considerations - Happiest Minds
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUMETHE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
THE 3V’S OF BIG DATA: VARIETY, VELOCITY, and VOLUME
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 

Viewers also liked

Rujukan
RujukanRujukan
Rujukanz z
 
Contoh Kerja Kursus PBS STPM Pengajian Am 4
Contoh Kerja Kursus PBS STPM Pengajian Am 4Contoh Kerja Kursus PBS STPM Pengajian Am 4
Contoh Kerja Kursus PBS STPM Pengajian Am 4Weiss Lee
 
Kaedah pengumpulan data pemerhatian
Kaedah pengumpulan data pemerhatianKaedah pengumpulan data pemerhatian
Kaedah pengumpulan data pemerhatianANIS IBRAHIM
 
Soal selidik merupakan satu set soalan atau item dalam bentuk tulisan
Soal selidik merupakan satu set soalan atau item dalam bentuk tulisanSoal selidik merupakan satu set soalan atau item dalam bentuk tulisan
Soal selidik merupakan satu set soalan atau item dalam bentuk tulisanMuss Miey
 

Viewers also liked (8)

Rujukan
RujukanRujukan
Rujukan
 
Sistem rujukan
Sistem rujukanSistem rujukan
Sistem rujukan
 
Rujukan
RujukanRujukan
Rujukan
 
Rujukan
RujukanRujukan
Rujukan
 
Temu bual
Temu bualTemu bual
Temu bual
 
Contoh Kerja Kursus PBS STPM Pengajian Am 4
Contoh Kerja Kursus PBS STPM Pengajian Am 4Contoh Kerja Kursus PBS STPM Pengajian Am 4
Contoh Kerja Kursus PBS STPM Pengajian Am 4
 
Kaedah pengumpulan data pemerhatian
Kaedah pengumpulan data pemerhatianKaedah pengumpulan data pemerhatian
Kaedah pengumpulan data pemerhatian
 
Soal selidik merupakan satu set soalan atau item dalam bentuk tulisan
Soal selidik merupakan satu set soalan atau item dalam bentuk tulisanSoal selidik merupakan satu set soalan atau item dalam bentuk tulisan
Soal selidik merupakan satu set soalan atau item dalam bentuk tulisan
 

Similar to Big data primer

Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSThe architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSTreasure Data, Inc.
 
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios SpagoWorld
 
Self-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big DataSelf-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big DataInside Analysis
 
Got Big Data? Get OpenSplice!
Got Big Data? Get OpenSplice!Got Big Data? Get OpenSplice!
Got Big Data? Get OpenSplice!Angelo Corsaro
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleSpringPeople
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessInside Analysis
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsInside Analysis
 
8 douetteau - dataiku - data tuesday open source 26 fev 2013
8   douetteau - dataiku - data tuesday open source 26 fev 2013 8   douetteau - dataiku - data tuesday open source 26 fev 2013
8 douetteau - dataiku - data tuesday open source 26 fev 2013 Data Tuesday
 
Reducing Database Pain & Costs with Postgres
Reducing Database Pain & Costs with PostgresReducing Database Pain & Costs with Postgres
Reducing Database Pain & Costs with PostgresEDB
 
Doc is a Four Letter Word
Doc is a Four Letter WordDoc is a Four Letter Word
Doc is a Four Letter WordMatt Badgley
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Jonathan Seidman
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Cloudera, Inc.
 
Making Sense of Graph Databases
Making Sense of Graph DatabasesMaking Sense of Graph Databases
Making Sense of Graph DatabasesInfiniteGraph
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPDr Geetha Mohan
 
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...StampedeCon
 

Similar to Big data primer (20)

Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSThe architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWS
 
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
Solutions Linux 2013: SpagoBI and Talend jointly support Big Data scenarios
 
Self-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big DataSelf-Service Access and Exploration of Big Data
Self-Service Access and Exploration of Big Data
 
Got Big Data? Get OpenSplice!
Got Big Data? Get OpenSplice!Got Big Data? Get OpenSplice!
Got Big Data? Get OpenSplice!
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Forrester
ForresterForrester
Forrester
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise Analytics
 
8 douetteau - dataiku - data tuesday open source 26 fev 2013
8   douetteau - dataiku - data tuesday open source 26 fev 2013 8   douetteau - dataiku - data tuesday open source 26 fev 2013
8 douetteau - dataiku - data tuesday open source 26 fev 2013
 
Reducing Database Pain & Costs with Postgres
Reducing Database Pain & Costs with PostgresReducing Database Pain & Costs with Postgres
Reducing Database Pain & Costs with Postgres
 
Doc is a Four Letter Word
Doc is a Four Letter WordDoc is a Four Letter Word
Doc is a Four Letter Word
 
Big data by_mcal
Big data by_mcalBig data by_mcal
Big data by_mcal
 
Everything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data WarehouseEverything Has Changed Except Us: Modernizing the Data Warehouse
Everything Has Changed Except Us: Modernizing the Data Warehouse
 
Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013Integrating hadoop - Big Data TechCon 2013
Integrating hadoop - Big Data TechCon 2013
 
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
Hadoop World 2011: Big Data Analytics – Data Professionals: The New Enterpris...
 
Making Sense of Graph Databases
Making Sense of Graph DatabasesMaking Sense of Graph Databases
Making Sense of Graph Databases
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
 
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
Big Data: Infrastructure Implications for “The Enterprise of Things” - Stampe...
 

Big data primer

  • 1. A  Big  Data  Primer     Stacia Misner       E-mail: smisner@datainspirations.com Twitter: @StaciaMisner Blog: blog.datainspirations.com
  • 2. Session  Overview   •  What’s  the  Fuss?   •  What’s  in  the  Big  Data  Stack?   •  Where  Do  I  Start?   2 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 3. What’s  the  Fuss?   •  Some  Background…   •  Classic  Data  Analysis  versus  Big  Data   •  Why  Now?   •  Why  Bother?   3 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 4. Some  Background…   Google Trends: “Big Data” 4 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 5. Has  Big  Data  Jumped  the  Shark?     Volume   Velocity   Variety   Variability   5 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 6. Is  Big  Data  the  Next  Fron;er?   6 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 7. Classic  Data  Analysis  …Uses  Just  a  Subset   Data Warehouse & BI Solutions ETL 7 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 8. Classic  Data  Analysis  …Requires  Structure   Data Warehouse & BI Solutions ETL 8 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 9. Variety  Includes  Unstructured  Data   9 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 10. Big  Data  versus  Tradi;onal  BI   http://blogs.forrester.com/brian_hopkins/11-08-29-big_data_brewer_and_a_couple_of_webinars 10 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 11. Why  Now?  The  Times…  They  Are  A’Changin’   Cost of Storage Decreasing 1970 1 TB $1,000,000 2013 1 TB < $100 Direct attached storage, not Enterprise SAN! 11 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 12. The  Times…  They  Are  A’Changin’   Data Volumes Increasing All Books 15 TB Daily Tweets 15 TB 12 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 13. The  Times…  They  Are  A’Changin’   Processing Power Increasing Then… Now… 10 Years 1 Week Completed in 2003 At 1/10th the Cost 3 Billion Base Pairs to Analyze 13 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 14. Why  Now?   Powerful, Scalable, Cheap, Elasticity 14 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 15. Why  Bother?     •  Make  more  data  available  faster     •  Deliver  access  to  more  detailed,  accurate  informa;on  to   adjust  just-­‐in-­‐;me   •  Segment  customers  at  more  granular  level  for   personaliza;on  of  products  and  services   http:// •  Perform  more  sophis;cated  analy;cs   wiki.apache. org/hadoop/ •  Improve  products   PoweredBy Case Study Customer,  Product,  Promo4on  Data    -­‐>   Personalized  Promo4ons   Before  Big  Data   A[er  Big  Data   8  weeks   1  week  and  dropping   15 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 16. What’s  In  the  Big  Data  Stack?   •  Key  Differences   •  Hadoop  Ecosystem   •  Hadoop  and  Analysis  Services   16 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 17. Key  Differences   Basically Available Soft-state Eventually consistent Scale Out As Needed Impose Schema With Commodity Hardware On Read 17 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 18. Hadoop  Ecosystem   Note: This is only a subset of ecosystem! MapReduce   HDFS   18 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 19. Problem  to  Solve   •  Elas;city   o  Ability  to  analyze  structured,  unstructured  data   o  DW  imposes  structure  for  ques;ons  we  know  we  want   answered   o  Need  ability  to  incorporate  other  types  of  data  on  demand   •  Scale   o  Low  cost  commodity  hardware   o  Distributed  workload   19 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 20. Hadoop  &  Analysis  Services  –  High  Latency   20 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 21. Hadoop  &  Analysis  Services-­‐  Medium  Latency     Linked Server HiveODBC driver 21 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 22. Hadoop  &  Analysis  Services-­‐  Medium  Latency     Analysis Management Objects (AMO) to push data into SSAS 22 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 23. Hadoop  &  Analysis  Services-­‐Low  Latency   Options: •  Impala (Cloudera) •  Spark and Shark (UC Berkeley) •  Stinger (Hortonworks) 23 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 24. Where  Do  I  Start?   •  Big  Data  Lifecycle   •  Approaches   24 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 25. Look at internal/external Big  Data  Lifecycle   processes – What is a challenge? Where could overwhelming advantage be useful? Discovery   Formulate hypothesis Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building   25 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 26. Big  Data  Business  Models     26 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 27. Big  Data  Lifecycle   Explore the data in a sandbox Discovery   Condition the data Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building   27 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 28. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Decide on methods and models Examine data for key variables Model  Building   28 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 29. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Create data sets for testing, training, and production Model  Building   Set up hardware environment 29 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 30. Big  Data  Lifecycle   Discovery   Data   Produc;on   Prepara;on   Validate (or not) hypothesis Share findings Result   Communica;on   Model  Planning   Model  Building   30 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 31. Big  Data  Lifecycle   Pilot project Discovery   Operationalize Data   Produc;on   Prepara;on   Result   Communica;on   Model  Planning   Model  Building   31 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 32. Approaches  –  Store  and  Analyze   •  Integrate  and  consolidate   o  Becer  data  quality   o  Access  to  history   o  Higher  storage  requirements  and  latency  impact   •  Choose  hardware   o  Massively  Parallel  Processing  (PDW)   o  Tabular  –  data  compression     o  RDBMS  –  column-­‐store   o  NoSQL  –  mul;ple  variable  data  sources   •  Analyze  data  at  rest   32 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 33. Approaches  –  Analyze  and  Store   •  Filter  and  aggregate  data  before  adding  to  DW   o  Reduce  ac;on  ;me  (receipt  of  raw  data  to  decision  point)   to  acain  greater  business  agility   o  Lower  storage  and  administra;ve  overhead   •  Analyze  data  in  mo;on  (complex  event  processing)   33 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 34. Overwhelmed?  Prototype  First!   •  Define  a  small  project  –  focus  on  one  product,  for   example   •  Capture  data  for  the  subset  of  focus  for  limited  dura;on   (one  month)   •  Take  ac;on  on  analy;cs  and  measure  resul;ng  change   http://www.microsoft.com/bigdata 34 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 35. Session  Review   •  What’s  the  Fuss?   •  What’s  in  the  Big  Data  Stack?   •  Where  Do  I  Start?   35 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 36. Resources   •  Big  data  has  jumped  the  shark  (9/11/2011)   o  www.dbms2.com/2011/09/11/big-­‐data-­‐has-­‐jumped-­‐the-­‐ shark/     •  Big  data:  The  next  fron;er  for  innova;on,  compe;;on,   and  produc;vity  (aka  The  McKinsey  report)   o  hcp://www.mckinsey.com/Insights/MGI/Research/ Technology_and_Innova;on/ Big_data_The_next_fron;er_for_innova;on   •  What  a  Big  Data  Model  Looks  Like   o  hcp://blogs.hbr.org/cs/2012/12/what_a_big-­‐ data_business_model.html     36 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.    
  • 37. Resources   •   Architectures  for  Running  SSAS  on  Data  in  Hadoop  Hive   o  hcp://thinknook.com/architectures-­‐for-­‐running-­‐sql-­‐ server-­‐analysis-­‐service-­‐ssas-­‐on-­‐data-­‐in-­‐hadoop-­‐ hive-­‐2013-­‐02-­‐25/   37 Copyright  ©  2013  by  Data  Inspira;ons  Inc.  All  rights  reserved.