Analyze This! Best
                                                         Practices For Big And
                                                               Fast Data

                                                            Judith Hurwitz, President
                                                              Hurwitz & Associates

                                                               Bill Schmarzo, CTO
                                                          EIMA Practice, EMC Consulting



© Copyright 2012 EMC Corporation. All rights reserved.                                    1
What is Big Fast Data?
The Transition in Data
    Management
           Judith Hurwitz
What Is Big Fast Data?

Big Fast Data is the ability to
manage a huge volume of
disparate data at the right velocity
within the right timeframe
Characteristics of Big Fast Data
  • Must be verified based on
     accuracy and business
     context
  • Must incorporate variety of
     data types including
     structured unstructured data
                                       3
Why Is Big Fast Data Important?


• Businesses need to gain
  insights from massive
  amounts of stored data
• Businesses need to be able
  to make decisions faster to
  impact outcomes
• Need to find answers
  without asking the question

                                   4
What Is The Business Looking For?


1. Ability to gain access to
   vast amounts of available
   data from multiple sources
2. Ability to identify anomalies
3. Ability to predict the future
4. Ability to react in real time
   based on analysis

                                     5
How Did We Get Here?

• Early online commerce sites and search
  engines began pushing boundaries of data
  management
• Successful companies found ways to
  monetize huge volumes of customer data
  to upsell
• The massive data had to be managed
  efficiently and in the right context
                                  6
Waves Of Data In Context With Usage Patterns
         Wave                          Examples                                            Characteristics
Relational Database         System of Record                        Used for structured, transactional data, strict definitional controls.
Content Management          Claims Document Management              Used with unstructured/semi-structured text, derived value,
System                      System, Web content management          context driven.
Data Warehouse              Customer and account data               Used for structured data. Subject oriented system optimized for
                            warehouse                               querying. Integrated, well-defined parameters, optimized for
                                                                    storage, focused on timely access to corporate data.
Complex Event               Monitoring sensor data in real time     Large streams of data focused on managing and analyzing
Processing/Streaming data   to determine process changes            business processes.
In-Memory Databases         Used in ecommerce engines to            Uses main memory to cache data to improve speed. Fast
                            reduce latency and speed                analytical processing that can transform decision making in real-
                            transaction processing.                 time or near real-time.
Hadoop Software             Used to process massive amounts         A non-relational software framework based on Google’s
Framework                   of highly distributed disparate data.   MapReduce Framework. It includes a distributed file system
                            Examples include fraud processing,      based software framework. Allows very large data files (both
                            image processing                        structured and unstructured data) to be distributed across all
                                                                    nodes of a very large grid of servers.
NoSQL Databases             Designed to process massive             Supports various database models including graph, object, key
                            amounts of data in a flexible form.     value, and document. Document oriented rather than relying on
                            Used in ecommerce to process            joins, scale out model for scalability.
                            massive amounts of data flexibly.

                                                                                                              7
How Infrastructure Supports The Reality Of Big Fast Data

 • Availability of commodity
   servers
 • Horizontal scaling because
   of virtualization
 • Emergence of Cloud
   Computing
 • Advanced data
   management including
   predictive analytics and
   big data analysis
                                              8
Making Big Data Fast Data A Reality

• Create a well defined business and IT strategy
• Focus on the business problem such as identifying
  buying opportunities at point of engagement or reducing
  fraud through an early warning system
• Understand the characteristics of your own data that you
  need to leverage for the future
• Identify your bottlenecks in your current data architecture
• Create a strategy so you can use massive data at the
  right speed and the right context to anticipate new
  opportunities                                   9
The Elements Of A Data Architecture

•   Foundational Data Services- support for relational, in-memory
    databases, structured and unstructured data
•   Middleware Services – allow for communication and integration
    between data sources
•   Big Data Analytics – ability to analyze huge volumes of data
•   Data Warehousing Capabilities – used to apply analytics to huge
    volumes of complex data
•   Management Services – deliver the right performance levels
•   Virtualized Infrastructure – ability to optimize the environment
•   Runtime Services – support for mobile computing and other user
    environments
                                                          10
The Business Initiative For Big Fast Data

•    Capture, transform, and
     manage huge volumes of
     information in near real time
•    Capture data at the point of
     creation and then combine data
     sources to create context to
     deliver on the business
     objective
•    Leverage data assets to gain a
     competitive advantage


                                                11
The Business Potential
                                                                   Of
                                                             Big Fast Data

                                                              Bill Schmarzo
                                                             CTO, EIM&A Practice
                                                               EMC Consulting




© Copyright 2012 EMC Corporation. All rights reserved.                             12
Big Fast Data Requires An Architecture For High-
velocity Data To Accelerate Operational Execution
                             Mobile-Enabled                  Application
                              Web Clients                   Performance
                                                              Manager
                                                                           Key Architecture Capabilities
                                                                           ď‚§ Scale out compute and storage
                          Cloud Application Platform                       ď‚§ Distribution: real-time WAN
 App Director Installer




                           Application Logic
                                                                           ď‚§ Data Diversity: SQL and NoSQL

                          In-memory Database                               ď‚§ Mobile enabled
                                                            Fast Ingest
                          vFabric Data Director    Greenplum               ď‚§ In-memory computing
                           Postgres      Oracle     Greenplum    Hadoop    ď‚§ In-database analytics
                                           Cloud Platform                  ď‚§ Cloud friendly architecture



© Copyright 2012 EMC Corporation. All rights reserved.                                                       13
Big Fast Data Use Cases
              Algorithmic Stock Trading                       Identify risk and pricing nuances in stock trading
Real-time




              Ad Serving                                      Serve right ad to right person at the right time
              Cyber Security                                  Flag potential security breach behaviors and situations
              Fraud Detection                                 Identify potential fraud situations at purchase time
              High-end Product Failure                        Predict high-end product failures (planes, trains, power plants)
              Next Best Offers                                Recommend products based on current shopping occurrence
              Churn Detection                                 Flag customer behaviors that are indicative of attrition
              Medical Treatment                               Recommend appropriate medical treatments in urgent situations
Right-time




              Money Laundering                                Flag suspicious financial transactions
              Claims Adjudication                             Approve insurance claims at time of filing
              Loan/Insurance Approval                         Calculate financial scores and risks to approve loan or policy
              Oil & Gas Exploration                           Track sensor feeds to identify potential drilling problems



             © Copyright 2012 EMC Corporation. All rights reserved.                                                              14
Use Case: Financial Trading And Real-time Operational
Analytics

                                                          Develop risk and pricing
                                                           algorithms against
                                                           historical data in
                                                           Greenplum Database using
                                                           analytical methods such as
                                                           linear regression,
                                                           clustering, etc.
                                                          Serve up analytic results
                                                           and scores to SQLFire for
                                                           real-time execution




© Copyright 2012 EMC Corporation. All rights reserved.                                  15
Use Case: Retail Location-based Marketing And Next
Best Offers
                                                          Develop analytic models
                                                           on detailed customer
                                                           loyalty and Point of Sale
                                                           (POS)data to create
                                                           “next best offer” scores
                                                           for each customer
                                                          Leverage “right-time”
                                                           feeds based upon
                                                           customer geo location to
                                                           deliver most appropriate
                                                           offers




© Copyright 2012 EMC Corporation. All rights reserved.                                 16
Use Case: Healthcare And Readmission Score At Initial
Admission
                                                           Out of 1000 patients,
                                                           1124 admissions
                                                                                     • Score patient at
                                                           expected within next 12     point of admission
                                                           months                      for the probability of
                                                                                       readmission based
                                                                                       upon patient history
                                                                                       and current health
                                                                                       factors
                                                                                     • Create custom
• Admissions increase with the
  level of cholesterol                                                                 treatment and
• Admissions decrease with the                                                         monitoring programs
  Max Heart Rate
• Cholesterol and Max Heart                                                            for high-risk patients
  Rate uncorrelated




  © Copyright 2012 EMC Corporation. All rights reserved.                                                        17
Greenplum And EMC Consulting Provide Big Fast Data
Strategy And Implementation Services
                                       Identify big data
             Vision
                                       analytics business
             Workshop
                                       use cases



                                    Analytics             Deploy analytics sandbox
                                                          to quantify the business
                                    Lab
                                                          case


                                                                                     Identify current state, determine required
                                                         Analytics
                                                                                     state and conduct gap analysis to develop
                                                         Operationalization
                                                                                     analytics implementation roadmap




           Repeat the process for
           identified business cases


© Copyright 2012 EMC Corporation. All rights reserved.                                                                            18
Questions and Answers




                    To type a question via WebEx, click on the Q&A tab
                             Please select “Ask: All Panelists”
                      to ensure your questions reach us. Thank you!

© Copyright 2012 EMC Corporation. All rights reserved.                   19
Learn More…
 See us at…
     –    Oct. 16-17 O’Reilly Strata Rx Conference, Santa Clara, CA
              ▪   Oct. 16 9:40 am It’s an Exciting Time in the Industry
              â–Ş   Oct. 16 3:35 pm Big Fast Data in Health Sciences: A Panel of Experts Discusses What and Why
              â–Ş   Oct. 17 2:05 pm A Predictive Approach to Real-Time Detection of Fraud, Waste, and Abuse in Healthcare
     –    Oct. 23-25 O’Reilly Strata New York Conference
              â–Ş   Oct. 23 11:15 am Great Debate: The Old Models are Broken
     –    On-demand webinar: Transform Your BI and Data Warehouse for Big Data
     –    Upcoming webinar Sept. 18, 11am PT/2pm ET Using Greenplum to Deliver Big Data Analytics

 Contact Judith Hurwitz
     –    Email: judith.hurwitz@hurwitz.com
     –    LinkedIn: http://www.linkedin.com/pub/judith-hurwitz/0/18/405
     –    Twitter: @jhurwitz

 Contact Bill Schmarzo
     –    Email: william.schmarzo@emc.com
     –    LinkedIn: http://www.linkedin.com/in/schmarzo
     –    Twitter: @schmarzo
     –    Blog: http://infocus.emc.com/author/william_schmarzo/


 © Copyright 2012 EMC Corporation. All rights reserved.                                                                   20
THANK YOU


© Copyright 2012 EMC Corporation. All rights reserved.   21

Analyze This! Best Practices For Big And Fast Data

  • 1.
    Analyze This! Best Practices For Big And Fast Data Judith Hurwitz, President Hurwitz & Associates Bill Schmarzo, CTO EIMA Practice, EMC Consulting © Copyright 2012 EMC Corporation. All rights reserved. 1
  • 2.
    What is BigFast Data? The Transition in Data Management Judith Hurwitz
  • 3.
    What Is BigFast Data? Big Fast Data is the ability to manage a huge volume of disparate data at the right velocity within the right timeframe Characteristics of Big Fast Data • Must be verified based on accuracy and business context • Must incorporate variety of data types including structured unstructured data 3
  • 4.
    Why Is BigFast Data Important? • Businesses need to gain insights from massive amounts of stored data • Businesses need to be able to make decisions faster to impact outcomes • Need to find answers without asking the question 4
  • 5.
    What Is TheBusiness Looking For? 1. Ability to gain access to vast amounts of available data from multiple sources 2. Ability to identify anomalies 3. Ability to predict the future 4. Ability to react in real time based on analysis 5
  • 6.
    How Did WeGet Here? • Early online commerce sites and search engines began pushing boundaries of data management • Successful companies found ways to monetize huge volumes of customer data to upsell • The massive data had to be managed efficiently and in the right context 6
  • 7.
    Waves Of DataIn Context With Usage Patterns Wave Examples Characteristics Relational Database System of Record Used for structured, transactional data, strict definitional controls. Content Management Claims Document Management Used with unstructured/semi-structured text, derived value, System System, Web content management context driven. Data Warehouse Customer and account data Used for structured data. Subject oriented system optimized for warehouse querying. Integrated, well-defined parameters, optimized for storage, focused on timely access to corporate data. Complex Event Monitoring sensor data in real time Large streams of data focused on managing and analyzing Processing/Streaming data to determine process changes business processes. In-Memory Databases Used in ecommerce engines to Uses main memory to cache data to improve speed. Fast reduce latency and speed analytical processing that can transform decision making in real- transaction processing. time or near real-time. Hadoop Software Used to process massive amounts A non-relational software framework based on Google’s Framework of highly distributed disparate data. MapReduce Framework. It includes a distributed file system Examples include fraud processing, based software framework. Allows very large data files (both image processing structured and unstructured data) to be distributed across all nodes of a very large grid of servers. NoSQL Databases Designed to process massive Supports various database models including graph, object, key amounts of data in a flexible form. value, and document. Document oriented rather than relying on Used in ecommerce to process joins, scale out model for scalability. massive amounts of data flexibly. 7
  • 8.
    How Infrastructure SupportsThe Reality Of Big Fast Data • Availability of commodity servers • Horizontal scaling because of virtualization • Emergence of Cloud Computing • Advanced data management including predictive analytics and big data analysis 8
  • 9.
    Making Big DataFast Data A Reality • Create a well defined business and IT strategy • Focus on the business problem such as identifying buying opportunities at point of engagement or reducing fraud through an early warning system • Understand the characteristics of your own data that you need to leverage for the future • Identify your bottlenecks in your current data architecture • Create a strategy so you can use massive data at the right speed and the right context to anticipate new opportunities 9
  • 10.
    The Elements OfA Data Architecture • Foundational Data Services- support for relational, in-memory databases, structured and unstructured data • Middleware Services – allow for communication and integration between data sources • Big Data Analytics – ability to analyze huge volumes of data • Data Warehousing Capabilities – used to apply analytics to huge volumes of complex data • Management Services – deliver the right performance levels • Virtualized Infrastructure – ability to optimize the environment • Runtime Services – support for mobile computing and other user environments 10
  • 11.
    The Business InitiativeFor Big Fast Data • Capture, transform, and manage huge volumes of information in near real time • Capture data at the point of creation and then combine data sources to create context to deliver on the business objective • Leverage data assets to gain a competitive advantage 11
  • 12.
    The Business Potential Of Big Fast Data Bill Schmarzo CTO, EIM&A Practice EMC Consulting © Copyright 2012 EMC Corporation. All rights reserved. 12
  • 13.
    Big Fast DataRequires An Architecture For High- velocity Data To Accelerate Operational Execution Mobile-Enabled Application Web Clients Performance Manager Key Architecture Capabilities  Scale out compute and storage Cloud Application Platform  Distribution: real-time WAN App Director Installer Application Logic  Data Diversity: SQL and NoSQL In-memory Database  Mobile enabled Fast Ingest vFabric Data Director Greenplum  In-memory computing Postgres Oracle Greenplum Hadoop  In-database analytics Cloud Platform  Cloud friendly architecture © Copyright 2012 EMC Corporation. All rights reserved. 13
  • 14.
    Big Fast DataUse Cases Algorithmic Stock Trading Identify risk and pricing nuances in stock trading Real-time Ad Serving Serve right ad to right person at the right time Cyber Security Flag potential security breach behaviors and situations Fraud Detection Identify potential fraud situations at purchase time High-end Product Failure Predict high-end product failures (planes, trains, power plants) Next Best Offers Recommend products based on current shopping occurrence Churn Detection Flag customer behaviors that are indicative of attrition Medical Treatment Recommend appropriate medical treatments in urgent situations Right-time Money Laundering Flag suspicious financial transactions Claims Adjudication Approve insurance claims at time of filing Loan/Insurance Approval Calculate financial scores and risks to approve loan or policy Oil & Gas Exploration Track sensor feeds to identify potential drilling problems © Copyright 2012 EMC Corporation. All rights reserved. 14
  • 15.
    Use Case: FinancialTrading And Real-time Operational Analytics  Develop risk and pricing algorithms against historical data in Greenplum Database using analytical methods such as linear regression, clustering, etc.  Serve up analytic results and scores to SQLFire for real-time execution © Copyright 2012 EMC Corporation. All rights reserved. 15
  • 16.
    Use Case: RetailLocation-based Marketing And Next Best Offers  Develop analytic models on detailed customer loyalty and Point of Sale (POS)data to create “next best offer” scores for each customer  Leverage “right-time” feeds based upon customer geo location to deliver most appropriate offers © Copyright 2012 EMC Corporation. All rights reserved. 16
  • 17.
    Use Case: HealthcareAnd Readmission Score At Initial Admission Out of 1000 patients, 1124 admissions • Score patient at expected within next 12 point of admission months for the probability of readmission based upon patient history and current health factors • Create custom • Admissions increase with the level of cholesterol treatment and • Admissions decrease with the monitoring programs Max Heart Rate • Cholesterol and Max Heart for high-risk patients Rate uncorrelated © Copyright 2012 EMC Corporation. All rights reserved. 17
  • 18.
    Greenplum And EMCConsulting Provide Big Fast Data Strategy And Implementation Services Identify big data Vision analytics business Workshop use cases Analytics Deploy analytics sandbox to quantify the business Lab case Identify current state, determine required Analytics state and conduct gap analysis to develop Operationalization analytics implementation roadmap Repeat the process for identified business cases © Copyright 2012 EMC Corporation. All rights reserved. 18
  • 19.
    Questions and Answers To type a question via WebEx, click on the Q&A tab Please select “Ask: All Panelists” to ensure your questions reach us. Thank you! © Copyright 2012 EMC Corporation. All rights reserved. 19
  • 20.
    Learn More…  Seeus at… – Oct. 16-17 O’Reilly Strata Rx Conference, Santa Clara, CA ▪ Oct. 16 9:40 am It’s an Exciting Time in the Industry ▪ Oct. 16 3:35 pm Big Fast Data in Health Sciences: A Panel of Experts Discusses What and Why ▪ Oct. 17 2:05 pm A Predictive Approach to Real-Time Detection of Fraud, Waste, and Abuse in Healthcare – Oct. 23-25 O’Reilly Strata New York Conference ▪ Oct. 23 11:15 am Great Debate: The Old Models are Broken – On-demand webinar: Transform Your BI and Data Warehouse for Big Data – Upcoming webinar Sept. 18, 11am PT/2pm ET Using Greenplum to Deliver Big Data Analytics  Contact Judith Hurwitz – Email: judith.hurwitz@hurwitz.com – LinkedIn: http://www.linkedin.com/pub/judith-hurwitz/0/18/405 – Twitter: @jhurwitz  Contact Bill Schmarzo – Email: william.schmarzo@emc.com – LinkedIn: http://www.linkedin.com/in/schmarzo – Twitter: @schmarzo – Blog: http://infocus.emc.com/author/william_schmarzo/ © Copyright 2012 EMC Corporation. All rights reserved. 20
  • 21.
    THANK YOU © Copyright2012 EMC Corporation. All rights reserved. 21