Anexinet Big Data

Solutions for Big Data Analytics
Big Data Defined



Volume                                 Velocity
• Datasets that grow too large to      • Large volume streaming data that
  easily manage in traditional RDBMS     can overwhelm traditional BI & ETL
• TBs, PBs, ZBs                          processes



Variety                                Value
• Data sources extraneous to           • Big Data can have a
  traditional business systems that      transformational effect on business
  can be unstructured and require        when the proper systems and
  text analytics                         processes are put in place
Big Data vs. Classic BI

 What is different from classic DW/BI and Big Data Analytics?
     Businesses today treat data warehouse & business intelligence as must-have reporting and
      operational capability
     Businesses that are not fully mature in BI lifecycle may struggle with Big Data

 Big Data Projects look for untapped analytics, not BI dashboards
 SCALE: Think Volume, Variety and Velocity
     Yahoo! Uses Microsoft SQL Server & Analysis Services, with Hadoop, Oracle & Tableau
         38,000 machines distributed across 20 different clusters
     2-petabyte Hadoop cluster that feeds 1.2 terabytes of raw data each day into Oracle RAC
     Data is compressed and 135 gigabytes of data per day is sent to a SQL Server 2008 R2 Analysis
      Services cube
     Cube produces 24 terabytes of data each quarter
     http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000001707
Scalable Big Data Platform Architecture



                            HDFS Cluster                                       In-memory
                                                                                  cubes




                              MapReduce
                              Framework
                                                                            Analytical
                                                                                                  Advanced in-
                                                                           Columnstore
                                                      MPP                                        memory analytics
                                                                              Tables
                                                    Database


                  Hadoop                                           Analytics




                                                     Star                                          Ad-hoc data
                                                   Schemas                                          discovery


                                           Data Warehouse                                  End User Reporting


© Copyright 2013 Anexinet Corp.                                                                                     4
Go Beyond Dashboards. Provide Advanced Analytics.

 Large number of data
                                                                   Tableau
  points adds new business
  value

 Big Data advanced
  analytics requires tool that                                          Microsoft Power
  can sample complex data                                                    View
  sources

 Must provide quick
  aggregations of large data
  sets that are easily                              Qlikview
  consumed by the human
  eye

 Must provide “data
  discovery” for ad-hoc
  analysis
Marketing Samples

 Enhance marketing
  campaigns with Big Data

 Social analytics,
  customer analytic,
  targeted marketing,
  brand sentiment

 Big Data has proven
  transformational for
  marketing organizations
  (Razorfish, Yahoo!,
  NBC, [x+1])




                               Web Analytics from Google Analytics
Anexinet Big Data Offerings

Strategy Engagement
• Customer stakeholder interviews & interactive sessions
• Define Big Data Requirements
• Design Big Data Strategy
• Deliver Strategy & Roadmap Documents


     Starter Solution
     • Let Anexinet handle the hardest parts of a Big Data solution
       * Getting started
       * Collecting & processing data
       * Uncover business value from Big Data


Big Data Project Engagement
• End-to-end Big Data project
  * Big Data Discovery
  * Big Data Platform
  * Big Data Analytics
  * Big Data Visualizations
Partnerships



  Big Data Platforms     Big Data Databases   Big Data Visualizations


• EMC Greenplum        • HP Vertica           • QlikView
• Hortonworks          • EMC Greenplum        • Tableau
  (OSS, MSFT, HP)      • Microsoft PDW        • Microsoft PowerPivot
• Cloudera             • Oracle Exalytics     • Microsoft Power View
  (OSS, Oracle, HP)    • Oracle Big Data
                         Appliance
A Credible Partner to Deploy Big Data Solutions



    Security           Integration         Configuration         Governance

• Ensure           • ETL / ELT           • Configure the      • Ensure Data
  privacy of PII   • Integrate             Big Data             Quality
                     Hadoop into           environment to     • MDM
• Conform Big        your DW &             maximize           • Process
  Data solution      Analytics             throughput,          Governance
  to your            environments          performance
  enterprise       • Integrate Big         and analytics to
  security           Data into your IT     meet your
                     investments           stated SLA goals
  standards
Top Impediments to Successful Big Data Analytics
Big Data Buzzword Glossary
 Big Data: Think 3 v’s, unstructured data, data that is not currently managed in DW. This is the data that
  companies need to do game-changing analytics.
 Big Data Analytics: Business insights gained from mining Big Data to transform business processes
 Columnar: Column-oriented databases that are used in Big Data scenarios because of their speed and
  compression capabilities, i.e. HP Vertica, HBase
 Hadoop: Apache open-source framework for Big Data processing. Made up of multiple components. The
  leading Big Data platform. Marketed by Couldera & Hortonworks.
 In-memory DB: A database that resides fully in memory, eliminating IO bottlenecks. Very important in Big
  Data Analytics systems, i.e. Microsoft PowerPivot, SSAS 2012, SAP HANA
 MapReduce: Distributed data programming and processing framework. A key aspect of processing Big
  Data is using a MapReduce framework across distributed clusters of commodity servers. Available as
  open source in the Hadoop framework and in various Hadoop distribution flavors.
 MPP: Massively Parallel Processing database engine, mostly used for data warehouse & BI workloads.
  I.e. SQL Server PDW, IBM Netezza, Teradata
 NoSQL: Key-value data store for quick eventual-ACID schemaless database writes. Big Data systems will
  use these to store data coming in from sources that dump large amounts of data quickly, i.e. Cassandra,
  MongoDB.

Anexinet Big Data Solutions

  • 1.
    Anexinet Big Data Solutionsfor Big Data Analytics
  • 2.
    Big Data Defined Volume Velocity • Datasets that grow too large to • Large volume streaming data that easily manage in traditional RDBMS can overwhelm traditional BI & ETL • TBs, PBs, ZBs processes Variety Value • Data sources extraneous to • Big Data can have a traditional business systems that transformational effect on business can be unstructured and require when the proper systems and text analytics processes are put in place
  • 3.
    Big Data vs.Classic BI  What is different from classic DW/BI and Big Data Analytics?  Businesses today treat data warehouse & business intelligence as must-have reporting and operational capability  Businesses that are not fully mature in BI lifecycle may struggle with Big Data  Big Data Projects look for untapped analytics, not BI dashboards  SCALE: Think Volume, Variety and Velocity  Yahoo! Uses Microsoft SQL Server & Analysis Services, with Hadoop, Oracle & Tableau  38,000 machines distributed across 20 different clusters  2-petabyte Hadoop cluster that feeds 1.2 terabytes of raw data each day into Oracle RAC  Data is compressed and 135 gigabytes of data per day is sent to a SQL Server 2008 R2 Analysis Services cube  Cube produces 24 terabytes of data each quarter  http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000001707
  • 4.
    Scalable Big DataPlatform Architecture HDFS Cluster In-memory cubes MapReduce Framework Analytical Advanced in- Columnstore MPP memory analytics Tables Database Hadoop Analytics Star Ad-hoc data Schemas discovery Data Warehouse End User Reporting © Copyright 2013 Anexinet Corp. 4
  • 5.
    Go Beyond Dashboards.Provide Advanced Analytics.  Large number of data Tableau points adds new business value  Big Data advanced analytics requires tool that Microsoft Power can sample complex data View sources  Must provide quick aggregations of large data sets that are easily Qlikview consumed by the human eye  Must provide “data discovery” for ad-hoc analysis
  • 6.
    Marketing Samples  Enhancemarketing campaigns with Big Data  Social analytics, customer analytic, targeted marketing, brand sentiment  Big Data has proven transformational for marketing organizations (Razorfish, Yahoo!, NBC, [x+1]) Web Analytics from Google Analytics
  • 7.
    Anexinet Big DataOfferings Strategy Engagement • Customer stakeholder interviews & interactive sessions • Define Big Data Requirements • Design Big Data Strategy • Deliver Strategy & Roadmap Documents Starter Solution • Let Anexinet handle the hardest parts of a Big Data solution * Getting started * Collecting & processing data * Uncover business value from Big Data Big Data Project Engagement • End-to-end Big Data project * Big Data Discovery * Big Data Platform * Big Data Analytics * Big Data Visualizations
  • 8.
    Partnerships BigData Platforms Big Data Databases Big Data Visualizations • EMC Greenplum • HP Vertica • QlikView • Hortonworks • EMC Greenplum • Tableau (OSS, MSFT, HP) • Microsoft PDW • Microsoft PowerPivot • Cloudera • Oracle Exalytics • Microsoft Power View (OSS, Oracle, HP) • Oracle Big Data Appliance
  • 9.
    A Credible Partnerto Deploy Big Data Solutions Security Integration Configuration Governance • Ensure • ETL / ELT • Configure the • Ensure Data privacy of PII • Integrate Big Data Quality Hadoop into environment to • MDM • Conform Big your DW & maximize • Process Data solution Analytics throughput, Governance to your environments performance enterprise • Integrate Big and analytics to security Data into your IT meet your investments stated SLA goals standards
  • 10.
    Top Impediments toSuccessful Big Data Analytics
  • 11.
    Big Data BuzzwordGlossary  Big Data: Think 3 v’s, unstructured data, data that is not currently managed in DW. This is the data that companies need to do game-changing analytics.  Big Data Analytics: Business insights gained from mining Big Data to transform business processes  Columnar: Column-oriented databases that are used in Big Data scenarios because of their speed and compression capabilities, i.e. HP Vertica, HBase  Hadoop: Apache open-source framework for Big Data processing. Made up of multiple components. The leading Big Data platform. Marketed by Couldera & Hortonworks.  In-memory DB: A database that resides fully in memory, eliminating IO bottlenecks. Very important in Big Data Analytics systems, i.e. Microsoft PowerPivot, SSAS 2012, SAP HANA  MapReduce: Distributed data programming and processing framework. A key aspect of processing Big Data is using a MapReduce framework across distributed clusters of commodity servers. Available as open source in the Hadoop framework and in various Hadoop distribution flavors.  MPP: Massively Parallel Processing database engine, mostly used for data warehouse & BI workloads. I.e. SQL Server PDW, IBM Netezza, Teradata  NoSQL: Key-value data store for quick eventual-ACID schemaless database writes. Big Data systems will use these to store data coming in from sources that dump large amounts of data quickly, i.e. Cassandra, MongoDB.