Using Hadoop Analytics to
Gain a Big Data Advantage

Jonathan Seidman, Solution Architect, Cloudera
Ian Fyfe, VP Product Marketing, Pentaho
Jeff Stacey, Director of GTM Strategy, Channel & Sales Development, Dell
Why big data
    matters to your
    business
    Jonathan
    Seidman, Cloudera




2   Confidential        Big Data Solutions

                                             2
Explosive Data Growth

                                           10,000
 GIGABYTES OF DATA CREATED (IN BILLIONS)




                                                                 1.8 trillion gigabytes of data was
                                                                 created in 2011…

                                                                    •   More than 90% is unstructured data
                                                                    •   Approx. 500 quadrillion files
                                               5,000                •   Quantity doubles every 2 years




                                                  0

                                                         2005                                           2010                                      2015

                                                                                                               STRUCTURED DATA         UNSTRUCTURED DATA
Source: IDC 2011




                                           3      Confidential                                                               Big Data Solutions
The ‗Big Data‘ Phenomenon


       Big Data Drivers                       More Content                                                    More Devices

       • The proliferation of data capture
         and creation technologies
       • Increased ―interconnectedness‖
         drives consumption (creating
         more data)                              More                                                         New & Better
                                              Consumption                                                      Information
       • Inexpensive storage makes it
         possible to keep more, longer
       • Innovative software and analysis
         tools turn data into information




                                                                   • Every gigabyte of stored content can generate
                                    Big Data encompasses not         a petabyte or more of transient data*
                                    only the content itself, but
                                    how it’s consumed              • The information about you is much greater
                                                                     than the information you create
*Source: IDC 2011




      4     Confidential                                                                 Big Data Solutions
The Opportunity: Quickly gain a competitive
advantage
                                                           Use Cases
                   • Big opportunity to drive              • Ecommerce – Predict
                     revenue, e.g.                           customer behavior across
                      – Predict customer behavior            all channels to drive
                        across all channels (Web             revenue
                        site, social media, email, etc.)   • E-gaming – understand
                      – Understand and monetize              and better monetize
                        customer behavior                    customer behavior
                      – Predict customer churn             • Networks – predict failure,
                                                             neutralize attacks to reduce
                   • Big opportunity to reduce               costs
                     costs, e.g.                           • Customers – predict churn,
                      – Networks – predict                   optimize revenue
                        failure, neutralize attacks        • Machines/sensors –
                      – Machines/sensors – predict           predict failures, reduce
                        failures                             costs
                      – Financial risk management –        • Financial risk
                        reduce fraud, increase security      management – reduce
                                                             fraud, increase security


5   Confidential                                                      Big Data Solutions
Big data
    challenges
    Ian Fyfe, Pentaho




6   Confidential        Big Data Solutions

                                             6
Big Data Challenges


              Cost-effectively managing the volume, velocity and variety of
                                                                      data


                   Deriving value across structured and unstructured data


          Adapting to context changes and integrating new data sources
                                                             and types




7   Confidential                                               Big Data Solutions
The Current Solutions
                                          10,000
GIGABYTES OF DATA CREATED (IN BILLIONS)




                                                           Current Database Solutions are designed
                                                           for structured data.

                                                                •   Optimized to answer known questions
                                                                    quickly
                                              5,000             •   Schemas dictate form/context
                                                                •   Difficult to adapt to new data types and new
                                                                    questions
                                                                •   Expensive at petabyte scale




                                                 0                                                                                                        10%
                                                        2005                                       2010                                          2015

                                                                                                                   STRUCTURED DATA    UNSTRUCTURED DATA




                                          8      Confidential                                                                        Big Data Solutions
Common Data Analytics Architecture
          Offline data can‘t be
            analyzed easily



     TAPE
    ARCHIVE                           Can‘t explore original
                                                                                          BI REPORTS
                                                                                               &
                                        high fidelity data
                                                                                       INTERACTIVE APPS




 STORAGE ONLY                                                                                 RDBMS
     GRID                                              ETL COMPUTE
                                                           GRID                            (AGGREGATED
(ORIGINAL RAW DATA)                                                                            DATA)




                                                       Moving data to compute
                                                            doesn‘t scale
                    DATA COLLECTION




                     DATA SOURCES

9    Confidential                                                               Big Data Solutions
Leveraging big
     data for
     competitive
     advantage
     All




10   Confidential     Big Data Solutions
Success With Hadoop




11   Confidential     Big Data Solutions
Big Data Analytics at TravelTainment
    Multi-channel distribution platform for the travel industry

                                                                             Pentaho Business Analytics fits perfectly
                                                                                 into our open source Big Data
                                                                                          environment.‖

                                                                               -- Ibrahim Husseini, Director of Data
                                                                                    Warehouse, TravelTainment


•        Business challenge: Inefficient and time consuming reporting capabilities on big data
         sets with legacy system.

    Benefits
                                                                                  Why Pentaho
    • Ability to visualize its very large data volumes for reporting and
         analysis in such a way that non-technical users can also easily        • Capability to analyze data from Hadoop
         understand them                                                          and Hive
                                                                                • Professional support for in-depth
    • Can now run complex reports three times faster and with more
                                                                                  analysis
         flexibility than before
                                                                                • Self-service analysis and reporting for
    • For the first time can offer clients user-friendly, self-service and        business customers
         ad-hoc reporting services helping IT focus on their main business
         and not serve as support desk for reporting                            • Cost effective solution




    12     Confidential                                                                          Big Data Solutions
                                                                                                                            1
Dell uncovers new insights and reduces IT costs by US$35 million with a
business intelligence solution designed for big data




                                       Accelerated customer shipment time by 33 percent
Dell                                    Saved US$2 million by improving product quality
Business
                                        Integrated data silos
Intelligence
Practice                                Reduced IT costs by US$35 million

                                        Increased agility




 13   Confidential                                                Big Data Solutions
SecureWorks slashes
                               the cost of storage                              Organization
                               with Dell | Cloudera                             SecureWorks is a true security partner
                                                                                to help protect your IT assets, comply
                               Solution                                         with regulations and reduce costs —
                                                                                without having to build your internal
                                                                                security expertise from scratch.


Challenge
SecureWorks needed a highly scalable solution for
collecting, processing, and analyzing massive amounts of data collected from
customer environments.                                                           “Our storage cost per
                                                                                  gigabyte is 23 cents.
Solution                                                                          We thought we had
The organization deployed the Dell™ | Cloudera® Solution with Cloudera‘s          great economics
distribution of Apache® Hadoop® software, Dell-developed Crowbar software         previously when we
framework, PowerEdge™ C2100 servers, Force10 switches, Dell and                   were spending about
Cloudera services in a solution based on a Dell reference architecture.
                                                                                  seventeen dollars per
                                                                                  gigabyte.”
Benefits
• Reduced the cost of data storage to 23 cents per/gigabyte
• Gained easy scalability for future growth                                        Robert Scudiere, Director of
                                                                                   Engineering, Dell SecureWorks
• Leveraged open source software and commodity hardware to reduce time
  to market
• Maintain high availability for critical services and flexibility to analyze
  structured and unstructured data

 Read the case study
 Watch the case study video
14   Confidential                                                                            Big Data Solutions
Must-haves of an
     effective
     big data solution
     Jeff Stacey, Dell
     16, then 24 to close

     & Jonathan Seidman,
     Cloudera




15   Confidential           Big Data Solutions

                                                 1
Big Data Solution Requirements


                                 Cost-effectively manage
                    the volume, variety and velocity of data


                                    Process and analyze
                         large, complex data sets…quickly


                                           Flexibly adapt
                    to context changes and new data types




16   Confidential                            Big Data Solutions
Why was Hadoop created?
                                                                    Dramatic changes in
Exploding data volumes & types                           LEADS TO
                                                                    enterprise data management

                                                                                       With Hadoop, you can…
                                                                                       •   Extract more value
                 DIGITAL
                CONTENT                                                                •   From more data
                                                                                       •   More cost effectively
                                                                            NEW        •   With greater flexibility
                                                OPERATIONAL
                                                                         OPPORTUNITY
                                      WEB          DATA
                                      LOGS
                SOCIAL
                MEDIA                                                                       •   Deep analysis
     FILES                                   SMART
                                             GRIDS
                                                                                            •   Exhaustive and detailed
                                                                          HARD              •   Sophisticated algorithms
                                                                        PROBLEMS            •   Quick results
                           TRANSACTIONAL
                               DATA

         AD
     IMPRESSIONS                                                                                     •   Any kind
                                               R&D                                                   •   From any source
                                               DATA
                                                                                                     •   Structured and unstructured
                                                                         BIG DATA                    •   At scale


 It’s difficult to handle data this diverse at this scale.
          Traditional platforms can’t keep pace.




17      Confidential                                                                            Big Data Solutions
What is Apache Hadoop?
                                                                 CORE HADOOP COMPONENTS
Hadoop is a platform for data
storage and processing that is…                                Hadoop
                                                           Distributed File
                                                                                                   MapReduce
  Scalable                                                System (HDFS)
  Fault tolerant                                           File sharing and data                 Distributed computing
  Open source                                            protection across physical             across physical servers
                                                                   servers




         Consolidates                       Excels at                                         Scales
          everything                     complex analysis                                  economically
                                     • Scale-out architecture divides                  • Can be deployed on
 • A single repository for storing
                                       workloads across multiple                         commodity hardware
   and mining any type of data
                                       nodes
 • Not bound by a single schema                                                        • Open source platform
                                     • Flexible file system eliminates                   guards against vendor
                                       ETL bottlenecks                                   lock




18   Confidential                                                                               Big Data Solutions
Core Hadoop: HDFS
Self-healing, high bandwidth CLUSTERED STORAGE



       1

       2
                    HDFS
       3                           2                                                   1
                                               1            1          2

       4                           4                                                   3
                                               5            3          3

       5                           5                                                   4
                                               2            4          5

     HDFS breaks incoming files into blocks and stores them redundantly across the cluster



19   Confidential                                                            Big Data Solutions
Core Hadoop: MapReduce
Framework for DISTRIBUTED COMPUTING



     1

      2
                    MR
      3                            2                                                  1
                                               1           1           2

      4                            4                                                  3
                                               5           3           3

      5                            5                                                  4
                                               2           4           5

          Processes many jobs in parallel across many nodes and combines the results



20   Confidential                                                            Big Data Solutions
Major Hadoop Utilities

                                                                     Apache Pig
                                                                   High-level language
                                                                   for expressing data
                                    Apache Hive                     analysis programs
                                                                                                       Apache HBase
                                 SQL-like language and
                                  metadata repository                                                 The Hadoop database.
                                                                                                       Random, real -time
                                                                                                        read/write access


                           Hue
                                                                                                              Apache Zookeeper
                      Browser-based
                    desktop interface for                                                                        Highly reliable
                      interacting with                                                                             distributed
                          Hadoop                                                                               coordination service


                               Oozie
                                                                                                                Flume
                          Server-based
                        workflow engine for                                                              Distributed service for
                         Hadoop activities                                                                   collecting and
                                                                                                          aggregating log and
                                                                                                               event data

                                                   Sqoop
                                                                                 Apache Whirr
                                              Integrating Hadoop
                                                  with RDBMS                    Library for running
                                                                                Hadoop in the cloud




21   Confidential                                                                                                            Big Data Solutions
Hadoop in Production




22   Confidential      Big Data Solutions
The unrivaled leader in Hadoop
• Worldwide #1 distribution of Apache
  Hadoop
• 100% Open-Source Hadoop
  Distribution
• Largest contributor to the open source
  Hadoop ecosystem
     – Project founders from 8 of the 13
       leading Apache Projects
• Cloudera has more Apache committers
  on staff than any other company
• More than 100 enterprise & public
  sector customers across a wide variety
  of industries



23   Confidential                          Big Data Solutions
Dell | Cloudera Solution with Pentaho


                         Dell Value
                          Business intelligence practice
                          Open & scalable infrastructure
                          Certified and tested platforms
                          Active community participation
                          Crowbar deployment tool
                          Reference Architecture
                          Deployment Guide & Services
                          Joint support with Cloudera
                          Actual customers




24   Confidential                          Big Data Solutions
Industry first: PowerEdge C8000
         Mix and match for the ultimate performance in a dense 4U package

 •     Speed up your most resource-intensive
       workloads by mixing and matching
       compute, storage and/or GPU nodes in the
       same 4U shared infrastructure chassis
 •     Get the cores, memory and I/O expansion you
       need for peak workload performance

 Great for: Big Data, Web 2.0/Hosting, HPC

                                       Get faster results with
            Mix & Match                                                Do more with less
                                       more compute power
     • Mix compute, storage and
       GPUs in the same 4U             • Intel Xeon ES-2600          • Shared infrastructure
       chassis                           processors boost              reduces power & cooling
                                         performance by 80%            costs by ~20%
     • More workload flexibility, HD
       & I/O options than the HP       • Up to 135W support          • Refresh with the latest
                                                                       components without having
       SL6500 or Super Micro           • 2x the I/O bandwidth with
       6047R                                                           to replace the entire chassis
                                         PCI Express Gen3



25    Confidential                                                             Big Data Solutions
• Visual design for Hadoop
• Reduces skills requirements
• Deep integration with Hadoop
      – HDFS, MapReduce, Sqoop, Oo
        zie
      – Runs as MapReduce in-Hadoop
                                      Reporting &   Data Discovery            Predictive
• Easily connects Hadoop to           Dashboards     Visualization            Analytics

  other enterprise data sources
• Broadens Hadoop use to data
  analysts, business users and IT




                      Data
                 Ingestion, Man
                 ipulation, Integ
                 ration, Workflo
                        w
 26    Confidential                                            Big Data Solutions
Fast Visual Development for Hadoop
                     Ingestion / Manipulation / Integration




                     Scheduling


                                                        Modeling




27   Confidential                                  Big Data Solutions
                                                                        2
Discovery > Proof of Value > Deployment




28   Confidential                 Big Data Solutions
Summary
             Dell | Cloudera Solution with Pentaho



                      Cost-effectively managing the volume, velocity and
                                                           variety of data



                   Derive value across structured and unstructured data




                   Rapidly adapt to context changes and integrating new
                                                 data sources and types




29   Confidential                                              Big Data Solutions
Q&A


     Ian Fyfe, Pentaho




30   Confidential        Big Data Solutions
Start getting big insights

     Jonathan Seidman, Cloudera
     jseidman@cloudera.com
     www.cloudera.com

     Ian Fyfe, Pentaho
     ifyfe@pentaho.com
     www.pentaho.com

     Jeff Stacey, Dell
     Hadoop@dell.com
     www.dell.com/hadoop



31   Confidential                 Big Data Solutions

Webinar | Using Hadoop Analytics to Gain a Big Data Advantage

  • 1.
    Using Hadoop Analyticsto Gain a Big Data Advantage Jonathan Seidman, Solution Architect, Cloudera Ian Fyfe, VP Product Marketing, Pentaho Jeff Stacey, Director of GTM Strategy, Channel & Sales Development, Dell
  • 2.
    Why big data matters to your business Jonathan Seidman, Cloudera 2 Confidential Big Data Solutions 2
  • 3.
    Explosive Data Growth 10,000 GIGABYTES OF DATA CREATED (IN BILLIONS) 1.8 trillion gigabytes of data was created in 2011… • More than 90% is unstructured data • Approx. 500 quadrillion files 5,000 • Quantity doubles every 2 years 0 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATA Source: IDC 2011 3 Confidential Big Data Solutions
  • 4.
    The ‗Big Data‘Phenomenon Big Data Drivers More Content More Devices • The proliferation of data capture and creation technologies • Increased ―interconnectedness‖ drives consumption (creating more data) More New & Better Consumption Information • Inexpensive storage makes it possible to keep more, longer • Innovative software and analysis tools turn data into information • Every gigabyte of stored content can generate Big Data encompasses not a petabyte or more of transient data* only the content itself, but how it’s consumed • The information about you is much greater than the information you create *Source: IDC 2011 4 Confidential Big Data Solutions
  • 5.
    The Opportunity: Quicklygain a competitive advantage Use Cases • Big opportunity to drive • Ecommerce – Predict revenue, e.g. customer behavior across – Predict customer behavior all channels to drive across all channels (Web revenue site, social media, email, etc.) • E-gaming – understand – Understand and monetize and better monetize customer behavior customer behavior – Predict customer churn • Networks – predict failure, neutralize attacks to reduce • Big opportunity to reduce costs costs, e.g. • Customers – predict churn, – Networks – predict optimize revenue failure, neutralize attacks • Machines/sensors – – Machines/sensors – predict predict failures, reduce failures costs – Financial risk management – • Financial risk reduce fraud, increase security management – reduce fraud, increase security 5 Confidential Big Data Solutions
  • 6.
    Big data challenges Ian Fyfe, Pentaho 6 Confidential Big Data Solutions 6
  • 7.
    Big Data Challenges Cost-effectively managing the volume, velocity and variety of data Deriving value across structured and unstructured data Adapting to context changes and integrating new data sources and types 7 Confidential Big Data Solutions
  • 8.
    The Current Solutions 10,000 GIGABYTES OF DATA CREATED (IN BILLIONS) Current Database Solutions are designed for structured data. • Optimized to answer known questions quickly 5,000 • Schemas dictate form/context • Difficult to adapt to new data types and new questions • Expensive at petabyte scale 0 10% 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATA 8 Confidential Big Data Solutions
  • 9.
    Common Data AnalyticsArchitecture Offline data can‘t be analyzed easily TAPE ARCHIVE Can‘t explore original BI REPORTS & high fidelity data INTERACTIVE APPS STORAGE ONLY RDBMS GRID ETL COMPUTE GRID (AGGREGATED (ORIGINAL RAW DATA) DATA) Moving data to compute doesn‘t scale DATA COLLECTION DATA SOURCES 9 Confidential Big Data Solutions
  • 10.
    Leveraging big data for competitive advantage All 10 Confidential Big Data Solutions
  • 11.
    Success With Hadoop 11 Confidential Big Data Solutions
  • 12.
    Big Data Analyticsat TravelTainment Multi-channel distribution platform for the travel industry Pentaho Business Analytics fits perfectly into our open source Big Data environment.‖ -- Ibrahim Husseini, Director of Data Warehouse, TravelTainment • Business challenge: Inefficient and time consuming reporting capabilities on big data sets with legacy system. Benefits Why Pentaho • Ability to visualize its very large data volumes for reporting and analysis in such a way that non-technical users can also easily • Capability to analyze data from Hadoop understand them and Hive • Professional support for in-depth • Can now run complex reports three times faster and with more analysis flexibility than before • Self-service analysis and reporting for • For the first time can offer clients user-friendly, self-service and business customers ad-hoc reporting services helping IT focus on their main business and not serve as support desk for reporting • Cost effective solution 12 Confidential Big Data Solutions 1
  • 13.
    Dell uncovers newinsights and reduces IT costs by US$35 million with a business intelligence solution designed for big data Accelerated customer shipment time by 33 percent Dell Saved US$2 million by improving product quality Business Integrated data silos Intelligence Practice Reduced IT costs by US$35 million Increased agility 13 Confidential Big Data Solutions
  • 14.
    SecureWorks slashes the cost of storage Organization with Dell | Cloudera SecureWorks is a true security partner to help protect your IT assets, comply Solution with regulations and reduce costs — without having to build your internal security expertise from scratch. Challenge SecureWorks needed a highly scalable solution for collecting, processing, and analyzing massive amounts of data collected from customer environments. “Our storage cost per gigabyte is 23 cents. Solution We thought we had The organization deployed the Dell™ | Cloudera® Solution with Cloudera‘s great economics distribution of Apache® Hadoop® software, Dell-developed Crowbar software previously when we framework, PowerEdge™ C2100 servers, Force10 switches, Dell and were spending about Cloudera services in a solution based on a Dell reference architecture. seventeen dollars per gigabyte.” Benefits • Reduced the cost of data storage to 23 cents per/gigabyte • Gained easy scalability for future growth Robert Scudiere, Director of Engineering, Dell SecureWorks • Leveraged open source software and commodity hardware to reduce time to market • Maintain high availability for critical services and flexibility to analyze structured and unstructured data Read the case study Watch the case study video 14 Confidential Big Data Solutions
  • 15.
    Must-haves of an effective big data solution Jeff Stacey, Dell 16, then 24 to close & Jonathan Seidman, Cloudera 15 Confidential Big Data Solutions 1
  • 16.
    Big Data SolutionRequirements Cost-effectively manage the volume, variety and velocity of data Process and analyze large, complex data sets…quickly Flexibly adapt to context changes and new data types 16 Confidential Big Data Solutions
  • 17.
    Why was Hadoopcreated? Dramatic changes in Exploding data volumes & types LEADS TO enterprise data management With Hadoop, you can… • Extract more value DIGITAL CONTENT • From more data • More cost effectively NEW • With greater flexibility OPERATIONAL OPPORTUNITY WEB DATA LOGS SOCIAL MEDIA • Deep analysis FILES SMART GRIDS • Exhaustive and detailed HARD • Sophisticated algorithms PROBLEMS • Quick results TRANSACTIONAL DATA AD IMPRESSIONS • Any kind R&D • From any source DATA • Structured and unstructured BIG DATA • At scale It’s difficult to handle data this diverse at this scale. Traditional platforms can’t keep pace. 17 Confidential Big Data Solutions
  • 18.
    What is ApacheHadoop? CORE HADOOP COMPONENTS Hadoop is a platform for data storage and processing that is… Hadoop Distributed File MapReduce  Scalable System (HDFS)  Fault tolerant File sharing and data Distributed computing  Open source protection across physical across physical servers servers Consolidates Excels at Scales everything complex analysis economically • Scale-out architecture divides • Can be deployed on • A single repository for storing workloads across multiple commodity hardware and mining any type of data nodes • Not bound by a single schema • Open source platform • Flexible file system eliminates guards against vendor ETL bottlenecks lock 18 Confidential Big Data Solutions
  • 19.
    Core Hadoop: HDFS Self-healing,high bandwidth CLUSTERED STORAGE 1 2 HDFS 3 2 1 1 1 2 4 4 3 5 3 3 5 5 4 2 4 5 HDFS breaks incoming files into blocks and stores them redundantly across the cluster 19 Confidential Big Data Solutions
  • 20.
    Core Hadoop: MapReduce Frameworkfor DISTRIBUTED COMPUTING 1 2 MR 3 2 1 1 1 2 4 4 3 5 3 3 5 5 4 2 4 5 Processes many jobs in parallel across many nodes and combines the results 20 Confidential Big Data Solutions
  • 21.
    Major Hadoop Utilities Apache Pig High-level language for expressing data Apache Hive analysis programs Apache HBase SQL-like language and metadata repository The Hadoop database. Random, real -time read/write access Hue Apache Zookeeper Browser-based desktop interface for Highly reliable interacting with distributed Hadoop coordination service Oozie Flume Server-based workflow engine for Distributed service for Hadoop activities collecting and aggregating log and event data Sqoop Apache Whirr Integrating Hadoop with RDBMS Library for running Hadoop in the cloud 21 Confidential Big Data Solutions
  • 22.
    Hadoop in Production 22 Confidential Big Data Solutions
  • 23.
    The unrivaled leaderin Hadoop • Worldwide #1 distribution of Apache Hadoop • 100% Open-Source Hadoop Distribution • Largest contributor to the open source Hadoop ecosystem – Project founders from 8 of the 13 leading Apache Projects • Cloudera has more Apache committers on staff than any other company • More than 100 enterprise & public sector customers across a wide variety of industries 23 Confidential Big Data Solutions
  • 24.
    Dell | ClouderaSolution with Pentaho Dell Value  Business intelligence practice  Open & scalable infrastructure  Certified and tested platforms  Active community participation  Crowbar deployment tool  Reference Architecture  Deployment Guide & Services  Joint support with Cloudera  Actual customers 24 Confidential Big Data Solutions
  • 25.
    Industry first: PowerEdgeC8000 Mix and match for the ultimate performance in a dense 4U package • Speed up your most resource-intensive workloads by mixing and matching compute, storage and/or GPU nodes in the same 4U shared infrastructure chassis • Get the cores, memory and I/O expansion you need for peak workload performance Great for: Big Data, Web 2.0/Hosting, HPC Get faster results with Mix & Match Do more with less more compute power • Mix compute, storage and GPUs in the same 4U • Intel Xeon ES-2600 • Shared infrastructure chassis processors boost reduces power & cooling performance by 80% costs by ~20% • More workload flexibility, HD & I/O options than the HP • Up to 135W support • Refresh with the latest components without having SL6500 or Super Micro • 2x the I/O bandwidth with 6047R to replace the entire chassis PCI Express Gen3 25 Confidential Big Data Solutions
  • 26.
    • Visual designfor Hadoop • Reduces skills requirements • Deep integration with Hadoop – HDFS, MapReduce, Sqoop, Oo zie – Runs as MapReduce in-Hadoop Reporting & Data Discovery Predictive • Easily connects Hadoop to Dashboards Visualization Analytics other enterprise data sources • Broadens Hadoop use to data analysts, business users and IT Data Ingestion, Man ipulation, Integ ration, Workflo w 26 Confidential Big Data Solutions
  • 27.
    Fast Visual Developmentfor Hadoop Ingestion / Manipulation / Integration Scheduling Modeling 27 Confidential Big Data Solutions 2
  • 28.
    Discovery > Proofof Value > Deployment 28 Confidential Big Data Solutions
  • 29.
    Summary Dell | Cloudera Solution with Pentaho  Cost-effectively managing the volume, velocity and variety of data  Derive value across structured and unstructured data  Rapidly adapt to context changes and integrating new data sources and types 29 Confidential Big Data Solutions
  • 30.
    Q&A Ian Fyfe, Pentaho 30 Confidential Big Data Solutions
  • 31.
    Start getting biginsights Jonathan Seidman, Cloudera jseidman@cloudera.com www.cloudera.com Ian Fyfe, Pentaho ifyfe@pentaho.com www.pentaho.com Jeff Stacey, Dell Hadoop@dell.com www.dell.com/hadoop 31 Confidential Big Data Solutions

Editor's Notes

  • #6 Come up with something new – to the point what they are looking for.Start with some stories. How real firms used our products, had a problem, solved it.Shows how they can’t solve problem with the tools they have.
  • #8 Business users asking more sophisticated questionsExplore data in more detailCombine a variety of dataExtract actionable information and insight from it quicklyTraditional “big data” solutionsExtremely expensiveand/orNot enough detail
  • #13 TravelTainment, a provider of multi-channel distribution platforms for the travel industry, is using Pentaho Business Analytics for self-service analytics and reporting in a Big Data environment. With the continually booming online travel market, TravelTainment’s different clients required more insight into its data to help them plan promotions and other services. Before Pentaho, the company had acquired a set of legacy systems that had grown around individual products with limited reporting capabilities. As a result, reporting was inefficient and time consuming for IT. When TravelTainment decided to standardize on a single customer-focused reporting application, it chose Pentaho Business Analytics for the solution’s self-service reporting and ability to manage Big Data sets. Pentaho Reporting enables TravelTainment to run reports three times faster and with more flexibility than before. TravelTainment can now, for the first time, offer its clients user-friendly, self-service and ad-hoc reporting services. This also means that TravelTainment’s developer team can now fully concentrate on its main business, rather than having to serve as a support desk for reporting. With the success of this implementation, TravelTainment now plans to evaluate using Pentaho Data Integration (PDI) to move its data in and out of Hadoop.
  • #14 http://content.dell.com/us/en/enterprise/d/corporate~case-studies~en/Documents~2011-dell-bi-11003262.pdf.aspxBusiness needWith explosive data growth and the proliferation of data silos, Dell spent millions on data management without monetizing information. It needed to integrate enterprise data to improve information accuracy, cut costs, and uncover actionable insights. SolutionDell Enterprise Business Intelligence (EBI) consultants helped design and deploy an integrated, global enterprise data warehouse solution, combining Teradata, Informatica, and other BI software with new and existing Dell infrastructure components.Benefits• Accelerated customer shipment time by 33 percent and decreased the shipment backlog • Saved US$2 million by improving product quality and avoiding component replacements• Integrated data silos, offering an enterprise-wide view of information while reducing IT costs by US$35 million• Increased agility by providing information workers with self-service capabilities for accessing certified global data
  • #26 Introducing the four products that make up the PowerEdge C8000 series:The PowerEdge C8000 4U shared infrastructure chassisThe PowerEdge C8220 single-wide compute sledThe PowerEdge C8220x double-wide GPU sledThe PowerEdge C8000x double-wide storage sled The PowerEdge C8000 chassis holds up to 8 single-wide compute sleds or 4double-wide compute sleds. Each compute sled is equivalent to a standard server built with a processor(s), memory, network interface, baseboard management controller, and local hard drive storage. The C8000 will only be the only 4U Shared Infrastructure on the market that gives customers compute, GPU, and storage options in one chassis with the ability for internal or external power. Zeus delivers the greatest amount configuration flexibility and front-side serviceability. Zeus’ flexibility allows customers to standardize on a single architecture. By using the same common chassis design for a variety of configurations, the PowerEdge C 8000 series can be scaled out, just like a versatile Lego block.  The advantages of Zeus:By using the same basic building block over and over again, our customers can get the performance they need, with less deployment and maintenance time needed. This efficient use of IT resources plus the shared infrastructure savings help lower the total cost of ownership. Technology refresh cycles can be staggered to further reduce the total cost of ownership over several years.
  • #29 Emphasize results they can achieve! Go back to customer case studies.