Webinar | Using Hadoop Analytics to Gain a Big Data Advantage


Published on

Learn about:
Why big data matters to your business: realize revenue, increase customer loyalty, and pinpoint effective strategies
The business and technical challenges of big data solutions
How to leverage big data for competitive advantage
The “must haves” of an effective big data solution
Real-world examples of Cloudera, Pentaho and Dell big data solutions in action

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Come up with something new – to the point what they are looking for.Start with some stories. How real firms used our products, had a problem, solved it.Shows how they can’t solve problem with the tools they have.
  • Business users asking more sophisticated questionsExplore data in more detailCombine a variety of dataExtract actionable information and insight from it quicklyTraditional “big data” solutionsExtremely expensiveand/orNot enough detail
  • TravelTainment, a provider of multi-channel distribution platforms for the travel industry, is using Pentaho Business Analytics for self-service analytics and reporting in a Big Data environment. With the continually booming online travel market, TravelTainment’s different clients required more insight into its data to help them plan promotions and other services. Before Pentaho, the company had acquired a set of legacy systems that had grown around individual products with limited reporting capabilities. As a result, reporting was inefficient and time consuming for IT. When TravelTainment decided to standardize on a single customer-focused reporting application, it chose Pentaho Business Analytics for the solution’s self-service reporting and ability to manage Big Data sets. Pentaho Reporting enables TravelTainment to run reports three times faster and with more flexibility than before. TravelTainment can now, for the first time, offer its clients user-friendly, self-service and ad-hoc reporting services. This also means that TravelTainment’s developer team can now fully concentrate on its main business, rather than having to serve as a support desk for reporting. With the success of this implementation, TravelTainment now plans to evaluate using Pentaho Data Integration (PDI) to move its data in and out of Hadoop.
  • http://content.dell.com/us/en/enterprise/d/corporate~case-studies~en/Documents~2011-dell-bi-11003262.pdf.aspxBusiness needWith explosive data growth and the proliferation of data silos, Dell spent millions on data management without monetizing information. It needed to integrate enterprise data to improve information accuracy, cut costs, and uncover actionable insights. SolutionDell Enterprise Business Intelligence (EBI) consultants helped design and deploy an integrated, global enterprise data warehouse solution, combining Teradata, Informatica, and other BI software with new and existing Dell infrastructure components.Benefits• Accelerated customer shipment time by 33 percent and decreased the shipment backlog • Saved US$2 million by improving product quality and avoiding component replacements• Integrated data silos, offering an enterprise-wide view of information while reducing IT costs by US$35 million• Increased agility by providing information workers with self-service capabilities for accessing certified global data
  • Introducing the four products that make up the PowerEdge C8000 series:The PowerEdge C8000 4U shared infrastructure chassisThe PowerEdge C8220 single-wide compute sledThe PowerEdge C8220x double-wide GPU sledThe PowerEdge C8000x double-wide storage sled The PowerEdge C8000 chassis holds up to 8 single-wide compute sleds or 4double-wide compute sleds. Each compute sled is equivalent to a standard server built with a processor(s), memory, network interface, baseboard management controller, and local hard drive storage. The C8000 will only be the only 4U Shared Infrastructure on the market that gives customers compute, GPU, and storage options in one chassis with the ability for internal or external power. Zeus delivers the greatest amount configuration flexibility and front-side serviceability. Zeus’ flexibility allows customers to standardize on a single architecture. By using the same common chassis design for a variety of configurations, the PowerEdge C 8000 series can be scaled out, just like a versatile Lego block.  The advantages of Zeus:By using the same basic building block over and over again, our customers can get the performance they need, with less deployment and maintenance time needed. This efficient use of IT resources plus the shared infrastructure savings help lower the total cost of ownership. Technology refresh cycles can be staggered to further reduce the total cost of ownership over several years.
  • Emphasize results they can achieve! Go back to customer case studies.
  • Webinar | Using Hadoop Analytics to Gain a Big Data Advantage

    1. 1. Using Hadoop Analytics toGain a Big Data AdvantageJonathan Seidman, Solution Architect, ClouderaIan Fyfe, VP Product Marketing, PentahoJeff Stacey, Director of GTM Strategy, Channel & Sales Development, Dell
    2. 2. Why big data matters to your business Jonathan Seidman, Cloudera2 Confidential Big Data Solutions 2
    3. 3. Explosive Data Growth 10,000 GIGABYTES OF DATA CREATED (IN BILLIONS) 1.8 trillion gigabytes of data was created in 2011… • More than 90% is unstructured data • Approx. 500 quadrillion files 5,000 • Quantity doubles every 2 years 0 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATASource: IDC 2011 3 Confidential Big Data Solutions
    4. 4. The ‗Big Data‘ Phenomenon Big Data Drivers More Content More Devices • The proliferation of data capture and creation technologies • Increased ―interconnectedness‖ drives consumption (creating more data) More New & Better Consumption Information • Inexpensive storage makes it possible to keep more, longer • Innovative software and analysis tools turn data into information • Every gigabyte of stored content can generate Big Data encompasses not a petabyte or more of transient data* only the content itself, but how it’s consumed • The information about you is much greater than the information you create*Source: IDC 2011 4 Confidential Big Data Solutions
    5. 5. The Opportunity: Quickly gain a competitiveadvantage Use Cases • Big opportunity to drive • Ecommerce – Predict revenue, e.g. customer behavior across – Predict customer behavior all channels to drive across all channels (Web revenue site, social media, email, etc.) • E-gaming – understand – Understand and monetize and better monetize customer behavior customer behavior – Predict customer churn • Networks – predict failure, neutralize attacks to reduce • Big opportunity to reduce costs costs, e.g. • Customers – predict churn, – Networks – predict optimize revenue failure, neutralize attacks • Machines/sensors – – Machines/sensors – predict predict failures, reduce failures costs – Financial risk management – • Financial risk reduce fraud, increase security management – reduce fraud, increase security5 Confidential Big Data Solutions
    6. 6. Big data challenges Ian Fyfe, Pentaho6 Confidential Big Data Solutions 6
    7. 7. Big Data Challenges Cost-effectively managing the volume, velocity and variety of data Deriving value across structured and unstructured data Adapting to context changes and integrating new data sources and types7 Confidential Big Data Solutions
    8. 8. The Current Solutions 10,000GIGABYTES OF DATA CREATED (IN BILLIONS) Current Database Solutions are designed for structured data. • Optimized to answer known questions quickly 5,000 • Schemas dictate form/context • Difficult to adapt to new data types and new questions • Expensive at petabyte scale 0 10% 2005 2010 2015 STRUCTURED DATA UNSTRUCTURED DATA 8 Confidential Big Data Solutions
    9. 9. Common Data Analytics Architecture Offline data can‘t be analyzed easily TAPE ARCHIVE Can‘t explore original BI REPORTS & high fidelity data INTERACTIVE APPS STORAGE ONLY RDBMS GRID ETL COMPUTE GRID (AGGREGATED(ORIGINAL RAW DATA) DATA) Moving data to compute doesn‘t scale DATA COLLECTION DATA SOURCES9 Confidential Big Data Solutions
    10. 10. Leveraging big data for competitive advantage All10 Confidential Big Data Solutions
    11. 11. Success With Hadoop11 Confidential Big Data Solutions
    12. 12. Big Data Analytics at TravelTainment Multi-channel distribution platform for the travel industry Pentaho Business Analytics fits perfectly into our open source Big Data environment.‖ -- Ibrahim Husseini, Director of Data Warehouse, TravelTainment• Business challenge: Inefficient and time consuming reporting capabilities on big data sets with legacy system. Benefits Why Pentaho • Ability to visualize its very large data volumes for reporting and analysis in such a way that non-technical users can also easily • Capability to analyze data from Hadoop understand them and Hive • Professional support for in-depth • Can now run complex reports three times faster and with more analysis flexibility than before • Self-service analysis and reporting for • For the first time can offer clients user-friendly, self-service and business customers ad-hoc reporting services helping IT focus on their main business and not serve as support desk for reporting • Cost effective solution 12 Confidential Big Data Solutions 1
    13. 13. Dell uncovers new insights and reduces IT costs by US$35 million with abusiness intelligence solution designed for big data Accelerated customer shipment time by 33 percentDell Saved US$2 million by improving product qualityBusiness Integrated data silosIntelligencePractice Reduced IT costs by US$35 million Increased agility 13 Confidential Big Data Solutions
    14. 14. SecureWorks slashes the cost of storage Organization with Dell | Cloudera SecureWorks is a true security partner to help protect your IT assets, comply Solution with regulations and reduce costs — without having to build your internal security expertise from scratch.ChallengeSecureWorks needed a highly scalable solution forcollecting, processing, and analyzing massive amounts of data collected fromcustomer environments. “Our storage cost per gigabyte is 23 cents.Solution We thought we hadThe organization deployed the Dell™ | Cloudera® Solution with Cloudera‘s great economicsdistribution of Apache® Hadoop® software, Dell-developed Crowbar software previously when weframework, PowerEdge™ C2100 servers, Force10 switches, Dell and were spending aboutCloudera services in a solution based on a Dell reference architecture. seventeen dollars per gigabyte.”Benefits• Reduced the cost of data storage to 23 cents per/gigabyte• Gained easy scalability for future growth Robert Scudiere, Director of Engineering, Dell SecureWorks• Leveraged open source software and commodity hardware to reduce time to market• Maintain high availability for critical services and flexibility to analyze structured and unstructured data Read the case study Watch the case study video14 Confidential Big Data Solutions
    15. 15. Must-haves of an effective big data solution Jeff Stacey, Dell 16, then 24 to close & Jonathan Seidman, Cloudera15 Confidential Big Data Solutions 1
    16. 16. Big Data Solution Requirements Cost-effectively manage the volume, variety and velocity of data Process and analyze large, complex data sets…quickly Flexibly adapt to context changes and new data types16 Confidential Big Data Solutions
    17. 17. Why was Hadoop created? Dramatic changes inExploding data volumes & types LEADS TO enterprise data management With Hadoop, you can… • Extract more value DIGITAL CONTENT • From more data • More cost effectively NEW • With greater flexibility OPERATIONAL OPPORTUNITY WEB DATA LOGS SOCIAL MEDIA • Deep analysis FILES SMART GRIDS • Exhaustive and detailed HARD • Sophisticated algorithms PROBLEMS • Quick results TRANSACTIONAL DATA AD IMPRESSIONS • Any kind R&D • From any source DATA • Structured and unstructured BIG DATA • At scale It’s difficult to handle data this diverse at this scale. Traditional platforms can’t keep pace.17 Confidential Big Data Solutions
    18. 18. What is Apache Hadoop? CORE HADOOP COMPONENTSHadoop is a platform for datastorage and processing that is… Hadoop Distributed File MapReduce  Scalable System (HDFS)  Fault tolerant File sharing and data Distributed computing  Open source protection across physical across physical servers servers Consolidates Excels at Scales everything complex analysis economically • Scale-out architecture divides • Can be deployed on • A single repository for storing workloads across multiple commodity hardware and mining any type of data nodes • Not bound by a single schema • Open source platform • Flexible file system eliminates guards against vendor ETL bottlenecks lock18 Confidential Big Data Solutions
    19. 19. Core Hadoop: HDFSSelf-healing, high bandwidth CLUSTERED STORAGE 1 2 HDFS 3 2 1 1 1 2 4 4 3 5 3 3 5 5 4 2 4 5 HDFS breaks incoming files into blocks and stores them redundantly across the cluster19 Confidential Big Data Solutions
    20. 20. Core Hadoop: MapReduceFramework for DISTRIBUTED COMPUTING 1 2 MR 3 2 1 1 1 2 4 4 3 5 3 3 5 5 4 2 4 5 Processes many jobs in parallel across many nodes and combines the results20 Confidential Big Data Solutions
    21. 21. Major Hadoop Utilities Apache Pig High-level language for expressing data Apache Hive analysis programs Apache HBase SQL-like language and metadata repository The Hadoop database. Random, real -time read/write access Hue Apache Zookeeper Browser-based desktop interface for Highly reliable interacting with distributed Hadoop coordination service Oozie Flume Server-based workflow engine for Distributed service for Hadoop activities collecting and aggregating log and event data Sqoop Apache Whirr Integrating Hadoop with RDBMS Library for running Hadoop in the cloud21 Confidential Big Data Solutions
    22. 22. Hadoop in Production22 Confidential Big Data Solutions
    23. 23. The unrivaled leader in Hadoop• Worldwide #1 distribution of Apache Hadoop• 100% Open-Source Hadoop Distribution• Largest contributor to the open source Hadoop ecosystem – Project founders from 8 of the 13 leading Apache Projects• Cloudera has more Apache committers on staff than any other company• More than 100 enterprise & public sector customers across a wide variety of industries23 Confidential Big Data Solutions
    24. 24. Dell | Cloudera Solution with Pentaho Dell Value  Business intelligence practice  Open & scalable infrastructure  Certified and tested platforms  Active community participation  Crowbar deployment tool  Reference Architecture  Deployment Guide & Services  Joint support with Cloudera  Actual customers24 Confidential Big Data Solutions
    25. 25. Industry first: PowerEdge C8000 Mix and match for the ultimate performance in a dense 4U package • Speed up your most resource-intensive workloads by mixing and matching compute, storage and/or GPU nodes in the same 4U shared infrastructure chassis • Get the cores, memory and I/O expansion you need for peak workload performance Great for: Big Data, Web 2.0/Hosting, HPC Get faster results with Mix & Match Do more with less more compute power • Mix compute, storage and GPUs in the same 4U • Intel Xeon ES-2600 • Shared infrastructure chassis processors boost reduces power & cooling performance by 80% costs by ~20% • More workload flexibility, HD & I/O options than the HP • Up to 135W support • Refresh with the latest components without having SL6500 or Super Micro • 2x the I/O bandwidth with 6047R to replace the entire chassis PCI Express Gen325 Confidential Big Data Solutions
    26. 26. • Visual design for Hadoop• Reduces skills requirements• Deep integration with Hadoop – HDFS, MapReduce, Sqoop, Oo zie – Runs as MapReduce in-Hadoop Reporting & Data Discovery Predictive• Easily connects Hadoop to Dashboards Visualization Analytics other enterprise data sources• Broadens Hadoop use to data analysts, business users and IT Data Ingestion, Man ipulation, Integ ration, Workflo w 26 Confidential Big Data Solutions
    27. 27. Fast Visual Development for Hadoop Ingestion / Manipulation / Integration Scheduling Modeling27 Confidential Big Data Solutions 2
    28. 28. Discovery > Proof of Value > Deployment28 Confidential Big Data Solutions
    29. 29. Summary Dell | Cloudera Solution with Pentaho Cost-effectively managing the volume, velocity and variety of data Derive value across structured and unstructured data Rapidly adapt to context changes and integrating new data sources and types29 Confidential Big Data Solutions
    30. 30. Q&A Ian Fyfe, Pentaho30 Confidential Big Data Solutions
    31. 31. Start getting big insights Jonathan Seidman, Cloudera jseidman@cloudera.com www.cloudera.com Ian Fyfe, Pentaho ifyfe@pentaho.com www.pentaho.com Jeff Stacey, Dell Hadoop@dell.com www.dell.com/hadoop31 Confidential Big Data Solutions