Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Webinar | From Zero to Big Data Answers in Less Than an Hour – Live Demo Slides


Published on

Slides describing Cloudera and Karmasphere, and how combined their products can install a Hadoop cluster, import data, run queries and generate results.

Slides describing Cloudera and Karmasphere, and how combined their products can install a Hadoop cluster, import data, run queries and generate results.

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. March 13, 2011From Zero to Big Data Answers in Less Thanan HourDaniel Templeton | Cloudera Manager, Partner Program AdoptionRichard Guth | Karmasphere Chief Marketing Officer
  • 2. The ‘Big Data’ Phenomenon Big Data Drivers: More Content More Devices  The proliferation of data capture and creation technologies  Increased “interconnectedness” drives consumption (creating more data) More New & Better Consumption Information  Inexpensive storage makes it possible to keep more, longer  Innovative software and analysis tools turn data into information  Every gigabyte of stored content can generate Big Data encompasses not a petabyte or more of transient data* only the content itself, but how it’s consumed.  The information about you is much greater than the information you create*Source: IDC 2011 2 ©2011 Cloudera, Inc. All Rights Reserved.
  • 3. What is Apache Hadoop? CORE HADOOP COMPONENTS Apache Hadoop is a platform for data storage and processing that is… Hadoop Distributed File  Scalable System (HDFS) MapReduce  Fault tolerant  Open source File Sharing & Data Protection Across Distributed Computing Across Physical Servers Physical Servers Has the Flexibility to Store Excels at Scales and Mine Any Type of Data Processing Complex Data Economically Ask questions across structured and  Scale-out architecture divides  Can be deployed on commodity unstructured data that were previously workloads across multiple nodes hardware impossible to ask or solve  Flexible file system eliminates ETL  Open source platform guards Not bound by a single schema bottlenecks against vendor lock 3 ©2011 Cloudera, Inc. All Rights Reserved.
  • 4. Who Is Cloudera?The trusted leader in We make Hadoop Unrivaled knowledge Strong executive Apache Hadoop. enterprise-easy. and experience. team with proven abilities. Package the #1  A distribution of Apache  Founders, committers and distribution of Apache Hadoop that is contributors to Apache Mike Olson Amr Awadallah Hadoop in commercial and tested, certified and Hadoop and related CEO VP, Engineering non-commercial supported projects Kirk Dunn Mary Rorabaugh environments COO VP, Finance  A suite of management  A wealth of experience in Jeff Charles Roadmap control or software for Hadoop the design and delivery of Hammerbacher Zedlewski Chief Scientist VP, Products influence over all Apache administrators enterprise software Doug Cutting Omer Trajman Hadoop-related projects Chief Architect VP, Customer  Training and certification Solutions Top contributor to the programs Apache ecosystem overall  Comprehensive support Tens of thousands of nodes and consulting services under management 4 ©2011 Cloudera, Inc. All Rights Reserved.
  • 5. CDH Overview The #1 commercial and non-commercial Apache Hadoop distribution. Complete, Integrated Hadoop Stack CDH Components  Apache Hadoop – reliable, scalable distributed computing File System Mount UI Framework SDK FUSE-DFS HUE HUE SDK  Apache Hive – SQL-like language and metadata repository  Apache Pig – High level language for expressing data analysis programs Workflow Scheduling Metadata APACHE OOZIE APACHE OOZIE APACHE HIVE  Apache HBase – Hadoop database for random, real-time read/write access  Apache Zookeeper – Highly reliable distributed coordination service Languages / Compilers Data APACHE PIG, APACHE HIVE Fast  Apache Flume* – Distributed service for collecting and aggregating Read/Write log and event data Integration Access  Apache Whirr* – Library for running Hadoop in the cloud APACHE FLUME, APACHE SQOOP APACHE HBASE  Apache Sqoop* – Integrating Hadoop with RDBMS  Apache Oozie* – Server-based workflow engine for Hadoop Activities Coordination APACHE ZOOKEEPER  Fuse-DFS – Module within Hadoop for mounting HDFS as a traditional file system  Hue – Browser-based desktop interface for interacting with Hadoop* Currently undergoing Incubation at the Apache Software Foundation. 5 ©2011 Cloudera, Inc. All Rights Reserved.
  • 6. Cloudera EnterpriseCloudera Enterprise makes CLOUDERA ENTERPRISE COMPONENTSopen source Hadoop enterprise-easy Simplify and Accelerate Hadoop Deployment Cloudera Production-Level Manager Support Reduce Adoption Costs and Risks Lower the Cost of Administration End-to-End Management Our Team of Experts On- Increase Transparency and Control Over Hadoop Application for Apache Call to Help You Meet Hadoop Your SLAs Leverage the Experience of Our Experts EFFECTIVENESS EFFICIENCY Ensuring You Enabling You to Get Value From Your Hadoop Deployment Affordably Run Hadoop in Production6 ©2011 Cloudera, Inc. All Rights Reserved.
  • 7. Big Data Intelligence Applications for Enterprise Data Professionals www.karmasphere.com7 © Karmasphere 2012 All rights reserved
  • 8. About Karmasphere Company Pure-play, singularly focused on Big Data Intelligence and Analytics on Hadoop and NoSQL, in the cloud and on-premise. Engineering Expertise Hadoop, analytics, web analytics, business intelligence, visualizations, programming languages, compilers, architecture, mathe matics, database Management Experience Google, Yahoo, Ask, Ning, Omniture, BEA, Oracle, Sybase, Actuate, Apple, Zend, Intel , BMC, Spotfire8 © Karmasphere 2012 All rights reserved
  • 9. Karmasphere Mission Provide an EASY way to find INSIGHTS in Big Data to transform business Upcoming Skills Shortage “By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions” “Big Data: the next frontier for innovation, competition and productivity” McKinsey, May 20119 © Karmasphere 2012 All rights reserved
  • 10. Karmasphere: Big Data Mining and Analytics ON Hadoop10 © Karmasphere 2012 All rights reserved
  • 11. From Zero to Answers in 60 Minutes DEMO Our Process Marketing Analyst for Retail Chain • Access any cloud or on- 1 Connect to the preconfigured premise Cloudera CDH Cloudera CDH cluster • Assemble and organize 2 Access our structured point of sale unstructured and transactions data and bring up structured data in transactional data for lunch meals Hadoop 3 Correlate results with unstructured • Analyze the data using social media data to get some insight familiar SQL on our buyers and buying behavior 4 Infer from these results on underperforming stores and come up with an action plan to increase sales for these stores11 © Karmasphere 2012 All rights reserved
  • 12. www.karmasphere.com12 © Karmasphere 2012 All rights reserved
  • 13. From Zero to Big Data Answers in Less Than an Hour The webinar recording will be made available shortly at: • Contact Information: • • 1 (888) 789-1488 • • 1 (650) 292-610013 © Karmasphere 2012 All rights reserved ©2011 Cloudera, Inc. All 13