• Save
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloudera Enterprise
 

Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloudera Enterprise

on

  • 2,375 views

This session will discuss what's new in the recently released CDH3 and Enterprise 3.5 products. We'll review how usage of Hadoop has evolving in the enterprise and how CDH3 and Enterprise 3.5 meet ...

This session will discuss what's new in the recently released CDH3 and Enterprise 3.5 products. We'll review how usage of Hadoop has evolving in the enterprise and how CDH3 and Enterprise 3.5 meet these new challenges with advances in functionality, performance, security and manageability.

Statistics

Views

Total Views
2,375
Slideshare-icon Views on SlideShare
1,955
Embed Views
420

Actions

Likes
3
Downloads
0
Comments
0

4 Embeds 420

http://www.cloudera.com 415
http://lanyrd.com 3
http://test.cloudera.com 1
http://blog.cloudera.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloudera Enterprise Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloudera Enterprise Presentation Transcript

    • Cloudera’s Distribution Including Apache Hadoop & Cloudera Enterprise
    • Who has big data? Everyone!
      Web
      • Social network analysis
      • Clickstreamsessionization
      Media
      • Content optimization
      • Clickstreamsessionization
      Advanced Analytics
      Telco
      • Network analytics
      • Mediation
      Retail
      • Loyalty & promotions analysis
      • Data factory
      Data Processing
      Financial
      • Fraud analysis
      • Trade reconciliation
      Federal
      Biopharma
      • Entity analysis
      • SIGINT
      • Sequence analysis
      • Annotation
    • When they started to get big data, what did Google build?
      Dremel
      Evenflow
      Evenflow
      Dremel
      Sawzall
      Bigtable
      MySQL
      Gateway
      MapReduce / GFS
      Chubby
    • When they started to get big data, what did Google build?
      Store data
      Dremel
      Evenflow
      Evenflow
      Dremel
      Sawzall
      Bigtable
      MySQL
      Gateway
      MapReduce / GFS
      Chubby
    • When they started to get big data, what did Google build?
      Process data
      Dremel
      Evenflow
      Evenflow
      Dremel
      Sawzall
      Bigtable
      MySQL
      Gateway
      MapReduce / GFS
      Chubby
    • When they started to get big data, what did Google build?
      Ingest data
      Dremel
      Evenflow
      Evenflow
      Dremel
      Sawzall
      Bigtable
      MySQL
      Gateway
      MapReduce / GFS
      Chubby
    • When they started to get big data, what did Google build?
      Serve data
      Dremel
      Evenflow
      Evenflow
      Dremel
      Sawzall
      Bigtable
      MySQL
      Gateway
      MapReduce / GFS
      Chubby
    • When they started to get big data, what did Google build?
      High level domain specific language
      Dremel
      Evenflow
      Evenflow
      Dremel
      Sawzall
      Bigtable
      MySQL
      Gateway
      MapReduce / GFS
      Chubby
    • When they started to get big data, what did Google build?
      Chain together complex workloads
      Dremel
      Evenflow
      Evenflow
      Dremel
      Sawzall
      Bigtable
      MySQL
      Gateway
      MapReduce / GFS
      Chubby
    • When they started to get big data, what did Google build?
      Schedule them
      Dremel
      Evenflow
      Evenflow
      Dremel
      Sawzall
      Bigtable
      MySQL
      Gateway
      MapReduce / GFS
      Chubby
    • When they started to get big data, what did Google build?
      Columnar storage + metadata
      Dremel
      Evenflow
      Evenflow
      Dremel
      Sawzall
      Bigtable
      MySQL
      Gateway
      MapReduce / GFS
      Chubby
    • When they started to get big data, what did Google build?
      End users query data
      Dremel
      Evenflow
      Evenflow
      Dremel
      Sawzall
      Bigtable
      MySQL
      Gateway
      MapReduce / GFS
      Chubby
    • When they started to get big data, what did Google build?
      Coordinate within system
      Dremel
      Evenflow
      Evenflow
      Dremel
      Sawzall
      Bigtable
      MySQL
      Gateway
      MapReduce / GFS
      Chubby
    • The pattern repeats…
      HiPal
      Databee
      Databee
      Hive
      Hive
      HBase
      Scribe
      Zookeeper
    • The pattern repeats…
      Oozie
      Oozie
      Hive
      Pig & Hive
      HBase
      Data Highway
      Zookeeper
    • The pattern repeats…
      Azkaban
      Azkaban
      Pig
      Voldemort
      Sqoop
      Kafka
      Zookeeper
    • Formalized in CDH
      Cloudera’s Distribution Including Apache Hadoop
      Hue
      Hue
      Oozie
      Oozie
      Hive
      Hive / Pig
      HBase
      Sqoop
      Flume
      Zookeeper
    • Cloudera’s product strategy
      • Provide the reference distribution for the Apache Hadoop platform
      • Functionally complete
      • Performant and secure
      • Integrated & tested
      • Easy to trial & consume
      • 100% Apache licensed
      • Open to partners and the extended IT ecosystem
      • Provide a commercial solution to helps enterprises run Hadoop in production
      • Software & services
      • Increase transparency, consistency & reliability
      • Lower the cost & complexity of administration
      • Improved compliance to policies & processes
      Cloudera’s Distribution Including Apache Hadoop
      Cloudera Enterprise
    • Cloudera’s Distribution including Apache Hadoop (CDH) is among other things Apache Hadoop code
      • The only code Cloudera includes for MapReduce, HDFS and Hadoop Common is code committed to the Apache Hadoop project
      • Means no forking and conformance to an open standard
      • This is similarly the case with:
      • Apache Hive
      • Apache Hbase
      • Apache Pig
      • and so on…
    • CDH is: Apache Hadoop people
      * Source – Apache, Cloudera & Yahoo jira, Q4, 2010
    • CDH is something that works with the enterprise IT ecosystem
      Drivers, language enhancements, testing
      Sqoop frame-work, adapters
      More coming…
      Packaging, testing
    • CDH improves to make Apache Hadoop easier to run in trial or production
      1Q 2011
      4Q 2010
      • Known issues & limitations
      • Security guide
      • Certified integrations
      • Predictable updates
      • Integrated system
      • Installation guide
      • Availability of support
      • Packaging
      • Patching
      3Q 2010
      • Security guide
      • Certified integrations
      • Predictable updates
      • Integrated system
      • Installation guide
      • Availability of support
      • Packaging
      • Patching
      • Certified integrations
      • Predictable updates
      • Integrated system
      • Installation guide
      • Availability of support
      • Packaging
      • Patching
      2Q 2010
      • Integrated system
      • Installation guide
      • Availability of support
      • Packaging
      • Patching
      2009
      • Installation guide
      • Availability of support
      • Packaging
      • Patching
    • CDH3 is generally available!
      I/O performance improvements
      Job performance improvements
      Stability improvements
      Durability improvements
      Log data collection
      Database integration
      Web UI
      Authentication
      Indexing
      Expanded platform support – RHEL6, Suse11, Maven
      Scheduling
      Workflow
      Replication
      24
      Copyright 2011 Cloudera Inc. All rights reserved
    • Why Enterprise?
      Hadoop is a distributed system that presents unique operational challenges
      The fixed cost of managing internal patch & release infrastructure is prohibitive
      Hadoop skills & expertise are scarce
      Challenging to track consistently to community development efforts
      25
      Copyright 2011 Cloudera Inc. All rights reserved
    • Cloudera Enterprise
      • Reduces the risks of running Hadoop in production
      • Improves consistency, compliance and administrative overhead
      Management Suite
      • Authorization Manager
      • Activity Monitor (new)
      • Service Monitor
      • Resource Manager
      • Service & Configuration Manager (new)
      Cloudera Management Suite
      • Production support for CDH & certified integrations (Oracle, Netezza, Teradata, Greenplum, Aster Data)
      26
      Copyright 2011 Cloudera Inc. All rights reserved