• Save
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloudera Enterprise
 

Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloudera Enterprise

on

  • 2,426 views

This session will discuss what's new in the recently released CDH3 and Enterprise 3.5 products. We'll review how usage of Hadoop has evolving in the enterprise and how CDH3 and Enterprise 3.5 meet ...

This session will discuss what's new in the recently released CDH3 and Enterprise 3.5 products. We'll review how usage of Hadoop has evolving in the enterprise and how CDH3 and Enterprise 3.5 meet these new challenges with advances in functionality, performance, security and manageability.

Statistics

Views

Total Views
2,426
Views on SlideShare
2,000
Embed Views
426

Actions

Likes
3
Downloads
0
Comments
0

6 Embeds 426

http://www.cloudera.com 415
http://192.168.137.100 5
http://lanyrd.com 3
http://test.cloudera.com 1
http://blog.cloudera.com 1
http://dc1 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloudera Enterprise Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloudera Enterprise Presentation Transcript

  • Cloudera’s Distribution Including Apache Hadoop & Cloudera Enterprise
  • Who has big data? Everyone!
    Web
    • Social network analysis
    • Clickstreamsessionization
    Media
    • Content optimization
    • Clickstreamsessionization
    Advanced Analytics
    Telco
    • Network analytics
    • Mediation
    Retail
    • Loyalty & promotions analysis
    • Data factory
    Data Processing
    Financial
    • Fraud analysis
    • Trade reconciliation
    Federal
    Biopharma
    • Entity analysis
    • SIGINT
    • Sequence analysis
    • Annotation
  • When they started to get big data, what did Google build?
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • When they started to get big data, what did Google build?
    Store data
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • When they started to get big data, what did Google build?
    Process data
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • When they started to get big data, what did Google build?
    Ingest data
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • When they started to get big data, what did Google build?
    Serve data
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • When they started to get big data, what did Google build?
    High level domain specific language
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • When they started to get big data, what did Google build?
    Chain together complex workloads
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • When they started to get big data, what did Google build?
    Schedule them
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • When they started to get big data, what did Google build?
    Columnar storage + metadata
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • When they started to get big data, what did Google build?
    End users query data
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • When they started to get big data, what did Google build?
    Coordinate within system
    Dremel
    Evenflow
    Evenflow
    Dremel
    Sawzall
    Bigtable
    MySQL
    Gateway
    MapReduce / GFS
    Chubby
  • The pattern repeats…
    HiPal
    Databee
    Databee
    Hive
    Hive
    HBase
    Scribe
    Zookeeper
  • The pattern repeats…
    Oozie
    Oozie
    Hive
    Pig & Hive
    HBase
    Data Highway
    Zookeeper
  • The pattern repeats…
    Azkaban
    Azkaban
    Pig
    Voldemort
    Sqoop
    Kafka
    Zookeeper
  • Formalized in CDH
    Cloudera’s Distribution Including Apache Hadoop
    Hue
    Hue
    Oozie
    Oozie
    Hive
    Hive / Pig
    HBase
    Sqoop
    Flume
    Zookeeper
  • Cloudera’s product strategy
    • Provide the reference distribution for the Apache Hadoop platform
    • Functionally complete
    • Performant and secure
    • Integrated & tested
    • Easy to trial & consume
    • 100% Apache licensed
    • Open to partners and the extended IT ecosystem
    • Provide a commercial solution to helps enterprises run Hadoop in production
    • Software & services
    • Increase transparency, consistency & reliability
    • Lower the cost & complexity of administration
    • Improved compliance to policies & processes
    Cloudera’s Distribution Including Apache Hadoop
    Cloudera Enterprise
  • Cloudera’s Distribution including Apache Hadoop (CDH) is among other things Apache Hadoop code
    • The only code Cloudera includes for MapReduce, HDFS and Hadoop Common is code committed to the Apache Hadoop project
    • Means no forking and conformance to an open standard
    • This is similarly the case with:
    • Apache Hive
    • Apache Hbase
    • Apache Pig
    • and so on…
  • CDH is: Apache Hadoop people
    * Source – Apache, Cloudera & Yahoo jira, Q4, 2010
  • CDH is something that works with the enterprise IT ecosystem
    Drivers, language enhancements, testing
    Sqoop frame-work, adapters
    More coming…
    Packaging, testing
  • CDH improves to make Apache Hadoop easier to run in trial or production
    1Q 2011
    4Q 2010
    • Known issues & limitations
    • Security guide
    • Certified integrations
    • Predictable updates
    • Integrated system
    • Installation guide
    • Availability of support
    • Packaging
    • Patching
    3Q 2010
    • Security guide
    • Certified integrations
    • Predictable updates
    • Integrated system
    • Installation guide
    • Availability of support
    • Packaging
    • Patching
    • Certified integrations
    • Predictable updates
    • Integrated system
    • Installation guide
    • Availability of support
    • Packaging
    • Patching
    2Q 2010
    • Integrated system
    • Installation guide
    • Availability of support
    • Packaging
    • Patching
    2009
    • Installation guide
    • Availability of support
    • Packaging
    • Patching
  • CDH3 is generally available!
    I/O performance improvements
    Job performance improvements
    Stability improvements
    Durability improvements
    Log data collection
    Database integration
    Web UI
    Authentication
    Indexing
    Expanded platform support – RHEL6, Suse11, Maven
    Scheduling
    Workflow
    Replication
    24
    Copyright 2011 Cloudera Inc. All rights reserved
  • Why Enterprise?
    Hadoop is a distributed system that presents unique operational challenges
    The fixed cost of managing internal patch & release infrastructure is prohibitive
    Hadoop skills & expertise are scarce
    Challenging to track consistently to community development efforts
    25
    Copyright 2011 Cloudera Inc. All rights reserved
  • Cloudera Enterprise
    • Reduces the risks of running Hadoop in production
    • Improves consistency, compliance and administrative overhead
    Management Suite
    • Authorization Manager
    • Activity Monitor (new)
    • Service Monitor
    • Resource Manager
    • Service & Configuration Manager (new)
    Cloudera Management Suite
    • Production support for CDH & certified integrations (Oracle, Netezza, Teradata, Greenplum, Aster Data)
    26
    Copyright 2011 Cloudera Inc. All rights reserved