Hadoop summit cloudera keynote_v5

Hadoop Grows Up!

Doug Cutting – Hadoop project founder
June 29th 2010

Hadoop is Almost 5 Years Old!

2

A Career…

Copyright 2010 Cloudera Inc. All rights reserved 4

An Ecosystem…


A Market…


An Emerging Platform for Applications…
Graph analysis Machine learning Scientific Archive Security
Query & reporting Complex ETL Search quality Fraud detection
Clickstream analysis POS analysis Trade compliance And more…


Hadoop Started From Humble Beginnings….

• MapReduce and HDFS only

• Good for experienced Java
programmers

• Limited application set


Innovation: the Secret to Hadoop Success

• Projects & components
develop around Hadoop
“Provide more levels of
“Provide common
abstraction & automation
technical services”
for job creation ”
• User base grows
“Make it “Cover more data
movements –
• More applications easier to get
data in & out” inserts, appends,
etc”
are made possible


But Innovation Isn’t Free

• For every release of MapReduce 20
and HDFS, there are >20
10
releases of related projects
0

• Every component has its own
schedule, versioning, HBase 0.89 HDFS 0.20
dependencies & patch Pig 0.7
requirements Hive 0.6 Oozie 2.0

• Hadoop community likes to
build 2-3 of everything

Announcing Cloudera’s Distribution for Hadoop v3
• Open source – 100% Apache licensed
• Simplified – Cloudera manages
required versions &
dependencies
• Integrated – all components
interoperate
• Reliable – patched with fixes
from future releases to
improve stability

• Easy to consume – Debian, RPM, tarball, Virtual Machine, EC2,
Rackspace, Softlayer


What’s New in CDH v3?
• Updates to existing Hadoop frameworks
• Pig 0.7
• Sqoop 1.0
• Hadoop 0.20S (planned)
• Support for 3 new related components
• HBase – with durability
• Zookeeper
• Oozie – run workflows + support for Hive & Sqoop actions
• Introducing 2 new components
• Flume – collect streaming data with centralized
configuration & guaranteed delivery
• Hue – web UI and SDK for Hadoop web applications

Charles Zedlewski,
Cloudera Product Management


Harnessing Hadoop Has Challenges

Skill Set – experts only

Complexity – more than ten components

Manageability – hard to configure, monitor & administer

Interoperability – limited support for DBMS &
analytic tools
14

Announcing Cloudera Enterprise
• Reduces the risks of running Hadoop in production
• Improves consistency, compliance and administrative overhead

Management tools
• Monitoring & config for
data integration
• Authorization mgmt &
provisioning
• Resource mgmt

• Production support for CDH & certified integrations (e.g. Oracle,
Vertica)

Demo


Some Announcements

• Party at our place
• Hackathon on CDH3 – applications, enhancements, open
source contributions
• July 27th, 9:30am – 7:30pm
• For invite: hackathon@cloudera.com
• Free food & snacks

• Or stay home and read
• Hadoop the Definitive Guide, second edition
• Available on October 12th at Hadoop World

Thank You!

• Stop by our table if you have questions!


Hadoop summit cloudera keynote_v5

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Hadoop summit cloudera keynote_v5

Similar to Hadoop summit cloudera keynote_v5 (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Recently uploaded

Recently uploaded (20)

Hadoop summit cloudera keynote_v5