Webinar: The Future of Hadoop

Webinar: The Future of Hadoop



With a community of over 500 contributors, Apache Hadoop and related projects are evolving at an ever increasing rate. Join the co-creator of Apache Hadoop, Doug Cutting, and Cloudera’s Chief ...

With a community of over 500 contributors, Apache Hadoop and related projects are evolving at an ever increasing rate. Join the co-creator of Apache Hadoop, Doug Cutting, and Cloudera’s Chief Scientist, Jeff Hammerbacher, for a discussion of the most exciting new features being developed by the Apache Hadoop community.



Total Views
Views on SlideShare
Embed Views



19 Embeds 7,610

http://www.techgig.com 4475
http://www.cloudera.com 1894
http://d.hatena.ne.jp 1111
http://siliconangle.com 72
http://twitter.com 22
http://us-w1.rockmelt.com 11
http://webcache.googleusercontent.com 6
http://www.moriwaki.net 3
http://test.cloudera.com 3
http://translate.googleusercontent.com 2
http://servicesangle.com 2
http://a0.twimg.com 2
http://cloudera.louddog.net 1
http://cloudera.matt.dev 1
https://twimg0-a.akamaihd.net 1 1
http://tweetedtimes.com 1
http://www.instacurate.com 1
http://blog.cloudera.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Hi,
    I am recruiting you any for universalisation, charismation, divinisation and presentation,
    Sorry, for this comment, i have commented on topic for recession, but then i went universal, pardon me .... !
    i am not doing too much, i am doing what i think it has to be done ....
    my solution for recession is universalisation, means evaluate all resourcess and assets of universe and then apply necessary sum of new currency (Zik=100$) to pay all debts and to buy off all taxes from national governments ....
    of course for this we need adequate entity, i see on horizon only myself as the secular and universal, legal and official The God, recognised by UN and with contracts with all national states governments,
    of course i invite you all to create a fresh new account at google, free, but with my data: universal identities names and universal residence, like this: Zababau Ganetros Cirimbo Ostangu zaqaqef@gmail.com ogiriny64256142, ( you can create this one but then inform me), access to account i have to have because this is divinising universalisation, but you can open it for all, i simply have to arrange it to adapt to paradigm, isn't it ......
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Webinar: The Future of Hadoop Webinar: The Future of Hadoop Presentation Transcript

  • The Future of Hadoop Doug Cutting | A Founder of Apache HadoopJeff Hammerbacher | Chief Scientist, Cloudera Welcome to the webinar! Audio/Telephone: +1 (215) 383-1016 Access Code: 421-634-457 Audio Pin: Shown after joining the Webinar Hadoop, Hbase, Pig, Hive, Bigtop, Avro, Flume & Whirr are trademark of the Apache Software Foundation
  • Housekeeping▪ All lines are on mute▪ Ask questions at any time using the Questions panel on GoToMeeting▪ Slides and recording will be available on www.cloudera.com/events ©2011 Cloudera, Inc. All Rights Reserved.
  • Presentation Outline▪ 1. Context▪ 2. Apache Bigtop▪ 3. Apache Hadoop Core▪ 4. Apache HBase, Hive, and Pig▪ 5. Other components▪ Questions and Discussion ©2011 Cloudera, Inc. All Rights Reserved. View slide
  • 1. Context View slide
  • ContextData▪ 1.8 ZB will be created and replicated in 2011 ▪ Up 9x in the last five years ▪ More than 90% of this data is unstructured ▪ Enterprises have some liability for 80% of this data ▪ Enterprises will spend $4T on managing data in 2011 ▪ Source: IDC Digital Universe Report 2011 ©2011 Cloudera, Inc. All Rights Reserved.
  • ContextHadoop▪ Apache Hadoop and related software are designed for this world▪ Volume ▪ Commodity hardware and open source software lowers cost and increases capacity▪ Velocity ▪ Data ingest speed aided by append-only and schema-on-read design▪ Variety ▪ Multiple tools to structure, process, and access data ©2011 Cloudera, Inc. All Rights Reserved.
  • ContextHadoop
  • ContextHDFS and MapReduce▪ Apache Hadoop = HDFS + MapReduce ▪ Similar to kernel of an operating system ▪ Referred to as “Hadoop Core”▪ Related components are often deployed with Hadoop ▪ For example: HBase, Hive, Pig, Oozie, Flume, Sqoop ▪ Together, these components form a “Hadoop Stack” ▪ Not all components must be deployed
  • ContextBigtop▪ What standards should all components follow?▪ How can we ensure all components of the stack work together?▪ How can we find the right version of each component?▪ How can we make it easy to install an additional component?
  • 2. Apache Bigtop
  • Apache Bigtop▪ Now incubating at Apache▪ Hadoop ecosystem-wide project, including: ▪ Interoperability testing of components ▪ Packaging of compatible versions of components▪ Like a Fedora, Debian or CentOS for Hadoop ecosystem▪ Releases are not a single artifact ▪ Rather a set of interdependent, compatible components ©2011 Cloudera, Inc. All Rights Reserved.
  • Apache Bigtop▪ Current components ▪ Hadoop ▪ HBase ▪ Hive ▪ Pig ▪ Oozie ▪ Sqoop ▪ Flume ▪ ZooKeeper ▪ Whirr
  • Apache Bigtop▪ Outputs ▪ Source ▪ RPM ▪ Deb▪ Tests ▪ Integration ▪ Package ▪ Smoke▪ Release 0.1.0 under vote now!
  • 3. Apache Hadoop Core
  • Apache Hadoop Core▪ Current stable releases based on branches from 0.20▪ Upcoming release: 0.22 ▪ Includes both security and new implementation of append ▪ Not expected to be run at scale or commercially supported ▪ Nearly ready for vote▪ Upcoming release: 0.23 ▪ Build and dependency management moved to Maven ▪ Branch to happen soon
  • HDFS▪ Robustness ▪ HDFS-1073: Checkpointing of image and edits log▪ Availability ▪ HDFS-1623: High availability▪ Performance ▪ HDFS-941: Faster random reads ▪ HDFS-2080: Faster checksums ©2011 Cloudera, Inc. All Rights Reserved.
  • HDFS▪ Scalability ▪ HDFS-1052: Federation of the NameNode ▪ Source of diagram: http://www.hortonworks.com/an-introduction-to-hdfs-federation/
  • MapReduce▪ Modularity ▪ MAPREDUCE-279: MapReduce 2.0 ▪ Break JobTracker into ResourceManager and ApplicationMaster ▪ Replace TaskTracker with NodeManager ▪ Source of diagram: http://www.odbms.org/download/dean-keynote-ladis2009.pdf
  • MapReduce▪ Potential New Frameworks ▪ MAPREDUCE-2719: Distributed shell ▪ MAPREDUCE-2720: Distributed Java commands ▪ MPI: Communication-intensive parallelism ▪ Fast scans and aggregations ▪ OpenDremel ▪ Bulk Synchronous Parallel ▪ Giraph, Golden Orb, Hama, et al. ▪ Actor Model (streaming) ▪ S4, Akka, Storm, et al.
  • 4. HBase, Hive, and Pig
  • Apache HBase▪ Upcoming release: 0.92.0▪ Server-side triggers ▪ HBASE-2000: Coprocessors▪ Availability ▪ HBASE-1730/4213: Online schema changes▪ Performance ▪ HBASE-3857: HFile 2.0▪ HBase book in September! ©2011 Cloudera, Inc. All Rights Reserved.
  • Apache Hive▪ Upcoming release: 0.8▪ Data transfer ▪ HIVE-306: INSERT INTO ▪ HIVE-1918: EXPORT/IMPORT▪ Indexes ▪ HIVE-1644: Automatically use indexes ▪ HIVE-1803: Bitmap indexes▪ Data formats ▪ HIVE-895: Avro support ©2011 Cloudera, Inc. All Rights Reserved.
  • Apache Pig▪ Recent release: 0.9▪ Scripting ▪ PIG-1479: Embedding Pig in Python ▪ PIG-1793: Macro expansion▪ Debugging ▪ PIG-1712: ILLUSTRATE rework▪ Data formats ▪ PIG-1748: Avro support ©2011 Cloudera, Inc. All Rights Reserved.
  • 5. Other Components
  • Other Components▪ Apache Incubator ▪ Sqoop, Flume, and Oozie now incubating ▪ Whirr graduated to a top-level Apache project▪ Apache Avro ▪ Interoperability with Protocol Buffers and Thrift ▪ Column-oriented file format ▪ Python MapReduce implementation▪ Apache ZooKeeper ▪ Multi-update ▪ Kerberos authentication of clients ©2011 Cloudera, Inc. All Rights Reserved.
  • Q&AVisit www.hadoopworld.com• November 8-9, 2011 in New York City• Early bird discount ends September 5, 2011Enter Today: www.facebook.com/cloudera• Click the “Be a Cloudera Hero for Apache Hadoop” tab• Share what you think Apache Hadoop can do for you• Win a personal hackathon with Doug Cutting in San Francisco, CA