Your SlideShare is downloading. ×
0
The Future of Hadoop                                                                                     Doug Cutting | A...
Housekeeping▪   All lines are on mute▪   Ask questions at any time using the Questions panel on GoToMeeting▪   Slides and ...
Presentation Outline▪   1. Context▪   2. Apache Bigtop▪   3. Apache Hadoop Core▪   4. Apache HBase, Hive, and Pig▪   5. Ot...
1. Context
ContextData▪   1.8 ZB will be created and replicated in 2011    ▪   Up 9x in the last five years    ▪   More than 90% of t...
ContextHadoop▪   Apache Hadoop and related software are designed for this world▪   Volume    ▪   Commodity hardware and op...
ContextHadoop
ContextHDFS and MapReduce▪   Apache Hadoop = HDFS + MapReduce    ▪   Similar to kernel of an operating system    ▪   Refer...
ContextBigtop▪   What standards should all components follow?▪   How can we ensure all components of the stack work togeth...
2. Apache Bigtop
Apache Bigtop▪   Now incubating at Apache▪   Hadoop ecosystem-wide project, including:    ▪   Interoperability testing of ...
Apache Bigtop▪   Current components    ▪   Hadoop    ▪   HBase    ▪   Hive    ▪   Pig    ▪   Oozie    ▪   Sqoop    ▪   Flu...
Apache Bigtop▪   Outputs    ▪   Source    ▪   RPM    ▪   Deb▪   Tests    ▪   Integration    ▪   Package    ▪   Smoke▪   Re...
3. Apache Hadoop Core
Apache Hadoop Core▪   Current stable releases based on branches from 0.20▪   Upcoming release: 0.22    ▪   Includes both s...
HDFS▪   Robustness    ▪   HDFS-1073: Checkpointing of image and edits log▪   Availability    ▪   HDFS-1623: High availabil...
HDFS▪   Scalability    ▪   HDFS-1052: Federation of the NameNode                                    ▪   Source of diagram:...
MapReduce▪   Modularity    ▪   MAPREDUCE-279: MapReduce 2.0        ▪   Break JobTracker into ResourceManager and Applicati...
MapReduce▪   Potential New Frameworks    ▪   MAPREDUCE-2719: Distributed shell    ▪   MAPREDUCE-2720: Distributed Java com...
4. HBase, Hive, and Pig
Apache HBase▪   Upcoming release: 0.92.0▪   Server-side triggers    ▪   HBASE-2000: Coprocessors▪   Availability    ▪   HB...
Apache Hive▪   Upcoming release: 0.8▪   Data transfer    ▪   HIVE-306: INSERT INTO    ▪   HIVE-1918: EXPORT/IMPORT▪   Inde...
Apache Pig▪   Recent release: 0.9▪   Scripting    ▪   PIG-1479: Embedding Pig in Python    ▪   PIG-1793: Macro expansion▪ ...
5. Other Components
Other Components▪   Apache Incubator    ▪   Sqoop, Flume, and Oozie now incubating    ▪   Whirr graduated to a top-level A...
Q&AVisit www.hadoopworld.com• November 8-9, 2011 in New York City• Early bird discount ends September 5, 2011Enter Today: ...
Upcoming SlideShare
Loading in...5
×

Webinar: The Future of Hadoop

14,604

Published on

With a community of over 500 contributors, Apache Hadoop and related projects are evolving at an ever increasing rate. Join the co-creator of Apache Hadoop, Doug Cutting, and Cloudera’s Chief Scientist, Jeff Hammerbacher, for a discussion of the most exciting new features being developed by the Apache Hadoop community.

Published in: Technology
1 Comment
15 Likes
Statistics
Notes
  • Hi,
    I am recruiting you any for universalisation, charismation, divinisation and presentation,
    Sorry, for this comment, i have commented on topic for recession, but then i went universal, pardon me .... !
    i am not doing too much, i am doing what i think it has to be done ....
    my solution for recession is universalisation, means evaluate all resourcess and assets of universe and then apply necessary sum of new currency (Zik=100$) to pay all debts and to buy off all taxes from national governments ....
    of course for this we need adequate entity, i see on horizon only myself as the secular and universal, legal and official The God, recognised by UN and with contracts with all national states governments,
    of course i invite you all to create a fresh new account at google, free, but with my data: universal identities names and universal residence, like this: Zababau Ganetros Cirimbo Ostangu zaqaqef@gmail.com ogiriny64256142, ( you can create this one but then inform me), access to account i have to have because this is divinising universalisation, but you can open it for all, i simply have to arrange it to adapt to paradigm, isn't it ......
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
14,604
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
287
Comments
1
Likes
15
Embeds 0
No embeds

No notes for slide

Transcript of "Webinar: The Future of Hadoop"

  1. 1. The Future of Hadoop Doug Cutting | A Founder of Apache HadoopJeff Hammerbacher | Chief Scientist, Cloudera Welcome to the webinar! Audio/Telephone: +1 (215) 383-1016 Access Code: 421-634-457 Audio Pin: Shown after joining the Webinar Hadoop, Hbase, Pig, Hive, Bigtop, Avro, Flume & Whirr are trademark of the Apache Software Foundation
  2. 2. Housekeeping▪ All lines are on mute▪ Ask questions at any time using the Questions panel on GoToMeeting▪ Slides and recording will be available on www.cloudera.com/events ©2011 Cloudera, Inc. All Rights Reserved.
  3. 3. Presentation Outline▪ 1. Context▪ 2. Apache Bigtop▪ 3. Apache Hadoop Core▪ 4. Apache HBase, Hive, and Pig▪ 5. Other components▪ Questions and Discussion ©2011 Cloudera, Inc. All Rights Reserved.
  4. 4. 1. Context
  5. 5. ContextData▪ 1.8 ZB will be created and replicated in 2011 ▪ Up 9x in the last five years ▪ More than 90% of this data is unstructured ▪ Enterprises have some liability for 80% of this data ▪ Enterprises will spend $4T on managing data in 2011 ▪ Source: IDC Digital Universe Report 2011 ©2011 Cloudera, Inc. All Rights Reserved.
  6. 6. ContextHadoop▪ Apache Hadoop and related software are designed for this world▪ Volume ▪ Commodity hardware and open source software lowers cost and increases capacity▪ Velocity ▪ Data ingest speed aided by append-only and schema-on-read design▪ Variety ▪ Multiple tools to structure, process, and access data ©2011 Cloudera, Inc. All Rights Reserved.
  7. 7. ContextHadoop
  8. 8. ContextHDFS and MapReduce▪ Apache Hadoop = HDFS + MapReduce ▪ Similar to kernel of an operating system ▪ Referred to as “Hadoop Core”▪ Related components are often deployed with Hadoop ▪ For example: HBase, Hive, Pig, Oozie, Flume, Sqoop ▪ Together, these components form a “Hadoop Stack” ▪ Not all components must be deployed
  9. 9. ContextBigtop▪ What standards should all components follow?▪ How can we ensure all components of the stack work together?▪ How can we find the right version of each component?▪ How can we make it easy to install an additional component?
  10. 10. 2. Apache Bigtop
  11. 11. Apache Bigtop▪ Now incubating at Apache▪ Hadoop ecosystem-wide project, including: ▪ Interoperability testing of components ▪ Packaging of compatible versions of components▪ Like a Fedora, Debian or CentOS for Hadoop ecosystem▪ Releases are not a single artifact ▪ Rather a set of interdependent, compatible components ©2011 Cloudera, Inc. All Rights Reserved.
  12. 12. Apache Bigtop▪ Current components ▪ Hadoop ▪ HBase ▪ Hive ▪ Pig ▪ Oozie ▪ Sqoop ▪ Flume ▪ ZooKeeper ▪ Whirr
  13. 13. Apache Bigtop▪ Outputs ▪ Source ▪ RPM ▪ Deb▪ Tests ▪ Integration ▪ Package ▪ Smoke▪ Release 0.1.0 under vote now!
  14. 14. 3. Apache Hadoop Core
  15. 15. Apache Hadoop Core▪ Current stable releases based on branches from 0.20▪ Upcoming release: 0.22 ▪ Includes both security and new implementation of append ▪ Not expected to be run at scale or commercially supported ▪ Nearly ready for vote▪ Upcoming release: 0.23 ▪ Build and dependency management moved to Maven ▪ Branch to happen soon
  16. 16. HDFS▪ Robustness ▪ HDFS-1073: Checkpointing of image and edits log▪ Availability ▪ HDFS-1623: High availability▪ Performance ▪ HDFS-941: Faster random reads ▪ HDFS-2080: Faster checksums ©2011 Cloudera, Inc. All Rights Reserved.
  17. 17. HDFS▪ Scalability ▪ HDFS-1052: Federation of the NameNode ▪ Source of diagram: http://www.hortonworks.com/an-introduction-to-hdfs-federation/
  18. 18. MapReduce▪ Modularity ▪ MAPREDUCE-279: MapReduce 2.0 ▪ Break JobTracker into ResourceManager and ApplicationMaster ▪ Replace TaskTracker with NodeManager ▪ Source of diagram: http://www.odbms.org/download/dean-keynote-ladis2009.pdf
  19. 19. MapReduce▪ Potential New Frameworks ▪ MAPREDUCE-2719: Distributed shell ▪ MAPREDUCE-2720: Distributed Java commands ▪ MPI: Communication-intensive parallelism ▪ Fast scans and aggregations ▪ OpenDremel ▪ Bulk Synchronous Parallel ▪ Giraph, Golden Orb, Hama, et al. ▪ Actor Model (streaming) ▪ S4, Akka, Storm, et al.
  20. 20. 4. HBase, Hive, and Pig
  21. 21. Apache HBase▪ Upcoming release: 0.92.0▪ Server-side triggers ▪ HBASE-2000: Coprocessors▪ Availability ▪ HBASE-1730/4213: Online schema changes▪ Performance ▪ HBASE-3857: HFile 2.0▪ HBase book in September! ©2011 Cloudera, Inc. All Rights Reserved.
  22. 22. Apache Hive▪ Upcoming release: 0.8▪ Data transfer ▪ HIVE-306: INSERT INTO ▪ HIVE-1918: EXPORT/IMPORT▪ Indexes ▪ HIVE-1644: Automatically use indexes ▪ HIVE-1803: Bitmap indexes▪ Data formats ▪ HIVE-895: Avro support ©2011 Cloudera, Inc. All Rights Reserved.
  23. 23. Apache Pig▪ Recent release: 0.9▪ Scripting ▪ PIG-1479: Embedding Pig in Python ▪ PIG-1793: Macro expansion▪ Debugging ▪ PIG-1712: ILLUSTRATE rework▪ Data formats ▪ PIG-1748: Avro support ©2011 Cloudera, Inc. All Rights Reserved.
  24. 24. 5. Other Components
  25. 25. Other Components▪ Apache Incubator ▪ Sqoop, Flume, and Oozie now incubating ▪ Whirr graduated to a top-level Apache project▪ Apache Avro ▪ Interoperability with Protocol Buffers and Thrift ▪ Column-oriented file format ▪ Python MapReduce implementation▪ Apache ZooKeeper ▪ Multi-update ▪ Kerberos authentication of clients ©2011 Cloudera, Inc. All Rights Reserved.
  26. 26. Q&AVisit www.hadoopworld.com• November 8-9, 2011 in New York City• Early bird discount ends September 5, 2011Enter Today: www.facebook.com/cloudera• Click the “Be a Cloudera Hero for Apache Hadoop” tab• Share what you think Apache Hadoop can do for you• Win a personal hackathon with Doug Cutting in San Francisco, CA
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×