MapReduce Container ReUse


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Hortonworks Data Platform (HDP) is the only 100% open source Apache Hadoop distribution that provides a complete and reliable foundation for enterprises that want to build, deploy and manage big data solutions. It allows you to confidently capture, process and share data in any format, at scale on commodity hardware and/or in a cloud environment. As the foundation for the next generation enterprise data architecture, HDP delivers all of the necessary components to uncover business insights from the growing streams of data flowing into and throughout your business. HDP is a fully integrated data platform that includes the stable core functions of Apache Hadoop (HDFS and MapReduce), the baseline tools to process big data (Apache Hive, Apache HBase, Apache Pig) as well as a set of advanced capabilities (Apache Ambari, Apache HCatalog and High Availability) that make big data operational and ready for the enterprise.  Run through the points on left…
  • MapReduce Container ReUse

    1. 1. MR Container Re-Use(Not YARN specific) Siddarth Seth Member of Technical Staff © Hortonworks Inc. 2012
    2. 2. Current AM Components Job Running TA Listener Task Task TaskAttemp Container t Launcher RMAllocato RM r/ NM Scheduler
    3. 3. • TaskAttempt and Container operations are tightly coupled – CLC construction, Container Launch invocation is handled by the TaskAttempt – Container Launch is tied to the TaskAttempt (instead of container size, LocalResources) – Container shutdown.
    4. 4. AM post 3902 Container Job Running JVM Listener Running TA Listener Task Task NM TaskAttemp Container Communicat t or Node RM AMScheduler NM Rack ?
    5. 5. • Container and Node have their own states.• Containers interact with the NodeManager• Tasks interact with the scheduler – which matches containers to task attempts.• Nodes take care of blacklisting – simplifies the scheduler.• Easier to write a custom scheduler.
    6. 6. Current State• Most of the AM functional changes are done. (Cleanup pending)• Task side changes are required• A re-use scheduler needs to be implemented.
    7. 7. Facilitates• Common MapOutputBuffer for maps assigned to the same container.• Merging per-node or per-rack map output• Custom Task Types
    8. 8. Hortonworks Data Platform • Simplify deployment to get started quickly and easily • Monitor, manage any size cluster with familiar console and tools 1 • Only platform to include data integration services to interact with any data • Metadata services opens the platform for integration with existing applications • Dependable high availability architecture Reduce risks and cost of adoption Lower the total cost to administer and provision • Tested at scale to future proof your cluster growth Integrate with your existing ecosystem Page 8 © Hortonworks Inc. 2012
    9. 9. Hortonworks Training The expert source for Apache Hadoop training & certificationRole-based Developer and Administration training – Coursework built and maintained by the core Apache Hadoop development team. – The “right” course, with the most extensive and realistic hands-on materials – Provide an immersive experience into real-world Hadoop scenarios – Public and Private courses availableComprehensive Apache Hadoop Certification – Become a trusted and valuable Apache Hadoop expert Page 9 © Hortonworks Inc. 2012
    10. 10. Next Steps?1 Download Hortonworks Data Platform Use the getting started guide Learn more… get support Hortonworks Support • Expert role based training • Full lifecycle technical support • Course for admins, developers across four service levels and operators • Delivered by Apache Hadoop • Certification program Experts/Committers • Custom onsite options • Forward-compatible Page 10 © Hortonworks Inc. 2012
    11. 11. Thank You!Questions & AnswersFollow: @hortonworksRead: Page 11 © Hortonworks Inc. 2012