Hortonworks Data Platform (HDP) is the only 100% open source Apache Hadoop distribution that provides a complete and reliable foundation for enterprises that want to build, deploy and manage big data solutions. It allows you to confidently capture, process and share data in any format, at scale on commodity hardware and/or in a cloud environment. As the foundation for the next generation enterprise data architecture, HDP delivers all of the necessary components to uncover business insights from the growing streams of data flowing into and throughout your business. HDP is a fully integrated data platform that includes the stable core functions of Apache Hadoop (HDFS and MapReduce), the baseline tools to process big data (Apache Hive, Apache HBase, Apache Pig) as well as a set of advanced capabilities (Apache Ambari, Apache HCatalog and High Availability) that make big data operational and ready for the enterprise. Run through the points on left…
Current AM Components Job Running TA Listener Task Task TaskAttemp Container t Launcher RMAllocato RM r/ NM Scheduler
• TaskAttempt and Container operations are tightly coupled – CLC construction, Container Launch invocation is handled by the TaskAttempt – Container Launch is tied to the TaskAttempt (instead of container size, LocalResources) – Container shutdown.
AM post 3902 Container Job Running JVM Listener Running TA Listener Task Task NM TaskAttemp Container Communicat t or Node RM AMScheduler NM Rack ?
• Container and Node have their own states.• Containers interact with the NodeManager• Tasks interact with the scheduler – which matches containers to task attempts.• Nodes take care of blacklisting – simplifies the scheduler.• Easier to write a custom scheduler.
Current State• Most of the AM functional changes are done. (Cleanup pending)• Task side changes are required• A re-use scheduler needs to be implemented.
Facilitates• Common MapOutputBuffer for maps assigned to the same container.• Merging per-node or per-rack map output• Custom Task Types