In this webinar we introduce the the concepts of Hadoop and dive into some details unqiue to the Pivotal HD distribution, namely HAWQ which brings ANSI complaint SQL to Hadoop.
We also introduce the Spring for Apache Hadoop project that simplifies developing Hadoop applications by providing a unified configuration model and easy to use APIs for using HDFS, MapReduce, Pig, Hive, and HBase. It also provides integration with other Spring ecosystem project such as Spring Integration and Spring Batch enabling you to develop solutions for big data ingest/export and Hadoop workflow orchestration. The new Spring XD umbrella project is also introduced.
Client contacts the namenode with a request to write some dataNamenode responds and says okay write it to these data nodesClient connects to each data node and writes out four blocks, one per node
After the file is closed, the data nodes traffic data around to replicate the blocks to a triplicate, all orchestrated by the namenodeIn the event of a node failure, data can be accessed on other nodes and the namenode will move data blocks to other nodes
Client contacts the namenode with a request to write some dataNamenode responds and says okay write it to these data nodesClient connects to each data node and writes out four blocks, one per node
Uses key value pairs as input and output to both phasesHighly parallelizable paradigm – very easy choice for data processing on a Hadoop cluster
Advanced Database Services (HAWQ) – high-performance, “True SQL” query interface running within the Hadoop cluster.Xtensions Framework – support for ADS interfaces on external data providers (HBase, Avro, etc.).Advanced Analytics Functions (MADLib) – ability to access parallelized machine-learning and data-mining functions at scale.Unified Storage Services (USS)and Unified Catalog Services (UCS) – support for tiered storage (hot, warm, cold) and integration of multiple data provider catalogs into a single interface.