Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Hadoop at Data-360 Conference


Published on

A short introduction to Hadoop mostly with live industry examples and scenarios.

Published in: Technology, Business
  • Be the first to comment

Introduction to Hadoop at Data-360 Conference

  1. 1.
  2. 2.
  3. 3. Hadoop is an Open Source (Java based), “Scalable”, “fault tolerant” platform for large amount of unstructured data storage & processing, distributed across machines.
  4. 4. Flexibility A Single Repo for storing and analyzing any kind of data not bounded by schema Scalability Scale-out architecture divides workload across multiple nodes using flexible distributed file system Low Cost Deployed on commodity hardware & open source platform Fault Tolerant Continue working event if node(s) go down
  5. 5. A system to move computation, where the data is.
  6. 6. Hadoop Common HDFS Map/Reduce
  7. 7. Hadoop Common HDFS MapReduce
  8. 8. Cloudera Impala Hortonworks Tez Impala uses C++ based in-memory processing of HDFS data through SQL like statements to expedite the data processing Use cases include user collaborative filtering, user recommendations, clustering and classification.