• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Introduction to Hadoop at Data-360 Conference
 

Introduction to Hadoop at Data-360 Conference

on

  • 242 views

A short introduction to Hadoop mostly with live industry examples and scenarios.

A short introduction to Hadoop mostly with live industry examples and scenarios.

Statistics

Views

Total Views
242
Views on SlideShare
236
Embed Views
6

Actions

Likes
0
Downloads
14
Comments
0

2 Embeds 6

https://twitter.com 4
https://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Slides are for reference only. We can understand and learn more about live example and live discussion.Lets see who is who in the room. How many coders? Program Managers? Any Hadoop stories? How about where is Hadoop headquarter?

Introduction to Hadoop at Data-360 Conference Introduction to Hadoop at Data-360 Conference Presentation Transcript

  • avkash@bigdataperspective.com
  • http://www.packtpub.com/using-cloudera-impala/book http://www.amazon.com/Simplifying-Windows-Azure-HDInsight-Service/dp/0735673802 https://www.linkedin.com/in/avkashchauhan
  • Hadoop is an Open Source (Java based), “Scalable”, “fault tolerant” platform for large amount of unstructured data storage & processing, distributed across machines.
  • Flexibility A Single Repo for storing and analyzing any kind of data not bounded by schema Scalability Scale-out architecture divides workload across multiple nodes using flexible distributed file system Low Cost Deployed on commodity hardware & open source platform Fault Tolerant Continue working event if node(s) go down
  • A system to move computation, where the data is.
  • Hadoop Common HDFS Map/Reduce
  • Hadoop Common HDFS MapReduce
  • Cloudera Impala Hortonworks Tez Impala uses C++ based in-memory processing of HDFS data through SQL like statements to expedite the data processing Use cases include user collaborative filtering, user recommendations, clustering and classification.