avkash@bigdataperspective.com
http://www.packtpub.com/using-cloudera-impala/book
http://www.amazon.com/Simplifying-Windows-Azure-HDInsight-Service/dp/0735673802
https://www.linkedin.com/in/avkashchauhan
Hadoop is an Open Source (Java based), “Scalable”, “fault
tolerant” platform for large amount of unstructured data storage
& processing, distributed across machines.
Flexibility
A Single Repo for storing
and analyzing any kind
of data not bounded by
schema
Scalability
Scale-out architecture
divides workload across
multiple nodes using flexible
distributed file system
Low Cost
Deployed on
commodity
hardware & open
source platform
Fault Tolerant
Continue working
event if node(s) go
down
A system to move computation, where the data is.
Hadoop Common
HDFS Map/Reduce
Hadoop Common
HDFS MapReduce
Cloudera Impala Hortonworks Tez
Impala uses C++ based in-memory
processing of HDFS data through SQL
like statements to expedite the data
processing
Use cases include user collaborative
filtering, user recommendations,
clustering and classification.
Introduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 Conference
Introduction to Hadoop at Data-360 Conference

Introduction to Hadoop at Data-360 Conference

Editor's Notes

  • #12 Slides are for reference only. We can understand and learn more about live example and live discussion.Lets see who is who in the room. How many coders? Program Managers? Any Hadoop stories? How about where is Hadoop headquarter?