Slides are for reference only. We can understand and learn more about live example and live discussion.Lets see who is who in the room. How many coders? Program Managers? Any Hadoop stories? How about where is Hadoop headquarter?
Hadoop is an Open Source (Java based), “Scalable”, “fault
tolerant” platform for large amount of unstructured data storage
& processing, distributed across machines.
A Single Repo for storing
and analyzing any kind
of data not bounded by
divides workload across
multiple nodes using flexible
distributed file system
hardware & open
event if node(s) go
A system to move computation, where the data is.
Cloudera Impala Hortonworks Tez
Impala uses C++ based in-memory
processing of HDFS data through SQL
like statements to expedite the data
Use cases include user collaborative
filtering, user recommendations,
clustering and classification.
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.