Successfully reported this slideshow.
Structured, Unstructured and Complex Data                              Management                              Amit Chaudh...
Hadoop What is this? Structure of this Is this unknown thing right for me? Where is this used?
   Any idea? (Idea SIM card)
What is                     ? It is an open source project by the  Apache Foundation to handle large  data processing It...
Large Data Means?   1000 kilobytes = 1 Megabyte   1000 Megabytes = 1 Gigabyte   1000 Gigabytes = 1 Terabyte   1000 Ter...
So what’s the big deal? Scalable: New nodes can be added as  needed, without changing the formats Flexible: It is schema...
Hadoop = HDFS + MapReduce HDFS: For storing massive datasets  using low-cost storage MapReduce: The algorithm on which  ...
HDFS It is a fault-tolerant storage system Able to store huge amounts of  information It creates clusters of machines a...
HDFS It manages storage on the cluster by  breaking incoming files into  pieces, called blocks Stores each of the blocks...
How this works?
How this works?
Which companies areusing? LinkedIn Walt Disney Wal-mart General Electric Nokia Bank of America Foursquare
at Foursquare   Foursquare: Mobile + Location + Social    Networking
Is this unknown thing right for me?
Upcoming SlideShare
Loading in …5
×

Hadoop

2,339 views

Published on

Hadoop is a project run under Apache. It is an efficient choice to manage big clusters of data easily.

Published in: Education, Technology, Business
  • Be the first to comment

Hadoop

  1. 1. Structured, Unstructured and Complex Data Management Amit Chaudhary 11MCA03 Karthik Iyer 11MCA05
  2. 2. Hadoop What is this? Structure of this Is this unknown thing right for me? Where is this used?
  3. 3.  Any idea? (Idea SIM card)
  4. 4. What is ? It is an open source project by the Apache Foundation to handle large data processing It was inspired by Google’s MapReduce and Google File System (GFS) papers It was originally conceived by Doug Cutting It is named after his son’s pet elephant incidentally
  5. 5. Large Data Means? 1000 kilobytes = 1 Megabyte 1000 Megabytes = 1 Gigabyte 1000 Gigabytes = 1 Terabyte 1000 Terabytes = 1 Petabyte 1000 Petabytes = 1 Exabyte 1000 Exabytes = 1 Zettabyte 1000 Zettabytes = 1 Yottabyte 1000 Yottabytes = 1 Bronobyte 1000 Bronobytes = 1 Geopbyte
  6. 6. So what’s the big deal? Scalable: New nodes can be added as needed, without changing the formats Flexible: It is schema-less, and can absorb any type of data, structured or not, from any number of sources Fault tolerant: System redirects work to another location if a node fails
  7. 7. Hadoop = HDFS + MapReduce HDFS: For storing massive datasets using low-cost storage MapReduce: The algorithm on which Google built its empire
  8. 8. HDFS It is a fault-tolerant storage system Able to store huge amounts of information It creates clusters of machines and coordinates work among them If one fails, it continues to operate the cluster without losing data or interrupting work, by shifting work to the remaining machines in the cluster
  9. 9. HDFS It manages storage on the cluster by breaking incoming files into pieces, called blocks Stores each of the blocks redundantly across the pool of servers It stores three complete copies of each file by copying each piece to three different servers
  10. 10. How this works?
  11. 11. How this works?
  12. 12. Which companies areusing? LinkedIn Walt Disney Wal-mart General Electric Nokia Bank of America Foursquare
  13. 13. at Foursquare Foursquare: Mobile + Location + Social Networking
  14. 14. Is this unknown thing right for me?

×