Hadoop
Agenda• Problems with traditional large-scale systems• Requirements for new approaches• What is Hadoop..?• Why Hadoop?• Ov...
Problems with traditional large-scale systemsData is being increased day-by-dayIssues with the network failureServer fa...
Requirements for new approachesData should be stored in a distributed mannerand parallel processing.High performance and...
What is Hadoop…?Open Source FrameworkProcess large amount of data
Why Hadoop…?• Accessible• Scalable• Robust• Simple
Overview of HadoopIt handles 3 types of dataStructuredSemi – structuredUnstructuredAnalyses and process large amounts of...
Compare with traditional DB’sRDBMS• Stores GB’s of data• Supports batch processand interactive process• Allows Updation• S...
ComponentsHadoop can be divided into 2 parts1. HDFS – Hadoop Distributed File System2. MapReduce Programming model
Hadoop Distributed File SystemIt is a distributed file systemRuns on commodity hardwareProvides high throughput access ...
Core Architectural Goal of HDFSA HDFS instance may consist of thousands of server machines.Detection of faults and quick...
MapReduce Programming ModelMapReduce works on divide and conquer rule on the data.Schedules execution across a set of ma...
MapReduce Programming Model– MAP• Map() function that processes a key/value pair togenerate a set of intermediate key/valu...
Applications
REFERENCE• HADOOP IN ACTION- By CHUK LAM• YOUTUBE• WIKEPEDIA• GOOGLE IMAGES
Conclusion
Hadoop hive presentation
Hadoop hive presentation
Hadoop hive presentation
Hadoop hive presentation
Hadoop hive presentation
Hadoop hive presentation
Hadoop hive presentation
Hadoop hive presentation
Hadoop hive presentation
Hadoop hive presentation
Hadoop hive presentation
Upcoming SlideShare
Loading in...5
×

Hadoop hive presentation

630

Published on

Hadoop seminar topic,Hadoop Cse,Hadoop ppt

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
630
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
64
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Hadoop hive presentation

  1. 1. Hadoop
  2. 2. Agenda• Problems with traditional large-scale systems• Requirements for new approaches• What is Hadoop..?• Why Hadoop?• Overview of Hadoop• HDFS• Map Reduce• Applications• Conclusion
  3. 3. Problems with traditional large-scale systemsData is being increased day-by-dayIssues with the network failureServer failureLoss of dataCost is more.Distributed computing need manual processing
  4. 4. Requirements for new approachesData should be stored in a distributed mannerand parallel processing.High performance and less cost.Should be scalableShould be simple to access and processFault tolerance
  5. 5. What is Hadoop…?Open Source FrameworkProcess large amount of data
  6. 6. Why Hadoop…?• Accessible• Scalable• Robust• Simple
  7. 7. Overview of HadoopIt handles 3 types of dataStructuredSemi – structuredUnstructuredAnalyses and process large amounts of data (Peta byte)
  8. 8. Compare with traditional DB’sRDBMS• Stores GB’s of data• Supports batch processand interactive process• Allows Updation• Schemas must me defined• Only structured dataHADOOP• Stores PB’s of data• Only batch process• Does not allow Updation, itfollows WORM• Schemas not required• Supports 3 types of data
  9. 9. ComponentsHadoop can be divided into 2 parts1. HDFS – Hadoop Distributed File System2. MapReduce Programming model
  10. 10. Hadoop Distributed File SystemIt is a distributed file systemRuns on commodity hardwareProvides high throughput access to application datasuitable for applications that have large data sets.It is designed to store a very large amount of data (Tera or petabytes).
  11. 11. Core Architectural Goal of HDFSA HDFS instance may consist of thousands of server machines.Detection of faults and quickly recovering from them in anautomated manner
  12. 12. MapReduce Programming ModelMapReduce works on divide and conquer rule on the data.Schedules execution across a set of machinesManages inter-process communicationThe Reducer processes all output from all mappers and arrivesat final output
  13. 13. MapReduce Programming Model– MAP• Map() function that processes a key/value pair togenerate a set of intermediate key/value pairs– REDUCE• reduce() function that merges all intermediate valuesassociated with the same intermediate key.
  14. 14. Applications
  15. 15. REFERENCE• HADOOP IN ACTION- By CHUK LAM• YOUTUBE• WIKEPEDIA• GOOGLE IMAGES
  16. 16. Conclusion
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×