Hadoop hive presentation

1,575 views

Published on

Hadoop seminar topic,Hadoop Cse,Hadoop ppt

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,575
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
125
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Hadoop hive presentation

  1. 1. Hadoop
  2. 2. Agenda• Problems with traditional large-scale systems• Requirements for new approaches• What is Hadoop..?• Why Hadoop?• Overview of Hadoop• HDFS• Map Reduce• Applications• Conclusion
  3. 3. Problems with traditional large-scale systemsData is being increased day-by-dayIssues with the network failureServer failureLoss of dataCost is more.Distributed computing need manual processing
  4. 4. Requirements for new approachesData should be stored in a distributed mannerand parallel processing.High performance and less cost.Should be scalableShould be simple to access and processFault tolerance
  5. 5. What is Hadoop…?Open Source FrameworkProcess large amount of data
  6. 6. Why Hadoop…?• Accessible• Scalable• Robust• Simple
  7. 7. Overview of HadoopIt handles 3 types of dataStructuredSemi – structuredUnstructuredAnalyses and process large amounts of data (Peta byte)
  8. 8. Compare with traditional DB’sRDBMS• Stores GB’s of data• Supports batch processand interactive process• Allows Updation• Schemas must me defined• Only structured dataHADOOP• Stores PB’s of data• Only batch process• Does not allow Updation, itfollows WORM• Schemas not required• Supports 3 types of data
  9. 9. ComponentsHadoop can be divided into 2 parts1. HDFS – Hadoop Distributed File System2. MapReduce Programming model
  10. 10. Hadoop Distributed File SystemIt is a distributed file systemRuns on commodity hardwareProvides high throughput access to application datasuitable for applications that have large data sets.It is designed to store a very large amount of data (Tera or petabytes).
  11. 11. Core Architectural Goal of HDFSA HDFS instance may consist of thousands of server machines.Detection of faults and quickly recovering from them in anautomated manner
  12. 12. MapReduce Programming ModelMapReduce works on divide and conquer rule on the data.Schedules execution across a set of machinesManages inter-process communicationThe Reducer processes all output from all mappers and arrivesat final output
  13. 13. MapReduce Programming Model– MAP• Map() function that processes a key/value pair togenerate a set of intermediate key/value pairs– REDUCE• reduce() function that merges all intermediate valuesassociated with the same intermediate key.
  14. 14. Applications
  15. 15. REFERENCE• HADOOP IN ACTION- By CHUK LAM• YOUTUBE• WIKEPEDIA• GOOGLE IMAGES
  16. 16. Conclusion

×