Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals

3,731 views

Published on

This Hadoop MapReduce tutorial will unravel MapReduce Programming, MapReduce Commands, MapReduce Fundamentals, Driver Class, Mapper Class, Reducer Class, Job Tracker & Task Tracker.

At the end, you'll have a strong knowledge regarding Hadoop MapReduce Basics.

PPT Agenda:

✓ Introduction to BIG Data & Hadoop
✓ What is MapReduce?
✓ MapReduce Data Flows
✓ MapReduce Programming

----------
What is MapReduce?

MapReduce is a programming framework for distributed processing of large data-sets via commodity computing clusters. It is based on the principal of parallel data processing, wherein data is broken into smaller blocks rather than processed as a single block. This ensures a faster, secure & scalable solution. Mapreduce commands are based in Java.

----------
What are MapReduce Components?

It has the following components:

1. Combiner: The combiner collates all the data from the sample set based on your desired filters. For example, you can collate data based on day, week, month and year. After this, the data is prepared and sent for parallel processing.

2. Job Tracker: This allocates the data across multiple servers.

3. Task Tracker: This executes the program across various servers.

4. Reducer: It will isolate the desired output from across the multiple servers.

----------
Applications of MapReduce

1. Data Mining
2. Document Indexing
3. Business Intelligence
4. Predictive Modelling
5. Hypothesis Testing

----------
Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance.

Email: sales@skillspeed.com
Website: https://www.skillspeed.com

Published in: Technology
  • Be the first to comment

Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals

  1. 1. Slide 1© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data Insights using MapReduce
  2. 2. Slide 2© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Session Objectives ᗍ Introduction to Big Data and Hadoop ᗍ Understanding HDFS ᗍ Introduction to MapReduce – MapReduce Fundamentals ᗍ MapReduce Programming Tutorial ᗍ BIG Data Analytics via MapReduce ᗍ BIG Data & Hadoop Course Details ᗍ Webinar by Skillspeed Get Started with BIG Data & Hadoop
  3. 3. Slide 3© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data and its Challenges Get Started with BIG Data & Hadoop
  4. 4. Slide 4© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Big Data and its Challenges Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications Systems / Enterprises generate huge amount of data from Terabytes to and even Petabytes of information It’s very difficult to manage such huge data…… Get Started with BIG Data & Hadoop
  5. 5. Slide 5© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Who Generates Big Data? Have you ever wondered how Google, Facebook or LinkedIn manages to store and utilize the huge data? Today, it is becoming a problem for all of us to manage such BIG DATA…. Get Started with BIG Data & Hadoop
  6. 6. Slide 6© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop can be used for easy processing of such huge Data….. We will answer how? Before that let’s understand what is Hadoop? Get Started with BIG Data & Hadoop
  7. 7. Slide 7© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop and its Characteristics Apache Hadoop is a framework that allows the distributed processing of large data sets across clusters of commodity computers using a simple programming model It is an Open-source Data Management technology with scale-out storage and distributed processing Hadoop Characteristics Flexible Reliable Economical Scalable Get Started with BIG Data & Hadoop
  8. 8. Slide 8© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Why Hadoop? How does Hadoop solve the Big Data challenges? Hadoop Platform is designed to address the big data problems Size of Data Variety of Data Get Started with BIG Data & Hadoop
  9. 9. Slide 9© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Hadoop Ecosystem Flume Sqoop Import Or Export Unstructured or Semi-Structured data Structured Data Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Other YARN Frameworks (MPI, GIRAPH) YARN Cluster Resource Management Get Started with BIG Data & Hadoop
  10. 10. Slide 10© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Map Reduce Get Started with BIG Data & Hadoop
  11. 11. Slide 11© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Map Reduce – Scenario Let us consider a real life scenario to understand the importance of “Map Reduce” in Hadoop Suppose, you are the handling a project which has x tasks and takes 100 hours for one resource to complete 1 x 100 = 100 hours 100/10(resources) = 10 hours Get Started with BIG Data & Hadoop
  12. 12. Slide 12© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Similarly, = 100 hours 100/10 = 10 hours Map Reduce – Scenario Get Started with BIG Data & Hadoop
  13. 13. Slide 13© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com More Scenarios on Map-Reduce Problem Statement: Find maximum stock market levels recorded in a span of 5 years Problem Statement: De-identify personal identifier information Get Started with BIG Data & Hadoop
  14. 14. Slide 14© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Traditional Solution matchesSplit Data Very Big Data All matches grep grep grep cat grep : matches matches matches Split Data Split Data Split Data Get Started with BIG Data & Hadoop
  15. 15. Slide 15© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Solution Very Big Input Split Data All matches : Split Data Split Data Split Data M A P R E D U C E MapReduce Framework Get Started with BIG Data & Hadoop
  16. 16. Slide 16© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Advantages Two biggest advantages: ᗍ Takes processing to the data ᗍ Allows processing data in parallel a b c Map Task HDFS Block Data Center Rack Node Get Started with BIG Data & Hadoop
  17. 17. Slide 17© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com MapReduce Flow 1. Input data is present in data nodes 2. Map tasks = Input Splits 3. Mappers produce intermediate data 4. Data exchanged among nodes in “shuffling” 5. All data of same key goes to same reducer 6. Reducer output stored at output location Node 1 INPUT DATA Map Node 2 Map Node 1 Reduce Node 1 Reduce Get Started with BIG Data & Hadoop
  18. 18. Slide 18© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com What is Expected? In this section, we will discuss the questions on HDFS and MapReduce that is asked during the interview This will help you analyze the importance of the topics under study! Get Started with BIG Data & Hadoop
  19. 19. Slide 19© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Job Trends – Hadoop Get Started with BIG Data & Hadoop
  20. 20. Slide 20© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Why SkillSpeed? Course Curriculum from Industry Experts Instructor Led Live Virtual Sessions Lifetime access to Course Content via LMS 100% Placement Assistance 24x7 Support Get Started with BIG Data & Hadoop
  21. 21. Slide 21© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Course Topics Module 1 Introduction to Big Data and Hadoop Module 2 HDFS Internals, Hadoop Configurations and Data Loading Module 3 Introduction to Map Reduce Module 4 Advanced Map Reduce Concepts Module 5 Introduction to Pig Module 6 Advanced Pig and Introduction to Hive Module 7 Advanced Hive Concepts Module 8 Extending Hive and HBase Introduction Module 9 Advanced HBase and Oozie Introduction Module 10 Project Set-up Discussion Get Started with BIG Data & Hadoop
  22. 22. Slide 22© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Corporate Partners Get Started with BIG Data & Hadoop
  23. 23. Slide 23© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Lines open 24/7 To know more about the course, Please contact: IND +91-90660-20904 USA 1866-607-6547 (Toll Free) Or reach us at sales@skillspeed.com Contact Us Get Started with BIG Data & Hadoop
  24. 24. Slide 24© 2015 BlueCamphor Technologies (P) Ltd. www.skillspeed.com Image References Google images – credit for google, Facebook and LinkedIn LOGO and Snapshots http://findicons.com/icon/66444/user_group http://www.virtualizor.com/tour https://accounts.it.et.byu.edu/ http://www.clipartsfree.net/tag/server.html http://www.gopixpic.com/16/time-clock-icon-png-download http://blog.smartbear.com/requirements/how-to-interview-users-to-find-out-what-they-really-want/ http://www.lincs.fr/research/areas/big-data/ http://www.counsellingpages.co.uk/ http://langfordsconsultancy.com/langfords-training-support-package/ http://cbsepathshala.blogspot.in/2012/05/physics-class-x-chapter-electricity.html http://mmatycoon.com/tycoontimes/tycoontimesstory.php?SID=1010 http://imgarcade.com/1/big-data-cartoon/

×