Hadoop map reduce

WDABT 2016 – BHARATHIAR
UNIVERSITY
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar
University,- WDABT 2016

TAkE A CloSER look
AT
PRESENTED BY
K.SANTHIYA
PH.D RESEARCH SCHolAR
UNDER THE GUIDANCE of
DR.V.BHUVANESWARI
ASSISTANT PRofESSoR
DEPARTmENT of ComPUTER APPlICATIoNS
BHARATHIAR UNIVERSITY
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar
University,- WDABT 2016

AGENDA
• MAPREDUCE
•  ANALOGY
•  EXECUTION
•  HADOOP INTERACTION
•  BUILD MAPREDUCE PROGRAM IN ECLIPSE
YARN
•  YARN DEFINITION
•  YARN REAL LIFE CONNECT
•  YARN INRASTRUCTURE
K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016

WHY mAPREDUCE
WDABT 2016

mAP REDUCE
WDABT 2016

REAl TImE USES of mAP
REDUCE
WDABT 2016

MR REAL – LIFE CONNECT
WDABT 2016

MAP REDUCE - ANALOGY
WDABT 2016

MAP REDUCE – ANALOGY CONTD.,
WDABT 2016

MAP REDUCE EXAMPLE
WDABT 2016

MAP EXECUTION
WDABT 2016

MAP EXECUTION – DISTRIBUTED TWO NODE
ENVIRONMENT
WDABT 2016

MAPREDUCE JOBS
WDABT 2016

HADOOP JOB WORK
INTERACTION
WDABT 2016

CHARACTERISTICS OF MR
• MapReduce is designed to handle very large scale data in
the range of petabytes and exabytes.
• It works well on write once and read many data, also
known as WORM data.
• MapReduce allows parallelism without mutexes.
• The Map and Reduce operations are performed by the
same processor.
• Operations are provisioned near the data as data locality is
preferred.
• Commodity hardware and storage is leveraged in
MapReduce.
• The runtime takes care of splitting and moving data for
operations.
WDABT 2016

BUSINESS SCENARIO
WDABT 2016

SET UP ENVIRONMENT
WDABT 2016

SMALL DATA AND BIG DATA
WDABT 2016

UPLOADING SMALL & BIG DATA
WDABT 2016

BUILD MAPREDUCE PROGRAM
WDABT 2016

MAPREDUCE DEMO
• We will be running an example to compute the value of
‘pi’, which is a computation intensive program. The first
argument indicates how many maps to create. Here, we
use 10 mappers. The second argument indicates how
many samples are generated per map; here, we take 100
random samples. So this program uses 10 multiplied by
100, that is, 1000 random points to estimate pi. We could
enhance 100 to 10 million and improve accuracy.
WDABT 2016

HADOOP MR REQUIREMENTS
WDABT 2016

Create a New Project : Step 1
WDABT 2016

WDABT 2016

CHECKING HADOOP
ENVIRONMENT FOR MAPREDUCE
WDABT 2016

Build a MR Application Using Eclipse
and Run in Hadoop Cluster
Let’s build a MapReduce Java program in Eclipse and then run in our Hadoop
cluster. In this demo, we will run Eclipse in the Windows development
machine and our Hadoop cluster will be in Ubuntu.
• First, let’s launch Eclipse.
• 2. Enter the workspace location.
• 3. Click OK
• 4. The Eclipse window will open.
• 5. Close the welcome screen of Eclipse.
• 6. Select the New menu item.
• 7. Select Java Project.
• 8. The New Java Project window opens.
• 9. We will be build a WordCount program here to count the number of
times each word occurs in a particular file.
WDABT 2016

and Run in Hadoop Cluster Contd.,
10. Enter the name of the project as ‘WordCount’ and click Finish.
11. Right click the WordCount project in the panel on the left.
12. Select New and then Class.
13. The New Java Class window opens.
14. Enter the name of the class as ‘WordCount’.
15. Click Finish.
16. Now, let’s copy the WordCount program from the MapReduce tutorial on
Hadoop’s website. You may go to Hadoop’s documentation or directly go
to the link being shown.
17. Copy the source code for the Word Count program.
18. You would notice a lot of compilation errors. Let’s fix the build patch
now. Select the project WordCount.
WDABT 2016

19. Select the Project menu item.
20. Click Properties.
21. In libraries, add external JARs.
22. Browse to the unpacked Hadoop directory and go to share- Hadoop-
MapReduce directory.
23. Select the Hadoop MapReduce client core and Hadoop MapReduce client
common JAR files.
24. Now, go to share-Hadoop-common directory.
25. Select the Hadoop common JAR file.
26. The compilation errors would have gone by now.
27. Let’s now see various portions of this program.
28. The usual Java imports are at the top of the program.
29. Further, there are Hadoop and MapReduce related import statements.
Select the Description column header.K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016

30. In the main method, we begin by setting configuration of the MapReduce
job.
31. We set the name of the Mapper class.
32. We set the name of Combiner class.
33. Similarly, there is a Reducer class.
34. We can set the output key class.
35. We can also set the output value class.
36. Also, set the input data path for the source dataset.
37. Set the output path to a location where the results are desired.
38. Our Mapper class extends Mapper.
39. It has a map method which takes key and value as arguments and uses
context.
40. In the WordCount logic, we just tokenize each line by space character and
extract individual words.K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-
WDABT 2016

41. Our Reducer class similarly extends Reducer.
42. The Reduce method takes a key and an iterable list of values as arguments.
43. The final output is again written as key value pairs.
44. Select the New menu item.
45. Let’s now build and export a JAR file to run this program on a Hadoop
cluster. Click File menu and then Export.
46. The Export window opens.
47. Expand Java.
48. Select JAR file.
49. Click the Next button.
50. Enter the path and name of JAR. In this case, let’s name it
‘WordCount.jar’.
WDABT 2016

51. Make sure that you to select the project.
52. Now, let’s transfer this JAR to the Hadoop cluster. If you are using
Windows, you can use any SCP or FTP client such as WinSCP. Login to
WinSCP using the IP address of the Hadoop Ubuntu cluster.
53. Enter the username of the Hadoop machine.
54. Enter the password.
55. Select the WordCount.jar file from the local Windows machine.
56. Using WinSCP, you can drag and drop to the Ubuntu machine in the panel
on the right.
57. The Copy window opens.
58. Click the Copy button.
WDABT 2016

59. Now, run the WordCount program in the Hadoop cluster using the hadoop
jar command. Specify the input file name on which WordCount is to be
applied and also the output result path.
60. View the results in the output directory.
61. You will notice a file named similar to the part Out1.
62. View the contents of this output file using the hadoop fs -cat command.
63. The output will have a count of each word’s occurrence in the input
dataset.
WDABT 2016

WHY YARN ?
YARN : Yet ANotheR ResouRce NAvigAtoR
WDABT 2016

WHAT IS YARN ?
YARN is a resource manager. It was created by separating
the processing engine and the management function of
MapReduce. It monitors and manages workloads,
maintains a multi-tenant environment, manages the high
availability features of Hadoop, and implements security
controls.
WDABT 2016

YARN – REAL LIFE CONNECT
• Limitations of MapReduce
• Architected by Yahoo
• Hadoop 2.0 provides a broader ecosystem with
– Spark for Iterative processing
– Storm for Stream processing
– Hadoop for Batch processing
WDABT 2016

YARN INFRASTRUCTURE
WDABT 2016

REFERENCES
• (2012) Carl W. Olofson, Dan Vesset.
Worldwide Hadoop – MapReduce Ecosystem
Software 2012-2016 Forecast [Online] Available
: http://www.idc.com/getdoc.jsp?
containerId=234294
• Philip Russom , " Big Data Analytics " ,
presented by tdwi , 2011
• K. Cukier, “Data, data everywhere,'' Economist,
vol. 394, no. 8671,pp. 3_16, 2010
WDABT 2016

Hadoop map reduce

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Hadoop map reduce

Similar to Hadoop map reduce (20)

Recently uploaded

Recently uploaded (20)

Hadoop map reduce