Scanning the Internet for External Cloud Exposures via SSL Certs
Hadoop MapReduce
1. SEMINAR ONSEMINAR ON
Android App DevelopmentAndroid App Development
Trained by-Trained by-
Hewlett-Packard Education Services,Hewlett-Packard Education Services,
MumbaiMumbai
Presented to-
Mr. R.K. Banyal By-
Mr. Hukum Chand Saini Urvashi Kataria
2. About HPES:About HPES:
• American global IT company headquartered in Palo-
Alto, California, US.
• Provider of products, soft wares, technologies,
solutions and services to individual as well as small
& medium sized business.
• Major operations include- HP Software, HP Financial
Services & Corporate Investments
• Provides practical training in fields like Big Data,
Android App Dev, Embedded Systems etc.
3. An android application that allows you to enjoy your as well as
your dear ones birthday.
Save the days, get reminded of them, capture moments on the
day itself, get greeted by the app, and celebrate!!
About Birthday Bash:About Birthday Bash:
9. Why MapReduce?Why MapReduce?
• Large scale data processing was difficult!
Managing hundreds or thousands of processors
Managing parallelization and distribution
Reliable execution with easy data access
MapReduce provides all of these, easily!
13. How Map and Reduce WorkHow Map and Reduce Work
TogetherTogether
14. Hadoop MapReduce: A Closer LookHadoop MapReduce: A Closer Look
file
file
InputFormat
Split Split Split
RR RR RR
Map Map Map
Input (K, V) pairs
Partitioner
Intermediate (K, V) pairs
Sort
Reduce
OutputFormat
Files loaded from local HDFS store
RecordReaders
Final (K, V) pairs
Writeback to local
HDFS store
file
file
InputFormat
Split Split Split
RR RR RR
Map Map Map
Input (K, V) pairs
Partitioner
Intermediate (K, V) pairs
Sort
Reduce
OutputFormat
Files loaded from local HDFS store
RecordReaders
Final (K, V) pairs
Writeback to local
HDFS store
Node 1 Node 2
Shuffling
Process
Intermediate
(K,V) pairs
exchanged by
all nodes
15. AlgorithmAlgorithm
map(key, value):
// key: document name; value: text of document
for each word w in value:
emit(w, 1)
reduce(key, values):
// key: a word; values: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key,result)
map(key=url, val=contents):
for each word w in contents:
emit (w, “1”)
reduce(key=word, values=uniq_counts):
//Sum all “1”s in values list
emit result “(word, sum)”
19. Service ProvidersService Providers
• Open Source
o Apache
• Commercial
o Cloudera
o Hortonworks
o MapR
o AWS MapReduce
o Microsoft HDInsight (Beta)
20. Advancements:Advancements:
MRV1 & MRV2MRV1 & MRV2
MRV2 (MAPREDUCE VERSION 2)
•Splits the existing JobTracker’s roles
o Resource management
o Job lifecycle management
•MapReduce 2.0 provides many benefits over the existing
MapReduce framework:
o Better scalability
o Through distributed job lifecycle management
o Support for multiple Hadoop MapReduce API versions in a
single cluster
22. Advantages of MapReduceAdvantages of MapReduce
• Distributed data and computation.
• Tasks are independent. Entire nodes can fail and restart.
• Linear scaling in the idle case. It’s used to design cheap
commodity, hardware.
• Simple programming model. The end-user programmer
only writes map reduce task.
23. Disadvantages/ Cases where MR isn’tDisadvantages/ Cases where MR isn’t
a suitable choice:a suitable choice:
• Real time processing
• It is not always very easy to implement each and every
thing as a map reduce program
• When your intermediate processes need to talk to each
other
• When your processing requires lot of data to be shuffled
over the network
• When you need to handle streaming data. MR is best suited
to batch process huge amount of data which you already
have
25. RDBMS vs. HadoopRDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response
Time
Can be near immediate Has latency (due to batch
processing)
26. ReferencesReferences
• J. Dean and S. Ghemawat. “MapReduce: Simplified Data
Processing on Large Clusters.” Proceedings of the 6th
Symposium on Operating System Design and Implementation
(OSDI 2004), pages 137-150. 2004.
• S. Ghemawat, H. Gobioff, and S.-T. Leung. “The Google File
System.” OSDI 200?
• http://hadoop.apache.org/common/docs/current/mapred_tutori
al.html. “Map/Reduce Tutorial”. Fetched January 21, 2010.
• Tom White. Hadoop: The Definitive Guide. O'Reilly Media.
June 5, 2009
• http://developer.yahoo.com/hadoop/tutorial/module4.html
• J. Lin and C. Dyer. Data-Intensive Text Processing with
MapReduce, Book Draft. February 7, 2010.