Radu Pastia: I've been working with Hadoop two years ago, when I started the Big Data Team at Avira. At first I was oriented more towards the operations side - sizing and setting up our new Hadoop cluster to run smoothly. As our setup stabilized, I started delving deeper into data science and machine learning. I have been coding ever since I had my first home computer running BASIC and my background before Hadoop is in backend scripting for web-based applications.
6. Building a connector – The Right Way
Mapper
Par$$oner
Reducer
Input
Format
Input
Split
Record
Reader
Output
Format
Record
Writer
7.
8.
9.
10. The InputFormat: From Input to Mapper
--range 2014-09-01;2014-09-20
--number_of_mappers 4
2014-‐09-‐01
2014-‐09-‐02
2014-‐09-‐03
2014-‐09-‐04
2014-‐09-‐05
…
…
…
2014-‐09-‐06
2014-‐09-‐20
Input Split 1
2014-‐09-‐01
2014-‐09-‐02
...
2014-‐09-‐05
Record Reader 1
(2014-‐09-‐01-‐A;
record
A)
(2014-‐09-‐01-‐B;
record
B)
(2014-‐09-‐01-‐…;
record
…)
(2014-‐09-‐02-‐A;
record
A)
(2014-‐09-‐02-‐B;
record
B)
(2014-‐09-‐02-‐…;
record
…)
(2014-‐09-‐05-‐A;
record
A)
(2014-‐09-‐05-‐B;
record
B)
(2014-‐09-‐05-‐…;
record
…)
Mapper
11.
12.
13.
14.
15.
16. The InputFormat: From Input to Mapper
--range 2014-09-01;2014-09-20
--number_of_mappers 4
2014-‐09-‐01
2014-‐09-‐02
2014-‐09-‐03
2014-‐09-‐04
2014-‐09-‐05
…
…
…
2014-‐09-‐06
2014-‐09-‐20
Input Split 1
2014-‐09-‐01
2014-‐09-‐02
...
2014-‐09-‐05
Record Reader 1
(2014-‐09-‐01-‐A;
record
A)
(2014-‐09-‐01-‐B;
record
B)
(2014-‐09-‐01-‐…;
record
…)
(2014-‐09-‐02-‐A;
record
A)
(2014-‐09-‐02-‐B;
record
B)
(2014-‐09-‐02-‐…;
record
…)
(2014-‐09-‐05-‐A;
record
A)
(2014-‐09-‐05-‐B;
record
B)
(2014-‐09-‐05-‐…;
record
…)
Mapper