6. Accessibility
MapReduce에서 Hive로
1. Group By & Sum
!
!
!
!
2. UDF & UDAF
Map
!
Group By
Key별 수집
Reduce
!
Group By
Key별 Sum
Map
!
UDF
Reduce
!
UDAF
Select
key1,
sum(key1)
From
Table
Group
by
key1;
Select
udf(key1)
From
Table;
Select
key1,
udaf(key1)
From
Table
Group
by
key1;
7. Accessibility
MapReduce에서 Hive로
3. Transform
!
Map
!
Transform
Reduce
!
Transform
FROM
(
FROM
records2
MAP
year,
temperature,
quality
USING
'is_good_quality.py'
AS
year,
temperature)
map_output
REDUCE
year,
temperature
USING
'max_temperature_reduce.py'
AS
year,
temperature;
Hadoop Definitive Guide
9. Accessibility
insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring,
x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple, x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen,
x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo,
a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs
as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d
right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from
tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and
months >= 15 and event_type = 'event1' and mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string =
d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as
bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as
chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo,
bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo
= trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo,
probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from
master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and
s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string,
s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select
probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim('
20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and
((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by
probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string
is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm,
e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs
as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct
probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs,
mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2')
and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on
a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm,
round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else
s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm,
prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring,
testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string,
marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series,
pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where
s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series +
s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on
a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from
master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),
5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and
probe_chg_result_spring in ('11','12')) or probe_chg_spring in ('31','32')) group by probe_mgmt_string) c on
a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by
prob_outs_ko asc limit 10000) x
12. Accessibility
Fair Scheduler로 Queue별 Quota 설정
!
!
tom
!
jerry
!
default
30%
30%
20%
40%
40%
Fair Share
Over Fair Share
0% 50% 100%
Load 1!
(Transform)
특정 Queue만 사용할 경우 다소 억제
다수의 Queue가 동시에 사용될 경우 여전히 문제
독점 사용 문제 해결
set mapred.job.queue.name=tom;
13. Accessibility
code value SELECT
code,
sum(value)
a | 1
a | 38
a | 45
b | 9
a | 34
a | 12
a | 78
FROM
Table
GROUP
BY
code;
Mapper
Mapper
Mapper
a!
Reducer
b!
Reducer
왜
99%에서
안
끝나죠?!!!
14. Accessibility
select
/*+
MAPJOIN(b)
*/
count(*)
from
tableA
a
join
tableB
b
on
(a.id
=
b.id);
원래보다
너무
느려요!!!
17. Accessibility
insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring, x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple,
x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen, x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo,
a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as
silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs
from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and
mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm,
round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select
probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring,
testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo,
probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where
s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from
master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim('
20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by probe_mgmt_string) c on a.probe_mgmt_string =
c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string,
e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs,
a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select
probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and
testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else
s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series,
bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring,
marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where
silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series + s.bf_m3_series +
s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select
probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),
5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11','12')) or probe_chg_spring in
('31','32')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_outs_ko asc
limit 10000) x
21. Accessibility
insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring, x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple,
x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen, x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo,
a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as
silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs
from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and
mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm,
round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select
probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring,
testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo,
probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where
s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from
master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim('
20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by probe_mgmt_string) c on a.probe_mgmt_string =
c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string,
e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs,
a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select
probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and
testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else
s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series,
bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring,
marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where
silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series + s.bf_m3_series +
s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select
probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),
5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11','12')) or probe_chg_spring in
('31','32')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_outs_ko asc
limit 10000) x
4 days 4 hours
25. Accessibility
insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring, x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple,
x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen, x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo,
a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as
silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs
from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and
mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm,
round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select
probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring,
testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo,
probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where
s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from
master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim('
20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by probe_mgmt_string) c on a.probe_mgmt_string =
c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string,
e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs,
a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select
probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and
testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else
s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series,
bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring,
marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where
silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series + s.bf_m3_series +
s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select
probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),
5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11','12')) or probe_chg_spring in
('31','32')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_outs_ko asc
limit 10000) x
4 days 4 hours
2 days 2 hours
27. Expansion
Current System 분석
Job
Detail
저장
표준화 ETL, Cleansing, Lineage 등
저장 20 PB 저장 능력
공급 원활한 데이터 공급 (Real Time / Batch)
프로세싱
분석 R, Python 등 분석 중심
DW/
Realtime
Low Latency, Event Processing
현황
Data Size (압축 / Origin)
Day 50 TB / 250 TB
Year 18.25 PB / 91.25 PB
Job Type
저장 표준화/저장/공급
프로세싱 분석/Real Time
30. Expansion : Saturn (저장)
Disk Fault at Datanode
High IO
Low IO
Eject!
RoundRobin Available Space
31. Expansion : Saturn (저장)
High Temperature at Datanode
Disk Controller
참고 쓰는 중
http://rlv.zcache.com/suppressed_laughing_yellow_smiley_face_stickers-r200e51f37ff941a38208de69f6c51657_v9waf_8byvr_512.jpg
40. Expansion : Yarn
Multiple Processing Engine들의 Resrouce Mgmt를 위해
NodeManager
yarn.nodemanager.resource.memory-mb!
Node Manager에서 관리하는 전체 메모리
yarn.nodemanager.resource.cpu-vcores!
Node Manager에서 관리하는 CPU Core수
Resource
Manager
yarn.scheduler.minimum-allocation-mb!
각 Node Manager에 할당할 수 있는 Container당 !
최소 메모리
41. Expansion : Yarn
Multiple Processing Engine들의 Resrouce Mgmt를 위해
Resource
Manager 최소메모리 : 3G
NodeManager
각 Node당 단 1개의 Container만 생성
이렇게 일주일 운영
전체메모리 : 3G
사용가능 Core수 : 18
Container
TaskJVM!
(ex.
TaskTracker
Fork 1개)
42. Expansion : Yarn
Multiple Processing Engine들의 Resrouce Mgmt를 위해
Resource
Manager 최소메모리 : 3G
NodeManager
각 Node당 18 개의 Container 생성 가능
전체메모리 : 54 G
사용가능 Core수 : 18
Container
TaskJVM!
(ex.
TaskTracker
Fork 1개)
43. Expansion : Compress
Snappy가 좋다고 하길래
용량이 넉넉해서 마음껏 사용!
Raw Data 250 TB/day Snappy 90 TB/day
32 PB/year
용량이 부족
Snappy 90 TB/day GZip 50 TB/day
2달 걸림.
44. Expansion : NameNode HA
Automatic FailOver면 안심해도 되는줄
Zookeeper Timeout : 60초
NameNode GC하는데 3분 30초 걸림
Standby로 FailOver했는데, !
Hadoop Client들이 원래 Active로만 연결
전 Cluster 장애
Zookeeper Timeout : 10분
45. Expansion : Too Many CLOSE_WAIT
MR V1 & Datanode
1. connect
TaskTracker DataNode
2. block 요청
3. send block
4. close
CLOSE_WAIT FIN_WAIT
2시간 내로 없어지지 않음.
client socket port 고갈
TT Restart
47. Lessons Learned
Accessibility
1. 누구나 쉽게 접근할 수 있어야 한다.
2. 프로그램은 할 줄 몰라도 동작원리는 알아야 한다.
3. 쉬우면 많은 사람들이 접근한다.
4. 누구나 분석가가 되어간다.
48. Lessons Learned
Expansion
1. Network는 Hadoop의 혈관과 같다.
2. Yarn은 아직 사용하기 시기 상조다. 설정 정보가 너무 많고, 상관 관계도 너무 복잡하다.
3. Hadoop 이중화는 반드시 Client도 확인해야 한다.
4. Hadoop 이중화가 그렇다고 정말 안전하지도 않다.
5. Yahoo 2,000대는 아마도 디스크가 작았던 것 같다.
6. 아직 해야할 일이 많다.
50. Approximate Query Engine
Blink DB 처럼
select
sum(val1)
from
table
where
key=’a’
within
3
seconds
select
sum(val1)
from
table
where
key=’a’
Error
Rate
10%
51. Approximate Query Engine
Zoomable Data Navigation
select
sum(val1)
from
table
where
age
between
1
and
10
within
1
seconds
select
sum(val1)
from
table
where
age
between
1
and
10
within
10
seconds