[2A3]Big Data Launching Episodes

NAVER D2
NAVER D2NAVER D2
[2A3]Big Data Launching Episodes
[2A3]Big Data Launching Episodes
Big Data Launching 
Episodes 
안성화 Manager / Data Tech Lab 
SK Telecom
CONTENTS 
1. Accessibility 
2. Expansion 
3. Lessens Learned 
4. Future
1. Accessibility 
10 
GB/Hour 100 
MB/Hour 
GroupBy 
& 
Sum 
SKT 최초의 Hadoop 시스템
Accessibility 
MapReduce에서 Hive로 
1. Group By & Sum 
! 
! 
! 
! 
2. UDF & UDAF 
Map 
! 
Group By 
Key별 수집 
Reduce 
! 
Group By 
Key별 Sum 
Map 
! 
UDF 
Reduce 
! 
UDAF 
Select 
key1, 
sum(key1) 
From 
Table 
Group 
by 
key1; 
Select 
udf(key1) 
From 
Table; 
Select 
key1, 
udaf(key1) 
From 
Table 
Group 
by 
key1;
Accessibility 
MapReduce에서 Hive로 
3. Transform 
! 
Map 
! 
Transform 
Reduce 
! 
Transform 
FROM 
( 
FROM 
records2 
MAP 
year, 
temperature, 
quality 
USING 
'is_good_quality.py' 
AS 
year, 
temperature) 
map_output 
REDUCE 
year, 
temperature 
USING 
'max_temperature_reduce.py' 
AS 
year, 
temperature; 
Hadoop Definitive Guide
Accessibility 
MapReduce에서 Hive로 
4. GUI (Hue) 
http://gethue.com/wp-content/uploads/2014/03/hue-3.6.png
Accessibility 
insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring, 
x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple, x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen, 
x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, 
a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs 
as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d 
right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from 
tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and 
months >= 15 and event_type = 'event1' and mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string = 
d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as 
bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as 
chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, 
bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo 
= trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, 
probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from 
master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and 
s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, 
s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select 
probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 
20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and 
((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by 
probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string 
is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, 
e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs 
as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct 
probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, 
mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') 
and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on 
a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, 
round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else 
s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, 
prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, 
testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, 
marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, 
pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where 
s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series + 
s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on 
a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from 
master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '), 
5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and 
probe_chg_result_spring in ('11','12')) or probe_chg_spring in ('31','32')) group by probe_mgmt_string) c on 
a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by 
prob_outs_ko asc limit 10000) x
Accessibility 
6491.4 
3200
Accessibility 
Job ID … Map % Map Total Maps Completed 
Job_1 12% 120,000 12,000 
Job_2 0% 512 0 
… … … … …
Accessibility 
Fair Scheduler로 Queue별 Quota 설정 
! 
! 
tom 
! 
jerry 
! 
default 
30% 
30% 
20% 
40% 
40% 
Fair Share 
Over Fair Share 
0% 50% 100% 
Load 1! 
(Transform) 
특정 Queue만 사용할 경우 다소 억제 
다수의 Queue가 동시에 사용될 경우 여전히 문제 
독점 사용 문제 해결 
set mapred.job.queue.name=tom;
Accessibility 
code value SELECT 
code, 
sum(value) 
a | 1 
a | 38 
a | 45 
b | 9 
a | 34 
a | 12 
a | 78 
FROM 
Table 
GROUP 
BY 
code; 
Mapper 
Mapper 
Mapper 
a! 
Reducer 
b! 
Reducer 
왜 
99%에서 
안 
끝나죠?!!!
Accessibility 
select 
/*+ 
MAPJOIN(b) 
*/ 
count(*) 
from 
tableA 
a 
join 
tableB 
b 
on 
(a.id 
= 
b.id); 
원래보다 
너무 
느려요!!!
Accessibility 
hadoop 
fs 
-­‐text 
xxx.snappy 
> 
xxx.gzip 
hadoop 
fs 
-­‐put 
xxx.gzip 
/ 
fasdjlkfjlak 
UnSplittable!! 
sjdfljasdfjl 
Only 
1 
Mapper!! 
kasdjfljau82 
n381qslfj832 
9ruqw9ufoiau 
8qwue899288u 
q98r912ioquq
Accessibility 
http://www.bbc.co.uk/bitesize/ks3/maths/shape_space/2d_shapes/revision/3/
Accessibility 
insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring, x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple, 
x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen, x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, 
a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as 
silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs 
from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and 
mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, 
round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select 
probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, 
testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, 
probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where 
s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by 
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from 
master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 
20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by probe_mgmt_string) c on a.probe_mgmt_string = 
c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string, 
e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, 
a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select 
probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and 
testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select 
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else 
s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, 
bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, 
marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where 
silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series + s.bf_m3_series + 
s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select 
probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '), 
5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11','12')) or probe_chg_spring in 
('31','32')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_outs_ko asc 
limit 10000) x
Accessibility 
1,000,000,000,000
1,000,000,000,000 
100 TB 
Accessibility
Accessibility 
1,000,000,000,000 
100 TB 
4 days 4 hours
Accessibility 
insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring, x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple, 
x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen, x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, 
a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as 
silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs 
from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and 
mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, 
round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select 
probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, 
testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, 
probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where 
s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by 
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from 
master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 
20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by probe_mgmt_string) c on a.probe_mgmt_string = 
c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string, 
e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, 
a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select 
probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and 
testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select 
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else 
s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, 
bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, 
marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where 
silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series + s.bf_m3_series + 
s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select 
probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '), 
5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11','12')) or probe_chg_spring in 
('31','32')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_outs_ko asc 
limit 10000) x 
4 days 4 hours
http://www.ldn.net.au/wp-content/uploads/2013/09/boost-your-marketing-effectiveness.png
1월 2월 3월 4월 
http://fc06.deviantart.net/fs70/f/2009/357/f/c/Hopelessness_by_sarafim.jpg
Accessibility 
Hive에서 Tajo/Impala로 
! 
100 
! 
75 
50 
25 
0 
April May June July 
Data 
Size 
/ 
Date 
Impala 
Tajo
Accessibility 
insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring, x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple, 
x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen, x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, 
a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as 
silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs 
from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and 
mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, 
round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select 
probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, 
testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, 
probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where 
s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by 
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from 
master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 
20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by probe_mgmt_string) c on a.probe_mgmt_string = 
c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string, 
e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, 
a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select 
probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and 
testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select 
s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else 
s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, 
bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, 
marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where 
silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series + s.bf_m3_series + 
s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select 
probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '), 
5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11','12')) or probe_chg_spring in 
('31','32')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_outs_ko asc 
limit 10000) x 
4 days 4 hours 
2 days 2 hours
2. Expansion
Expansion 
Current System 분석 
Job 
Detail 
저장 
표준화 ETL, Cleansing, Lineage 등 
저장 20 PB 저장 능력 
공급 원활한 데이터 공급 (Real Time / Batch) 
프로세싱 
분석 R, Python 등 분석 중심 
DW/ 
Realtime 
Low Latency, Event Processing 
현황 
Data Size (압축 / Origin) 
Day 50 TB / 250 TB 
Year 18.25 PB / 91.25 PB 
Job Type 
저장 표준화/저장/공급 
프로세싱 분석/Real Time
Expansion 
Jupiter (분석) 
3 PB 
BigBang! 
Saturn (저장) 
20 !PB 
Neptune 
(Real Time) 
1 PB 
Flume
Expansion 
Saturn(저장) Cluster Topology 
2G 
40G DS 
10G 
X 
2 Bonding 
.! 
.! 
.! 
. 
.! 
.! 
.! 
. 
.! 
.! 
.! 
.! 
.! 
.! 
.! 
.! 
. 
Rack 
awareness 
4TB 
X 
12 
AS
Expansion : Saturn (저장) 
Disk Fault at Datanode 
High IO 
Low IO 
Eject! 
RoundRobin Available Space
Expansion : Saturn (저장) 
High Temperature at Datanode 
Disk Controller 
참고 쓰는 중 
http://rlv.zcache.com/suppressed_laughing_yellow_smiley_face_stickers-r200e51f37ff941a38208de69f6c51657_v9waf_8byvr_512.jpg
Expansion : Saturn (저장) + Flume 
Saturn(저장) Cluster Topology + Flume 
.! 
.! 
.! 
.! 
.! 
.! 
.! 
.! 
.! 
Flume 
Compress & Send / 1 minute 
Dynamic Frequency Scaling 
Maximum Performance
Expansion : Saturn (저장) + Flume 
Saturn(저장) Cluster Topology + Flume 
Flume 
Sending… 
Sudden Fault Disk at DataNode 
Eventual Sending using SSD
Expansion : Neptune (DW) 
Neptune(DW) Cluster Topology 
10G 
40G 
X 
2 20G 
Bonding 
.! 
.! 
.! 
. 
.! 
.! 
.! 
. 
.! 
.! 
.! 
.! 
.! 
.! 
.! 
.! 
. 
Rack 
awareness 
1TB 
X 
23 
SAS 
DS 
AS
Expansion : Neptune (DW) 
Bandwidth가 높을 때는 Network 필수적으로 점검할 사항 
20G 
Bonding 
$ ifconfig! 
…! 
eth0 Link encap:Ethernet HWaddr 38:EA:A7:38:53:24 ! 
UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1! 
RX packets:7170284853 errors:2456 dropped:25019 overruns:0 frame:2456! 
TX packets:31088639355 errors:0 dropped:0 overruns:0 carrier:0! 
collisions:0 txqueuelen:10000 ! 
RX bytes:41081083208513 (37.3 TiB) TX bytes:40786177694493 (37.0 TiB)! 
… 
GBIC 
$ ethtool -S eth0! 
…! 
rx_crc_errors : 2456! 
…
Expansion : Neptune (DW) 
Bandwidth가 높을 때는 Network 필수적으로 점검할 사항 
https://c0da80aa54a5e1ed7d2b945327c31140a345bfe8.googledrive.com/host/0BxotWZXnwSAGSS1qRE02eWVrU28/2013-07-kernel-networking-ring-buffer. 
png
Expansion : Neptune (DW) 
Bandwidth가 높을 때는 Network 필수적으로 점검할 사항 
# 
ethtool 
-­‐g 
eth0 
Ring 
parameters 
for 
eth0: 
Pre-­‐set 
maximums: 
RX: 
4096 
(최대) 
RX 
Mini: 
0 
RX 
Jumbo: 
0 
TX: 
4096 
Current 
hardware 
settings: 
RX: 
512 
(현재) 
RX 
Mini: 
0 
RX 
Jumbo: 
0 
TX: 
512 
잦은 Frame Packet Drop 발생 
# 
ethtool 
-­‐G 
eth0 
rx 
2048
Expansion : Neptune (DW) 
Bandwidth가 높을 때는 Network 필수적으로 점검할 사항 
rmem_max wmem_max 
tcp_mem 
socket 
Receive! 
Buffer 
Send! 
Buffer 
tcp_rmem 
tcp_wmem 
net.core.rmem_max = 16777216 
net.core.wmem_max = 16777216 
net.ipv4.tcp_rmem = 204800 204800 16777216 
net.ipv4.tcp_wmem = 204800 204800 16777216 
net.ipv4.tcp_wmem 
204800 204800 16777216 
R S 
R S
Expansion 
Jupiter (분석) 
3 PB 
BigBang! 
Saturn (저장) 
20 !PB 
Neptune 
(Real Time) 
1 PB 
Flume
Expansion : Yarn 
Multiple Processing Engine들의 Resrouce Mgmt를 위해 
NodeManager 
yarn.nodemanager.resource.memory-mb! 
Node Manager에서 관리하는 전체 메모리 
yarn.nodemanager.resource.cpu-vcores! 
Node Manager에서 관리하는 CPU Core수 
Resource 
Manager 
yarn.scheduler.minimum-allocation-mb! 
각 Node Manager에 할당할 수 있는 Container당 ! 
최소 메모리
Expansion : Yarn 
Multiple Processing Engine들의 Resrouce Mgmt를 위해 
Resource 
Manager 최소메모리 : 3G 
NodeManager 
각 Node당 단 1개의 Container만 생성 
이렇게 일주일 운영 
전체메모리 : 3G 
사용가능 Core수 : 18 
Container 
TaskJVM! 
(ex. 
TaskTracker 
Fork 1개)
Expansion : Yarn 
Multiple Processing Engine들의 Resrouce Mgmt를 위해 
Resource 
Manager 최소메모리 : 3G 
NodeManager 
각 Node당 18 개의 Container 생성 가능 
전체메모리 : 54 G 
사용가능 Core수 : 18 
Container 
TaskJVM! 
(ex. 
TaskTracker 
Fork 1개)
Expansion : Compress 
Snappy가 좋다고 하길래 
용량이 넉넉해서 마음껏 사용! 
Raw Data 250 TB/day Snappy 90 TB/day 
32 PB/year 
용량이 부족 
Snappy 90 TB/day GZip 50 TB/day 
2달 걸림.
Expansion : NameNode HA 
Automatic FailOver면 안심해도 되는줄 
Zookeeper Timeout : 60초 
NameNode GC하는데 3분 30초 걸림 
Standby로 FailOver했는데, ! 
Hadoop Client들이 원래 Active로만 연결 
전 Cluster 장애 
Zookeeper Timeout : 10분
Expansion : Too Many CLOSE_WAIT 
MR V1 & Datanode 
1. connect 
TaskTracker DataNode 
2. block 요청 
3. send block 
4. close 
CLOSE_WAIT FIN_WAIT 
2시간 내로 없어지지 않음. 
client socket port 고갈 
TT Restart
3. Lessons Learns
Lessons Learned 
Accessibility 
1. 누구나 쉽게 접근할 수 있어야 한다. 
2. 프로그램은 할 줄 몰라도 동작원리는 알아야 한다. 
3. 쉬우면 많은 사람들이 접근한다. 
4. 누구나 분석가가 되어간다.
Lessons Learned 
Expansion 
1. Network는 Hadoop의 혈관과 같다. 
2. Yarn은 아직 사용하기 시기 상조다. 설정 정보가 너무 많고, 상관 관계도 너무 복잡하다. 
3. Hadoop 이중화는 반드시 Client도 확인해야 한다. 
4. Hadoop 이중화가 그렇다고 정말 안전하지도 않다. 
5. Yahoo 2,000대는 아마도 디스크가 작았던 것 같다. 
6. 아직 해야할 일이 많다.
4. Future
Approximate Query Engine 
Blink DB 처럼 
select 
sum(val1) 
from 
table 
where 
key=’a’ 
within 
3 
seconds 
select 
sum(val1) 
from 
table 
where 
key=’a’ 
Error 
Rate 
10%
Approximate Query Engine 
Zoomable Data Navigation 
select 
sum(val1) 
from 
table 
where 
age 
between 
1 
and 
10 
within 
1 
seconds 
select 
sum(val1) 
from 
table 
where 
age 
between 
1 
and 
10 
within 
10 
seconds
Q&A
THANK YOU
1 of 53

Recommended

From Tensorflow Graph to Tensorflow Eager by
From Tensorflow Graph to Tensorflow EagerFrom Tensorflow Graph to Tensorflow Eager
From Tensorflow Graph to Tensorflow EagerGuy Hadash
585 views22 slides
Dynamic Mesh in OpenFOAM by
Dynamic Mesh in OpenFOAMDynamic Mesh in OpenFOAM
Dynamic Mesh in OpenFOAMFumiya Nozaki
120.8K views111 slides
CFD for Rotating Machinery using OpenFOAM by
CFD for Rotating Machinery using OpenFOAMCFD for Rotating Machinery using OpenFOAM
CFD for Rotating Machinery using OpenFOAMFumiya Nozaki
88.4K views90 slides
Adding Statistical Functionality to the DATA Step with PROC FCMP by
Adding Statistical Functionality to the DATA Step with PROC FCMPAdding Statistical Functionality to the DATA Step with PROC FCMP
Adding Statistical Functionality to the DATA Step with PROC FCMPJacques Rioux
967 views30 slides
3장 자동적으로 움직이는 게임 에이전트 생성법_2 by
3장 자동적으로 움직이는 게임 에이전트 생성법_23장 자동적으로 움직이는 게임 에이전트 생성법_2
3장 자동적으로 움직이는 게임 에이전트 생성법_2suitzero
471 views19 slides
openFrameworks 007 - 3D by
openFrameworks 007 - 3DopenFrameworks 007 - 3D
openFrameworks 007 - 3Droxlu
23K views30 slides

More Related Content

What's hot

openFrameworks 007 - graphics by
openFrameworks 007 - graphicsopenFrameworks 007 - graphics
openFrameworks 007 - graphicsroxlu
24.6K views37 slides
openFrameworks 007 - video by
openFrameworks 007 - videoopenFrameworks 007 - video
openFrameworks 007 - videoroxlu
19K views18 slides
Stabilizer: Statistically Sound Performance Evaluation by
Stabilizer: Statistically Sound Performance EvaluationStabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance EvaluationEmery Berger
17.2K views152 slides
Mnistauto 5 by
Mnistauto 5Mnistauto 5
Mnistauto 5Ali Rıza SARAL
78 views31 slides
Assignment 3 by
Assignment 3Assignment 3
Assignment 3Ayesha Bhatti
245 views3 slides
Simple, fast, and scalable torch7 tutorial by
Simple, fast, and scalable torch7 tutorialSimple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorialJin-Hwa Kim
143 views25 slides

What's hot(7)

openFrameworks 007 - graphics by roxlu
openFrameworks 007 - graphicsopenFrameworks 007 - graphics
openFrameworks 007 - graphics
roxlu24.6K views
openFrameworks 007 - video by roxlu
openFrameworks 007 - videoopenFrameworks 007 - video
openFrameworks 007 - video
roxlu19K views
Stabilizer: Statistically Sound Performance Evaluation by Emery Berger
Stabilizer: Statistically Sound Performance EvaluationStabilizer: Statistically Sound Performance Evaluation
Stabilizer: Statistically Sound Performance Evaluation
Emery Berger17.2K views
Simple, fast, and scalable torch7 tutorial by Jin-Hwa Kim
Simple, fast, and scalable torch7 tutorialSimple, fast, and scalable torch7 tutorial
Simple, fast, and scalable torch7 tutorial
Jin-Hwa Kim143 views

Viewers also liked

[2B1]검색엔진의 패러다임 전환 by
[2B1]검색엔진의 패러다임 전환[2B1]검색엔진의 패러다임 전환
[2B1]검색엔진의 패러다임 전환NAVER D2
7.2K views51 slides
[1C2]webrtc 개발, 현재와 미래 by
[1C2]webrtc 개발, 현재와 미래[1C2]webrtc 개발, 현재와 미래
[1C2]webrtc 개발, 현재와 미래NAVER D2
14.8K views32 slides
[1B3]모바일 앱 크래시 네이버에서는 어떻게 수집하고 보여줄까요 by
[1B3]모바일 앱 크래시 네이버에서는 어떻게 수집하고 보여줄까요[1B3]모바일 앱 크래시 네이버에서는 어떻게 수집하고 보여줄까요
[1B3]모바일 앱 크래시 네이버에서는 어떻게 수집하고 보여줄까요NAVER D2
14.7K views44 slides
[D2CAMPUS] Algorithm tips - ALGOS by
[D2CAMPUS] Algorithm tips - ALGOS[D2CAMPUS] Algorithm tips - ALGOS
[D2CAMPUS] Algorithm tips - ALGOSNAVER D2
5.6K views79 slides
[2A6]web & health 2.0. 회사에서의 data science란? by
[2A6]web & health 2.0. 회사에서의 data science란?[2A6]web & health 2.0. 회사에서의 data science란?
[2A6]web & health 2.0. 회사에서의 data science란?NAVER D2
4.6K views41 slides
[C5]deview 2012 nodejs by
[C5]deview 2012 nodejs[C5]deview 2012 nodejs
[C5]deview 2012 nodejsNAVER D2
22.1K views95 slides

Viewers also liked(20)

[2B1]검색엔진의 패러다임 전환 by NAVER D2
[2B1]검색엔진의 패러다임 전환[2B1]검색엔진의 패러다임 전환
[2B1]검색엔진의 패러다임 전환
NAVER D27.2K views
[1C2]webrtc 개발, 현재와 미래 by NAVER D2
[1C2]webrtc 개발, 현재와 미래[1C2]webrtc 개발, 현재와 미래
[1C2]webrtc 개발, 현재와 미래
NAVER D214.8K views
[1B3]모바일 앱 크래시 네이버에서는 어떻게 수집하고 보여줄까요 by NAVER D2
[1B3]모바일 앱 크래시 네이버에서는 어떻게 수집하고 보여줄까요[1B3]모바일 앱 크래시 네이버에서는 어떻게 수집하고 보여줄까요
[1B3]모바일 앱 크래시 네이버에서는 어떻게 수집하고 보여줄까요
NAVER D214.7K views
[D2CAMPUS] Algorithm tips - ALGOS by NAVER D2
[D2CAMPUS] Algorithm tips - ALGOS[D2CAMPUS] Algorithm tips - ALGOS
[D2CAMPUS] Algorithm tips - ALGOS
NAVER D25.6K views
[2A6]web & health 2.0. 회사에서의 data science란? by NAVER D2
[2A6]web & health 2.0. 회사에서의 data science란?[2A6]web & health 2.0. 회사에서의 data science란?
[2A6]web & health 2.0. 회사에서의 data science란?
NAVER D24.6K views
[C5]deview 2012 nodejs by NAVER D2
[C5]deview 2012 nodejs[C5]deview 2012 nodejs
[C5]deview 2012 nodejs
NAVER D222.1K views
[D2 CAMPUS] 분야별 모임 '보안' 발표자료 by NAVER D2
[D2 CAMPUS] 분야별 모임 '보안' 발표자료[D2 CAMPUS] 분야별 모임 '보안' 발표자료
[D2 CAMPUS] 분야별 모임 '보안' 발표자료
NAVER D21.4K views
swig를 이용한 C++ 랩핑 by NAVER D2
swig를 이용한 C++ 랩핑swig를 이용한 C++ 랩핑
swig를 이용한 C++ 랩핑
NAVER D23.1K views
Django에서 websocket을 사용하는 방법 by NAVER D2
Django에서 websocket을 사용하는 방법Django에서 websocket을 사용하는 방법
Django에서 websocket을 사용하는 방법
NAVER D29.3K views
[D2 COMMUNITY] Open Container Seoul Meetup - Docker security by NAVER D2
[D2 COMMUNITY] Open Container Seoul Meetup - Docker security[D2 COMMUNITY] Open Container Seoul Meetup - Docker security
[D2 COMMUNITY] Open Container Seoul Meetup - Docker security
NAVER D22.8K views
[112]rest에서 graph ql과 relay로 갈아타기 이정우 by NAVER D2
[112]rest에서 graph ql과 relay로 갈아타기 이정우[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우
NAVER D225.3K views
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제 by NAVER D2
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제
NAVER D25.6K views
파이어베이스 네이버 밋업발표 by NAVER D2
파이어베이스 네이버 밋업발표파이어베이스 네이버 밋업발표
파이어베이스 네이버 밋업발표
NAVER D21.2K views
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 by NAVER D2
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제
NAVER D25.6K views
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이 by NAVER D2
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이
[D2 CAMPUS] 부산대 Alcall 프로그래밍 경시대회 문제 풀이
NAVER D25.5K views
개알못의 오픈소스이야기 - 이상준님 by NAVER D2
개알못의 오픈소스이야기 - 이상준님개알못의 오픈소스이야기 - 이상준님
개알못의 오픈소스이야기 - 이상준님
NAVER D24.3K views
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제 풀이 by NAVER D2
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제 풀이[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제 풀이
[D2 CAMPUS] 숭실대 SCCC 프로그래밍 경시대회 문제 풀이
NAVER D25.3K views
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제풀이 by NAVER D2
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제풀이[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제풀이
[D2 CAMPUS] 2016 한양대학교 프로그래밍 경시대회 문제풀이
NAVER D25.8K views
[2A4]DeepLearningAtNAVER by NAVER D2
[2A4]DeepLearningAtNAVER[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER
NAVER D218.9K views
오픈소스 SW 라이선스 - 박은정님 by NAVER D2
오픈소스 SW 라이선스 - 박은정님오픈소스 SW 라이선스 - 박은정님
오픈소스 SW 라이선스 - 박은정님
NAVER D24.1K views

Similar to [2A3]Big Data Launching Episodes

Coverage Criteria- 2 different kinds of coverage criteria in this proj (1).pdf by
Coverage Criteria- 2 different kinds of coverage criteria in this proj (1).pdfCoverage Criteria- 2 different kinds of coverage criteria in this proj (1).pdf
Coverage Criteria- 2 different kinds of coverage criteria in this proj (1).pdfshyamsunder1211
4 views8 slides
How I Built a Power Debugger Out of the Standard Library and Things I Found o... by
How I Built a Power Debugger Out of the Standard Library and Things I Found o...How I Built a Power Debugger Out of the Standard Library and Things I Found o...
How I Built a Power Debugger Out of the Standard Library and Things I Found o...doughellmann
1.9K views65 slides
Gps c by
Gps cGps c
Gps cUD. Berkah Jaya Komputer
115 views41 slides
How I Built a Power Debugger Out of the Standard Library and Things I Found o... by
How I Built a Power Debugger Out of the Standard Library and Things I Found o...How I Built a Power Debugger Out of the Standard Library and Things I Found o...
How I Built a Power Debugger Out of the Standard Library and Things I Found o...doughellmann
283 views63 slides
Router Queue Simulation in C++ in MMNN and MM1 conditions by
Router Queue Simulation in C++ in MMNN and MM1 conditionsRouter Queue Simulation in C++ in MMNN and MM1 conditions
Router Queue Simulation in C++ in MMNN and MM1 conditionsMorteza Mahdilar
1.2K views16 slides
A2 Reference for SImulator public static final int MAX_PARKI.pdf by
A2 Reference for SImulator      public static final int MAX_PARKI.pdfA2 Reference for SImulator      public static final int MAX_PARKI.pdf
A2 Reference for SImulator public static final int MAX_PARKI.pdfankkarao9652
2 views8 slides

Similar to [2A3]Big Data Launching Episodes(20)

Coverage Criteria- 2 different kinds of coverage criteria in this proj (1).pdf by shyamsunder1211
Coverage Criteria- 2 different kinds of coverage criteria in this proj (1).pdfCoverage Criteria- 2 different kinds of coverage criteria in this proj (1).pdf
Coverage Criteria- 2 different kinds of coverage criteria in this proj (1).pdf
shyamsunder12114 views
How I Built a Power Debugger Out of the Standard Library and Things I Found o... by doughellmann
How I Built a Power Debugger Out of the Standard Library and Things I Found o...How I Built a Power Debugger Out of the Standard Library and Things I Found o...
How I Built a Power Debugger Out of the Standard Library and Things I Found o...
doughellmann1.9K views
How I Built a Power Debugger Out of the Standard Library and Things I Found o... by doughellmann
How I Built a Power Debugger Out of the Standard Library and Things I Found o...How I Built a Power Debugger Out of the Standard Library and Things I Found o...
How I Built a Power Debugger Out of the Standard Library and Things I Found o...
doughellmann283 views
Router Queue Simulation in C++ in MMNN and MM1 conditions by Morteza Mahdilar
Router Queue Simulation in C++ in MMNN and MM1 conditionsRouter Queue Simulation in C++ in MMNN and MM1 conditions
Router Queue Simulation in C++ in MMNN and MM1 conditions
Morteza Mahdilar1.2K views
A2 Reference for SImulator public static final int MAX_PARKI.pdf by ankkarao9652
A2 Reference for SImulator      public static final int MAX_PARKI.pdfA2 Reference for SImulator      public static final int MAX_PARKI.pdf
A2 Reference for SImulator public static final int MAX_PARKI.pdf
ankkarao96522 views
CONTROLSTRUCTURES.ppt by Sanjjaayyy
CONTROLSTRUCTURES.pptCONTROLSTRUCTURES.ppt
CONTROLSTRUCTURES.ppt
Sanjjaayyy5 views
Distributed Radar Tracking Simulation Project by Assignmentpedia
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation Project
Assignmentpedia357 views
Distributed Radar Tracking Simulation Project by Assignmentpedia
Distributed Radar Tracking Simulation ProjectDistributed Radar Tracking Simulation Project
Distributed Radar Tracking Simulation Project
Assignmentpedia185 views
The Story About The Migration by EDB
 The Story About The Migration The Story About The Migration
The Story About The Migration
EDB128 views
ASSIGNMENT 5 Q1ipynb - Colaboratory.pdf by LnuRitika
ASSIGNMENT 5 Q1ipynb - Colaboratory.pdfASSIGNMENT 5 Q1ipynb - Colaboratory.pdf
ASSIGNMENT 5 Q1ipynb - Colaboratory.pdf
LnuRitika3 views
Bb2 by brehot2
Bb2Bb2
Bb2
brehot2108 views
R/Finance 2009 Chicago by gyollin
R/Finance 2009 ChicagoR/Finance 2009 Chicago
R/Finance 2009 Chicago
gyollin1.5K views
Terraform at Scale - All Day DevOps 2017 by Jonathon Brouse
Terraform at Scale - All Day DevOps 2017Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017
Jonathon Brouse596 views
Write a Matlab code (a computerized program) for calculating plane st.docx by ajoy21
 Write a Matlab code (a computerized program) for calculating plane st.docx Write a Matlab code (a computerized program) for calculating plane st.docx
Write a Matlab code (a computerized program) for calculating plane st.docx
ajoy218 views

More from NAVER D2

[211] 인공지능이 인공지능 챗봇을 만든다 by
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다NAVER D2
10.8K views73 slides
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i... by
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...NAVER D2
3.6K views69 slides
[215] Druid로 쉽고 빠르게 데이터 분석하기 by
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
5.4K views58 slides
[245]Papago Internals: 모델분석과 응용기술 개발 by
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발NAVER D2
2.1K views55 slides
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈 by
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈NAVER D2
2.3K views66 slides
[235]Wikipedia-scale Q&A by
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&ANAVER D2
1.5K views54 slides

More from NAVER D2(20)

[211] 인공지능이 인공지능 챗봇을 만든다 by NAVER D2
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
NAVER D210.8K views
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i... by NAVER D2
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
NAVER D23.6K views
[215] Druid로 쉽고 빠르게 데이터 분석하기 by NAVER D2
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
NAVER D25.4K views
[245]Papago Internals: 모델분석과 응용기술 개발 by NAVER D2
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
NAVER D22.1K views
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈 by NAVER D2
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
NAVER D22.3K views
[235]Wikipedia-scale Q&A by NAVER D2
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
NAVER D21.5K views
[244]로봇이 현실 세계에 대해 학습하도록 만들기 by NAVER D2
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
NAVER D21.7K views
[243] Deep Learning to help student’s Deep Learning by NAVER D2
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
NAVER D21.4K views
[234]Fast & Accurate Data Annotation Pipeline for AI applications by NAVER D2
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
NAVER D21.3K views
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing by NAVER D2
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
NAVER D21.4K views
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지 by NAVER D2
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
NAVER D21.9K views
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기 by NAVER D2
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
NAVER D23.6K views
[224]네이버 검색과 개인화 by NAVER D2
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
NAVER D22.3K views
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템) by NAVER D2
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
NAVER D21.9K views
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기 by NAVER D2
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
NAVER D22.6K views
[213] Fashion Visual Search by NAVER D2
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
NAVER D21.5K views
[232] TensorRT를 활용한 딥러닝 Inference 최적화 by NAVER D2
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
NAVER D24.5K views
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지 by NAVER D2
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
NAVER D21.1K views
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터 by NAVER D2
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
NAVER D21.7K views
[223]기계독해 QA: 검색인가, NLP인가? by NAVER D2
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
NAVER D23.8K views

Recently uploaded

PRODUCT PRESENTATION.pptx by
PRODUCT PRESENTATION.pptxPRODUCT PRESENTATION.pptx
PRODUCT PRESENTATION.pptxangelicacueva6
15 views1 slide
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensorssugiuralab
21 views15 slides
Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide
Uni Systems for Power Platform.pptx by
Uni Systems for Power Platform.pptxUni Systems for Power Platform.pptx
Uni Systems for Power Platform.pptxUni Systems S.M.S.A.
56 views21 slides
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdfDr. Jimmy Schwarzkopf
20 views29 slides
Case Study Copenhagen Energy and Business Central.pdf by
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdfAitana
16 views3 slides

Recently uploaded(20)

TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab21 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi132 views
Unit 1_Lecture 2_Physical Design of IoT.pdf by StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker40 views
Powerful Google developer tools for immediate impact! (2023-24) by wesley chun
Powerful Google developer tools for immediate impact! (2023-24)Powerful Google developer tools for immediate impact! (2023-24)
Powerful Google developer tools for immediate impact! (2023-24)
wesley chun10 views

[2A3]Big Data Launching Episodes

  • 3. Big Data Launching Episodes 안성화 Manager / Data Tech Lab SK Telecom
  • 4. CONTENTS 1. Accessibility 2. Expansion 3. Lessens Learned 4. Future
  • 5. 1. Accessibility 10 GB/Hour 100 MB/Hour GroupBy & Sum SKT 최초의 Hadoop 시스템
  • 6. Accessibility MapReduce에서 Hive로 1. Group By & Sum ! ! ! ! 2. UDF & UDAF Map ! Group By Key별 수집 Reduce ! Group By Key별 Sum Map ! UDF Reduce ! UDAF Select key1, sum(key1) From Table Group by key1; Select udf(key1) From Table; Select key1, udaf(key1) From Table Group by key1;
  • 7. Accessibility MapReduce에서 Hive로 3. Transform ! Map ! Transform Reduce ! Transform FROM ( FROM records2 MAP year, temperature, quality USING 'is_good_quality.py' AS year, temperature) map_output REDUCE year, temperature USING 'max_temperature_reduce.py' AS year, temperature; Hadoop Definitive Guide
  • 8. Accessibility MapReduce에서 Hive로 4. GUI (Hue) http://gethue.com/wp-content/uploads/2014/03/hue-3.6.png
  • 9. Accessibility insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring, x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple, x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen, x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '), 5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11','12')) or probe_chg_spring in ('31','32')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_outs_ko asc limit 10000) x
  • 11. Accessibility Job ID … Map % Map Total Maps Completed Job_1 12% 120,000 12,000 Job_2 0% 512 0 … … … … …
  • 12. Accessibility Fair Scheduler로 Queue별 Quota 설정 ! ! tom ! jerry ! default 30% 30% 20% 40% 40% Fair Share Over Fair Share 0% 50% 100% Load 1! (Transform) 특정 Queue만 사용할 경우 다소 억제 다수의 Queue가 동시에 사용될 경우 여전히 문제 독점 사용 문제 해결 set mapred.job.queue.name=tom;
  • 13. Accessibility code value SELECT code, sum(value) a | 1 a | 38 a | 45 b | 9 a | 34 a | 12 a | 78 FROM Table GROUP BY code; Mapper Mapper Mapper a! Reducer b! Reducer 왜 99%에서 안 끝나죠?!!!
  • 14. Accessibility select /*+ MAPJOIN(b) */ count(*) from tableA a join tableB b on (a.id = b.id); 원래보다 너무 느려요!!!
  • 15. Accessibility hadoop fs -­‐text xxx.snappy > xxx.gzip hadoop fs -­‐put xxx.gzip / fasdjlkfjlak UnSplittable!! sjdfljasdfjl Only 1 Mapper!! kasdjfljau82 n381qslfj832 9ruqw9ufoiau 8qwue899288u q98r912ioquq
  • 17. Accessibility insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring, x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple, x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen, x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '), 5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11','12')) or probe_chg_spring in ('31','32')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_outs_ko asc limit 10000) x
  • 19. 1,000,000,000,000 100 TB Accessibility
  • 21. Accessibility insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring, x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple, x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen, x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '), 5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11','12')) or probe_chg_spring in ('31','32')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_outs_ko asc limit 10000) x 4 days 4 hours
  • 23. 1월 2월 3월 4월 http://fc06.deviantart.net/fs70/f/2009/357/f/c/Hopelessness_by_sarafim.jpg
  • 24. Accessibility Hive에서 Tajo/Impala로 ! 100 ! 75 50 25 0 April May June July Data Size / Date Impala Tajo
  • 25. Accessibility insert overwrite table tmp_daily_recommendation partition(silo) select x.probe_mgmt_string, x.marang_made_spring, x.marang_made_nm, x.bf_m2_series, x.chg_dev_silo, x.prob_simple, x.prob_simple_ko, x.prob_outs, x.prob_outs_ko, x.choosen, x.silo from ( select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'chg' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple <= mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '21' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '),5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11', '12')) or probe_chg_spring in ('1','2')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_simple_ko asc limit 10000 union all select a.probe_mgmt_string, e.marang_made_spring, e.marang_made_nm, e.bf_m2_series, e.chg_dev_silo, a.mouse_prob_simple as prob_simple, a.mouse_rank_simple as prob_simple_ko, a.mouse_prob_outs as prob_outs, a.mouse_rank_outs as prob_outs_ko, 'out' as choosen, trim(' 20140927 ') as silo from (select distinct probe_mgmt_string from tmp_old_report) d right outer join (select probe_mgmt_string, mouse_prob_simple, mouse_prob_outs, mouse_rank_simple, mouse_rank_outs from tmp_comp_post_prob_new where silo = trim(' 20140927 ') and pado_spring in ('1','2') and testman_sample_spring >= '021' and months >= 15 and event_type = 'event1' and mouse_rank_simple > mouse_rank_outs) a on a.probe_mgmt_string = d.probe_mgmt_string join (select s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm, round(avg(s.bf_m2_series),0) as bf_m2_series, max(case when s.prev_intro_chg_silo like '#%' then s.probe_test_silo else s.prev_intro_chg_silo end) as chg_dev_silo from (select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst where silo = trim(' 20140927 ') union all select probe_mgmt_string, marang_made_spring, marang_made_nm, prev_intro_chg_silo, probe_test_silo, bf_m2_series, bf_m3_series, bf_m4_series, pado_spring, probe_st_spring, testman_sample_spring from master_mers_mst_enc where silo = trim(' 2014-09-27 ')) s where s.pado_spring in ('1','2') and s.probe_st_spring = 'song' and s.testman_sample_spring >= '021' and (s.bf_m2_series + s.bf_m3_series + s.bf_m4_series) >= 60000 group by s.probe_mgmt_string, s.marang_made_spring, s.marang_made_nm) e on a.probe_mgmt_string = e.probe_mgmt_string left outer join (select probe_mgmt_string, min(chg_silo) as chg_silo from master_mers_evthist where chg_silo between date_add(concat(substr(trim(' 20140927 '),1,4),'-',substr(trim(' 20140927 '), 5,2),'-',substr(trim(' 20140927 '),7,2)),-60) and trim(' 20140927 ') and ((probe_chg_spring in ('1','2') and probe_chg_result_spring in ('11','12')) or probe_chg_spring in ('31','32')) group by probe_mgmt_string) c on a.probe_mgmt_string = c.probe_mgmt_string where c.probe_mgmt_string is null and d.probe_mgmt_string is null order by prob_outs_ko asc limit 10000) x 4 days 4 hours 2 days 2 hours
  • 27. Expansion Current System 분석 Job Detail 저장 표준화 ETL, Cleansing, Lineage 등 저장 20 PB 저장 능력 공급 원활한 데이터 공급 (Real Time / Batch) 프로세싱 분석 R, Python 등 분석 중심 DW/ Realtime Low Latency, Event Processing 현황 Data Size (압축 / Origin) Day 50 TB / 250 TB Year 18.25 PB / 91.25 PB Job Type 저장 표준화/저장/공급 프로세싱 분석/Real Time
  • 28. Expansion Jupiter (분석) 3 PB BigBang! Saturn (저장) 20 !PB Neptune (Real Time) 1 PB Flume
  • 29. Expansion Saturn(저장) Cluster Topology 2G 40G DS 10G X 2 Bonding .! .! .! . .! .! .! . .! .! .! .! .! .! .! .! . Rack awareness 4TB X 12 AS
  • 30. Expansion : Saturn (저장) Disk Fault at Datanode High IO Low IO Eject! RoundRobin Available Space
  • 31. Expansion : Saturn (저장) High Temperature at Datanode Disk Controller 참고 쓰는 중 http://rlv.zcache.com/suppressed_laughing_yellow_smiley_face_stickers-r200e51f37ff941a38208de69f6c51657_v9waf_8byvr_512.jpg
  • 32. Expansion : Saturn (저장) + Flume Saturn(저장) Cluster Topology + Flume .! .! .! .! .! .! .! .! .! Flume Compress & Send / 1 minute Dynamic Frequency Scaling Maximum Performance
  • 33. Expansion : Saturn (저장) + Flume Saturn(저장) Cluster Topology + Flume Flume Sending… Sudden Fault Disk at DataNode Eventual Sending using SSD
  • 34. Expansion : Neptune (DW) Neptune(DW) Cluster Topology 10G 40G X 2 20G Bonding .! .! .! . .! .! .! . .! .! .! .! .! .! .! .! . Rack awareness 1TB X 23 SAS DS AS
  • 35. Expansion : Neptune (DW) Bandwidth가 높을 때는 Network 필수적으로 점검할 사항 20G Bonding $ ifconfig! …! eth0 Link encap:Ethernet HWaddr 38:EA:A7:38:53:24 ! UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1! RX packets:7170284853 errors:2456 dropped:25019 overruns:0 frame:2456! TX packets:31088639355 errors:0 dropped:0 overruns:0 carrier:0! collisions:0 txqueuelen:10000 ! RX bytes:41081083208513 (37.3 TiB) TX bytes:40786177694493 (37.0 TiB)! … GBIC $ ethtool -S eth0! …! rx_crc_errors : 2456! …
  • 36. Expansion : Neptune (DW) Bandwidth가 높을 때는 Network 필수적으로 점검할 사항 https://c0da80aa54a5e1ed7d2b945327c31140a345bfe8.googledrive.com/host/0BxotWZXnwSAGSS1qRE02eWVrU28/2013-07-kernel-networking-ring-buffer. png
  • 37. Expansion : Neptune (DW) Bandwidth가 높을 때는 Network 필수적으로 점검할 사항 # ethtool -­‐g eth0 Ring parameters for eth0: Pre-­‐set maximums: RX: 4096 (최대) RX Mini: 0 RX Jumbo: 0 TX: 4096 Current hardware settings: RX: 512 (현재) RX Mini: 0 RX Jumbo: 0 TX: 512 잦은 Frame Packet Drop 발생 # ethtool -­‐G eth0 rx 2048
  • 38. Expansion : Neptune (DW) Bandwidth가 높을 때는 Network 필수적으로 점검할 사항 rmem_max wmem_max tcp_mem socket Receive! Buffer Send! Buffer tcp_rmem tcp_wmem net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 204800 204800 16777216 net.ipv4.tcp_wmem = 204800 204800 16777216 net.ipv4.tcp_wmem 204800 204800 16777216 R S R S
  • 39. Expansion Jupiter (분석) 3 PB BigBang! Saturn (저장) 20 !PB Neptune (Real Time) 1 PB Flume
  • 40. Expansion : Yarn Multiple Processing Engine들의 Resrouce Mgmt를 위해 NodeManager yarn.nodemanager.resource.memory-mb! Node Manager에서 관리하는 전체 메모리 yarn.nodemanager.resource.cpu-vcores! Node Manager에서 관리하는 CPU Core수 Resource Manager yarn.scheduler.minimum-allocation-mb! 각 Node Manager에 할당할 수 있는 Container당 ! 최소 메모리
  • 41. Expansion : Yarn Multiple Processing Engine들의 Resrouce Mgmt를 위해 Resource Manager 최소메모리 : 3G NodeManager 각 Node당 단 1개의 Container만 생성 이렇게 일주일 운영 전체메모리 : 3G 사용가능 Core수 : 18 Container TaskJVM! (ex. TaskTracker Fork 1개)
  • 42. Expansion : Yarn Multiple Processing Engine들의 Resrouce Mgmt를 위해 Resource Manager 최소메모리 : 3G NodeManager 각 Node당 18 개의 Container 생성 가능 전체메모리 : 54 G 사용가능 Core수 : 18 Container TaskJVM! (ex. TaskTracker Fork 1개)
  • 43. Expansion : Compress Snappy가 좋다고 하길래 용량이 넉넉해서 마음껏 사용! Raw Data 250 TB/day Snappy 90 TB/day 32 PB/year 용량이 부족 Snappy 90 TB/day GZip 50 TB/day 2달 걸림.
  • 44. Expansion : NameNode HA Automatic FailOver면 안심해도 되는줄 Zookeeper Timeout : 60초 NameNode GC하는데 3분 30초 걸림 Standby로 FailOver했는데, ! Hadoop Client들이 원래 Active로만 연결 전 Cluster 장애 Zookeeper Timeout : 10분
  • 45. Expansion : Too Many CLOSE_WAIT MR V1 & Datanode 1. connect TaskTracker DataNode 2. block 요청 3. send block 4. close CLOSE_WAIT FIN_WAIT 2시간 내로 없어지지 않음. client socket port 고갈 TT Restart
  • 47. Lessons Learned Accessibility 1. 누구나 쉽게 접근할 수 있어야 한다. 2. 프로그램은 할 줄 몰라도 동작원리는 알아야 한다. 3. 쉬우면 많은 사람들이 접근한다. 4. 누구나 분석가가 되어간다.
  • 48. Lessons Learned Expansion 1. Network는 Hadoop의 혈관과 같다. 2. Yarn은 아직 사용하기 시기 상조다. 설정 정보가 너무 많고, 상관 관계도 너무 복잡하다. 3. Hadoop 이중화는 반드시 Client도 확인해야 한다. 4. Hadoop 이중화가 그렇다고 정말 안전하지도 않다. 5. Yahoo 2,000대는 아마도 디스크가 작았던 것 같다. 6. 아직 해야할 일이 많다.
  • 50. Approximate Query Engine Blink DB 처럼 select sum(val1) from table where key=’a’ within 3 seconds select sum(val1) from table where key=’a’ Error Rate 10%
  • 51. Approximate Query Engine Zoomable Data Navigation select sum(val1) from table where age between 1 and 10 within 1 seconds select sum(val1) from table where age between 1 and 10 within 10 seconds
  • 52. Q&A