실시간 URL UV/PV 집계 사례를 통해 보는

'빅데이터 실시간 데이터 분석'

다음커뮤니케이션 유대은
moongtook@daumcorp.com
moongtook@hanmail.net
moongtook@gmail.com
빅데이터 분석
Batch vs Real Time
Big data analytics - Batch (Hadoop)

Query = Function (All Data)

Big data - Nathan Marz and James Warren, http://www.manning.com/marz/
http://www.slideshare.net/Hadoop_Summit/realtime-analytics-with-storm
Big data analytics - Batch (Hadoop)

MapReduce Job = Function (All Data)

Big data - Nathan Marz and James Warren, http://www.manning.com/marz/
http://www.slideshare.net/Hadoop_Summit/realtime-analytics-with-storm
Big data analytics - Batch (Hadoop)

http://www.slideshare.net/Hadoop_Summit/realtime-analytics-with-storm
Big data analytics - Batch (Hadoop)

http://www.slideshare.net/Hadoop_Summit/realtime-analytics-with-storm
Big data analytics - Real Time (Storm)

Query = Function (Data Stream)

Data Stream을 바라보고 실시간으로 바로 분석
Fast, Incremental algorithm
Big data analytics - Real Time (Storm)

Topology = Function (Data Stream)

Strom은 Data Stream을 바라보며
실시간으로 데이터를 처리하기 위한 좋은 인프라
https://github.com/nathanmarz/storm
bolt
spout

http://www.infoq.com/presentations/Storm
A spout is a source of streams

A bolt consumes any number of
input streams, does some
processing

http://www.infoq.com/presentations/Storm
Storm - cluster
Storm - cluster
distributed realtime computation infra
URL UV/PV 실시간 집계 사례
로그수집

https://github.com/moongtook/kestrel_tail
로그수집
로그수집

https://github.com/moongtook/kestrel_tail
로그분석
로그분석
로그 하나 꺼내오기
로그 하나 꺼내오기
URL의 UV/PV 카운트 올리기
URL의 UV/PV 카운트 올리기

Inside of Redis
URL의 UV/PV 카운트 저장하기
URL의 UV/PV 카운트 저장하기

Cassandra column family
row key 1

super column 1

super column 2

...

column name 1

column name 1

column name2

...

column value
row key 2

column name2
column value

column value

column value

...

super column 1

super column 2

...

column name 1

column name 1

column name2

...

column value
...

column name2
column value

column value

column value

...

...

...

...
URL의 UV/PV 카운트 저장하기

md5( reversed url) + date
6ed6a80a1623
65e78e2716d4
9508d974_201
2-10-24

Henessy column family schema

...

minutely_pv

minutely_uv

hourly_pv

hourly_uv

daily_pv

daily_uv

minutely_pv

...

...

212

202

5220

4576

233997

155723

151

...

...

...

20:01

...

minutely_pv

minutely_uv

hourly_pv

hourly_uv

daily_pv

daily_uv

minutely_pv

...

...

...
...

20:01

...
bc2ed9981fae
01adda327bcd
7e2a3576_201
2-10-24

...

20:02

...

388

383

9839

8163

597338

299751

364

...

...

...

...

20:02

...

...

...
Greenplum에도 저장하기

Search, Aggregation, Ranking을 위해
지난 1분동안 UV/PV 변화가 있었던 컨텐츠만...
Greenplum에도 저장하기

Secondary Index Pattern
2012-10-24_20_01

bc2ed9981fae01adda327bcd7e2a3576_2012-10-24

null

null

...

6ed6a80a162365e78e2716d49508d974_2012-10-24

bc2ed9981fae01adda327bcd7e2a3576_2012-10-24

...
2012-10-24_20_03

6ed6a80a162365e78e2716d49508d974_2012-10-24

...
2012-10-24_20_02

...

null

null

...

6ed6a80a162365e78e2716d49508d974_2012-10-24

bc2ed9981fae01adda327bcd7e2a3576_2012-10-24

...

null

null

...

...

...
URL UV/PV 실시간 집계 사례
Fault-tolerant
장애 허용 시스템(Fault tolerant system)은
구성 부품의 일부가 고장나도 정상적으로 처리를
수행하는 시스템 이다. - 위키백과

http://ko.wikipedia.org/wiki/장애_허용_시스템
http://en.wikipedia.org/wiki/Fault-tolerant_design
Human Fault-tolerant
Human Fault-tolerant

Big data - Nathan Marz and James Warren, http://www.manning.com/marz/
http://strataconf.com/strata2013/public/schedule/detail/27610
URL UV/PV 실시간 집계 사례
URL UV/PV 실시간 집계 사례
Lamda architecture

Big data - Nathan Marz and James Warren, http://www.manning.com/marz/
Lamda architecture

Big data - Nathan Marz and James Warren, http://www.manning.com/marz/
Lamda architecture

Big data - Nathan Marz and James Warren, http://www.manning.com/marz/
Lamda architecture

Big data - Nathan Marz and James Warren, http://www.manning.com/marz/
Lamda architecture

Twitter summingbird - https://speakerdeck.com/sritchie/summingbird-streaming-mapreduce-at-twitter
Lamda architecture

Twitter summingbird - https://speakerdeck.com/sritchie/summingbird-streaming-mapreduce-at-twitter
끝!

빅데이터 실시간 데이터 분석 - URL 실시간 UV/PV 집계 사례