SlideShare a Scribd company logo
1 of 78
Download to read offline
,
2
…
NOTICE 





.


.
3
.
IT
.
4
?
5
?
6
?
7
?
8
?
9
,
.
10
“
, .
.”
“ ,
.”
11
…
.
https://twitter.com/search?q=m.news.naver.com%2Fcomment&src=recent_search_click
12
. .
:
2006.04.26 ~ 2018.05.25 ( : 2018.10)
:
6 ( , , , 

/ , , IT/ ) 30 ,



,
- hashing : 2015 12
13
https://github.com/zaemyung/crawl-naver-news-and-comments
14
15
• 2009 ,
. 

. 1)
1) https://namu.wiki/w/
16
• 2010 . 

.
• .
. 1)
1) https://namu.wiki/w/
17
• 2012 1 1800
, . 

, 

. 1)
• 2012 .
SNS . 2)
1) https://www.wikitree.co.kr/main/news_view.php?id=71675
2) https://namu.wiki/w/
18
• 2016 10 , JTBC pc
.
• , 1)
.
1) https://www.mk.co.kr/news/society/view/2018/05/294952/
19
• 2016
.
20
30
.
21
!
22
…
“ ” ?
23
?
.
•
10 . UI , top 10
.
.
•
top 10 top 10
. top 10
.
24
,
criteria .
1) .
, top 10 . 

top (?) . ( )
2) , , .
, top 10 

.
25
criteria 1: top 10
369 .
, , , IT, .
26
criteria 1: top 10
27
criteria 1: top 10
top user ,
top user . ( …)
28
criteria 1: top 10
top user .
,
.
, ,
.
,
, .
29
criteria 2: abnormal
top user .
,
.
, ,
.
,
, .
?
30
criteria 2: abnormal
top user .
,
.
, ,
.
,
, .
top 10
31
criteria 2: abnormal
100% .
top 10 ,
.
, top 10 .
top 10 .
32
criteria 2: abnormal
top user top 10 X histogram GMM(n=1) .
mean: 25.0, std: 20.5
33
criteria 2: abnormal
top user top 10 X histogram GMM(n=1) .
( n fitting Appendix A )
mean: 25.0, std: 20.5
90% 

top 10 





0.01%
34
user 1 (X = 96)
line ,  line ,  line top 10 .
user 1
35
title article date user 1 top comments
, ‘29 ’
2016-11-25
15:12:00
.
.
“ ,
”
2016-11-27
16:04:00 . .
“ , … 

”( )
2016-11-27
16:46:00
.
.
.
‘ ’… ‘ ’
2016-12-06
12:34:00
.
… …
“ ”…
2016-12-06
12:45:00
.
… …
36
title article date 1njDA top comments
, ‘ ’ 2017-03-04
08:00:00
?
.
85 , 3000 …
2017-03-05
09:00:00
?
.
[ ] 42.6% vs 37.2%…
47.6% vs 43.3%
2017-04-10
09:15:00
.
.
“ … ”
2017-04-22
08:01:00
.
. ~
!!!
37
title section article date user 1 top comments
[ ]
society 2017-03-30 10:18:00
.
‘
’
politics 2017-03-30 10:22:00
.
38
user 2 (X = 93)
line ,  line ,  line top 10 .
user 2
39
title article date user 2 top comments
‘ ’… “ ”
2017-06-25
20:20:00
~~~~
“ , … ·
”( )
2017-06-28
12:18:00
×
- …
2017-06-28
20:52:00
~~~~????
, ‘ ’
2017-06-29
12:24:00
, … 2017-06-29
13:54:00 ~
, ‘ ’
( )
2017-06-30
13:23:00
~ ~
‘ ’ … FTA ‘
’( )
2017-07-01
10:05:00 ~~~~~~~~
40
title article date user 2 top comments
“4 ”… ( )
2018-03-06
20:24:00
“ · ·
”
2018-03-07
15:28:00
“ …
”
2018-03-07
16:53:00
“ … ,
”( )
2018-03-07
17:16:00
[ ] ,
2018-03-09
09:11:00
41
!
42
!
?
43
.
“ ” rule .
44
,
rule ?
45
2019 9 , 5 .
- : -
- : / ( + )
-
-
-
.
46
2019 9 , 5 .
- : -
- : / ( + )
-
-
-
.
47
Q. ?
48
?
: 344
: 80.4%
: 300
: 95.2%
49
, .
50
?
YouTube, Facebook, Reddit .
. ,
1) .
2) default upvote , .
Reddit
Facebook
YouTube
51
?
YouTube, Facebook, Reddit .
. ,
1) .
2) default upvote , .
Reddit
Facebook
YouTube
52
Best
Best ranking Wilson score ,
vote smoothing metric
, benefit .
w−
= max{0,
2n ̂p + z2
− z z2
+ 4n ̂p(1 − ̂p)
2(n + z2)
} = wilson score
53
Wilson Score
process Bernoulli process .
n ,
central limit theorem normal distribution .
̂p
̂p
54
Wilson Score
.̂p p
p = ̂p ± z
p(1 − p)
n
(1 +
z2
n
)p2
− (2 ̂p +
z2
n
)p + ̂p2
= 0
p =
2n ̂p + z2 ± z z2
+ 4n ̂p(1 − ̂p)
2(n + z2)
w−
= max{0,
2n ̂p + z2
− z z2
+ 4n ̂p(1 − ̂p)
2(n + z2)
} = wilson score
55
Wilson Score
import numpy as np
# ref: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
def best(up, down):
try:
z = 1.96 # 95% confidence level
n = up + down
p_up = up / n
p_down = 1 - p_up
denominator = 2 * (n + z**2)
numerator = 2 * n * p_up + z**2 - z * np.sqrt(z**2 + 4 * n * p_up * p_down)
lower = numerator / denominator
except ZeroDivisionError as e:
lower = 0
return max(0, lower)
56
Best
«MB ‘ ’ “ ”» 1)
comments best
score
1091 55 0.938 0.952
!! 562 39 0.936 0.935
. .
!!!
252 14 0.933 0.947
. 565 38 0.933 0.937
595 37 0.932 0.941
.. 686 43 0.931 0.941
4146 317 0.926 0.929
296 14 0.921 0.955
302 13 0.921 0.959
919 51 0.92 0.947
1) https://news.naver.com/main/ranking/read.nhn?rankingType=popular_day&oid=001&aid=0009819820&date=20180117&type=1&rankingSectionId=100&rankingSeq=27
57
Controversial
.
.
controversial =
match * log(match + 1)
|#upvote − #downvote| + 1
where, match = min(#upvote, #downvote)
58
Controversial
upvote downvote controversial score
1001 1000 3454.38
999 1000 3450.42
100 100 461.52
101 100 230.76
1000 700 15.24
130 100 14.89
100 130 14.89
1 1 0.69
1 2 0.35
controversial =
match * log(match + 1)
|#upvote − #downvote| + 1
where, match = min(#upvote, #downvote)
59
Controversial
import math
def controversial(upvote, downvote):
match = min(upvote, downvote)
top = match * math.log(match + 1)
bottom = abs(upvote - downvote) + 1
return float(top) / bottom
60
Controversial
« “ ”… “ ”( )» 1)
userId comments
50y20 , ‘ ’ 26 26
1N1GK . 81 85
1a1lV , ? ? 22 22
1huSl
n ?n n
n
33 31
akMSD ; 11 11
32rhW . . 10 10
43lgl . 66 57
1qBRj .n . 15 14
aaTK5
.n n .n
.
8 8
5dcgw
?? .
8 8
1) https://news.naver.com/main/ranking/read.nhn?rankingType=popular_day&oid=001&aid=0009878303&date=20180210&type=1&rankingSectionId=100&rankingSeq=6
* :
61
Controversial
« “ ”… “ ”( )» 1)
userId comments
50y20 , ‘ ’ 26 26
1N1GK . 81 85
1a1lV , ? ? 22 22
1huSl
n ?n n
n
33 31
akMSD ; 11 11
32rhW . . 10 10
43lgl . 66 57
1qBRj .n . 15 14
aaTK5
.n n .n
.
8 8
5dcgw
?? .
8 8
1) https://news.naver.com/main/ranking/read.nhn?rankingType=popular_day&oid=001&aid=0009878303&date=20180210&type=1&rankingSectionId=100&rankingSeq=6
* :
62
New controversial
controversial :
controversiality .
:
: = 6.5 : 3.5 .
new controversial upvote downvote .


vote
. upvote downvote wilson score upvote
downvote vote 0 1 upvote downvote
.

63
New controversial
import math
def new_controversial(upvote, downvote):
p_up = best(upvote, downvote) * 3.5
p_down = best(downvote, upvote) * 6.5
match = min(p_up, p_down)
top = match * math.log(match + 1)
bottom = abs(p_up - p_down) + 1
return float(top) / bottom
import math
def controversial(upvote, downvote):
match = min(upvote, downvote)
top = match * math.log(match + 1)
bottom = abs(upvote - downvote) + 1
return float(top) / bottom
64
New controversial
« “ ”… “ ”( )» 1)
userId comments
4gnUX , ! 16 11
3TWS
0
~~ ?
11 7
4SOr4 . !! 9 6
5qpx3 n .n 9 6
2tUo3 !! ~~ 8 5
R0Yj
?n !
? …
8 5
IIus 6 4
PqIi ..n . 6 4
1ceFq 6 4
5Vnqo ? . 6 4
1) https://news.naver.com/main/ranking/read.nhn?rankingType=popular_day&oid=001&aid=0009878303&date=20180210&type=1&rankingSectionId=100&rankingSeq=6
* :
* :
65
Best antipathy
Best ,
Best antipathy wilson score .
w−
neg = max{0,
2n(1 − ̂p) + z2
− z z2
+ 4n ̂p(1 − ̂p)
2(n + z2)
}
w−
= max{0,
2n ̂p + z2
− z z2
+ 4n ̂p(1 − ̂p)
2(n + z2)
} Best
Best antipathy
66
Best antipathy
import numpy as np
def best_anti(up, down):
try:
z = 1.96 # 95% confidence level
n = up + down
p_up = up / n
p_down = 1 - p_up
denominator = 2 * (n + z**2)
numerator = 2 * n * p_down + z**2 - z * np.sqrt(z**2 + 4 * n * p_up * p_down)
lower = numerator / denominator
except ZeroDivisionError as e:
lower = 0
return max(0, lower)
67
Best antipathy
.
68
Best antipathy
« “ ”… “ ”( )» 1)
userId comments
3uhFA 0 5
BoU7 .. ?? 0 4
1cvun !n ! 0 4
zg5j … ? 5 ~! 0 4
5OPN
. . ,
. .
0 4
qNLK
. 50 10 1-1
? ?
1 6
6lxkz 0 3
3lBvF - ! 0 3
1jqva .. 0 3
1) https://news.naver.com/main/ranking/read.nhn?rankingType=popular_day&oid=001&aid=0009878303&date=20180210&type=1&rankingSectionId=100&rankingSeq=6
69
criteria
detect
reddit
1) Best
2) New controversial
3) Best anti
70
Future Work
metric .
( )
metric .
1)
2)
…
71
.
.
72
.
,
.
73
.
,
.
" " " " ,
" " " " .
https://inmoonlight.github.io .
Q & A
76
References
1. https://www.viewsnnews.com/article?q=74392
2. http://news1.kr/articles/634736
3. http://www.speconomy.com/news/articleView.html?idxno=16882
4. http://sports.khan.co.kr/bizlife/sk_index.html?art_id=201704170922013&sec_id=560901
5. http://www.donga.com/news/article/all/20180828/91705745/1
6. https://news.joins.com/article/22539863
7. https://www.wikitree.co.kr/main/news_view.php?id=71675
8. https://redditblog.com/2009/10/15/reddits-new-comment-sorting-system/
9. http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
10.https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval
11.https://www.reddit.com/r/NoStupidQuestions/comments/3xmlh8/
what_does_something_being_labeled_controversial/?sort=confidence
77
Appendix A: GMM
GMM n_components silhouette score .
78
Appendix A: GMM
score n_components=2 2 gaussian fitting .
userId
cluster 1
prob
cluster 2
prob
top 

ratio (%)
user 1 2.27E-16 1 95.60
user 2 2.56E-15 1 93.08
user 3 9.31E-14 1 89.21
user 4 4.96E-11 1 82.00
user 5 8.11E-01 0.188898 41.27
user 6 1.00E+00 0.000219 15.05

More Related Content

Similar to (KO) 온라인 뉴스 댓글 플랫폼을 흐리는 어뷰저 분석기 / (EN) Online news comments analysis revealing public opinion manipulators and possible solutions

第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)
第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)
第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)Wataru Shito
 
第5回 様々なファイル形式の読み込みとデータの書き出し
第5回 様々なファイル形式の読み込みとデータの書き出し第5回 様々なファイル形式の読み込みとデータの書き出し
第5回 様々なファイル形式の読み込みとデータの書き出しWataru Shito
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeWim Godden
 
第3回 データフレームの基本操作 その1(解答付き)
第3回 データフレームの基本操作 その1(解答付き)第3回 データフレームの基本操作 その1(解答付き)
第3回 データフレームの基本操作 その1(解答付き)Wataru Shito
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineCitus Data
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineCitus Data
 
機械学習勉強会 - PRECIOUS 京大前・吉田店
機械学習勉強会 - PRECIOUS 京大前・吉田店機械学習勉強会 - PRECIOUS 京大前・吉田店
機械学習勉強会 - PRECIOUS 京大前・吉田店YusukeKominami
 
Scaling the #2ndhalf
Scaling the #2ndhalfScaling the #2ndhalf
Scaling the #2ndhalfSalo Shp
 
M|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use CasesM|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use CasesMariaDB plc
 
Just in time (series) - KairosDB
Just in time (series) - KairosDBJust in time (series) - KairosDB
Just in time (series) - KairosDBVictor Anjos
 
Javascript Without Javascript
Javascript Without JavascriptJavascript Without Javascript
Javascript Without JavascriptPatrick Kettner
 
Operations with complex numbers
Operations with complex numbersOperations with complex numbers
Operations with complex numbersRosa E Padilla
 
The Ring programming language version 1.7 book - Part 64 of 196
The Ring programming language version 1.7 book - Part 64 of 196The Ring programming language version 1.7 book - Part 64 of 196
The Ring programming language version 1.7 book - Part 64 of 196Mahmoud Samir Fayed
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Alexey Grigorev
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 

Similar to (KO) 온라인 뉴스 댓글 플랫폼을 흐리는 어뷰저 분석기 / (EN) Online news comments analysis revealing public opinion manipulators and possible solutions (20)

第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)
第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)
第5回 様々なファイル形式の読み込みとデータの書き出し(解答付き)
 
第5回 様々なファイル形式の読み込みとデータの書き出し
第5回 様々なファイル形式の読み込みとデータの書き出し第5回 様々なファイル形式の読み込みとデータの書き出し
第5回 様々なファイル形式の読み込みとデータの書き出し
 
機械学習と自動微分
機械学習と自動微分機械学習と自動微分
機械学習と自動微分
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
第3回 データフレームの基本操作 その1(解答付き)
第3回 データフレームの基本操作 その1(解答付き)第3回 データフレームの基本操作 その1(解答付き)
第3回 データフレームの基本操作 その1(解答付き)
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine | Dimitri Fontaine
 
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri FontaineThe Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
The Art of PostgreSQL | PostgreSQL Ukraine Meetup | Dimitri Fontaine
 
Piano rubyslava final
Piano rubyslava finalPiano rubyslava final
Piano rubyslava final
 
機械学習勉強会 - PRECIOUS 京大前・吉田店
機械学習勉強会 - PRECIOUS 京大前・吉田店機械学習勉強会 - PRECIOUS 京大前・吉田店
機械学習勉強会 - PRECIOUS 京大前・吉田店
 
Scaling the #2ndhalf
Scaling the #2ndhalfScaling the #2ndhalf
Scaling the #2ndhalf
 
M|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use CasesM|18 Analytics in the Real World, Case Studies and Use Cases
M|18 Analytics in the Real World, Case Studies and Use Cases
 
Just in time (series) - KairosDB
Just in time (series) - KairosDBJust in time (series) - KairosDB
Just in time (series) - KairosDB
 
Javascript Without Javascript
Javascript Without JavascriptJavascript Without Javascript
Javascript Without Javascript
 
Operations with complex numbers
Operations with complex numbersOperations with complex numbers
Operations with complex numbers
 
The Ring programming language version 1.7 book - Part 64 of 196
The Ring programming language version 1.7 book - Part 64 of 196The Ring programming language version 1.7 book - Part 64 of 196
The Ring programming language version 1.7 book - Part 64 of 196
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)
 
Czzawk
CzzawkCzzawk
Czzawk
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Explain this!
Explain this!Explain this!
Explain this!
 

Recently uploaded

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 

Recently uploaded (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 

(KO) 온라인 뉴스 댓글 플랫폼을 흐리는 어뷰저 분석기 / (EN) Online news comments analysis revealing public opinion manipulators and possible solutions

  • 1. ,
  • 4. 4 ?
  • 5. 5 ?
  • 6. 6 ?
  • 7. 7 ?
  • 8. 8 ?
  • 13. : 2006.04.26 ~ 2018.05.25 ( : 2018.10) : 6 ( , , , 
 / , , IT/ ) 30 ,
 
 , - hashing : 2015 12 13 https://github.com/zaemyung/crawl-naver-news-and-comments
  • 14. 14
  • 15. 15 • 2009 , . 
 . 1) 1) https://namu.wiki/w/
  • 16. 16 • 2010 . 
 . • . . 1) 1) https://namu.wiki/w/
  • 17. 17 • 2012 1 1800 , . 
 , 
 . 1) • 2012 . SNS . 2) 1) https://www.wikitree.co.kr/main/news_view.php?id=71675 2) https://namu.wiki/w/
  • 18. 18 • 2016 10 , JTBC pc . • , 1) . 1) https://www.mk.co.kr/news/society/view/2018/05/294952/
  • 21. 21 !
  • 23. 23 ? . • 10 . UI , top 10 . . • top 10 top 10 . top 10 .
  • 24. 24 , criteria . 1) . , top 10 . 
 top (?) . ( ) 2) , , .
  • 25. , top 10 
 . 25 criteria 1: top 10 369 .
  • 26. , , , IT, . 26 criteria 1: top 10
  • 27. 27 criteria 1: top 10 top user , top user . ( …)
  • 28. 28 criteria 1: top 10 top user . , . , , . , , .
  • 29. 29 criteria 2: abnormal top user . , . , , . , , . ?
  • 30. 30 criteria 2: abnormal top user . , . , , . , , . top 10
  • 31. 31 criteria 2: abnormal 100% . top 10 , . , top 10 . top 10 .
  • 32. 32 criteria 2: abnormal top user top 10 X histogram GMM(n=1) . mean: 25.0, std: 20.5
  • 33. 33 criteria 2: abnormal top user top 10 X histogram GMM(n=1) . ( n fitting Appendix A ) mean: 25.0, std: 20.5 90% 
 top 10 
 
 
 0.01%
  • 34. 34 user 1 (X = 96) line ,  line ,  line top 10 . user 1
  • 35. 35 title article date user 1 top comments , ‘29 ’ 2016-11-25 15:12:00 . . “ , ” 2016-11-27 16:04:00 . . “ , … 
 ”( ) 2016-11-27 16:46:00 . . . ‘ ’… ‘ ’ 2016-12-06 12:34:00 . … … “ ”… 2016-12-06 12:45:00 . … …
  • 36. 36 title article date 1njDA top comments , ‘ ’ 2017-03-04 08:00:00 ? . 85 , 3000 … 2017-03-05 09:00:00 ? . [ ] 42.6% vs 37.2%… 47.6% vs 43.3% 2017-04-10 09:15:00 . . “ … ” 2017-04-22 08:01:00 . . ~ !!!
  • 37. 37 title section article date user 1 top comments [ ] society 2017-03-30 10:18:00 . ‘ ’ politics 2017-03-30 10:22:00 .
  • 38. 38 user 2 (X = 93) line ,  line ,  line top 10 . user 2
  • 39. 39 title article date user 2 top comments ‘ ’… “ ” 2017-06-25 20:20:00 ~~~~ “ , … · ”( ) 2017-06-28 12:18:00 × - … 2017-06-28 20:52:00 ~~~~???? , ‘ ’ 2017-06-29 12:24:00 , … 2017-06-29 13:54:00 ~ , ‘ ’ ( ) 2017-06-30 13:23:00 ~ ~ ‘ ’ … FTA ‘ ’( ) 2017-07-01 10:05:00 ~~~~~~~~
  • 40. 40 title article date user 2 top comments “4 ”… ( ) 2018-03-06 20:24:00 “ · · ” 2018-03-07 15:28:00 “ … ” 2018-03-07 16:53:00 “ … , ”( ) 2018-03-07 17:16:00 [ ] , 2018-03-09 09:11:00
  • 41. 41 !
  • 45. 45 2019 9 , 5 . - : - - : / ( + ) - - - .
  • 46. 46 2019 9 , 5 . - : - - : / ( + ) - - - .
  • 48. 48 ? : 344 : 80.4% : 300 : 95.2%
  • 50. 50 ? YouTube, Facebook, Reddit . . , 1) . 2) default upvote , . Reddit Facebook YouTube
  • 51. 51 ? YouTube, Facebook, Reddit . . , 1) . 2) default upvote , . Reddit Facebook YouTube
  • 52. 52 Best Best ranking Wilson score , vote smoothing metric , benefit . w− = max{0, 2n ̂p + z2 − z z2 + 4n ̂p(1 − ̂p) 2(n + z2) } = wilson score
  • 53. 53 Wilson Score process Bernoulli process . n , central limit theorem normal distribution . ̂p ̂p
  • 54. 54 Wilson Score .̂p p p = ̂p ± z p(1 − p) n (1 + z2 n )p2 − (2 ̂p + z2 n )p + ̂p2 = 0 p = 2n ̂p + z2 ± z z2 + 4n ̂p(1 − ̂p) 2(n + z2) w− = max{0, 2n ̂p + z2 − z z2 + 4n ̂p(1 − ̂p) 2(n + z2) } = wilson score
  • 55. 55 Wilson Score import numpy as np # ref: http://www.evanmiller.org/how-not-to-sort-by-average-rating.html def best(up, down): try: z = 1.96 # 95% confidence level n = up + down p_up = up / n p_down = 1 - p_up denominator = 2 * (n + z**2) numerator = 2 * n * p_up + z**2 - z * np.sqrt(z**2 + 4 * n * p_up * p_down) lower = numerator / denominator except ZeroDivisionError as e: lower = 0 return max(0, lower)
  • 56. 56 Best «MB ‘ ’ “ ”» 1) comments best score 1091 55 0.938 0.952 !! 562 39 0.936 0.935 . . !!! 252 14 0.933 0.947 . 565 38 0.933 0.937 595 37 0.932 0.941 .. 686 43 0.931 0.941 4146 317 0.926 0.929 296 14 0.921 0.955 302 13 0.921 0.959 919 51 0.92 0.947 1) https://news.naver.com/main/ranking/read.nhn?rankingType=popular_day&oid=001&aid=0009819820&date=20180117&type=1&rankingSectionId=100&rankingSeq=27
  • 57. 57 Controversial . . controversial = match * log(match + 1) |#upvote − #downvote| + 1 where, match = min(#upvote, #downvote)
  • 58. 58 Controversial upvote downvote controversial score 1001 1000 3454.38 999 1000 3450.42 100 100 461.52 101 100 230.76 1000 700 15.24 130 100 14.89 100 130 14.89 1 1 0.69 1 2 0.35 controversial = match * log(match + 1) |#upvote − #downvote| + 1 where, match = min(#upvote, #downvote)
  • 59. 59 Controversial import math def controversial(upvote, downvote): match = min(upvote, downvote) top = match * math.log(match + 1) bottom = abs(upvote - downvote) + 1 return float(top) / bottom
  • 60. 60 Controversial « “ ”… “ ”( )» 1) userId comments 50y20 , ‘ ’ 26 26 1N1GK . 81 85 1a1lV , ? ? 22 22 1huSl n ?n n n 33 31 akMSD ; 11 11 32rhW . . 10 10 43lgl . 66 57 1qBRj .n . 15 14 aaTK5 .n n .n . 8 8 5dcgw ?? . 8 8 1) https://news.naver.com/main/ranking/read.nhn?rankingType=popular_day&oid=001&aid=0009878303&date=20180210&type=1&rankingSectionId=100&rankingSeq=6 * :
  • 61. 61 Controversial « “ ”… “ ”( )» 1) userId comments 50y20 , ‘ ’ 26 26 1N1GK . 81 85 1a1lV , ? ? 22 22 1huSl n ?n n n 33 31 akMSD ; 11 11 32rhW . . 10 10 43lgl . 66 57 1qBRj .n . 15 14 aaTK5 .n n .n . 8 8 5dcgw ?? . 8 8 1) https://news.naver.com/main/ranking/read.nhn?rankingType=popular_day&oid=001&aid=0009878303&date=20180210&type=1&rankingSectionId=100&rankingSeq=6 * :
  • 62. 62 New controversial controversial : controversiality . : : = 6.5 : 3.5 . new controversial upvote downvote . 
 vote . upvote downvote wilson score upvote downvote vote 0 1 upvote downvote .

  • 63. 63 New controversial import math def new_controversial(upvote, downvote): p_up = best(upvote, downvote) * 3.5 p_down = best(downvote, upvote) * 6.5 match = min(p_up, p_down) top = match * math.log(match + 1) bottom = abs(p_up - p_down) + 1 return float(top) / bottom import math def controversial(upvote, downvote): match = min(upvote, downvote) top = match * math.log(match + 1) bottom = abs(upvote - downvote) + 1 return float(top) / bottom
  • 64. 64 New controversial « “ ”… “ ”( )» 1) userId comments 4gnUX , ! 16 11 3TWS 0 ~~ ? 11 7 4SOr4 . !! 9 6 5qpx3 n .n 9 6 2tUo3 !! ~~ 8 5 R0Yj ?n ! ? … 8 5 IIus 6 4 PqIi ..n . 6 4 1ceFq 6 4 5Vnqo ? . 6 4 1) https://news.naver.com/main/ranking/read.nhn?rankingType=popular_day&oid=001&aid=0009878303&date=20180210&type=1&rankingSectionId=100&rankingSeq=6 * : * :
  • 65. 65 Best antipathy Best , Best antipathy wilson score . w− neg = max{0, 2n(1 − ̂p) + z2 − z z2 + 4n ̂p(1 − ̂p) 2(n + z2) } w− = max{0, 2n ̂p + z2 − z z2 + 4n ̂p(1 − ̂p) 2(n + z2) } Best Best antipathy
  • 66. 66 Best antipathy import numpy as np def best_anti(up, down): try: z = 1.96 # 95% confidence level n = up + down p_up = up / n p_down = 1 - p_up denominator = 2 * (n + z**2) numerator = 2 * n * p_down + z**2 - z * np.sqrt(z**2 + 4 * n * p_up * p_down) lower = numerator / denominator except ZeroDivisionError as e: lower = 0 return max(0, lower)
  • 68. 68 Best antipathy « “ ”… “ ”( )» 1) userId comments 3uhFA 0 5 BoU7 .. ?? 0 4 1cvun !n ! 0 4 zg5j … ? 5 ~! 0 4 5OPN . . , . . 0 4 qNLK . 50 10 1-1 ? ? 1 6 6lxkz 0 3 3lBvF - ! 0 3 1jqva .. 0 3 1) https://news.naver.com/main/ranking/read.nhn?rankingType=popular_day&oid=001&aid=0009878303&date=20180210&type=1&rankingSectionId=100&rankingSeq=6
  • 69. 69 criteria detect reddit 1) Best 2) New controversial 3) Best anti
  • 70. 70 Future Work metric . ( ) metric . 1) 2) …
  • 73. 73 . , . " " " " , " " " " .
  • 75. Q & A
  • 76. 76 References 1. https://www.viewsnnews.com/article?q=74392 2. http://news1.kr/articles/634736 3. http://www.speconomy.com/news/articleView.html?idxno=16882 4. http://sports.khan.co.kr/bizlife/sk_index.html?art_id=201704170922013&sec_id=560901 5. http://www.donga.com/news/article/all/20180828/91705745/1 6. https://news.joins.com/article/22539863 7. https://www.wikitree.co.kr/main/news_view.php?id=71675 8. https://redditblog.com/2009/10/15/reddits-new-comment-sorting-system/ 9. http://www.evanmiller.org/how-not-to-sort-by-average-rating.html 10.https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval 11.https://www.reddit.com/r/NoStupidQuestions/comments/3xmlh8/ what_does_something_being_labeled_controversial/?sort=confidence
  • 77. 77 Appendix A: GMM GMM n_components silhouette score .
  • 78. 78 Appendix A: GMM score n_components=2 2 gaussian fitting . userId cluster 1 prob cluster 2 prob top 
 ratio (%) user 1 2.27E-16 1 95.60 user 2 2.56E-15 1 93.08 user 3 9.31E-14 1 89.21 user 4 4.96E-11 1 82.00 user 5 8.11E-01 0.188898 41.27 user 6 1.00E+00 0.000219 15.05