Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Link prediction 방법의 개념 및 활용
1. Link Prediction 방법의 개념 및 활용
Kyunghoon Kim
UNIST Mathematical Sciences
kyunghoon@unist.ac.kr
2015. 9. 3.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 1 / 86
2. About me
Speaker
Kyunghoon Kim (Graduate Student)
UNIST (Ulsan National Institute of Science and Technology)
Mathematical Sciences, School of Natural Sciences
Lab
Adviser : Bongsoo Jang
Homepage : http://amath.unist.ac.kr
“Be the light that shines the world with science and technology.”
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 2 / 86
3. 목차
1 Social Network
2 Link Prediction
Research Trend
Definition
Framework
Example
Theory
3 Link Prediction with Python
4 데모
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 3 / 86
4. Social Network
A social network is a social structure made up of
a set of social actors (such as individuals or organizations)
and a set of the dyadic ties (or interactions, relations) between these actors.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 4 / 86
5. Social Network : Internet
Ref: http://supraliminalsolutions.com/
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 5 / 86
6. Social Network : Information exchange
Ref: https://niftynotcool.files.wordpress.com/2013/12/internet-wallpaper-hd.jpg
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 6 / 86
7. Social Network : Degree Centrality
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 7 / 86
8. Social Network : Betweenness Centrality
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 8 / 86
9. Social Network : IoT (Internet of Things)
Ref: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?subtype=XB&infotype=PM&appname=GBSE_GB_TI_
USEN&htmlfid=GBE03620USEN&attachment=GBE03620USEN.PDF
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 9 / 86
10. Social Network : Problem
Non-trivial task
incompletion
dynamic
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 10 / 86
11. Research Trend of Link Prediction
Keyword “link prediction social network”
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015):
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 11 / 86
12. Application of Link Prediction
1 추천 시스템 (links)
친구 추천 (12’)
공동저자 추천 (07’)
온라인 쇼핑몰의 상품 추천 (11’)
특허 추천 (13’)
타분야 협력자 추천 (12’)
연락처 추천 (11’)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 12 / 86
13. Application of Link Prediction
2 복잡계 연구 (links)
네트워크 진화 연구 (02’)
웹사이트 링크 예측 (02’)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 13 / 86
14. Application of Link Prediction
3 다양한 분야에 적용 (links)
헬스케어 (12’)
단백질 네트워크 (12’)
비정상적 커뮤니케이션 확인 (09’)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 14 / 86
15. Research Trend of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 15 / 86
16. Research Trend of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 16 / 86
17. Research Trend of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 17 / 86
18. Research Trend of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 18 / 86
19. Research Trend of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.” Science China Information Sciences 58.1 (2015)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 19 / 86
20. Definition of Link Prediction
사회망(social networks)에서 링크 예측이란
지금의 네트워크에서 빠진 링크를 예측하는 것
미래의 네트워크에서 새롭게 나타나거나 사라질 링크를 예측하는 것
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 20 / 86
21. Definition of Link Prediction
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 21 / 86
22. Definition of Link Prediction
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 22 / 86
23. Definition of Link Prediction
사회망
G(V , E) at t
에 대해,
링크가 생기거나 사라지는 것을 (t′ > t)
빠진 링크나 관찰되지 않은 링크가 있는 것을 (at t)
찾아내는 것.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 23 / 86
24. Framework of Link Prediction
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.”
Science China Information Sciences 58.1 (2015): 1-38.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 24 / 86
25. Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 25 / 86
26. Link Prediction Example : Terrorist Networks
Problems of criminal network analysis
1 Incompleteness - the inevitability of missing nodes and links that the
investigators will not uncover.
2 Fuzzy boundaries - the difficulty in deciding who to include and who
not to include.
3 Dynamic - these networks are not static, they are always changing.
http://pear.accc.uic.edu/ojs/index.php/fm/article/view/941/863
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 26 / 86
27. Link Prediction Example : Terrorist Networks
Several summaries of data about hijackers in major newspaper
Sydney Morning Herald, 2001
Washington Post, 2001
From 2 to 6 weeks after the event, it appeared that a new relationship
or node was added to the network on a daily basis.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 27 / 86
28. Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 28 / 86
29. Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 29 / 86
30. Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 30 / 86
31. Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 31 / 86
32. Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 32 / 86
33. Link Prediction Example : Terrorist Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 33 / 86
35. 링크 예측의 세분화
Wang, Peng, et al. ”Link prediction in social networks: the state-of-the-art.”
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 35 / 86
41. 얼마나 큰 행렬을 다룰 수 있나요?
NetworkX는 기본 네트워크 구조로 “dictionary of dictionaries of
dictionaries”를 사용
dict-of-dicts-of-dicts 자료 구조의 장점:
Find edges and remove edges with two dictionary look-ups.
Prefer to “lists” because of fast lookup with sparse storage.
Prefer to “sets” since data can be attached to edge.
G[u][v] returns the edge attribute dictionary.
n in G tests if node n is in graph G.
for n in G: iterates through the graph.
for nbr in G[n]: iterates through neighbors.
https://networkx.github.io/documentation/latest/reference/introduction.html
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 41 / 86
42. 얼마나 큰 행렬을 다룰 수 있나요?
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 42 / 86
43. 얼마나 큰 행렬을 다룰 수 있나요?
Million-scale Graphs Analytic Frameworks
SNAP : http://snap.stanford.edu/snappy/index.html
Billion-scale Graphs Analytic Frameworks
Apache Hama : https://hama.apache.org/ (소개글)
Pegasus : http://www.cs.cmu.edu/~pegasus/
s2graph : https://github.com/daumkakao/s2graph (슬라이드)
Graph Database
Neo4j : http://neo4j.com/
OrientDB : http://orientdb.com/
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 43 / 86
44. 네트워크 공부를 위한 기본 서적
1 Networks: An Introduction by Mark Newman
2 링크 : 21세기를 지배하는 네트워크 과학 LINKED The New Science of Networks
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 44 / 86
45. 링크를 예측하기 위한 준비 운동
1 NumPy : 계산 속도에 최적화된 모듈
2 Pandas : 데이터 구조
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 45 / 86
46. NumPy: Numerical Python
다차원 배열
1 근접 메모리를 사용하고, C언어로 구성됨
2 하나의 데이터 타입
3 연산이 한 번에 배열 내의 모든 요소에 적용됨
http://www.numpy.org/
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 46 / 86
47. NumPy: Numerical Python
tic = timeit.default_timer()
for index, value in enumerate(b):
b[index] = value*1.1
toc = timeit.default_timer()
print toc-tic
1.82178592682
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 47 / 86
48. NumPy: Numerical Python
import numpy as np
import timeit
a = np.arange(1e7)
b = list(a)
tic = timeit.default_timer()
a = a*1.1
toc = timeit.default_timer()
print toc-tic
0.029629945755
사용 방법에 따라, ndarray의 연산 속도는 list()보다 훨씬 빠름.
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 48 / 86
49. Pandas: Python Data Analysis Library
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 49 / 86
50. Pandas / get data yahoo
%pylab inline
import pandas as pd
import pandas.io.data
import datetime
start=datetime.datetime(2015,1,1); end=datetime.datetime(2015,8,26)
text = """A, AAPL, AMCC, AMD, AMGN, AMKR, AMNT.OB, AMZN, APC, ASOG.P
text = text.replace(’ ’, ’’).split(’,’)
corps = []
for t in text:
if ’.’ not in t:
corps.append(t)
Code : https://goo.gl/8ddrnS
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 50 / 86
51. Pandas / get data yahoo
df = pd.io.data.get_data_yahoo(corps, start=start, end=end)
df[’Adj Close’].head()
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 51 / 86
52. Pandas / Return Value
returns = df[’Adj Close’].pct_change()
corr = returns.corr()
corr
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 52 / 86
53. Pandas / Correlation
bm = corr>0.5
bm.astype(int)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 53 / 86
54. Pandas / Convert to array
mat = bm.astype(int).values
mat
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 54 / 86
55. NetworkX / from numpy matrix
import networkx as nx
graph = nx.from_numpy_matrix(mat)
graph = nx.relabel_nodes(graph, dict(enumerate(bm.columns)))
nx.draw(graph, with_labels=True)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 55 / 86
64. mecab api
import Umorpheme.morpheme as um
from collections import OrderedDict
s = ’유니스트는 울산에 있습니다’
server = ’http://information.center/api/korean’
apikey = ’’ # Register at http://information.center/korean
data = um.analyzer(s, server, apikey, ’유니스트,UNIST’, 1)
temp =
for key, value in data.items():
temp[int(key)] = value
data = OrderedDict(sorted(temp.items()))
for i, j in data.iteritems():
print i, j[’data’], j[’feature’]
0 유니스트 CUSTOM
1 는 JX
2 울산 NNP
3 에 JKB
4 있 VV
5 습니다 EC
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 64 / 86
65. Pandas에 대한 자세한 내용은..
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 65 / 86
66. 링크 예측의 기본 정의
Γ(x) : 점 x의 이웃들의 집합
|Γ(x)| : 점 x의 이웃들의 개수
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 66 / 86
67. 공통 이웃들
공통 이웃들(Common Neighbors):
CN(u, v) = |Γ(u) ∩ Γ(v)|
본 그래프는 실제가 아닌 가상으로 설정된 상황임을 알려드립니다
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 67 / 86
72. 한국어 표시하기
pip install --upgrade
git+https://github.com/koorukuroo/networkx_for_unicode
import matplotlib.font_manager as fm
fp1 = fm.FontProperties(fname="./NotoSansKR-Regular.otf")
nx.set_fontproperties(fp1)
G = nx.Graph()
G.add_edge(u’한국어’,u’영어’)
nx.draw(G, with_labels=True)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 72 / 86
73. 선호적 연결
선호적 연결(Preferential attachment):
|Γ(u)||Γ(v)|
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 73 / 86
74. 선호적 연결
nx.draw_networkx_nodes(G, pos, node_size=500, node_color=’yellow’)
nx.draw_networkx_edges(G, pos, alpha=0.2)
nx.draw_networkx_labels(G, pos, font_size=20);
selected_lines = []
for u in G.nodes_iter():
preds = nx.preferential_attachment(G, [(u, v) for v in nx.non_neighbors(G, u)])
largest = heapq.nlargest(5, preds, key = lambda x: x[2])
for l in largest:
selected_lines.append(l)
subG = nx.Graph()
for line in selected_lines:
print line[0], line[1], line[2]
if line[2]>1:
subG.add_edge(line[0], line[1])
pos_subG = dict()
for s in subG.nodes():
pos_subG[s] = pos[s]
nx.draw_networkx_edges(subG, pos_subG, edge_color=’red’)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 74 / 86
76. 선호적 연결
degree = nx.degree_centrality(G)
nx.draw_networkx_nodes(G, pos, node_color=’yellow’, nodelist=degree.keys(),
node_size=np.array(degree.values())*10000)
nx.draw_networkx_edges(G, pos, alpha=0.2)
nx.draw_networkx_labels(G, pos, font_size=20);
selected_lines = []
for u in G.nodes_iter():
preds = nx.preferential_attachment(G, [(u, v) for v in nx.non_neighbors(G, u)])
largest = heapq.nlargest(5, preds, key = lambda x: x[2])
for l in largest:
selected_lines.append(l)
subG = nx.Graph()
for line in selected_lines:
print line[0], line[1], line[2]
if line[2]>1:
subG.add_edge(line[0], line[1])
pos_subG = dict()
for s in subG.nodes():
pos_subG[s] = pos[s]
nx.draw_networkx_edges(subG, pos_subG, edge_color=’red’)
Kyunghoon Kim (UNIST) Network Link Prediction 2015. 9. 3. 76 / 86