추놀 4회 영화 분류하기

추천아 놀자 4회
영화 분류하기
곧 시작함

영화 분류하기
영화의 19가지의 장르 정보로 유사한 것끼리 분류
- 데이터 셋 : movielens의 영화 장르 정보
- 분류 알고리즘 : k-means
- 영화 장르간의 유사도는 : cosine similarity

영화 분류하기 – 데이터 셋

영화 분류하기 – 데이터 추출( 장르 정보만 )
movie title + Action | Adventure | Animation ... 19개 장르 등
1|Toy Story (1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0
2|GoldenEye (1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
3|Four Rooms (1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0
4|Get Shorty (1995)|0|1|0|0|0|1|0|0|1|0|0|0|0|0|0|0|0|0|0

영화 분류하기 – 영화간의 유사도
Toy Story (1995)
|0|0|0|1|1|1|0|0|0|0|0|1|0|0|0|0|1|0|0
|0|1|1|0|0|0|0|0|0|0|0|1|0|0|0|0|1|0|0
GoldenEye

주어진데이터를K개의군집으로나누는알고리즘이다.
①나눌군집개수K를결정
②임의의군집중심으로가까운점들끼리묶음
③각각의군집에대하여평균을새로구함
④새로운평균의중심값으로가장근접한점들끼리묶음
⑤3번,4번단계를반복적으로수행하여변경이없을때까지수행
① ② ③ ④
⑤
영화 분류하기 – K-Means 클러스터링

영화 분류하기 – 클러스터링
K-Means 과정
- 데이터 셋 만들다(Vector)
[0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
[0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, Copycat (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Shanghai Triad (Yao a yao yao
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Dead Man Walking (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, Richard III (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, Seven (Se7en) (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Mighty Aphrodite (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, Postino, Il (1994)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
.
.
.
[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, From Dusk Till Dawn (1996)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, White Balloon, The (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Antonia's Line (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, Angels and Insects (1995)]
[0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, Muppet Treasure Island (1996)
[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, Braveheart (1995)]

K-Means 과정
- 클러스터링 개수 설정
3개

K-Means 과정
- 초기 Centro-id 결정 : 무작위 결정
[0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
[0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, Copycat (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Dead Man Walking (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, Richard III (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, Seven (Se7en) (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Mighty Aphrodite (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, Postino, Il (1994)
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
.
.
.
[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, From Dusk Till Dawn (1996)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, White Balloon, The (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, Antonia's Line (1995)]
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, Angels
[0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, Muppet Treasure Island (1996)
[0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, Braveheart (1995)]
1번클러스터Centro-id

K-Means 과정
- Centro-id1,2,3과데이터 셋의 유사도 측정
0.0, 0.0, 0.0, 0.0, 1.0, 0.0,
Toy Story (1995)]
1번클러스터Centro-id1
유사도계산0.95
0.85
0.98

K-Means 과정
- 가까운 Centro-id의 클러스터링 묶음
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Copycat (1995)]
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Dead Man Walking (1995)]
0.0, 0.0, 0.0, 1.0, 0.0, Richard III (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Seven (Se7en) (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)
0.0, 0.0, 0.0, 0.0, 0.0, Mighty Aphrodite (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]

K-Means 과정
- 클러스터링된 데이터셋의 중심값 구하기
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
0.0, 0.3, 1.0, 0.0, 0.1,
0.0, 1.0, 1.0, 0.8, 0.0,
0.9, 0.0, 1.0, 0.0, 0.3,

K-Means 과정
- 새로운 중심값을 Centroid로 구성
1번클러스터newCentro-id1
0.0, 0.3, 1.0, 0.0, 0.1,
0.0, 1.0, 1.0, 0.8, 0.0,
0.9, 0.0, 1.0, 0.0, 0.3,

K-Means 과정
- new Centro-id로 다시 클러스터링 실행
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Copycat (1995)]
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Dead Man Walking (1995)]
0.0, 0.0, 0.0, 1.0, 0.0, Richard III (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Seven (Se7en) (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]

K-Means 과정
- 클러스터링된 데이터셋의 다시 중심값 구하기
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
0.0, 0.3, 1.0, 0.0, 0.1,
0.0, 1.0, 1.0, 0.8, 0.0,
0.9, 0.0, 1.0, 0.0, 0.3,

K-Means 과정
- 이전의 중심값과 새로운 중심값을 비교
- 클러스터링 반복
이전의중심값Centro-id1
0.0, 0.3, 1.0, 0.0, 0.1,
새로운중심값Centro-id1
0.0, 0.3, 1.0, 0.0, 0.1,

K-Means 과정
- 이전의 중심값과 새로운 중심값을 비교
- 클러스터 종료
이전의중심값Centro-id1
0.0, 0.3, 1.0, 0.0, 0.1,
새로운중심값Centro-id1
0.0, 0.3, 1.0, 0.0, 0.1,

영화 분류하기 – 최종 결과
Lion King, The (1994)
Snow White and the Seven Dwarfs (1937)
| All Dogs Go to Heaven 2 (1996) |
Bedknobs and Broomsticks (1971) |
Sound of Music, The (1965)
Robert A. Heinlein's The Puppet Masters (1994)
Blade Runner (1982) | Aristocats, The (1970)
Flipper (1996) | Wallace & Gromit: The Best
of Aardman Animation (1996) | Kansas City (1996)
| Homeward Bound: The Incredible Journey (1993)
| 20,000 Leagues Under the Sea (1954) | Brazil (
GoldenEye (1995)
Rumble in the Bronx (1995)
Bad Boys (1995)
Strange Days (1995)
Natural Born Killers (1994)
Stargate (1994)
Fugitive, The (1993)
Jurassic Park (1993) |

감사합니다.
방송국 : Afreecatv.com/goodvc
블로그 : goodvc78.postach.io

추놀 4회 영화 분류하기

More Related Content

Viewers also liked

More from choi kyumin

Recently uploaded

추놀 4회 영화 분류하기