The document discusses clustering movies by genre using k-means clustering. It extracts genre information from a movie dataset and represents each movie as a vector of its genres. It then applies k-means clustering with k=3 clusters to group similar movies together based on their genre vectors. The document outlines the k-means clustering process, including initializing cluster centroids randomly and repeatedly assigning movies to their closest centroid cluster until the clusters stabilize.
13. 영화 분류하기 – 클러스터링
K-Means 과정
- Centro-id1,2,3과데이터 셋의 유사도 측정
0.0, 0.0, 0.0, 0.0, 1.0, 0.0,
Toy Story (1995)]
1번클러스터Centro-id1
2번클러스터Centro-id2
3번클러스터Centro-id3
유사도계산0.95
0.85
0.98
14. 영화 분류하기 – 클러스터링
K-Means 과정
- 가까운 Centro-id의 클러스터링 묶음
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Copycat (1995)]
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Dead Man Walking (1995)]
0.0, 0.0, 0.0, 1.0, 0.0, Richard III (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Seven (Se7en) (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)
0.0, 0.0, 0.0, 0.0, 0.0, Mighty Aphrodite (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
1번클러스터Centro-id1
0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mighty Aphrodite (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
2번클러스터Centro-id2
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
3번클러스터Centro-id3
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
15. 영화 분류하기 – 클러스터링
K-Means 과정
- 클러스터링된 데이터셋의 중심값 구하기
1번클러스터Centro-id1
0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mighty Aphrodite (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mighty Aphrodite (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
2번클러스터Centro-id2
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
3번클러스터Centro-id3
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
0.0, 0.3, 1.0, 0.0, 0.1,
0.0, 1.0, 1.0, 0.8, 0.0,
0.9, 0.0, 1.0, 0.0, 0.3,
16. 영화 분류하기 – 클러스터링
K-Means 과정
- 새로운 중심값을 Centroid로 구성
1번클러스터newCentro-id1
2번클러스터newCentro-id2
3번클러스터newCentro-id3
0.0, 0.3, 1.0, 0.0, 0.1,
0.0, 1.0, 1.0, 0.8, 0.0,
0.9, 0.0, 1.0, 0.0, 0.3,
17. 영화 분류하기 – 클러스터링
K-Means 과정
- new Centro-id로 다시 클러스터링 실행
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Copycat (1995)]
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Dead Man Walking (1995)]
0.0, 0.0, 0.0, 1.0, 0.0, Richard III (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Seven (Se7en) (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)
0.0, 0.0, 0.0, 0.0, 0.0, Mighty Aphrodite (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
1번클러스터Centro-id1
0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mighty Aphrodite (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
2번클러스터Centro-id2
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
3번클러스터Centro-id3
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
18. 영화 분류하기 – 클러스터링
K-Means 과정
- 클러스터링된 데이터셋의 다시 중심값 구하기
1번클러스터Centro-id1
0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mighty Aphrodite (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Usual Suspects, The (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mighty Aphrodite (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Mr. Holland's Opus (1995)]
2번클러스터Centro-id2
0.0, 1.0, 0.0, 0.0, 0.0, Twelve Monkeys (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Babe (1995)]
3번클러스터Centro-id3
0.0, 0.0, 0.0, 0.0, 0.0, Toy Story (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, GoldenEye (1995)]
0.0, 0.0, 1.0, 0.0, 0.0, Four Rooms (1995)]
0.0, 0.0, 0.0, 0.0, 0.0, Get Shorty (1995)]
0.0, 0.3, 1.0, 0.0, 0.1,
0.0, 1.0, 1.0, 0.8, 0.0,
0.9, 0.0, 1.0, 0.0, 0.3,
19. 영화 분류하기 – 클러스터링
K-Means 과정
- 이전의 중심값과 새로운 중심값을 비교
- 클러스터링 반복
이전의중심값Centro-id1
0.0, 0.3, 1.0, 0.0, 0.1,
새로운중심값Centro-id1
0.0, 0.3, 1.0, 0.0, 0.1,
20. 영화 분류하기 – 클러스터링
K-Means 과정
- 이전의 중심값과 새로운 중심값을 비교
- 클러스터 종료
이전의중심값Centro-id1
0.0, 0.3, 1.0, 0.0, 0.1,
새로운중심값Centro-id1
0.0, 0.3, 1.0, 0.0, 0.1,
21. 영화 분류하기 – 최종 결과
Lion King, The (1994)
Snow White and the Seven Dwarfs (1937)
| All Dogs Go to Heaven 2 (1996) |
Bedknobs and Broomsticks (1971) |
Sound of Music, The (1965)
Robert A. Heinlein's The Puppet Masters (1994)
Blade Runner (1982) | Aristocats, The (1970)
Flipper (1996) | Wallace & Gromit: The Best
of Aardman Animation (1996) | Kansas City (1996)
| Homeward Bound: The Incredible Journey (1993)
| 20,000 Leagues Under the Sea (1954) | Brazil (
GoldenEye (1995)
Rumble in the Bronx (1995)
Bad Boys (1995)
Strange Days (1995)
Natural Born Killers (1994)
Stargate (1994)
Fugitive, The (1993)
Jurassic Park (1993) |