From Power Chord to the Power of Models - Oredev

From
Power Chords
to
the Power of
Models
@aliostad
Ali Kheyrollahi

> stackoverflow
> £1.5 bln
global fashion
destination
> 35% every year

8
Local pop music
9
Local pop music
“Cheelee pom!”
10
Boney M
“Rasputin”
11
Blondie
“Heart of Glass”

Data Source - Wiki
4,990,2794,990,279 English Articles
37,583,879 Articles

Data Source - Wiki vs Britannica
Feng Zhu (assistant prof at Harvard):
“There has been lots of research on the accuracy of
Wikipedia, and the results are mixed—some studies
show it is just as good as the experts, others show
[that] Wikipedia is not accurate at all.”
“… the editors [of Britannica] are still not
found to be more objective than the crowd
in articles that are suﬃciently revised.”

Data Source - Wikipedia in scholar papers
0
45000
90000
135000
180000
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Source: Google Scholar

Data Acquisition - Wiki
List of Rock
Genres
Rock Genres Rock Artists
Store
Store
HTML
Capture
Links
Store
HTML
Python scripts
Postgres

Data Source - Content vs. Data
Hyphen
U+002D
ﬁgure dash
U+2012
minus sign
U+2015
em dash
U+2014
en dash
U+2013

Data Exploration
“I personally … literally just look at the screen,
just like the matrix”
Claudia Perlich, multi-award winner Data
Scientist

Data Exploration
“… the dirty little secret that I have won all of them
because I have found something wrong with the
data… I would like to play around with dataset and
get initimately familiar with dataset and its
properties.“
Claudia Perlich

Album Genre
http://wiki-rock.azurewebsites.net/top10-album-genres.html

Data Models Model
Mathematical representation of a concept
based on parameters that impact that
concept
• Rating of a native app
• Stackoverﬂow score
• Credit score
• Fraud check

“All models are wrong… but
some are useful.
George Box
Data Models Model

Data Models Graph 101
Social Network Analysis
and Graph Theory
• Nodes/vertices and edges/lines
• Directedness:
• Directed
• Undirected
• Degree, InDegree/OutDegree
• Weight
A B

Data Models Centrality
1
2
4
2
2
1
Same degree
Different betweenness
Degree

Graph Codez
import networkx as nx
g = nx.Graph()
g.add_edge(‘a’, ‘b’)
g.add_edge(‘b’, ‘c’)
…
print len(g[‘b’]) # degree
c = nx.betweenness_centrality(g, normalized=True)
# c -> dictionary of node names and their score
DiGraph()

Modelling
Inﬂuence
using
Wiki

Data Models Cited Inﬂuence
Howlin’
Wolf
Captain
Beefheart
1940 1964

Most inﬂuential Rock Artists Based on out-degree
The Beatles => 188
Black Sabbath => 127
Led Zeppelin => 118
Jimi Hendrix => 114
Bob Dylan => 94
Pink Floyd => 86
Iron Maiden => 77
Metallica => 77
The Rolling Stones => 66
The Beach Boys => 65
Neil Young => 63
Nirvana => 62
Slayer => 60
Queen => 59

Most inﬂuential Rock Artists Based on Betweenness Centrality
Jimi Hendrix => 53476.2014921
The Beatles => 47511.7957531
Bob Dylan => 38107.0298185
Led Zeppelin => 32701.7223273
Nirvana => 29733.9066836
Metallica => 29356.6009213
Queen => 28989.2844223
Robert Smith => 28880.670718
Elvis Presley => 28463.2891497
Slade => 27656.487307
Iron Maiden => 22449.6697023
Ramones => 22437.6112965
Rush => 21125.9481602
Neil Young => 19913.887522

Most influential Artists Based on Betweenness Centrality
Metallica => 566.06
Iron Maiden => 419.21
Corey Taylor => 146.0
Led Zeppelin => 122.73
Slipknot => 116.58
King Diamond => 94.7
Machine Head => 85.12
Rush => 70.41
Black Sabbath => 68.0
Van Halen => 54.56
Deep Purple => 53.5
Megadeth => 42.63
Guns N' Roses => 24.25
Heavy Metal
Nirvana => 490.08
Muse => 114.5
Weezer => 97.33
Pixies => 94.17
Sonic Youth => 78.5
Rivers Cuomo => 69.5
Siouxsie and the Banshees => 51.67
The Smiths => 51.5
Jeff Buckley => 46.17
The Offspring => 43.0
Placebo => 42.0
My Chemical Romance => 34.0
The Smashing Pumpkins => 32.33
Alternative Rock
Rush => 54.0
Marillion => 34.0
Pink Floyd => 33.0
Yes => 20.0
Porcupine Tree => 19.5
Dream Theater => 19.0
Chris Squire => 16.5
Primus => 15.0
Tool => 12.0
Mahavishnu Orchestra => 8.0
Geddy Lee => 7.0
Neil Peart => 5.0
Keith Emerson => 5.0
Progressive Rock

Data Models Page Rank
The Beatles => 0.00837723421839
Blind Lemon Jefferson => 0.00837369035189
Josh White => 0.00824945015047
Bessie Smith => 0.00717743996144
Louis Armstrong => 0.00692897940193
James P. Johnson => 0.00628676810257
Little Richard => 0.00584677302727
Muddy Waters => 0.005773172933
Tampa Red => 0.00572032424174
Robert Johnson => 0.00523579252974
Big Bill Broonzy => 0.00516075834679
Moon Mullican => 0.0050657751593
Black Sabbath => 0.00498789229732
Elvis Presley => 0.00497932058047
Duke Ellington => 0.00465800760107
Bo Diddley => 0.0044496675634
Jimmy Page => 0.00437658472459
Frank Zappa => 0.00431978608953
Miles Davis => 0.00396303890974
Jimi Hendrix => 0.00391117233916
Sister Rosetta Tharpe => 0.00390833570401
Bing Crosby => 0.00385435213525
Bob Dylan => 0.00358608821536
James Brown => 0.00349870931123

Weighted graph Album Genres
Krautrock
Psychedelic Rock
Experimental
Rock
1
1
1

Genre Aﬃnity
Indie Rock
Shoegazing
Alternative Rock
Dream Pop
22
25
24
12
Post-rock

Genre Aﬃnity
Gothic Metal
Doom Metal
Black Metal
Heavy Metal
13
34
27
12
Stoner Metal

Clustering in Networks
u1 u2 u3 u4 u5
u1 1 0 0 1
u2 1 1 1 0
u3 0 1 0 1
u4 0 1 0 1
u5 1 0 1 1
Adjacency Matrix
(Similarity Matrix)
u1 u2 u3 u4 u5
u1 2
u2 3
u3 2
u4 2
u5 3
Degree Matrix
1
5
4
2
3

u1 u2 u3 u4 u5
u1 2
u2 3
u3 2
u4 2
u5 3
Spectral Clustering:
Using Eigenvectors of the Laplacian Matrix
−
u1 u2 u3 u4 u5
u1 1 0 0 1
u2 1 1 1 0
u3 0 1 0 1
u4 0 1 0 1
u5 1 0 1 1
=
u1 u2 u3 u4 u5
u1 2 -1 0 0 -1
u2 -1 3 -1 -1 0
u3 0 -1 2 0 -1
u4 0 -1 0 2 -1
u5 -1 0 -1 -1 3
Degree Matrix
Adjacency Matrix
(Similarity Matrix)
Laplacian Matrix

Eigenvector: a vector (v) that by getting multiplied in matrix A
does not result in changing its direction (similar to being
multiplied by scalar λ)
u1 u2 u3 u4 u5
-0.7 0.3 -0.2 -0.1 0.7
-0.7 0.3 -0.2 -0.1 0.7

Spectral Clustering Codez
from sklearn.cluster import spectral_clustering
import numpy as np
A = [[0.0 for x in n] for x in n]
… # build adjacency matrix
res = spectral_clustering(np.matrix(A),
n_clusters)
# res -> list of cluster indices e.g. [1,1,0,5,…]

Spectral Clustering Results
Folk Rock
Country Rock
Blues
Folk
Country
Americana
Roots Rock
Blues Rock
Southern Rock
Power Metal
Progressive Metal
Symphonic Metal
Black Metal
Melodic Death Metal
Groove Metal
Nu Metal
Thrash Metal
Death Metal
Metalcore
Industrial Metal
Gothic Metal
Christian Metal
Doom Metal
Speed Metal
Alternative Rock
Indie Rock
New Wave
Synthpop
Electronica
Rock
R&B
Pop
Pop Rock
Funk
Soul
Heavy Metal
Hard Rock
Alternative Metal

word2vec Model
Skip-gram: a proximity-based probability model trained
using Neural Networks (Deep Learning)
Pink Floyd were an English rock band formed in London
X XX

word2vec Representation
rock
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0010000000000
Pink Floyd
band
formed
London
0000000010000
0000000000010
1000000000000
0.9
0.1
0.2
0.4
0.1
0.1
0.8
0.1
0.1
0.4
0.1
0.2
pop

Album Genre Model
Fun Happy Saturday
We Are Friends Electronic
Frozen Blood In My Veins
Redneck Dance
Chaos and Mayhem
Basement Dub
Sentiment Analysis in text
Predicting the genre based on name of the album

Deep Learning Basics
1) Traditional Neural Networks with many layers
2) Often uses convolution as the node function
3) Training on Big Data can take weeks even on GPU
0) A method of supervised learning
4) Huge success attributed to improved training,
powerful computation and above all Big Data
5) Pooling, Dropout and local connections important

Deep Learning TensorFlow
“Wish you were here”
=> [123, 101, 42, 1969 ]
=> [123, 101, 42, 1969, 0, 0, 0, … 0 ]
Rock
=> [0, 0, 0, 1, 0, 0, 0, 0 ]
=> [[100000000000],[000000010000], … ]

References
•All pictures from wikipedia.org used under Creative Commons
•Source of all data is from wikipedia.org collected online using a single call and then stored and processed
•Eﬃcient Estimation of Word Representations in Vector Space. Mikolov et. al. http://arxiv.org/abs/1301.3781
•Gensim's word2vec
•networkx lib
•word2vec blog post (500K docs): Five crazy abstractions my Deep Learning word2vec model just did
•word2vec on Rock music blog: Daft Punk+Tool=Muse: word2vec model trained on a small Rock music corpus
•code for word2vec on wiki data
•Highcharts: highcharts
•word2vec paper: PDF
•Automatic real-time road marking recognition using a feature-driven approach PDF
•Video of the road marking recognition: here and here and here
•Future of Programming - Rise of the Scientiﬁc Programmer (and fall of the craftsman)
•Deep Learning articles
•code for Deep Learning genre analysis
•…

From Power Chord to the Power of Models - Oredev

Recommended

Recommended

More Related Content

More from Ali Kheyrollahi

More from Ali Kheyrollahi (15)

Recently uploaded

Recently uploaded (20)

From Power Chord to the Power of Models - Oredev