SlideShare a Scribd company logo
From
Power Chords
to
the Power of
Models
@aliostad
Ali Kheyrollahi
> stackoverflow
> £1.5 bln
global fashion
destination
> 35% every year
8
Local pop music
9
Local pop music
“Cheelee pom!”
10
Boney M
“Rasputin”
11
Blondie
“Heart of Glass”
Infobox
Free text
Links
Data
Acquisition
Data Source - Wiki
4,990,2794,990,279 English Articles
37,583,879 Articles
Data Source - Wiki vs Britannica
Feng Zhu (assistant prof at Harvard):
“There has been lots of research on the accuracy of
Wikipedia, and the results are mixed—some studies
show it is just as good as the experts, others show
[that] Wikipedia is not accurate at all.”
“… the editors [of Britannica] are still not
found to be more objective than the crowd
in articles that are sufficiently revised.”
Data Source - Wikipedia in scholar papers
0
45000
90000
135000
180000
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
Source: Google Scholar
Data Acquisition - Wiki
List of Rock
Genres
Rock Genres Rock Artists
Store
Store
HTML
Capture
Links
Store
HTML
Python scripts
Postgres
Data Source - Content vs. Data
Hyphen
U+002D
figure dash
U+2012
minus sign
U+2015
em dash
U+2014
en dash
U+2013
Data
Exploration
Data Exploration
“I personally … literally just look at the screen,
just like the matrix”
Claudia Perlich, multi-award winner Data
Scientist
Data Exploration
“… the dirty little secret that I have won all of them
because I have found something wrong with the
data… I would like to play around with dataset and
get initimately familiar with dataset and its
properties.“
Claudia Perlich
Album Genre
Album Genre
http://wiki-rock.azurewebsites.net/top10-album-genres.html
Data
Models
Data Models Model?!
Data Models Model
Mathematical representation of a concept
based on parameters that impact that
concept
• Rating of a native app
• Stackoverflow score
• Credit score
• Fraud check
“All models are wrong… but
some are useful.
George Box
Data Models Model
Data Models Graph 101
Social Network Analysis
and Graph Theory
• Nodes/vertices and edges/lines
• Directedness:
• Directed
• Undirected
• Degree, InDegree/OutDegree
• Weight
A B
Data Models Centrality
1
2
4
2
2
1
Same degree
Different betweenness
Degree
Graph Codez
import networkx as nx
g = nx.Graph()
g.add_edge(‘a’, ‘b’)
g.add_edge(‘b’, ‘c’)
…
print len(g[‘b’]) # degree
c = nx.betweenness_centrality(g, normalized=True)
# c -> dictionary of node names and their score
DiGraph()
Modelling
Influence
using
Wiki
Data Models Cited Influence
Howlin’
Wolf
Captain
Beefheart
1940 1964
Data Models Cited Influence
Most influential Rock Artists Based on out-degree
The Beatles => 188
Black Sabbath => 127
Led Zeppelin => 118
Jimi Hendrix => 114
Bob Dylan => 94
Pink Floyd => 86
Iron Maiden => 77
Metallica => 77
The Rolling Stones => 66
The Beach Boys => 65
Neil Young => 63
Nirvana => 62
Slayer => 60
Queen => 59
Data Models Cited Influence
Most influential Rock Artists Based on Betweenness Centrality
Jimi Hendrix => 53476.2014921
The Beatles => 47511.7957531
Bob Dylan => 38107.0298185
Led Zeppelin => 32701.7223273
Nirvana => 29733.9066836
Metallica => 29356.6009213
Queen => 28989.2844223
Robert Smith => 28880.670718
Elvis Presley => 28463.2891497
Slade => 27656.487307
Iron Maiden => 22449.6697023
Ramones => 22437.6112965
Rush => 21125.9481602
Neil Young => 19913.887522
Data Models Cited Influence
Most influential Artists Based on Betweenness Centrality
Metallica => 566.06
Iron Maiden => 419.21
Corey Taylor => 146.0
Led Zeppelin => 122.73
Slipknot => 116.58
King Diamond => 94.7
Machine Head => 85.12
Rush => 70.41
Black Sabbath => 68.0
Van Halen => 54.56
Deep Purple => 53.5
Megadeth => 42.63
Guns N' Roses => 24.25
Heavy Metal
Nirvana => 490.08
Muse => 114.5
Weezer => 97.33
Pixies => 94.17
Sonic Youth => 78.5
Rivers Cuomo => 69.5
Siouxsie and the Banshees => 51.67
The Smiths => 51.5
Jeff Buckley => 46.17
The Offspring => 43.0
Placebo => 42.0
My Chemical Romance => 34.0
The Smashing Pumpkins => 32.33
Alternative Rock
Rush => 54.0
Marillion => 34.0
Pink Floyd => 33.0
Yes => 20.0
Porcupine Tree => 19.5
Dream Theater => 19.0
Chris Squire => 16.5
Primus => 15.0
Tool => 12.0
Mahavishnu Orchestra => 8.0
Geddy Lee => 7.0
Neil Peart => 5.0
Keith Emerson => 5.0
Progressive Rock
Data Models PageRank
Data Models Page Rank
The Beatles => 0.00837723421839
Blind Lemon Jefferson => 0.00837369035189
Josh White => 0.00824945015047
Bessie Smith => 0.00717743996144
Louis Armstrong => 0.00692897940193
James P. Johnson => 0.00628676810257
Little Richard => 0.00584677302727
Muddy Waters => 0.005773172933
Tampa Red => 0.00572032424174
Robert Johnson => 0.00523579252974
Big Bill Broonzy => 0.00516075834679
Moon Mullican => 0.0050657751593
Black Sabbath => 0.00498789229732
Elvis Presley => 0.00497932058047
Duke Ellington => 0.00465800760107
Bo Diddley => 0.0044496675634
Jimmy Page => 0.00437658472459
Frank Zappa => 0.00431978608953
Miles Davis => 0.00396303890974
Jimi Hendrix => 0.00391117233916
Sister Rosetta Tharpe => 0.00390833570401
Bing Crosby => 0.00385435213525
Bob Dylan => 0.00358608821536
James Brown => 0.00349870931123
Other
Models
Weighted graph Album Genres
Krautrock
Psychedelic Rock
Experimental
Rock
1
1
1
Genre Affinity
Indie Rock
Shoegazing
Alternative Rock
Dream Pop
22
25
24
12
Post-rock
Genre Affinity
Gothic Metal
Doom Metal
Black Metal
Heavy Metal
13
34
27
12
Stoner Metal
Clustering in Networks
Clustering in Networks
u1 u2 u3 u4 u5
u1 1 0 0 1
u2 1 1 1 0
u3 0 1 0 1
u4 0 1 0 1
u5 1 0 1 1
Adjacency Matrix
(Similarity Matrix)
u1 u2 u3 u4 u5
u1 2
u2 3
u3 2
u4 2
u5 3
Degree Matrix
1
5
4
2
3
Clustering in Networks
u1 u2 u3 u4 u5
u1 2
u2 3
u3 2
u4 2
u5 3
Spectral Clustering:
Using Eigenvectors of the Laplacian Matrix
−
u1 u2 u3 u4 u5
u1 1 0 0 1
u2 1 1 1 0
u3 0 1 0 1
u4 0 1 0 1
u5 1 0 1 1
=
u1 u2 u3 u4 u5
u1 2 -1 0 0 -1
u2 -1 3 -1 -1 0
u3 0 -1 2 0 -1
u4 0 -1 0 2 -1
u5 -1 0 -1 -1 3
Degree Matrix
Adjacency Matrix
(Similarity Matrix)
Laplacian Matrix
Clustering in Networks
Eigenvector: a vector (v) that by getting multiplied in matrix A
does not result in changing its direction (similar to being
multiplied by scalar λ)
u1 u2 u3 u4 u5
-0.7 0.3 -0.2 -0.1 0.7
-0.7 0.3 -0.2 -0.1 0.7
Spectral Clustering Codez
from sklearn.cluster import spectral_clustering
import numpy as np
A = [[0.0 for x in n] for x in n]
… # build adjacency matrix
res = spectral_clustering(np.matrix(A),
n_clusters)
# res -> list of cluster indices e.g. [1,1,0,5,…]
Spectral Clustering Results
Folk Rock
Country Rock
Blues
Folk
Country
Americana
Roots Rock
Blues Rock
Southern Rock
Power Metal
Progressive Metal
Symphonic Metal
Black Metal
Melodic Death Metal
Groove Metal
Nu Metal
Thrash Metal
Death Metal
Metalcore
Industrial Metal
Gothic Metal
Christian Metal
Doom Metal
Speed Metal
Alternative Rock
Indie Rock
New Wave
Synthpop
Electronica
Rock
R&B
Pop
Pop Rock
Funk
Soul
Heavy Metal
Hard Rock
Alternative Metal
Intelligent
Models
word2vec Model
Skip-gram: a proximity-based probability model trained
using Neural Networks (Deep Learning)
Pink Floyd were an English rock band formed in London
X XX
word2vec Representation
rock
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0010000000000
Pink Floyd
band
formed
London
0000000010000
0000000000010
1000000000000
0.9
0.1
0.2
0.4
0.1
0.1
0.8
0.1
0.1
0.4
0.1
0.2
pop
word2vec Demo
Album Genre Model
Fun Happy Saturday
We Are Friends Electronic
Frozen Blood In My Veins
Redneck Dance
Chaos and Mayhem
Basement Dub
Sentiment Analysis in text
Predicting the genre based on name of the album
Deep Learning Basics
1) Traditional Neural Networks with many layers
2) Often uses convolution as the node function
3) Training on Big Data can take weeks even on GPU
0) A method of supervised learning
4) Huge success attributed to improved training,
powerful computation and above all Big Data
5) Pooling, Dropout and local connections important
Deep Learning Topology
Deep Learning TensorFlow
“Wish you were here”
=> [123, 101, 42, 1969 ]
=> [123, 101, 42, 1969, 0, 0, 0, … 0 ]
Rock
=> [0, 0, 0, 1, 0, 0, 0, 0 ]
=> [[100000000000],[000000010000], … ]
Deep Learning Demo
Wrap-up
References
•All pictures from wikipedia.org used under Creative Commons
•Source of all data is from wikipedia.org collected online using a single call and then stored and processed
•Efficient Estimation of Word Representations in Vector Space. Mikolov et. al. http://arxiv.org/abs/1301.3781
•Gensim's word2vec
•networkx lib
•word2vec blog post (500K docs): Five crazy abstractions my Deep Learning word2vec model just did
•word2vec on Rock music blog: Daft Punk+Tool=Muse: word2vec model trained on a small Rock music corpus
•code for word2vec on wiki data
•Highcharts: highcharts
•word2vec paper: PDF
•Automatic real-time road marking recognition using a feature-driven approach PDF
•Video of the road marking recognition: here and here and here
•Future of Programming - Rise of the Scientific Programmer (and fall of the craftsman)
•Deep Learning articles
•code for Deep Learning genre analysis
•…

More Related Content

More from Ali Kheyrollahi

Microservice Architecture at ASOS - DevSum 2017
Microservice Architecture at ASOS - DevSum 2017Microservice Architecture at ASOS - DevSum 2017
Microservice Architecture at ASOS - DevSum 2017
Ali Kheyrollahi
 
5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama
Ali Kheyrollahi
 
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchReal time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Ali Kheyrollahi
 
5 must-have patterns for your microservice - buildstuff
5 must-have patterns for your microservice - buildstuff5 must-have patterns for your microservice - buildstuff
5 must-have patterns for your microservice - buildstuff
Ali Kheyrollahi
 
From Hard Science to Baseless Opinions - Oredev
From Hard Science to Baseless Opinions  - OredevFrom Hard Science to Baseless Opinions  - Oredev
From Hard Science to Baseless Opinions - Oredev
Ali Kheyrollahi
 
5 must have patterns for your microservice
5 must have patterns for your microservice5 must have patterns for your microservice
5 must have patterns for your microservice
Ali Kheyrollahi
 
From hard science to baseless opinions
From hard science to baseless opinionsFrom hard science to baseless opinions
From hard science to baseless opinions
Ali Kheyrollahi
 
Microservice architecture at ASOS
Microservice architecture at ASOSMicroservice architecture at ASOS
Microservice architecture at ASOS
Ali Kheyrollahi
 
Us Elections 2016 - Iran Elections 2005
Us Elections 2016 - Iran Elections 2005Us Elections 2016 - Iran Elections 2005
Us Elections 2016 - Iran Elections 2005
Ali Kheyrollahi
 
5 Anti-Patterns in Api Design - NDC London 2016
5 Anti-Patterns in Api Design - NDC London 20165 Anti-Patterns in Api Design - NDC London 2016
5 Anti-Patterns in Api Design - NDC London 2016
Ali Kheyrollahi
 
5 Anti-Patterns in Api Design - buildstuff
5 Anti-Patterns in Api Design - buildstuff5 Anti-Patterns in Api Design - buildstuff
5 Anti-Patterns in Api Design - buildstuff
Ali Kheyrollahi
 
5 Anti-Patterns in API Design - DDD East Anglia 2015
5 Anti-Patterns in API Design - DDD East Anglia 20155 Anti-Patterns in API Design - DDD East Anglia 2015
5 Anti-Patterns in API Design - DDD East Anglia 2015
Ali Kheyrollahi
 
5 Anti-Patterns in API Design
5 Anti-Patterns in API Design5 Anti-Patterns in API Design
5 Anti-Patterns in API Design
Ali Kheyrollahi
 
Topic Modelling and APIs
Topic Modelling and APIsTopic Modelling and APIs
Topic Modelling and APIs
Ali Kheyrollahi
 
Http caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCowHttp caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCow
Ali Kheyrollahi
 

More from Ali Kheyrollahi (15)

Microservice Architecture at ASOS - DevSum 2017
Microservice Architecture at ASOS - DevSum 2017Microservice Architecture at ASOS - DevSum 2017
Microservice Architecture at ASOS - DevSum 2017
 
5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama5 must have patterns for your microservice - techorama
5 must have patterns for your microservice - techorama
 
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in ElasticsearchReal time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
Real time monitoring-alerting: storing 2Tb of logs a day in Elasticsearch
 
5 must-have patterns for your microservice - buildstuff
5 must-have patterns for your microservice - buildstuff5 must-have patterns for your microservice - buildstuff
5 must-have patterns for your microservice - buildstuff
 
From Hard Science to Baseless Opinions - Oredev
From Hard Science to Baseless Opinions  - OredevFrom Hard Science to Baseless Opinions  - Oredev
From Hard Science to Baseless Opinions - Oredev
 
5 must have patterns for your microservice
5 must have patterns for your microservice5 must have patterns for your microservice
5 must have patterns for your microservice
 
From hard science to baseless opinions
From hard science to baseless opinionsFrom hard science to baseless opinions
From hard science to baseless opinions
 
Microservice architecture at ASOS
Microservice architecture at ASOSMicroservice architecture at ASOS
Microservice architecture at ASOS
 
Us Elections 2016 - Iran Elections 2005
Us Elections 2016 - Iran Elections 2005Us Elections 2016 - Iran Elections 2005
Us Elections 2016 - Iran Elections 2005
 
5 Anti-Patterns in Api Design - NDC London 2016
5 Anti-Patterns in Api Design - NDC London 20165 Anti-Patterns in Api Design - NDC London 2016
5 Anti-Patterns in Api Design - NDC London 2016
 
5 Anti-Patterns in Api Design - buildstuff
5 Anti-Patterns in Api Design - buildstuff5 Anti-Patterns in Api Design - buildstuff
5 Anti-Patterns in Api Design - buildstuff
 
5 Anti-Patterns in API Design - DDD East Anglia 2015
5 Anti-Patterns in API Design - DDD East Anglia 20155 Anti-Patterns in API Design - DDD East Anglia 2015
5 Anti-Patterns in API Design - DDD East Anglia 2015
 
5 Anti-Patterns in API Design
5 Anti-Patterns in API Design5 Anti-Patterns in API Design
5 Anti-Patterns in API Design
 
Topic Modelling and APIs
Topic Modelling and APIsTopic Modelling and APIs
Topic Modelling and APIs
 
Http caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCowHttp caching 101 and a bit of CacheCow
Http caching 101 and a bit of CacheCow
 

Recently uploaded

Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
Aftab Hussain
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
Google
 

Recently uploaded (20)

Graspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code AnalysisGraspan: A Big Data System for Big Code Analysis
Graspan: A Big Data System for Big Code Analysis
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
AI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website CreatorAI Genie Review: World’s First Open AI WordPress Website Creator
AI Genie Review: World’s First Open AI WordPress Website Creator
 

From Power Chord to the Power of Models - Oredev

  • 1. From Power Chords to the Power of Models @aliostad Ali Kheyrollahi
  • 2. > stackoverflow > £1.5 bln global fashion destination > 35% every year
  • 3. 8 Local pop music 9 Local pop music “Cheelee pom!” 10 Boney M “Rasputin” 11 Blondie “Heart of Glass”
  • 4.
  • 7. Data Source - Wiki 4,990,2794,990,279 English Articles 37,583,879 Articles
  • 8. Data Source - Wiki vs Britannica Feng Zhu (assistant prof at Harvard): “There has been lots of research on the accuracy of Wikipedia, and the results are mixed—some studies show it is just as good as the experts, others show [that] Wikipedia is not accurate at all.” “… the editors [of Britannica] are still not found to be more objective than the crowd in articles that are sufficiently revised.”
  • 9. Data Source - Wikipedia in scholar papers 0 45000 90000 135000 180000 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Source: Google Scholar
  • 10. Data Acquisition - Wiki List of Rock Genres Rock Genres Rock Artists Store Store HTML Capture Links Store HTML Python scripts Postgres
  • 11. Data Source - Content vs. Data Hyphen U+002D figure dash U+2012 minus sign U+2015 em dash U+2014 en dash U+2013
  • 13. Data Exploration “I personally … literally just look at the screen, just like the matrix” Claudia Perlich, multi-award winner Data Scientist
  • 14. Data Exploration “… the dirty little secret that I have won all of them because I have found something wrong with the data… I would like to play around with dataset and get initimately familiar with dataset and its properties.“ Claudia Perlich
  • 19. Data Models Model Mathematical representation of a concept based on parameters that impact that concept • Rating of a native app • Stackoverflow score • Credit score • Fraud check
  • 20. “All models are wrong… but some are useful. George Box Data Models Model
  • 21. Data Models Graph 101 Social Network Analysis and Graph Theory • Nodes/vertices and edges/lines • Directedness: • Directed • Undirected • Degree, InDegree/OutDegree • Weight A B
  • 22. Data Models Centrality 1 2 4 2 2 1 Same degree Different betweenness Degree
  • 23. Graph Codez import networkx as nx g = nx.Graph() g.add_edge(‘a’, ‘b’) g.add_edge(‘b’, ‘c’) … print len(g[‘b’]) # degree c = nx.betweenness_centrality(g, normalized=True) # c -> dictionary of node names and their score DiGraph()
  • 25. Data Models Cited Influence Howlin’ Wolf Captain Beefheart 1940 1964
  • 26. Data Models Cited Influence Most influential Rock Artists Based on out-degree The Beatles => 188 Black Sabbath => 127 Led Zeppelin => 118 Jimi Hendrix => 114 Bob Dylan => 94 Pink Floyd => 86 Iron Maiden => 77 Metallica => 77 The Rolling Stones => 66 The Beach Boys => 65 Neil Young => 63 Nirvana => 62 Slayer => 60 Queen => 59
  • 27. Data Models Cited Influence Most influential Rock Artists Based on Betweenness Centrality Jimi Hendrix => 53476.2014921 The Beatles => 47511.7957531 Bob Dylan => 38107.0298185 Led Zeppelin => 32701.7223273 Nirvana => 29733.9066836 Metallica => 29356.6009213 Queen => 28989.2844223 Robert Smith => 28880.670718 Elvis Presley => 28463.2891497 Slade => 27656.487307 Iron Maiden => 22449.6697023 Ramones => 22437.6112965 Rush => 21125.9481602 Neil Young => 19913.887522
  • 28. Data Models Cited Influence Most influential Artists Based on Betweenness Centrality Metallica => 566.06 Iron Maiden => 419.21 Corey Taylor => 146.0 Led Zeppelin => 122.73 Slipknot => 116.58 King Diamond => 94.7 Machine Head => 85.12 Rush => 70.41 Black Sabbath => 68.0 Van Halen => 54.56 Deep Purple => 53.5 Megadeth => 42.63 Guns N' Roses => 24.25 Heavy Metal Nirvana => 490.08 Muse => 114.5 Weezer => 97.33 Pixies => 94.17 Sonic Youth => 78.5 Rivers Cuomo => 69.5 Siouxsie and the Banshees => 51.67 The Smiths => 51.5 Jeff Buckley => 46.17 The Offspring => 43.0 Placebo => 42.0 My Chemical Romance => 34.0 The Smashing Pumpkins => 32.33 Alternative Rock Rush => 54.0 Marillion => 34.0 Pink Floyd => 33.0 Yes => 20.0 Porcupine Tree => 19.5 Dream Theater => 19.0 Chris Squire => 16.5 Primus => 15.0 Tool => 12.0 Mahavishnu Orchestra => 8.0 Geddy Lee => 7.0 Neil Peart => 5.0 Keith Emerson => 5.0 Progressive Rock
  • 30. Data Models Page Rank The Beatles => 0.00837723421839 Blind Lemon Jefferson => 0.00837369035189 Josh White => 0.00824945015047 Bessie Smith => 0.00717743996144 Louis Armstrong => 0.00692897940193 James P. Johnson => 0.00628676810257 Little Richard => 0.00584677302727 Muddy Waters => 0.005773172933 Tampa Red => 0.00572032424174 Robert Johnson => 0.00523579252974 Big Bill Broonzy => 0.00516075834679 Moon Mullican => 0.0050657751593 Black Sabbath => 0.00498789229732 Elvis Presley => 0.00497932058047 Duke Ellington => 0.00465800760107 Bo Diddley => 0.0044496675634 Jimmy Page => 0.00437658472459 Frank Zappa => 0.00431978608953 Miles Davis => 0.00396303890974 Jimi Hendrix => 0.00391117233916 Sister Rosetta Tharpe => 0.00390833570401 Bing Crosby => 0.00385435213525 Bob Dylan => 0.00358608821536 James Brown => 0.00349870931123
  • 32. Weighted graph Album Genres Krautrock Psychedelic Rock Experimental Rock 1 1 1
  • 33. Genre Affinity Indie Rock Shoegazing Alternative Rock Dream Pop 22 25 24 12 Post-rock
  • 34. Genre Affinity Gothic Metal Doom Metal Black Metal Heavy Metal 13 34 27 12 Stoner Metal
  • 36. Clustering in Networks u1 u2 u3 u4 u5 u1 1 0 0 1 u2 1 1 1 0 u3 0 1 0 1 u4 0 1 0 1 u5 1 0 1 1 Adjacency Matrix (Similarity Matrix) u1 u2 u3 u4 u5 u1 2 u2 3 u3 2 u4 2 u5 3 Degree Matrix 1 5 4 2 3
  • 37. Clustering in Networks u1 u2 u3 u4 u5 u1 2 u2 3 u3 2 u4 2 u5 3 Spectral Clustering: Using Eigenvectors of the Laplacian Matrix − u1 u2 u3 u4 u5 u1 1 0 0 1 u2 1 1 1 0 u3 0 1 0 1 u4 0 1 0 1 u5 1 0 1 1 = u1 u2 u3 u4 u5 u1 2 -1 0 0 -1 u2 -1 3 -1 -1 0 u3 0 -1 2 0 -1 u4 0 -1 0 2 -1 u5 -1 0 -1 -1 3 Degree Matrix Adjacency Matrix (Similarity Matrix) Laplacian Matrix
  • 38. Clustering in Networks Eigenvector: a vector (v) that by getting multiplied in matrix A does not result in changing its direction (similar to being multiplied by scalar λ) u1 u2 u3 u4 u5 -0.7 0.3 -0.2 -0.1 0.7 -0.7 0.3 -0.2 -0.1 0.7
  • 39. Spectral Clustering Codez from sklearn.cluster import spectral_clustering import numpy as np A = [[0.0 for x in n] for x in n] … # build adjacency matrix res = spectral_clustering(np.matrix(A), n_clusters) # res -> list of cluster indices e.g. [1,1,0,5,…]
  • 40. Spectral Clustering Results Folk Rock Country Rock Blues Folk Country Americana Roots Rock Blues Rock Southern Rock Power Metal Progressive Metal Symphonic Metal Black Metal Melodic Death Metal Groove Metal Nu Metal Thrash Metal Death Metal Metalcore Industrial Metal Gothic Metal Christian Metal Doom Metal Speed Metal Alternative Rock Indie Rock New Wave Synthpop Electronica Rock R&B Pop Pop Rock Funk Soul Heavy Metal Hard Rock Alternative Metal
  • 42. word2vec Model Skip-gram: a proximity-based probability model trained using Neural Networks (Deep Learning) Pink Floyd were an English rock band formed in London X XX
  • 45. Album Genre Model Fun Happy Saturday We Are Friends Electronic Frozen Blood In My Veins Redneck Dance Chaos and Mayhem Basement Dub Sentiment Analysis in text Predicting the genre based on name of the album
  • 46. Deep Learning Basics 1) Traditional Neural Networks with many layers 2) Often uses convolution as the node function 3) Training on Big Data can take weeks even on GPU 0) A method of supervised learning 4) Huge success attributed to improved training, powerful computation and above all Big Data 5) Pooling, Dropout and local connections important
  • 48. Deep Learning TensorFlow “Wish you were here” => [123, 101, 42, 1969 ] => [123, 101, 42, 1969, 0, 0, 0, … 0 ] Rock => [0, 0, 0, 1, 0, 0, 0, 0 ] => [[100000000000],[000000010000], … ]
  • 51. References •All pictures from wikipedia.org used under Creative Commons •Source of all data is from wikipedia.org collected online using a single call and then stored and processed •Efficient Estimation of Word Representations in Vector Space. Mikolov et. al. http://arxiv.org/abs/1301.3781 •Gensim's word2vec •networkx lib •word2vec blog post (500K docs): Five crazy abstractions my Deep Learning word2vec model just did •word2vec on Rock music blog: Daft Punk+Tool=Muse: word2vec model trained on a small Rock music corpus •code for word2vec on wiki data •Highcharts: highcharts •word2vec paper: PDF •Automatic real-time road marking recognition using a feature-driven approach PDF •Video of the road marking recognition: here and here and here •Future of Programming - Rise of the Scientific Programmer (and fall of the craftsman) •Deep Learning articles •code for Deep Learning genre analysis •…