SlideShare a Scribd company logo
1 of 17
Download to read offline
musiclopedia
♯ ♮ ♭ ♬ ♫ ♪ ♩
♩ ♪ ♫ ♬ ♭ ♮ ♯
discover the world of music
motivation
demo
(artists,dates)(1.4 TB, 240 million records)
pipeline
data sources
MusicBrainz
(artists,dates)
pipeline
Data Store & FrontEndStorage & Batch processing
data sources
MusicBrainz
(1.4 TB, 240 million records)
(artists,dates)
clusters
data sources
MusicBrainz
(1.4 TB, 240 million records)
HDFS datanode
Spark executor
HDFS datanode
Spark executor
HDFS datanode
Spark executor
HDFS namenode
Spark driver
Flask
server
OrientDB
master
OrientDB
master
4 x m4.large (8GB RAM ea. & 6TB SSD total) 3 x m4.large (32 GB SSD total)
data flow
content
header
WARC/1.0
WARC-Type: conversion
WARC-Target-URI: http://www.biography.com/people/ella-fitzgerald-9296210
WARC-Date: 2014-08-02T09:52:13Z
WARC-Record-ID:
WARC-Refers-To:
WARC-Block-Digest: sha1:JROHLCS5SKMBR6XY46WXREW7RXM64EJC
Content-Type: text/plain
Content-Length: 6724
Ella Fitzgerald, known as the "First Lady of Song" and "Lady Ella," was
an American jazz and song vocalist who interpreted much of the Great
American Songbook...
data flow
www.biography.com/people/ella-fitzgerald-9296210, Ella Fitzgerald
www.oldies.com/product-view/47037M.html, Louis Armstrong
bojack.org/2007/06/knock_a_few_bucks_off.html, John Coltrane
WARC/1.0
WARC-Type: conversion
WARC-Target-URI: http://www.biography.com/people/ella-fitzgerald-9296210
WARC-Date: 2014-08-02T09:52:13Z
WARC-Record-ID:
WARC-Refers-To:
WARC-Block-Digest: sha1:JROHLCS5SKMBR6XY46WXREW7RXM64EJC
Content-Type: text/plain
Content-Length: 6724
Ella Fitzgerald, known as the "First Lady of Song" and "Lady Ella," was
an American jazz and song vocalist who interpreted much of the Great
American Songbook...
data flow
challenges
- How to find the bands:
Air, The Clash, Chicago?
~1,4 TB, 274M websites,
1000 artists
- Norah Jones vs Miles
Davis?
challenges
- How to find the bands:
Air, The Clash, Chicago?
~1,4 TB, 274M websites,
1000 artists
- Norah Jones vs Miles
Davis?
challenges
- How to find the bands:
Air, The Clash, Chicago?
~1,4 TB, 274M websites,
1000 artists
- Norah Jones vs Miles
Davis?
about me
B.Sc. EE,
Universidad de
Chile
M.Mus. Music
Technology,
NYU
Artist catalog:
-MusicBrainz databaste (~1,000,000 entries)
→Jazz subset (1,000 entries)
Artist relationship metric:
-CommonCrawl July 2015 log (~145 TB)
→ Uncompressed '.wet' files (~1.5 TB)
data specs
John
Coltrane
W1
W10
W6 W5
Norah
Jones
W2
W3
W4
Miles
Davis
W9
W8
W7
W12
W13
W5
Miles John Norah Total
Miles 5 2 2 9
John 2 4 1 7
Norah 2 1 9 12
model
Miles John Norah Total
Miles 5 2 2 9
John 2 4 1 7
Norah 2 1 9 12
model
Avgerage links between any two artists “X” = (2+2+1)/3 = 1.667
Avgerage links for a single artist “Y”= (9+7+12)/3 = 9.333
=> Average percentage “Z” = X/Y = 17.8 %
bool areConnected(artist A, artist B){
aCountsInB = countLinks(A,B) / countLinks (B)
bCountsInA = countLinks(A,B) / countLinks (A)
if mean(aCountsInB, bCountsInA) > C *Z
return true
return false
}

More Related Content

What's hot

Rdf Overview Presentation
Rdf Overview PresentationRdf Overview Presentation
Rdf Overview PresentationKen Varnum
 
Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Richard Urban
 
Vocabulary Development for Local Use: A DIY Introduction
Vocabulary Development for Local Use: A DIY IntroductionVocabulary Development for Local Use: A DIY Introduction
Vocabulary Development for Local Use: A DIY IntroductionDiane Hillmann
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding FormJakob .
 
Wikidata for libraries and archives
Wikidata for libraries and archivesWikidata for libraries and archives
Wikidata for libraries and archives_Emw
 
Up and running with Wikidata
Up and running with WikidataUp and running with Wikidata
Up and running with Wikidata_Emw
 
An Ambitious Wikidata Tutorial
An Ambitious Wikidata TutorialAn Ambitious Wikidata Tutorial
An Ambitious Wikidata Tutorial_Emw
 
#sod14 - ok, è un endpoint SPARQL non facciamoci prendere dal panico
#sod14 - ok, è un endpoint SPARQL non facciamoci prendere dal panico#sod14 - ok, è un endpoint SPARQL non facciamoci prendere dal panico
#sod14 - ok, è un endpoint SPARQL non facciamoci prendere dal panicoDiego Valerio Camarda
 
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson Victor de Boer
 
RDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARCRDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARCDiane Hillmann
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationMuhammad Saleem
 
18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani
18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani
18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italianiDiego Valerio Camarda
 
Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...
Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...
Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...Fariz Darari
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." Avalon Media System
 
NdFluents: An Ontology for Annotated Statements with Inference Preservation
NdFluents: An Ontology for Annotated Statements with Inference PreservationNdFluents: An Ontology for Annotated Statements with Inference Preservation
NdFluents: An Ontology for Annotated Statements with Inference PreservationJosé M. Giménez-García
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataMuhammad Saleem
 

What's hot (20)

Rdf Overview Presentation
Rdf Overview PresentationRdf Overview Presentation
Rdf Overview Presentation
 
Rdf
RdfRdf
Rdf
 
Getting triples from records: the role of ISBD
Getting triples from records: the role of ISBDGetting triples from records: the role of ISBD
Getting triples from records: the role of ISBD
 
Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2Publishing and Using Linked Open Data - Day 2
Publishing and Using Linked Open Data - Day 2
 
Memento 101
Memento 101Memento 101
Memento 101
 
Vocabulary Development for Local Use: A DIY Introduction
Vocabulary Development for Local Use: A DIY IntroductionVocabulary Development for Local Use: A DIY Introduction
Vocabulary Development for Local Use: A DIY Introduction
 
Another RDF Encoding Form
Another RDF Encoding FormAnother RDF Encoding Form
Another RDF Encoding Form
 
Wikidata for libraries and archives
Wikidata for libraries and archivesWikidata for libraries and archives
Wikidata for libraries and archives
 
Up and running with Wikidata
Up and running with WikidataUp and running with Wikidata
Up and running with Wikidata
 
An Ambitious Wikidata Tutorial
An Ambitious Wikidata TutorialAn Ambitious Wikidata Tutorial
An Ambitious Wikidata Tutorial
 
#sod14 - ok, è un endpoint SPARQL non facciamoci prendere dal panico
#sod14 - ok, è un endpoint SPARQL non facciamoci prendere dal panico#sod14 - ok, è un endpoint SPARQL non facciamoci prendere dal panico
#sod14 - ok, è un endpoint SPARQL non facciamoci prendere dal panico
 
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
 
RDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARCRDA: Alive and Well and Still Speaking MARC
RDA: Alive and Well and Still Speaking MARC
 
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint FederationHiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation
 
18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani
18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani
18 ° Nexa Lunch Seminar - Lo stato dell'arte dei Linked Open Data italiani
 
Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...
Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...
Poster - Completeness Statements about RDF Data Sources and Their Use for Qu...
 
DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World." DLF 2015 Presentation, "RDF in the Real World."
DLF 2015 Presentation, "RDF in the Real World."
 
NdFluents: An Ontology for Annotated Statements with Inference Preservation
NdFluents: An Ontology for Annotated Statements with Inference PreservationNdFluents: An Ontology for Annotated Statements with Inference Preservation
NdFluents: An Ontology for Annotated Statements with Inference Preservation
 
Fedora Migration Considerations
Fedora Migration ConsiderationsFedora Migration Considerations
Fedora Migration Considerations
 
Federated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of DataFederated SPARQL query processing over the Web of Data
Federated SPARQL query processing over the Web of Data
 

Viewers also liked

Take Back Your Energy Dollars
Take Back Your Energy DollarsTake Back Your Energy Dollars
Take Back Your Energy DollarsDerek Satnik
 
Політичні партії у теоретичній спадщині ­Дж. Вашингтона
Політичні партії у теоретичній спадщині ­Дж. ВашингтонаПолітичні партії у теоретичній спадщині ­Дж. Вашингтона
Політичні партії у теоретичній спадщині ­Дж. ВашингтонаMaria Zhyhil
 
Parque de atracciones mundo extremo
Parque de atracciones mundo extremoParque de atracciones mundo extremo
Parque de atracciones mundo extremoinsideshare
 
Sustainability Meets Reality - 70 Acres Net-Zero in London, ON
Sustainability Meets Reality - 70 Acres Net-Zero in London, ONSustainability Meets Reality - 70 Acres Net-Zero in London, ON
Sustainability Meets Reality - 70 Acres Net-Zero in London, ONDerek Satnik
 
Parque de atracciones mundo extremo
Parque de atracciones mundo extremoParque de atracciones mundo extremo
Parque de atracciones mundo extremoinsideshare
 
Trends in Green Kitchens and Green Buildings
Trends in Green Kitchens and Green BuildingsTrends in Green Kitchens and Green Buildings
Trends in Green Kitchens and Green BuildingsDerek Satnik
 
Green Building Policy & Programs
Green Building Policy & ProgramsGreen Building Policy & Programs
Green Building Policy & ProgramsDerek Satnik
 
Asif Shahab CSCM - Resume
Asif Shahab CSCM - ResumeAsif Shahab CSCM - Resume
Asif Shahab CSCM - ResumeAsif Shahab
 
CaGBC Toronto: Partners In Project Green
CaGBC Toronto: Partners In Project GreenCaGBC Toronto: Partners In Project Green
CaGBC Toronto: Partners In Project GreenDerek Satnik
 
Visita TERMAS Romanas Campo Valdés
Visita TERMAS Romanas Campo ValdésVisita TERMAS Romanas Campo Valdés
Visita TERMAS Romanas Campo Valdésretruyes
 
Prakarya dan kewirausahaan: Potensi Bahan Baku Limbah
Prakarya dan kewirausahaan: Potensi Bahan Baku LimbahPrakarya dan kewirausahaan: Potensi Bahan Baku Limbah
Prakarya dan kewirausahaan: Potensi Bahan Baku LimbahRara Hanifatuzzahra
 
Ulangan uts gas buang smk tsm tkr
Ulangan uts gas buang smk tsm tkrUlangan uts gas buang smk tsm tkr
Ulangan uts gas buang smk tsm tkrTan Malaka
 
Cuidados no paciente asmático e abordagem do broncoespasmo
Cuidados no paciente asmático e abordagem do broncoespasmoCuidados no paciente asmático e abordagem do broncoespasmo
Cuidados no paciente asmático e abordagem do broncoespasmoFabricio Mendonca
 

Viewers also liked (20)

Take Back Your Energy Dollars
Take Back Your Energy DollarsTake Back Your Energy Dollars
Take Back Your Energy Dollars
 
Evento texts
Evento textsEvento texts
Evento texts
 
Політичні партії у теоретичній спадщині ­Дж. Вашингтона
Політичні партії у теоретичній спадщині ­Дж. ВашингтонаПолітичні партії у теоретичній спадщині ­Дж. Вашингтона
Політичні партії у теоретичній спадщині ­Дж. Вашингтона
 
Trabajo de redes kjvg
Trabajo de redes kjvgTrabajo de redes kjvg
Trabajo de redes kjvg
 
Parque de atracciones mundo extremo
Parque de atracciones mundo extremoParque de atracciones mundo extremo
Parque de atracciones mundo extremo
 
ICSTE 2015
ICSTE 2015ICSTE 2015
ICSTE 2015
 
Sustainability Meets Reality - 70 Acres Net-Zero in London, ON
Sustainability Meets Reality - 70 Acres Net-Zero in London, ONSustainability Meets Reality - 70 Acres Net-Zero in London, ON
Sustainability Meets Reality - 70 Acres Net-Zero in London, ON
 
História dos Avivamentos
História dos AvivamentosHistória dos Avivamentos
História dos Avivamentos
 
Parque de atracciones mundo extremo
Parque de atracciones mundo extremoParque de atracciones mundo extremo
Parque de atracciones mundo extremo
 
Trends in Green Kitchens and Green Buildings
Trends in Green Kitchens and Green BuildingsTrends in Green Kitchens and Green Buildings
Trends in Green Kitchens and Green Buildings
 
Green Building Policy & Programs
Green Building Policy & ProgramsGreen Building Policy & Programs
Green Building Policy & Programs
 
Asif Shahab CSCM - Resume
Asif Shahab CSCM - ResumeAsif Shahab CSCM - Resume
Asif Shahab CSCM - Resume
 
CaGBC Toronto: Partners In Project Green
CaGBC Toronto: Partners In Project GreenCaGBC Toronto: Partners In Project Green
CaGBC Toronto: Partners In Project Green
 
Visita TERMAS Romanas Campo Valdés
Visita TERMAS Romanas Campo ValdésVisita TERMAS Romanas Campo Valdés
Visita TERMAS Romanas Campo Valdés
 
Lk tugas kelompok
Lk tugas kelompokLk tugas kelompok
Lk tugas kelompok
 
Prakarya dan kewirausahaan: Potensi Bahan Baku Limbah
Prakarya dan kewirausahaan: Potensi Bahan Baku LimbahPrakarya dan kewirausahaan: Potensi Bahan Baku Limbah
Prakarya dan kewirausahaan: Potensi Bahan Baku Limbah
 
Edad Media
 Edad Media Edad Media
Edad Media
 
Edad media
Edad mediaEdad media
Edad media
 
Ulangan uts gas buang smk tsm tkr
Ulangan uts gas buang smk tsm tkrUlangan uts gas buang smk tsm tkr
Ulangan uts gas buang smk tsm tkr
 
Cuidados no paciente asmático e abordagem do broncoespasmo
Cuidados no paciente asmático e abordagem do broncoespasmoCuidados no paciente asmático e abordagem do broncoespasmo
Cuidados no paciente asmático e abordagem do broncoespasmo
 

Recently uploaded

Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

musiclopedia presentation

  • 1. musiclopedia ♯ ♮ ♭ ♬ ♫ ♪ ♩ ♩ ♪ ♫ ♬ ♭ ♮ ♯ discover the world of music
  • 4. (artists,dates)(1.4 TB, 240 million records) pipeline data sources MusicBrainz
  • 5. (artists,dates) pipeline Data Store & FrontEndStorage & Batch processing data sources MusicBrainz (1.4 TB, 240 million records)
  • 6. (artists,dates) clusters data sources MusicBrainz (1.4 TB, 240 million records) HDFS datanode Spark executor HDFS datanode Spark executor HDFS datanode Spark executor HDFS namenode Spark driver Flask server OrientDB master OrientDB master 4 x m4.large (8GB RAM ea. & 6TB SSD total) 3 x m4.large (32 GB SSD total)
  • 7. data flow content header WARC/1.0 WARC-Type: conversion WARC-Target-URI: http://www.biography.com/people/ella-fitzgerald-9296210 WARC-Date: 2014-08-02T09:52:13Z WARC-Record-ID: WARC-Refers-To: WARC-Block-Digest: sha1:JROHLCS5SKMBR6XY46WXREW7RXM64EJC Content-Type: text/plain Content-Length: 6724 Ella Fitzgerald, known as the "First Lady of Song" and "Lady Ella," was an American jazz and song vocalist who interpreted much of the Great American Songbook...
  • 8. data flow www.biography.com/people/ella-fitzgerald-9296210, Ella Fitzgerald www.oldies.com/product-view/47037M.html, Louis Armstrong bojack.org/2007/06/knock_a_few_bucks_off.html, John Coltrane WARC/1.0 WARC-Type: conversion WARC-Target-URI: http://www.biography.com/people/ella-fitzgerald-9296210 WARC-Date: 2014-08-02T09:52:13Z WARC-Record-ID: WARC-Refers-To: WARC-Block-Digest: sha1:JROHLCS5SKMBR6XY46WXREW7RXM64EJC Content-Type: text/plain Content-Length: 6724 Ella Fitzgerald, known as the "First Lady of Song" and "Lady Ella," was an American jazz and song vocalist who interpreted much of the Great American Songbook...
  • 10. challenges - How to find the bands: Air, The Clash, Chicago? ~1,4 TB, 274M websites, 1000 artists - Norah Jones vs Miles Davis?
  • 11. challenges - How to find the bands: Air, The Clash, Chicago? ~1,4 TB, 274M websites, 1000 artists - Norah Jones vs Miles Davis?
  • 12. challenges - How to find the bands: Air, The Clash, Chicago? ~1,4 TB, 274M websites, 1000 artists - Norah Jones vs Miles Davis?
  • 13. about me B.Sc. EE, Universidad de Chile M.Mus. Music Technology, NYU
  • 14.
  • 15. Artist catalog: -MusicBrainz databaste (~1,000,000 entries) →Jazz subset (1,000 entries) Artist relationship metric: -CommonCrawl July 2015 log (~145 TB) → Uncompressed '.wet' files (~1.5 TB) data specs
  • 16. John Coltrane W1 W10 W6 W5 Norah Jones W2 W3 W4 Miles Davis W9 W8 W7 W12 W13 W5 Miles John Norah Total Miles 5 2 2 9 John 2 4 1 7 Norah 2 1 9 12 model
  • 17. Miles John Norah Total Miles 5 2 2 9 John 2 4 1 7 Norah 2 1 9 12 model Avgerage links between any two artists “X” = (2+2+1)/3 = 1.667 Avgerage links for a single artist “Y”= (9+7+12)/3 = 9.333 => Average percentage “Z” = X/Y = 17.8 % bool areConnected(artist A, artist B){ aCountsInB = countLinks(A,B) / countLinks (B) bCountsInA = countLinks(A,B) / countLinks (A) if mean(aCountsInB, bCountsInA) > C *Z return true return false }