6. (artists,dates)
clusters
data sources
MusicBrainz
(1.4 TB, 240 million records)
HDFS datanode
Spark executor
HDFS datanode
Spark executor
HDFS datanode
Spark executor
HDFS namenode
Spark driver
Flask
server
OrientDB
master
OrientDB
master
4 x m4.large (8GB RAM ea. & 6TB SSD total) 3 x m4.large (32 GB SSD total)
7. data flow
content
header
WARC/1.0
WARC-Type: conversion
WARC-Target-URI: http://www.biography.com/people/ella-fitzgerald-9296210
WARC-Date: 2014-08-02T09:52:13Z
WARC-Record-ID:
WARC-Refers-To:
WARC-Block-Digest: sha1:JROHLCS5SKMBR6XY46WXREW7RXM64EJC
Content-Type: text/plain
Content-Length: 6724
Ella Fitzgerald, known as the "First Lady of Song" and "Lady Ella," was
an American jazz and song vocalist who interpreted much of the Great
American Songbook...
8. data flow
www.biography.com/people/ella-fitzgerald-9296210, Ella Fitzgerald
www.oldies.com/product-view/47037M.html, Louis Armstrong
bojack.org/2007/06/knock_a_few_bucks_off.html, John Coltrane
WARC/1.0
WARC-Type: conversion
WARC-Target-URI: http://www.biography.com/people/ella-fitzgerald-9296210
WARC-Date: 2014-08-02T09:52:13Z
WARC-Record-ID:
WARC-Refers-To:
WARC-Block-Digest: sha1:JROHLCS5SKMBR6XY46WXREW7RXM64EJC
Content-Type: text/plain
Content-Length: 6724
Ella Fitzgerald, known as the "First Lady of Song" and "Lady Ella," was
an American jazz and song vocalist who interpreted much of the Great
American Songbook...
17. Miles John Norah Total
Miles 5 2 2 9
John 2 4 1 7
Norah 2 1 9 12
model
Avgerage links between any two artists “X” = (2+2+1)/3 = 1.667
Avgerage links for a single artist “Y”= (9+7+12)/3 = 9.333
=> Average percentage “Z” = X/Y = 17.8 %
bool areConnected(artist A, artist B){
aCountsInB = countLinks(A,B) / countLinks (B)
bCountsInA = countLinks(A,B) / countLinks (A)
if mean(aCountsInB, bCountsInA) > C *Z
return true
return false
}