seevl: Data-driven music discovery

seevl: Data-driven music discovery
Alexandre Passant, co-founder, CEO, MDG Web ltd
http://seevl.net // @seevl // alex@seevl.net // @terraces

LA SemWeb & WebSpeed Meet-up, 2 October 2012
Cross Campus, Santa Monica

• Knowledge Engineering
• Social Web & Enterprise 2.0
• Sensor Networks & Real-Time

dbpedia:Bad_Brains dbpedia:Hardcore_Punk

p:associatedActs p:genre p:genre

:alex foaf:topic_interest dbpedia:Beastie_Boys dbpedia:Black_Flag_(band)

p:currentMembers

dbpedia:Adam_Yauch dbpedia:B._B._King

skos:subject skos:subject

dbpedia:Category:American_vegatarians

Our approach: SLADE

• Semantic LAyer for Data Exploration
• A framework to build data-driven apps
• ETL from existing sources / APIs
• Search, discovery, recommendations
• Data access / API
• Generic, conﬁg-based, domain-agnostic

The pipeline

Data-extraction
and
interlinking

Entity-centric semantic knowledge base
Web data sources (artists, genres, labels, locations...)

Storage

REST-ful interface

Search, discovery and recommendation
seevl products engine, on-top of our graph-database

Challenges
• Some technical challenges faced when building
SLADE and seevl.net
• Data models: Chosing the right schemas
• Data access: SPARQL or API or ... ?
• Scalability: Caching and optimisation strategies
• User Experience: User-centric design

RDF since day one
• RDF ?
• Agile model (ideal when iterating)
• Intuitive aspect of graph modelling
• Standard toolkits (SPARQL / HTTP)
• OWL? RDFS?
• Minor use of inference (type, hierarchies)

Artist data
• Music Ontology
• Label, Genres, Inﬂuences,Origins ...
• Collaborations between artists
• Activity period (add-on)
• Additional models/mappings
• e.g. Bio Vocabulary (birth/death), FOAF...

Social activities
• SIOC & SIOC-actions
• Social graph / sub-graph
• Action-centric activities (like, listen)
• Inferring user’s taste proﬁle
• Top artist, genres, labels
• Using latest actions

Similarity / Recsys
• Graph-based similarities
• Data-driven recommendations
• Ranking using weight-factors
• Explanations / tracking
• The Similarity Ontology
• Domain-agnostic

Provenance
• Keep trace of every statement in the ETL
• Origin, type and time of extraction
• With a low number of additional triples
• Introducing “data-slices”
• Multiple slices (=subgraphs) per resource
• Quick updates (DELETE / INSERT)

Provenance and graphs
GRAPH svl:seevl_id/wikipedia/facts/extract
{
svl: seevl_id mo:genre svl:BntvuZAy .
svl:seevl_id/wikipedia/extract dc:created
“2012-10-25” ; rdfs:seeAlso
wikipedia:Social_Distortion .
}

SPARQL
• Pros
• W3C Standard, Powerful
• HTTP-based w/ SPARQL Protocol
• SPARQL Update in 1.1
• Cons
• Learning curve for non-RDF people

URI patterns + JSON-LD
• Pre-deﬁned URIs mapped to SPARQL
query patterns, returning JSON-LD data
• Search queries or resources description
• Content-negotiation or ?_format=json
• GET and POST
• POST => SPARQL UPDATE
• GET => SPARQL SELECT / ASK

JSON-LD

• JSON for Linking Data
• The best of both worlds
• JSON serialization, works with any parser
• Additional semantics (URIs, typed links,
etc.) with JSON-LD parsers
• Use of context/mappings to avoid URIs

Search

• /entity/?property=value
• JSON-LD mappings used in URI templates
• Works with literals, dates, resources
• Ranking algorithm / alpha-ranking
• Patterns defined in a single config file

Search (text)
• /entity/?
prefLabel=clash&type=artist&_sort=count_desc
• Translated into
SELECT ?x WHERE {
?x a mo:artist ; skos:prefLabel ?x .
?x bif:contains “clash” .
}

Search (relations)
• /entity/?genre=BntvuZAy&type=artist
• Translated into
SELECT ?x WHERE {
?x a mo:artist ; mo:genre svl:BntvuZAy .
}

Resource description
• Patterns mapped to resource URI to
retrieve subset of the resource description
• /entity/seevl_id/infos
• /entity/seevl_id/facts
• /entity/seevl_id/links
• /entity/seevl_id/related(/related_id)

Is SPARQL fast enough?
• SPARQL is very powerful, but can be slow
• Some simple queries may lead to deep
graph patterns or transversal queries
depending on the modelling
• FILTERS (e.g. text and date based queries)
are expensive
• Not all triple-stores are equal

Splitting queries
• “List all resource sharing common
property-values with the current one,
whatever that property is”
• Fits in a single SPARQL query
• Doesn’t properly scale
• Becoming faster when splitting the query
and recomposing results via internal scripts

SPARQL: splitting queries
Direct SPARQL Property-slicing Complete-slicing
Queries Time Queries Time Queries Time
Ramones 1 139.97 20 109.51 66 37.84
Johnny Cash 1 257.81 30 152.60 135 75.35
U2 1 155.53 22 122.91 70 44.03
The Clash 1 146.43 20 110.84 79 42.61
Bad Religion 1 104.08 23 86.49 97 47.35
The Aggrolites 1 145.92 13 114.52 28 28.33
Janis Joplin 1 230.88 27 151.00 98 62.81

SPARQL + Redis
• Started by using Memcache to store query
results (e.g. “?x genre $y”)
• Good, but costly for the ﬁrst user
• Then, materialising results in-memory using
Redis as a key-value cache system
• Low indexing time (few minute on laptop)
• Increasing query-performance, real-time

SPARQL + Redis

• Redis
• HSET to deﬁne entities (minimal data)
• ZADD to store ordered sets of key-
values, with our own ranking scheme
• ZRANGE to retreive w/ correct order
• Everything in memory, instant query results

SPARQL + Redis
self.redis.hset(entity, 'uri', uri)
self.redis.hset(entity, 'prefLabel', prefLabel)
self.redis.hset(entity, 'description', description)
self.redis.zadd(‘genre:BntvuZAy’, entity, score)
...
self.redis.zrange(pattern, min, max, 'withscores')

User-experience
• Interfaces for graph-based/semantic data
• Don’t need to be ugly!
• As long as they’re built for users ﬁrst
• Focus on vertical-UX, rather than SemWeb-UX
• Check best practices in the domain
• Involve HCI / non-SemWeb people

Lessons learnt
• Don’t reinvent the wheel, check existing
stacks and use what ﬁts for the job
• Make it simple for your developers, using
REST-ful interfaces and design patterns
• Accept compromises, be pragmatic
• This of users / create persona who are not
SemWeb-geeks when designing the UX

Questions?
http://seevl.net // @seevl
alex@seevl.net // @terraces

seevl: Data-driven music discovery

More Related Content

What's hot

Similar to seevl: Data-driven music discovery

More from Alexandre Passant

Recently uploaded

seevl: Data-driven music discovery

Editor's Notes