Linked Data Marketplaces

(Linked) Data Marketplaces

Marin Dimitrov (Ontotext)

v0.6 / Mar 2011

Contents

• Introduction
• Data Marketplaces
– Factual, InfoChimps, Azure DataMarket, Freebase, Socrata,
Kasabi
– Data Market, Timetric, xIgnite
• Data Marketplaces for Linked Data

(Linked) Data Marketplaces Jan 2011 #2

INTRODUCTION


Definitions

• Data-as-a-Service (DaaS)
– “Like all members of the "as a Service" (XaaS) family, DaaS is based on
the concept that the product, data in this case, can be provided on
demand to the user regardless of geographic or organizational
separation of provider and consumer. Additionally, the emergence of
service-oriented architecture (SOA) has rendered the actual platform
on which the data resides also irrelevant” (Wikipedia)

• Data Marketplaces
– “Services that make it easy to find data from a range of secondary
data sources, then consume the data in a usable and unified format.
Several of these services are trying to create marketplaces for data,
envisioning that data providers can offer their data sets for sale to
data seekers” (DataMarket.com)


Data Marketplaces properties

• Proposed classification by Bauereiss & Fensel
1. Data domain
2. Population of content
3. Community management
4. Operating party
5. Pricing models
6. Data exchange
• Some additional differentiating characteristics
– Data model, Data size, Data export
– Branded marketplaces, SLA
– Query languages, Data tools

DATA MARKETPLACES


Factual

• www.factual.com / @factual


Factual (2)

• Data domain
– Travel, finance, sports, autos, movies, music, TV, books,
health, food, politics, education, science, arts, …
– High quality local data
• USA, Germany, France, Italy, UK, Japan, Switzerland, Australia, …
• Used by Facebook Places

• Data population
– Crawling the web
– Public data sources
– Community contributions
• Upload XLS/ODS, CSV

Factual (3)

• Data model
– tabular
– Taxonomy of 400 categories
• 13 Level 1 categories: Arts, Automotive, Business, Government, …

• Data size – 500,000 datasets
• Company info
– Factual Inc. (USA)
– $27M VC funding so far


Factual (4)

• Monetization model
– Pricing model not finalised yet (currently free)
– Pay-per-use pricing (per API call) with subscriptions
• Companies that contribute data will have a fee reduction

• Data access options
– REST API
• Read from table, Add/Write to table, Get schema info
– Web applications
• Read/write raw data from a web page (JavaScript)
• Web widgets for visualising, filtering and sorting data


Factual (5)

• Data tools
– AutoClipper – find tables on the web
– PageClipper – extract tabular data from a web page
– FactClipper – find individual facts (query templates)


InfoChimps

• www.infochimps.com / @infochimps


InfoChimps (2)

• Data domain
– All purpose
• Including data from Freebase, Wikipedia infoboxes, CKAN, Twitter,
Data.gov, Data.gov.uk, GeoNames, …

• Data population
– Public datasets
– User submitted datasets
• Data model is dataset specific
• 10,000+ datasets organised in 13 collections


InfoChimps (3)

• Company info
– InfoChimps (USA)
– $1.6M VC funding so far
– Acquired DataMarketplace in 12/2010
– Charge data sellers
• Data sellers choose the price & licensing of their data
• Charge for data storage
• 30% commission for InfoChimps on each sale


InfoChimps (4)

• Monetization model (2)
– Charge data buyers
• Baboon – free, 100K API calls / mo
• Brass Monkey – $20/mo, 500K API calls / mo
• Silverback – $250/mo, 2M API calls / mo
• Golden Ape – $4,000/mo, 15M API calls / mo

• Data access options
– REST API
• api.infochimps.com/DATASET/METHOD.json?PARAM=VALUE
– YQL tables


Azure DataMarket

• https://datamarket.azure.com


Azure DataMarket (2)

• Data domain
– All purpose, incl. Data.gov, UN data, Wolfram|Alpha, ESRI
• Data population
– Data publishers (need prior approval)
• Data can be stored on SQL Azure, Azure Storage or 3rd party clouds
(via Data Access Layers)

• Data model
– Depends on the dataset and the storage, but always
presented as OData to consumers
• Data size – 90 datasets



(c) Microsoft



• Company info
– Microsoft
– Subscription for data buyers (limited/unlimited API calls)
• Access options
– OData (feeds, queries, updates)
• Data tools
– Service Explorer
– Excel add-in (find, purchase, consume data)
– Integration with SQL Server Reporting Services /
Integration Services

DataMarket

• www.datamarket.com / @datamarket


DataMarket (2)

• Data domain
– Statistical data from 2,000 providers, incl. UN, Eurostat,
World Bank, US agencies, BP, FIFA, …
• Data population
– Data aggregation (2,000 data providers)
• Data size
– 13K datasets, 100M time series, 600M facts
• Company info
– DataMarket (Iceland)


DataMarket (3)

– Charge data sellers
• Free datasets – $249/mo; Paid datasets – 25% commission;
Branded datasets – $699/mo + commission
– Charge data buyers
• Free – 50 API calls/mo; $99 – 500 API calls/mo; $299 – 10K API
calls/mo; $799 – 100K API calls/mo

• Data access
– REST API


Socrata

• www.socrata.com / @socrata


Socrata (2)

• Data domain
– Business, education, government data
• Data population
– Uploads from data publishers
• Data size
– 13K datasets
• Data model
– tabular


Socrata (3)

• Company info
– Socrata (USA)
– Charge data buyers (“Plans starting at $499 per month”)
• Basic – 100K API calls/mo + 50GB traffic; Plus – 250K API calls/mo
+ 250GB traffic; Premium – 1M API calls/mo + 1.2TB traffic;
Ultimate – 10M API calls/mo + 5TB traffic

• Data access
– REST API (Socrata Open Data API)
– Data export (XLS, CSV, RDF, XML)
– RSS updates

Kasabi

• www.kasabi.com / @TeamKasabi


Kasabi (2)

• Data domain
– All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …
• Data population
– Public datasets
– User submitted datasets
• Data size
– 55 datasets
• Data model
– RDF


Kasabi (3)

• Company info
– Talis (UK)
– Charge data consumers
– Data hosting is free
• Data access
– SPARQL / Linked Data endpoint
– REST API
– Additional APIs
– PHP & Ruby client libraries


Freebase

• www.freebase.com / @fbase


Freebase (2)

• Data domain
– General purpose
• Data model
– Graph (RDF dumps available)
• Data population
– Community curated data (licensed as CC-BY)
– Import of public data sources (Wikipedia, MusicBrainz,
WordNet, LoC, …)
• Data size
– 20M entities

Freebase (3)

• Company info
– Metaweb (USA), now Google
– Free for 100K read API calls per day (10K write)
– Paid for higher volumes
• Data access
– REST API
– Linked Data endpoint (http://rdf.freebase.com)
– Triple uploader / RDF dumps
– Acre (application hosting platform)


Freebase (4)

• Data tools
– Web based – schema editor, review queue, viewers, …
– GridWorks (Google Refine)
• Exploring, data cleaning, transformation of tabular data
• Map data to Freebase schema & RDF export (3rd party extension)
– Acre
• Application hosting platform
– User contributed JavaScript code (converted to Java with Rhino)
• Access & store data directly into Freebase


timetric

• www.timetric.com / @timetric


timetric (2)

• Data domain
– Economic data
• Data population
– aggregate data from the world's leading sources of
economic data (World Bank, Eurostat, …)
– User uploaded data
• Data size
– 2.5M public statistics


timetric (3)

• Company info
– Timetric Ltd. (UK)
– Free public datasets
– Paid exclusive datasets
• Data access
– REST API


xIgnite

• www.xignite.com


xIgnite (2)

• Data domain
– Financial data
• Data population
– aggregate data from leading sources (Dow Jones, Thomson
Reuters, stock exchanges, …)
– Public datasets (national banks, SEC, Federal Reserve, …)
– User uploaded data
• Company info
– Xignite (USA)


xIgnite (3)

– Paid subscriptions
• Data access
– Web services (REST/SOAP)


Coming soon…

• BuzzData
– www.buzzdata.com / @buzzdata
– Company: BuzzData


Data marketplaces – features summary

• Data
– Data model, domain, export options
• Monetization
– Charge buyers/ sellers
– free API calls
– branded marketplaces & Service Level Agreement
• For developers
– REST API; query language
– Tools for data management / integration
– Application hosting


Feature matrix

DataMarket

DataMarket
InfoChimps

Freebase

timetric
Socrata
Factual

xIgnite
Kasabi
Azure
Data from all domains + + + - + + + - -
Data model tabular various various ? tabular RDF graph ? ?
DATA

Data export - - + - + ? + - -
RDF export - - - - + + + - -
Charge buyers + +/- + +/- + + +/- +/- +
MONETIZATION

Charge sellers ? + - + - ? - ? ?
Free API calls (month) ? 100K ? 50 - ? 3M ? -
Branded marketplaces - - + + + ? - - -
Service Level guarantee ? - - - - ? - - -
REST API + + + + + + + + +
Query language + - + - - + + - -
TOOLS

Tools + - + - - + + - -
App hosting - - + - - ? + - -

LINKED DATA + MARKETPLACES


Linked Data cloud (Sep 2010)

(c) R. Cyganiak and A. Jentzsch


Benefits of Linked Data for Data Marketplaces

• Unified data representation model (RDF)
– Easy consumption of the data
• Global identifiers for all objects (URI)
– Makes incremental data integration & federation easier
• Interlinked datasets
– New data added to the marketplace can be integrated
with existing data
– Network effects
• Data marketplace interoperability
– Data from different marketplaces can be easily integrated


Benefits of Linked Data for Data Marketplaces (2)

• Derived knowledge / facts
– RDF inference of additional implicit facts
– (see FactForge and LinkedLifeData)
• Rich queries
– SPARQL offers unmatched query expressivity
• Easy import of existing LOD datasets
– Linked Open Data cloud already includes 200+ datasets
with 20+ billion RDF triples


Linked Data for marketplaces – challenges

• Quality of data
– Different (public) datasets may come with inconsistent or
controversial data
– Quality more important than quantity
• Large scale data integration
– Ontology (schema) mapping of different datasets &
vocabularies
• Licensing
– Some datasets come with “CC-BY-NC” or unclear licensing
• Billing
– API calls / SPARQL queries with varying computational
cost (Linked) Data Marketplaces Jan 2011 #46

Linked Data for marketplaces – challenges (2)

• Billing
– API calls / SPARQL queries with varying computational
cost
• Operations
– Service Level guarantees
– Availability & scalability challenges
• Most Linked Data endpoints at present are neither scalable, nor
available


LinkedLifeData & FactForge

FactForge

LinkedLifeData
(c) R. Cyganiak and A. Jentzsch


LinkedLifeData & FactForge

• FactForge
– Integrates some of the most central LOD datasets
– General-purpose information (not specific to a domain)
– 1.2 billion explicit and 1 billion inferred statements
– The largest upper-level knowledge base
– http://www.FactForge.net
• Linked Life Data
– 25 of the most popular life-science datasets
– 2.7 billion explicit and 1.4 billion inferred statements
– http://www.LinkedLifeData.com


Strategic questions

• Monetization strategy
– which (linked) datasets can be monetized
– Charge buyers / charge sellers / free quota
– Branded marketplaces
• Community building
– Crowdsource the data curation to the community
– How to provide incentives to data curators?


Strategic questions (2)

• Operations
– How to ensure Service Level guarantees?
– How to deal with licensing issues?
– Account management, metering, billing
• Platform
– RDF database – data volume, query volume
– ETL tools
– Curation tools
– Data export & consumption


Data monetization with WebServius

(c) WebServius

• Benefits
– user management, quotas & restrictions
– Metering, pricing, billing
– Security, scalability, SLAs


Q&A

Questions?
@ontotext


Linked Data Marketplaces

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Viewers also liked

Viewers also liked (17)

Similar to Linked Data Marketplaces

Similar to Linked Data Marketplaces (20)

More from Marin Dimitrov

More from Marin Dimitrov (20)

Recently uploaded

Recently uploaded (20)

Linked Data Marketplaces