This document discusses data marketplaces and the potential benefits of linked data for data marketplaces. It provides an overview of several existing data marketplaces including Factual, InfoChimps, Azure DataMarket, Freebase, Socrata, and Kasabi. These marketplaces vary in their data domains, models, sizes, monetization approaches, and tools for data access. The document also outlines benefits of the semantic web and linked data for data marketplaces, such as unified data representation, global identifiers, interlinked datasets, and easy integration of existing linked open data. However, challenges include ensuring data quality and performing large-scale data integration across different schemas.
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
La nascita dei data lake - La aziende, ormai, sono sommerse dai dati e il classico datawarehouse fa fatica a macinare questi dati per numerosità e varietà. In molti hanno iniziato a guardare a delle architetture chiamate Data Lakes con Hadoop come tecnologia di riferimento. Ma questa soluzione va bene per tutto? Vieni a capire come operazionalizzare i data lakes per creare delle moderne architetture di gestione dati.
DataGraft: Data-as-a-Service for Open Datadapaasproject
Tutorial at "The Data Matters Series – Transforming Service Industry via Big Data Analytics", May 4, 2016, Cyberjaya, Malaysia
http://www.eventbrite.com/e/the-data-matters-series-transforming-service-industry-via-big-data-analytics-tickets-24617911837
Solution architecture for big data projects
solution architecture,big data,hadoop,hive,hbase,impala,spark,apache,cassandra,SAP HANA,Cognos big insights
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliData Driven Innovation
La nascita dei data lake - La aziende, ormai, sono sommerse dai dati e il classico datawarehouse fa fatica a macinare questi dati per numerosità e varietà. In molti hanno iniziato a guardare a delle architetture chiamate Data Lakes con Hadoop come tecnologia di riferimento. Ma questa soluzione va bene per tutto? Vieni a capire come operazionalizzare i data lakes per creare delle moderne architetture di gestione dati.
DataGraft: Data-as-a-Service for Open Datadapaasproject
Tutorial at "The Data Matters Series – Transforming Service Industry via Big Data Analytics", May 4, 2016, Cyberjaya, Malaysia
http://www.eventbrite.com/e/the-data-matters-series-transforming-service-industry-via-big-data-analytics-tickets-24617911837
Solution architecture for big data projects
solution architecture,big data,hadoop,hive,hbase,impala,spark,apache,cassandra,SAP HANA,Cognos big insights
The recent focus on Big Data in the data management community brings with it a paradigm shift—from the more traditional top-down, “design then build” approach to data warehousing and business intelligence, to the more bottom up, “discover and analyze” approach to analytics with Big Data. Where does data modeling fit in this new world of Big Data? Does it go away, or can it evolve to meet the emerging needs of these exciting new technologies? Join this webinar to discuss:
Big Data –A Technical & Cultural Paradigm Shift
Big Data in the Larger Information Management Landscape
Modeling & Technology Considerations
Organizational Considerations
The Role of the Data Architect in the World of Big Data
Open Data and News Analytics Demo from the 4th Sofia Open Data & Linked Data meetup
http://www.meetup.com/Sofia-Open-Data-Linked-Data-Meetup/events/228747999/
Mar'2016, Sofia | BG
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
Imagine risk analysis manager or compliance officer who can discover easily relationships like this: Big Bucks Café out of Seattle controls My Local Café in NYC through an offshore company. Such discovery can be a game changer if My Local Café pretends to be an independent small enterprise, while recently Big Bucks experiences financial difficulties.
Enterprise Architecture in the Era of Big Data and Quantum ComputingKnowledgent
Deck from April 2014 Big Data Palooza Meetup sponsored by Knowledgent. Enterprise Architect James Luisi spoke
Summary: Several characteristics identify the presence of big data. Invariably as new use cases emerge, new products emerge to address them. At this point, there are so many use cases, and so many products, that frameworks to organize and manage them are necessary. A couple of examples of useful frameworks to manage and organize include families of use cases and architectural disciplines.
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
In this webinar, Barry Zane, our Vice President of Engineering, discusses the evolution of databases from Relational to Semantic Graph and the Anzo Graph Query Engine, the key element of scale in the Anzo Smart Data Lake. Based on elastic clustered, in-memory computing, the Anzo Graph Query Engine offers interactive ad hoc query and analytics on datasets with billions of triples. With this powerful layer over their data, end users can effect powerful analytic workflows in a self-service manner.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
This presentation will discuss how just a few parts of the Semantic Web Cake can already boost your analytics by making your (big) data smarter and even more connected.
Integrating cloud with existing IBM SystemsBizTalk360
Enterprise IT environment require solutions that can be delivered faster, more cost effectively and with greater flexibility at global scale. With Azure, you can use cloud alongside your existing infrastructure and apps, including heritage IBM mainframe and midrange systems. In this session we will look at new Microsoft Azure Connectors for DB2, Informix and MQ that enable you to build hybrid enterprise cloud solutions. This video was recorded during the BizTalk Summit 2015 event held in London between April 13th & 14th, 2015.
Social Media Strategies for Powerful Communicationscourtneymbarnes
Teleseminar I presented for PRSA on the topic of the book I co-authored with Paul Argenti, "Digital Strategies for Powerful Corporate Communications," which is being published by McGraw-Hill in August 2009.
The recent focus on Big Data in the data management community brings with it a paradigm shift—from the more traditional top-down, “design then build” approach to data warehousing and business intelligence, to the more bottom up, “discover and analyze” approach to analytics with Big Data. Where does data modeling fit in this new world of Big Data? Does it go away, or can it evolve to meet the emerging needs of these exciting new technologies? Join this webinar to discuss:
Big Data –A Technical & Cultural Paradigm Shift
Big Data in the Larger Information Management Landscape
Modeling & Technology Considerations
Organizational Considerations
The Role of the Data Architect in the World of Big Data
Open Data and News Analytics Demo from the 4th Sofia Open Data & Linked Data meetup
http://www.meetup.com/Sofia-Open-Data-Linked-Data-Meetup/events/228747999/
Mar'2016, Sofia | BG
How to Reveal Hidden Relationships in Data and Risk AnalyticsOntotext
Imagine risk analysis manager or compliance officer who can discover easily relationships like this: Big Bucks Café out of Seattle controls My Local Café in NYC through an offshore company. Such discovery can be a game changer if My Local Café pretends to be an independent small enterprise, while recently Big Bucks experiences financial difficulties.
Enterprise Architecture in the Era of Big Data and Quantum ComputingKnowledgent
Deck from April 2014 Big Data Palooza Meetup sponsored by Knowledgent. Enterprise Architect James Luisi spoke
Summary: Several characteristics identify the presence of big data. Invariably as new use cases emerge, new products emerge to address them. At this point, there are so many use cases, and so many products, that frameworks to organize and manage them are necessary. A couple of examples of useful frameworks to manage and organize include families of use cases and architectural disciplines.
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
In this webinar, Barry Zane, our Vice President of Engineering, discusses the evolution of databases from Relational to Semantic Graph and the Anzo Graph Query Engine, the key element of scale in the Anzo Smart Data Lake. Based on elastic clustered, in-memory computing, the Anzo Graph Query Engine offers interactive ad hoc query and analytics on datasets with billions of triples. With this powerful layer over their data, end users can effect powerful analytic workflows in a self-service manner.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
This presentation will discuss how just a few parts of the Semantic Web Cake can already boost your analytics by making your (big) data smarter and even more connected.
Integrating cloud with existing IBM SystemsBizTalk360
Enterprise IT environment require solutions that can be delivered faster, more cost effectively and with greater flexibility at global scale. With Azure, you can use cloud alongside your existing infrastructure and apps, including heritage IBM mainframe and midrange systems. In this session we will look at new Microsoft Azure Connectors for DB2, Informix and MQ that enable you to build hybrid enterprise cloud solutions. This video was recorded during the BizTalk Summit 2015 event held in London between April 13th & 14th, 2015.
Social Media Strategies for Powerful Communicationscourtneymbarnes
Teleseminar I presented for PRSA on the topic of the book I co-authored with Paul Argenti, "Digital Strategies for Powerful Corporate Communications," which is being published by McGraw-Hill in August 2009.
What exactly is big data? The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs. Put simply, big data is larger, more complex data sets, especially from new data sources.
At the Technology Trends seminar, with HCMC University of Polytechnics' lecturers, KMS Technology's CTO delivered a topic of Big Data, Cloud Computing, Mobile, Social Media and In-memory Computing.
Everyone knows there's too much big data. But what's the best way to harness the power of big data? This presentation discusses three analytic engines that companies big and small are using to capture, store, transform and use big data. Also included are case studies of big data in action.
Presentation to SWIB23 in Berlin.
The journey to implement a production Linked Data Management and Discovery System for the National Library Board of Singapore.
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Marin Dimitrov
High-performing engineering teams regularly dedicate time on measuring the performance & quality of the systems and applications they’re building or on measuring & improving the various aspects of the development lifecycle. High-performing product companies are also data-driven when it comes to measuring the impact of new features & products in terms of business KPIs and Northstar metrics.
Can a data-driven approach be applied to measuring the performance, maturity and continuous improvement of an engineering team or the whole engineering organisation? In this discussion we’ll cover various important topics related to quantifying the performance of an engineering organisation
The career development of our teammates is among the key responsibilities of a leader - and оur personal career development vision & plan plays a critical role for our long term growth and success. Despite their importance, our career vision is often not getting enough attention and level of detail, or is hampered by easily avoidable mistakes. In this discussion, we’ll address typical mistakes related to long-term career planning, some best practices, and practical steps for building our own long-term career development vision (or the ones of the teammates we are leading), so that career planning becomes a long term journey with clear why/how/what, rather than just a list of SMART goals
Uber began its open source journey in 2015 when three passionate engineers decided to contribute Uber’s work back to the community. In only four years, Uber’s open source program has fostered 350+ outstanding open source projects with 2,000+ contributors worldwide delivering over 70,000 commits. Since 2017, four of Uber’s open source projects have won InfoWorld’s Best of Open Source Software Awards. In this talk, Brian Hsieh & Marin Dimitrov will share more details on Uber’s open source journey, program and best practices, and how Uber enables open innovation by fostering a healthy and collaborative open source culture
Trust - the Key Success Factor for Teams & OrganisationsMarin Dimitrov
>>> Most leaders agree that trust is a key factor for the success o the team and the organisation and that they are actively working to build trust. And yet, various studies imply that almost half of the teams and organisations worldwide experience lower trust levels with their managers, teammates and the rest of the organisation, which leads to decreased engagement, productivity and success.
>>> In this talk we will discuss why trust is a key success factor for every team and every organisation, some good practices for building, sustaining and rebuilding trust, as well as the most common mistakes related to trust building
talk @ the Computer Science department of Sofia University - practical advice for career growth for students
DEV.BG event http://dev.bg/%D1%81%D1%8A%D0%B1%D0%B8%D1%82%D0%B8%D0%B5/fmi-club-%D0%BF%D1%80%D0%B0%D0%BA%D1%82%D0%B8%D1%87%D0%BD%D0%B8-%D1%81%D1%8A%D0%B2%D0%B5%D1%82%D0%B8-%D0%B7%D0%B0-%D0%BA%D0%B0%D1%80%D0%B8%D0%B5%D1%80%D0%BD%D0%BE-%D1%80%D0%B0%D0%B7%D0%B2%D0%B8%D1%82/
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
slides from the S4 webinar "On-Demand RDF Graph Databases in the Cloud"
RDF database-as-a-service running on the Self-Service Semantic Suite (S4) platform: http://s4.ontotext.com
video recording of the talk is available at http://info.ontotext.com/on-demand-rdf-graph-database
slides from our talk "Low-Cost Open Data as-a-service" from the Semantic Web Developers workshop of ESWC'2015 (full paper: http://ceur-ws.org/Vol-1361/paper7.pdf)
Text Analytics & Linked Data Management As-a-ServiceMarin Dimitrov
slides from the talk on "Text Analytics & Linked Data Management As-a-Service with S4" from the ESWC'2015 workshop on Semantic Web Enterprise Adoption & Best Practices
full paper available at http://2015.wasabi-ws.org/papers/wasabi15_1.pdf
overview of the RDF graph database-as-a-service (GraphDB based) on the Self-Service Semantic Suite (S4)
http://s4.ontotext.com
presentation for the AKSW Group of the University of Leipzig
Dec'2013 webinar from the EUCLID project on managing large volumes of Linked Data
webinar recording at https://vimeo.com/84126769 and https://vimeo.com/84126770
more info on EUCLID: http://euclid-project.eu/
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
In the space of just a few years we’ve seen the transformational power of open data; both for transparency and accountability in public data, and efficiency and innovation with businesses in private data. In its first year, institutions and individuals throughout Europe have supported public sector bodies in releasing data and numerous start-ups, developers and SMEs in reusing this data for economic benefit.
However, we are still at the beginning of the open data movement, and there is still more that can be done to make open data simpler to use and to make it available to a wider audience.
The core goal of the DaPaaS project is to provide a Data- and Platform-as-a-Service environment, where 3rd parties (such as governmental organisations, SMEs, developers and larger companies) can publish and host both data sets and data-intensive applications, which can then be accessed by end-user applications in a cross-platform manner. You can find out more about DaPaaS on the detailed about page.
Essentially, DaPaaS aims to make publishing, consumption, and reuse of open data, as well as deploying open data applications, easier and cheaper for SMEs and small public bodies which otherwise may not have sufficient technical expertise, infrastructure and resources required to do so.
see also http://www.slideshare.net/eswcsummerschool/wed-roman-tutopendatapub-38742186
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. Contents
• Introduction
• Data Marketplaces
– Factual, InfoChimps, Azure DataMarket, Freebase, Socrata,
Kasabi
– Data Market, Timetric, xIgnite
• Data Marketplaces for Linked Data
(Linked) Data Marketplaces Jan 2011 #2
3. INTRODUCTION
(Linked) Data Marketplaces Jan 2011 #3
4. Definitions
• Data-as-a-Service (DaaS)
– “Like all members of the "as a Service" (XaaS) family, DaaS is based on
the concept that the product, data in this case, can be provided on
demand to the user regardless of geographic or organizational
separation of provider and consumer. Additionally, the emergence of
service-oriented architecture (SOA) has rendered the actual platform
on which the data resides also irrelevant” (Wikipedia)
• Data Marketplaces
– “Services that make it easy to find data from a range of secondary
data sources, then consume the data in a usable and unified format.
Several of these services are trying to create marketplaces for data,
envisioning that data providers can offer their data sets for sale to
data seekers” (DataMarket.com)
(Linked) Data Marketplaces Jan 2011 #4
5. Data Marketplaces properties
• Proposed classification by Bauereiss & Fensel
1. Data domain
2. Population of content
3. Community management
4. Operating party
5. Pricing models
6. Data exchange
• Some additional differentiating characteristics
– Data model, Data size, Data export
– Branded marketplaces, SLA
– Query languages, Data tools
(Linked) Data Marketplaces Jan 2011 #5
8. Factual (2)
• Data domain
– Travel, finance, sports, autos, movies, music, TV, books,
health, food, politics, education, science, arts, …
– High quality local data
• USA, Germany, France, Italy, UK, Japan, Switzerland, Australia, …
• Used by Facebook Places
• Data population
– Crawling the web
– Public data sources
– Community contributions
• Upload XLS/ODS, CSV
(Linked) Data Marketplaces Jan 2011 #8
9. Factual (3)
• Data model
– tabular
– Taxonomy of 400 categories
• 13 Level 1 categories: Arts, Automotive, Business, Government, …
• Data size – 500,000 datasets
• Company info
– Factual Inc. (USA)
– $27M VC funding so far
(Linked) Data Marketplaces Jan 2011 #9
10. Factual (4)
• Monetization model
– Pricing model not finalised yet (currently free)
– Pay-per-use pricing (per API call) with subscriptions
• Companies that contribute data will have a fee reduction
• Data access options
– REST API
• Read from table, Add/Write to table, Get schema info
– Web applications
• Read/write raw data from a web page (JavaScript)
• Web widgets for visualising, filtering and sorting data
(Linked) Data Marketplaces Jan 2011 #10
11. Factual (5)
• Data tools
– AutoClipper – find tables on the web
– PageClipper – extract tabular data from a web page
– FactClipper – find individual facts (query templates)
(Linked) Data Marketplaces Jan 2011 #11
13. InfoChimps (2)
• Data domain
– All purpose
• Including data from Freebase, Wikipedia infoboxes, CKAN, Twitter,
Data.gov, Data.gov.uk, GeoNames, …
• Data population
– Public datasets
– User submitted datasets
• Data model is dataset specific
• 10,000+ datasets organised in 13 collections
(Linked) Data Marketplaces Jan 2011 #13
14. InfoChimps (3)
• Company info
– InfoChimps (USA)
– $1.6M VC funding so far
– Acquired DataMarketplace in 12/2010
• Monetization model
– Charge data sellers
• Data sellers choose the price & licensing of their data
• Charge for data storage
• 30% commission for InfoChimps on each sale
(Linked) Data Marketplaces Jan 2011 #14
15. InfoChimps (4)
• Monetization model (2)
– Charge data buyers
• Baboon – free, 100K API calls / mo
• Brass Monkey – $20/mo, 500K API calls / mo
• Silverback – $250/mo, 2M API calls / mo
• Golden Ape – $4,000/mo, 15M API calls / mo
• Data access options
– REST API
• api.infochimps.com/DATASET/METHOD.json?PARAM=VALUE
– YQL tables
(Linked) Data Marketplaces Jan 2011 #15
17. Azure DataMarket (2)
• Data domain
– All purpose, incl. Data.gov, UN data, Wolfram|Alpha, ESRI
• Data population
– Data publishers (need prior approval)
• Data can be stored on SQL Azure, Azure Storage or 3rd party clouds
(via Data Access Layers)
• Data model
– Depends on the dataset and the storage, but always
presented as OData to consumers
• Data size – 90 datasets
(Linked) Data Marketplaces Jan 2011 #17
19. Azure DataMarket (4)
• Company info
– Microsoft
• Monetization model
– Subscription for data buyers (limited/unlimited API calls)
• Access options
– OData (feeds, queries, updates)
• Data tools
– Service Explorer
– Excel add-in (find, purchase, consume data)
– Integration with SQL Server Reporting Services /
Integration Services
(Linked) Data Marketplaces Jan 2011 #19
21. DataMarket (2)
• Data domain
– Statistical data from 2,000 providers, incl. UN, Eurostat,
World Bank, US agencies, BP, FIFA, …
• Data population
– Data aggregation (2,000 data providers)
• Data size
– 13K datasets, 100M time series, 600M facts
• Company info
– DataMarket (Iceland)
(Linked) Data Marketplaces Jan 2011 #21
22. DataMarket (3)
• Monetization model
– Charge data sellers
• Free datasets – $249/mo; Paid datasets – 25% commission;
Branded datasets – $699/mo + commission
– Charge data buyers
• Free – 50 API calls/mo; $99 – 500 API calls/mo; $299 – 10K API
calls/mo; $799 – 100K API calls/mo
• Data access
– REST API
(Linked) Data Marketplaces Jan 2011 #22
24. Socrata (2)
• Data domain
– Business, education, government data
• Data population
– Uploads from data publishers
• Data size
– 13K datasets
• Data model
– tabular
(Linked) Data Marketplaces Jan 2011 #24
25. Socrata (3)
• Company info
– Socrata (USA)
• Monetization model
– Charge data buyers (“Plans starting at $499 per month”)
• Basic – 100K API calls/mo + 50GB traffic; Plus – 250K API calls/mo
+ 250GB traffic; Premium – 1M API calls/mo + 1.2TB traffic;
Ultimate – 10M API calls/mo + 5TB traffic
• Data access
– REST API (Socrata Open Data API)
– Data export (XLS, CSV, RDF, XML)
– RSS updates
(Linked) Data Marketplaces Jan 2011 #25
27. Kasabi (2)
• Data domain
– All purpose, incl. DBpedia, GeoNames, BBC Linked Data, …
• Data population
– Public datasets
– User submitted datasets
• Data size
– 55 datasets
• Data model
– RDF
(Linked) Data Marketplaces Jan 2011 #27
28. Kasabi (3)
• Company info
– Talis (UK)
• Monetization model
– Charge data consumers
– Data hosting is free
• Data access
– SPARQL / Linked Data endpoint
– REST API
– Additional APIs
– PHP & Ruby client libraries
(Linked) Data Marketplaces Jan 2011 #28
30. Freebase (2)
• Data domain
– General purpose
• Data model
– Graph (RDF dumps available)
• Data population
– Community curated data (licensed as CC-BY)
– Import of public data sources (Wikipedia, MusicBrainz,
WordNet, LoC, …)
• Data size
– 20M entities
(Linked) Data Marketplaces Jan 2011 #30
31. Freebase (3)
• Company info
– Metaweb (USA), now Google
• Monetization model
– Free for 100K read API calls per day (10K write)
– Paid for higher volumes
• Data access
– REST API
– Linked Data endpoint (http://rdf.freebase.com)
– Triple uploader / RDF dumps
– Acre (application hosting platform)
(Linked) Data Marketplaces Jan 2011 #31
32. Freebase (4)
• Data tools
– Web based – schema editor, review queue, viewers, …
– GridWorks (Google Refine)
• Exploring, data cleaning, transformation of tabular data
• Map data to Freebase schema & RDF export (3rd party extension)
– Acre
• Application hosting platform
– User contributed JavaScript code (converted to Java with Rhino)
• Access & store data directly into Freebase
(Linked) Data Marketplaces Jan 2011 #32
34. timetric (2)
• Data domain
– Economic data
• Data population
– aggregate data from the world's leading sources of
economic data (World Bank, Eurostat, …)
– User uploaded data
• Data size
– 2.5M public statistics
(Linked) Data Marketplaces Jan 2011 #34
35. timetric (3)
• Company info
– Timetric Ltd. (UK)
• Monetization model
– Free public datasets
– Paid exclusive datasets
• Data access
– REST API
(Linked) Data Marketplaces Jan 2011 #35
37. xIgnite (2)
• Data domain
– Financial data
• Data population
– aggregate data from leading sources (Dow Jones, Thomson
Reuters, stock exchanges, …)
– Public datasets (national banks, SEC, Federal Reserve, …)
– User uploaded data
• Company info
– Xignite (USA)
(Linked) Data Marketplaces Jan 2011 #37
38. xIgnite (3)
• Monetization model
– Paid subscriptions
• Data access
– Web services (REST/SOAP)
(Linked) Data Marketplaces Jan 2011 #38
39. Coming soon…
• BuzzData
– www.buzzdata.com / @buzzdata
– Company: BuzzData
(Linked) Data Marketplaces Jan 2011 #39
40. Data marketplaces – features summary
• Data
– Data model, domain, export options
• Monetization
– Charge buyers/ sellers
– free API calls
– branded marketplaces & Service Level Agreement
• For developers
– REST API; query language
– Tools for data management / integration
– Application hosting
(Linked) Data Marketplaces Jan 2011 #40
42. LINKED DATA + MARKETPLACES
(Linked) Data Marketplaces Jan 2011 #42
43. Linked Data cloud (Sep 2010)
(c) R. Cyganiak and A. Jentzsch
(Linked) Data Marketplaces Jan 2011 #43
44. Benefits of Linked Data for Data Marketplaces
• Unified data representation model (RDF)
– Easy consumption of the data
• Global identifiers for all objects (URI)
– Makes incremental data integration & federation easier
• Interlinked datasets
– New data added to the marketplace can be integrated
with existing data
– Network effects
• Data marketplace interoperability
– Data from different marketplaces can be easily integrated
(Linked) Data Marketplaces Jan 2011 #44
45. Benefits of Linked Data for Data Marketplaces (2)
• Derived knowledge / facts
– RDF inference of additional implicit facts
– (see FactForge and LinkedLifeData)
• Rich queries
– SPARQL offers unmatched query expressivity
• Easy import of existing LOD datasets
– Linked Open Data cloud already includes 200+ datasets
with 20+ billion RDF triples
(Linked) Data Marketplaces Jan 2011 #45
46. Linked Data for marketplaces – challenges
• Quality of data
– Different (public) datasets may come with inconsistent or
controversial data
– Quality more important than quantity
• Large scale data integration
– Ontology (schema) mapping of different datasets &
vocabularies
• Licensing
– Some datasets come with “CC-BY-NC” or unclear licensing
• Billing
– API calls / SPARQL queries with varying computational
cost (Linked) Data Marketplaces Jan 2011 #46
47. Linked Data for marketplaces – challenges (2)
• Billing
– API calls / SPARQL queries with varying computational
cost
• Operations
– Service Level guarantees
– Availability & scalability challenges
• Most Linked Data endpoints at present are neither scalable, nor
available
(Linked) Data Marketplaces Jan 2011 #47
49. LinkedLifeData & FactForge
• FactForge
– Integrates some of the most central LOD datasets
– General-purpose information (not specific to a domain)
– 1.2 billion explicit and 1 billion inferred statements
– The largest upper-level knowledge base
– http://www.FactForge.net
• Linked Life Data
– 25 of the most popular life-science datasets
– 2.7 billion explicit and 1.4 billion inferred statements
– http://www.LinkedLifeData.com
(Linked) Data Marketplaces Jan 2011 #49
50. Strategic questions
• Monetization strategy
– which (linked) datasets can be monetized
– Charge buyers / charge sellers / free quota
– Branded marketplaces
• Community building
– Crowdsource the data curation to the community
– How to provide incentives to data curators?
(Linked) Data Marketplaces Jan 2011 #50
51. Strategic questions (2)
• Operations
– How to ensure Service Level guarantees?
– How to deal with licensing issues?
– Account management, metering, billing
• Platform
– RDF database – data volume, query volume
– ETL tools
– Curation tools
– Data export & consumption
(Linked) Data Marketplaces Jan 2011 #51
52. Data monetization with WebServius
(c) WebServius
• Benefits
– user management, quotas & restrictions
– Metering, pricing, billing
– Security, scalability, SLAs
(Linked) Data Marketplaces Jan 2011 #52
53. Q&A
Questions?
@ontotext
(Linked) Data Marketplaces Jan 2011 #53