2. 14:45 … Hot Drinks
15:00 … Welcome
15:10 … The Project
15:20 … The Findings
15:35 … Conclusions, Roadmap
15:45 … Guest Presentation: Data Market Austria
16:00 … Get your Book and hand over to Vienna Open Data MeetUp
6. Sabrina Kirrane (WU, Privacy and Sustainable Computing Lab)
Julia Neuschmidt (IDC Austria)
Mihai Lupu (Researchstudios Austria)
Elmar Kiesling (TU, Linked Data Lab)
Thomas Thurner (SWC + School of Data)
8. PROPEL 8
Emerging concept for data exchange and integration
Based on standard web technologies
Shifting away from a predominantly academic perspective, we
conceive Linked Data as a promising disruptive technology for
enterprise data management.
Source: blog.backand.com
Linked Data
9. The project goal
Survey industry and market needs, technological challenges,
and open research questions on the use of Linked Data in a
business context.
FFG ICT of the Future 2014/2015
Exploratory study
Project duration Nov 2015 – Dec 2016
Consortium: IDC Austria, Technical University of Vienna,
University of Economy Vienna, Semantic Web Company
PROPEL 9
10. Approach
PROPEL 10
Which industries are
the most likely to
adopt LD
technologies?
What are the key
drivers, inhibitors and
needs in data
management from a
demand side
perspective?
11. Approach
PROPEL 11
What
recommendations are
necessary for
enterprises, policy
makers and
researchers
in order to propel the
adoption of LD in
enterprises?
What are
technological and
standardisation
opportunities and
challenges?
14. Sectoral Analysis of LD Potential
Goal:
• Exploratory sectoral assessment of Linked Data adoption potential
• Alignment between Linked Data paradigm and industry characteristics
• Broad high-level, theoretical perspective
Methods:
• Industry classification: NACE rev. 2 top level sections,
with selective use of more detailed classes
• Extensive literature research
• Analysis of statistical data on industry characteristics
(R&D intensity, ICT spending,..)
• Industry expert interviews
• Internal validation survey
PROPEL 14
17. High potential sectors
17
✅ Highly networked
✅ Strong (potential) impact
of ICT-based innovation
✅ Data- and ICT-intense
☑ Global scope
☑ Knowledge-intense
☑ Complex operations
☑ Relatively open
☑ Some uptake of web
technologies
18. Medium potential sectors
18
✅ Highly networked
✅ ICT- and data-intense
✅ Strong (potential) impact
of ICT-based innovation
☑ Highly internationalized
☑ Complex operations
❌ Have not embraced
openness
❌ Limited uptake of web
technologies
19. Lower potential sectors
PROPEL 19
✅ Structural characteristics
mostly favorable
❌ Moderate ICT dynamics
❌ Have not embraced
openness
❌ Trailing web technology
uptake
20. Results
Broad potential for ELD across a large spectrum of industries
Focus on ”openness” and “web-centric positioning” in
academic discussions may inhibit enterprise adoption
Virtually all sectors in developed economies exhibit structural
characteristics that favor LD adoption:
• Actors in a highly networked global economy
• Increasingly data-driven and knowledge-intense
• Cross-organizational operations
However, various sectors
• are laggards in the technological dimensions and
• have untapped potential for ICT-based innovation
PROPEL 20
22. Market Forces
PROPEL 22
Economy:
• Positive economic development in Austria leads to a growth in IT spending and
we expect investments solutions for data and information management
Efficiency:
• Organizations focus primarily on costs. Data and information management
solutions and LD can have positive effects in terms of transforming businesses,
increasing efficiency and driving growth
Digital Transformation:
• Data and information management is a key asset for digital transformation, and
concepts around Linked Data can support the transformation process
23. Market Forces
PROPEL 23
Culture:
• Missing innovation culture in some organisations might be inhibitors for the uptake
of new technologies
Data driven networked global economy:
• Growing need to break up silos, and to share data across organizational
boundaries.
Digital life of citizens:
• High Internet adoption and user demands for new digital products and services
lead to redefinition and expansion of services.
24. Market Forces
PROPEL 24
Technology:
• New technologies like cloud, big data, IoT and cognitive computing/machine
learning change the way our data is managed.
Data security and privacy:
• Common barriers to adoption of new technology; at the same time security
concerns provide an opportunity for solution providers to generate revenue out of
their security solutions and services.
Regulations:
• General Data Protection Regulation forces organizations to take a fresh look on
how they manage their data.
25. Big efforts for data and information management
PROPEL 25
Demand-side analysis
29. Interviews
23 interviews:
Domains
Consulting, Engineering, Environment, Finance and Insurance,
Government, Healthcare, ICT, IT, Media, Pharmaceutical, Professional
Services, Real Estate, Research, Startup, Tourism, Transports & Logistics
Roles
Business Intelligence, CEO, Chief Engineer, Data and Systems Architect,
Data Scientist, Director Information Management, Enterprise Architect,
Founder, General Secretary, Governance, Risk & Compliance Manager,
Head of Communications and Media, Head of Development, Head of HR,
Head of R&D, Innovation Manager, Information Architect, IT Project
Manager, Management, Managing director, Marketing Analyst, Principle
System Analyst, Project Coordinator, Researcher, Technical Specialist
PROPEL 29
Note: Instead of explaining them what
ELD is, we gathered their
technology/research expectations from
a more general SW perspective
30. Technologies in need…
PROPEL 30
Analytics
Computational
linguistics & NLP
Concept tagging &
annotation
Data integration
Data management
Dynamic data /
streaming
Extraction, data
mining, text mining,
entity extraction
Logic, formal
languages &
reasoning
Human-Computer
Interaction &
visualization
Knowledge
representation
Machine learning
Ontology/thesaurus
/taxonomy
management
Quality &
Provenance
Recommendations
Robustness,
scalability,
optimization and
performance
Searching,
browsing &
exploration
Security and
privacy
System engineering
35. Semantic Web/Linked Data over time…
PROPEL 35
Early adopters:
MITRE
Chevron
British Telecom
Boeing
Ordnance Survey
Eli Lily
Pfizer
Agfa
Food and Drug Administration
National Institutes of Health
Software adopters/products:
Oracle
Adobe
Altova
OpenLink
TopQuadrant
Software AG
Aduna Software
Protége
SAPHIRE
37. LD Adopters - Companies
PROPEL 37
0
200
400
600
800
1000
1200
1400
1600
Google Oracle Yahoo SAP IEEE
Intelligent
Systems
Franz Bing Expert
System
IBM Research Poolparty
Occurrences
Companies
Conference Sponsors that appear in papers 2006-2015
39. Semantic Web/Linked Data over time…
PROPEL 39
The authors claim that "early research has
transitioned into these larger, more
applied systems, today’s Semantic Web
research is changing: It builds on the
earlier foundations but it has generated a
more diverse set of pursuits”.
48. Activities
54
Long-term
Support
emerging Linked
Enterprise Data
ecosystems
Establish
centers of
excellence
Position Austria
as a hotspot for
LED research
and innovation
Awareness and
Education
Legal and
Policy
FundingTechnological Innovation
Research
Medium-term
Develop key
foundational
technologies
Institutional and technological focus
on key issues and domains
Short-term
Cluster
stakeholders
and efforts
Get momentum
from new
funding lines
Supporting
studies and pilot
projects
50. "Use the power of ELD!"
Many industries are facing disruptive change
Even conservative industries see a need for a "two speed IT"
Linked Data can be both a disruptive force and a means to
respond to disruptive change
Key ELD technologies are mature and have been successfully
applied in many domains
Linked Data is agile and flexible
ELD is a enabler for product, process and business model
innovation!
PROPEL 56
51. "ELD is the backbone for the
developing content industry"
Linked Data is particularly relevant for online businesses
(media, e-commerce, etc.)
ELD provides a platform to generate and leverage economic
network effects typical for these industries
Tools to enrich digital products and make them
interchangeable within a broader digital environment
PROPEL 57
52. "We need to align research
priorities and practical needs"
Continued fundamental basic research necessary, but:
Industry needs should be reflected in applied
research agendas
More courage to apply cutting-edge technologies in
industry needed!
PROPEL 58
53. "ELD has to convince stakeholders
to embrace change"
Technological, behavioural and cultural adoption barriers
New skill sets required
To instigate change, ELD must
..make sense from a business perspective
→ clear business cases, fast returns, tangible, quantifiable benefits
..lower entry barriers
• by playing well with existing infrastructure
• through open source/freemium/experimental models
..address security, privacy, and compliance concerns
PROPEL 59
54. "Need to support [and subsidize]
emerging ELD ecosystems"
Prototypical example of a technology with strong economic network effects
Flagship implementations and pioneering projects are key to furthering the
growth of ELD in Austria.
Both financial and infrastructural support are necessary in order to
accelerate the development of the sector.
Core preparatory steps include:
• Base infrastructures (stores, services, data) to build solutions on top
• Project related funding
PROPEL 61
55. Julia Neuschmid | jneuschmid@idc.com
PROPEL 62
Thank you!
www.linked-data.at
58. Linked Data from 10,000 foot...
• Best practices for publishing and connecting
structured data on the Web
• Goal: Creating a global data space
Web of Documents Web of Data
59. ... and up-close
Graph-based data model that captures statements about
things in the world
Subject-predicate-object triples
Use of URIs as globally unique identifiers
PROPEL
66
http://example.com/
alice
http://xmlns.com/foaf/0.1/knows
http://example.com/
bob
:alice
foaf:knows
:bob
60. Principles
Anyone can…
• publish data
• create URIs
• choose or create vocabularies to represent their data
• refer to Linked Data published by others
Result:
• Decentralized data infrastructure (> 650.000 datasets)
• Machine-readable, and -discoverable data sets
• Bottom-up "pay as you go" data integration
PROPEL 67
61. Key ideas
PROPEL 68
Explicit SemanticsWeb of data Graph-based
Network effects
Global data space
Bottom-up
Flexible
Agile Machine readable
Interoperable
Ad-hoc integration
Linking
Decentralized Inference
Discovery
a
b
cx
y
Emergent
Open
68. ELD and LED
Enterprise Linked Data:
Internal use of LD technologies within organizations, e.g.,
• to integrate heterogeneous systems at the data level
• for advanced content/knowledge/… management
• as a basis for innovative products and services
Linked Enterprise Data:
• Cross-organizational data integration
• Data markets and data ecosystems
• Decentralized infrastructure for a networked economy
PROPEL 75
70. Linked Data vs. Open Data
Overlaps:
• Openness is a core principle in the design of LD
• Many Linked Data sets published under an open license
→ Linked Open Data and LD often used interchangeably
Key differences:
• Linked Data technologies can be used without publishing data –
e.g., for internal and external data integration.
• Not all open data will ever be linked (the majority will remain in
formats such as csv, txt etc.)
PROPEL 77
71. Linked Data vs. “The” Semantic Web
Overlaps:
• "LD is the Semantic Web done right" (Tim Berners-Lee)
• Semantic web is made up of Linked Data.
• Linked Data is based on Semantic web standards.
Key Differences:
• Semantic Web was all about "semantifying" the web, Linked Data is
based on web standards (URIs, http), but doesn't center around web
pages.
• LD is a more pragmatic "bottom-up" approach.
• "Linked Data is mainly about publishing structured data in RDF using URIs
rather than focusing on the ontological level or inference."
M. Hausenblas "Exploiting Linked Data For Building Web Applications"
IEEE Internet Computing, 2009
72. Linked Data vs Big Data
Overlaps:
• LD as a whole is big ( *)
• No rigid up-front (e.g., relational) data model
• Big Data technologies (e.g., Hadoop) are used to handle LD
• LD can represent knowledge extracted from big unstructured data
Key Differences:
• Individual linked data sets are typically not "big" per se
(e.g., English DBPedia dump currently < 5 GB)
• LD is structured and semantically explicit,
"big data lakes" are typically neither
• Big data based on distributed data infrastructures within an organization
(e.g., Hadoop clusters), LD creates a decentralized, globally distributed data
infrastructure
PROPEL 79
http://lodlaundromat.orgasper2016-05-10
73. Linked Data vs Knowledge Graphs
PROPEL 80
Facebook Open Graph Google's knowledge graph
Examples:
74. Linked Data vs Knowledge Graphs
Overlaps:
• Knowledge Graphs also represent explicit semantics in a
graph-based data model
• Both are often used to facilitate semantic search
• Knowledge graphs can use open standards (e.g., RDFa)
Key differences:
• Proprietary (data and technologies), closed "ecosystem"
• Tightly integrated with services
• Typically not published externally → no way to link to
75. References
Videos:
Tim Berners-Lee: The next Web of open, linked data (16:52)
Linked Data (and the Web of Data)
Manu Sporny: What is Linked Data (12:09)
Michael Hausenblas: Quick Linked Data Intro (3:14)
Annenberg Networks Theory Seminar with Tim-Berners-Lee
Metaweb (now defunct): Words vs entities
Tutorial:
Linda Project: Linked Data Primer
Articles:
C. Bizer, T. Heath, and T. Berners-Lee. Linked Data - The Story So Far. International Journal on
Semantic Web and Information Systems, 5(3):1 – 22, 2009.
Books:
T. Pellegrini, H. Sack, and S. Auer, Eds., Linked Enterprise Data. Heidelberg: Springer Berlin, 2014.
Tom Heath, Christian Bizer (2011). Linked Data - Evolving the Web into a Global Data Space.
Morgan & Claypool, 2011.
EUCLID Project Consortium (2014). Using Linked Data Effectively.
Hitzler, Rudolph, Krötzsch (2009). Foundations of Semantic Web Technologies. Chapman & Hall/CRC
PROPEL 82
76. Linked Data Principles
1. Use URIs to identify things
2. Use HTTP URIs so that people can look up
those names
3. When someone looks up a URI, provide
useful information, using the standards
(RDF, SPARQL)
4. Include links to other URIs so that they can
discover more things
DesignIssues:LinkedDatanotes,TimBerners-
Lee
77. The Semantic Web Technology Stack
http://bnode.org/blog/2009/07/08/the-semantic-web-not-a-piece-of-cake
78. Selected Linked Data
Standards/Technologies
URIs + HTTP:
• Web infrastructure that provides global identifiers for all objects
RDF:
• provides a generic graph-based data model for describing things
• various serializations
RDFS and OWL
• Basis for the definition of vocabularies
(i.e., collections of classes and properties)
• Expressed in RDF
• Facilitates inference (using reasoning engines)
SPARQL:
• Graph pattern-based query language (and protocol) for RDF data
PROPEL 85
79. Vocabularies
Many vocabularies beyond those defined in the RDF standard
Collections of defined relationships and classes of resources
Vocabulary definition and reuse is a key semantic web principle
Adapted from Euclid learning materials by Barry Norton
Best practices:
• Terms from well-known vocabularies
should be reused wherever possible
• New terms should be defined only if you
can not find required terms in existing
vocabularies
• Feel free to mix terms from different
vocabularies and to extend the
vocabularies with additional terms in
your own namespace
80. Examples of common Vocabularies
Vocabulary Description Classes and Relationships
Friend-of-a-Friend
(FOAF)
Vocabulary for describing
people.
foaf:Person, foaf:Agent,
foaf:name, foaf:knows,
foaf:member
Dublin Core (DC) Defines general metadata
attributes.
dc:FileFormat, dc:MediaType,
dc:creator, dc:description
Semantically-Interlinked
Online Communities
(SIOC)
Vocabulary for representing
online communities.
sioc:Community, sioc:Forum,
sioc:Post, sioc:follows,
sioc:topic
Music Ontology (MO) Provides terms for
describing artists, albums
and tracks.
mo:MusicArtist,
mo:MusicGroup, mo:Signal,
mo:member, mo:record
Simple Knowledge
Organization System
(SKOS)
Vocabulary for representing
taxonomies and loosely
structured knowledge.
skos:Concept, skos:inScheme,
skos:definition, skos:example
Adapted from Euclid learning materials by Barry Norton
81. Linked Data from an Application
Development Perspective
Data is self-describing (applications can dereference
URIs that identify vocabulary terms in order to find
their definition)
Use of HTTP as standardized data access
mechanism and RDF as a standardized data model
simplifies data access compared to Web APIs,
which rely on heterogeneous data models and
access interfaces
Web of Data is open, i.e., applications do not have
to be implemented against a fixed set of data
sources, but can discover new data sources at run-
time by following RDF links.
PROPEL 88
Editor's Notes
Projektpartner
Kofinanzierung
To connect data from different sources
With the term enterprise linked data we mean to link data from different sources in a business context within an enterprise; this data can be open or closed for external stakeholders.
E.g. link different information systems such as enterprise resource planning, customer relationship managment, supply chain management, emails, the web, social media, other sources, etc.
As a basis for innovative products and services, to increase efficiency and productivity, and to drive business.
1: The characterisation of industries/domains according to defined criteria in order to identify the industries with highest potential for the adoption of the (E)LD concept.
2: An investigation into the data and information management challenges that industries are currently facing, and their formulation as use cases.
3: Analysis of the Linked Data community, its current research and development activities as well as open challenges in regards to LD technologies and standards.
4: The development of an integrated roadmap based on the industry, market, and Linked Data community analysis.
Target private sector enterprises of differnet sizes, the research community and policy makers
In this phase of the project, we deliberately did not look into particular practical use cases in various industries, but aimed to assess the general suspectability of various sectors to the Linked Data paradigm.
The goal was to characterize different sectors along a set of dimensions that indicate how well their structural and technological profile aligns with characteristics of the Linked Data paradigm.
In later stages of the project, we then narrowed our analysis to particularly promising application domains and use cases, but in this phase the goal was to take a broad high-level perspective
Alignment between Linked Data paradigm and industry characteristics:
- Linked Data creates a semantically explicit global information space and is therefore useful for industries that are highly internationalized, networked, knowledge-intense, data-driven etc.
Based on such rationales, we developed a set of working hypotheses that I'll explain in a minute
Methods:
To develop these working hypotheses, we relied extensively on literature research
The individual industry characterizations were also mostly informed by desk research, but also through statistical data on industry characteristics (such as R&D intensity, IT spending)
Explain rationale:
Internationalization
→ The original goal of Linked Data was to create a global information space -> more useful in industries with geographic dispersion
Knowledge-intensity:
The Semantic web has a strong tradition in knowledge representation, and of course industries where knowledge is important should be more susceptible towards adopting technologies like linked data that help them to manage it
Operational complexity: coordination of many activities, cooperative processes, interactions etc.
→ need for common understanding and a joint infrastructure
Network: inter-organizational information flows, need to exchange data
Openness: is a core value in the LOD community; a starting point of this project was that these technologies can be applied in a not fully open environment
Still, industries that are characterized
Linked Data characteristics:
Decentralization
Linking, sharing and reuse
Self-descriptiveness
Flexibility and extensibility
Openness
Networked:
- information flows across organizational boundaries
- need to share and integrate information within and between organizations
- Within and across industries
Data infrastructure
The overall economic development influences the IT investments in Austria, slightly positive outlook
The overall economic development influences the IT investments in Austria, slightly positive outlook
For 84% of respondents efforts for data and information mng are rather big or even very big. 8% is overlaoded by their efforts.
The biggest challenges in data management are the cooperation between IT-departments and Lines of Business (LOBs), inconsistency in the business terminology, immature technology, and low data quality.
Future perspectives for growth are to integrate data from different sources, to create consistency between data and eliminate duplication, and to track communication with customers along different streams and channels (e.g. CRM, e-mail, social media, etc.).
From the interviews we derived approximately 60 user stories and mapped them to the 18 Foundations listed earlier
https://docs.google.com/spreadsheets/d/1x3s0r8Wlg5rt9paa5CwoP0MYwHPqlLj66WbClctbV9w/edit#gid=0
I selected 4 and added them here, however there are others incase you need them
Result of a stakeholder workshop we conducted with those people…
These are the foundations that we use for mapping the user stories from WP2 to the topics from WP3, therefore it is good to introduce them here
Analytics
Computational linguistics & NLP
Concept tagging & annotation (******Personally I’m not convinced about this one******)
Data integration
Data management
Dynamic data / streaming
Extraction, data mining, text mining, entity extraction (******Personally I’m not convinced the overlap with NLP******)
Logic, formal languages & reasoning
Human-Computer Interaction & visualization
Knowledge representation
Machine learning
Ontology/thesaurus/taxonomy management
Quality
Recommendations
Robustness, scalability, optimization and performance
Searching, browsing & exploration
Security and privacy
System engineering
A not-yet-very-scientific approach… still hope this is interesting, maybe a bit controversial to discuss here!
Outlines quite clearly what they thought back then the Semantic Web should be…
Terms to do with Agents and Web Services from conference/journal dictionary – there is no high level foundation so we might need to merge some terms
ontologies ontology management ontology engineering ontology languages
agents web services software agents services agent-based
… agents research definitly not going up, our community has largely been dominated by ontologies.
Might need data before 2006, semanttic Web services topic was already on the decline in 2006.
The year of ”Linked Data”
DBPedia
A lot of company use cases that have used SW mentioned:
Top 10 Companies plot from Sponsor Dictionary
Can look over companies per year FORM_
We identified a number threats for the development of an Enterprise Linked Data ecosystem in the Austrian environment that we have limited control over.
LEGAL AND Policy:Inconsistent legal standards across the EU can be a major stumbling block for the development of Linked Data Ecosystems. Here, we are somewhat dependent upon policy-making at the European level.
STANDARDISATION:standardization bodies, most notably of course the W3C, may focus their efforts elsewhere. So far, enterprise-related topics have not ranked particularly high in the priorities.
TECHNOLOGICAL INNOVATION:Broader developments in the research domain, such as the long-term demographics of the semantic web research community, i.e., the number of suitably qualified science and engineering students entering postgraduate studies.
FUNDING:And of course, we as researchers are always afraid of funding cuts and changes in funding priorities .
AWARENESS:
In a sense, the main weaknesses of the Austrian Enterprise Linked Data space is that it does not really exist
This can be attributed to a lack of awareness and education
As this project shows, there is a community in both academia and industry, but both we as researchers and the industry partners in the project are primarily active internationally
This is quite natural in the scientific domain and can be considered a strength for companies like SWC whose client base is mostly international, but not having a "home market" to address is also a weakness.
Other weaknesses include the fact that the relevant industry is not headquartered in Austria.
In terms of funding, there are high access barriers for funding of research projects
And difficulty to fund industry-wide action and adoption at an international level
Now, we also found that there are significant strenghts.
AWARENESS: Strengths include the small, but very active community.
This is reflected in the attendance of meetups such as this one.
STANDARDIZATION:
In terms of standardization, Austria is I THINK overproportionally represented in W3C groups and bodies, which we consider another strength.
TECHNOLOGY AND RESEARCH:
In terms of technology and research, I think it is fair to say that we are fairly well positioned and that technological innovation in research and practice is a key strength.
FUNDING:
Funding is of course always an issue to be concerned about, but overall Austria is well-positioned to fund basic and applied research related to Linked Data.
Applied research also fosters collaborations and knowledge transfer between universities and industry.
LEGAL AND POLICY: ?????????????????????????????????????????????????????
A major opportunity is the know-how concentrated available at universities and research institutions.
Standardization:
- certification scheme: interoperability of tools, vendors etc. (e.g., SPARQL)
We also consider the EU general data protection regulation with ist strong implications for Linked Data not just a challenge, but an opportunity for the European data infrastructure market in general and the Austrian market in particular
In the short term, we recommend measures to ensure the visibility of Austrian Linked Data initiatives in an international context
that are necessary to propel the potential of Linked Data in enterprises.
Foster multidisciplinary teams: to make sure the technical research is informed by societal and business requirements
To drive technological innovation, develop centers of excellence
Goals:
Provide a platform for Austrians to showcase technological results from nationally and internationally funded projects
Support the setup and development of centers, starting with Linked Data and moving beyond to Big Data Analytics and the Internet of Things
Flagship projects on ELD legal clearing centers:
- demonstrators based on national use cases
- broaden findings and solutions to a transnational blueprint
- couple outcomes of the flagship projects with the standardization efforts
Ability to respond to disruptive threats
Agile in terms of data modeling and integration
proven successful in data and information management
The redevelopment of companies coupled with disruptive developments, present companies with the opportunity to adopt ELD, facilitating new business narratives, new processes and new products.
Given that ELD has its strength as an underpinning technology for new business narratives,
Community has also been actively involved in the development of tools (ontologies, RDF, NLP) that
There is a strong history of fundamental theoretic research in logic and knowledge representation.
Important, but if we want the technologies we develop to be relevant, we need to align research priorities with practical needs
More customer involvement in business processes and (open) innovation
needs to be improved to foster adoption
In existing environments
While, well established industries have their own home grown IT environments, ELD does not have this level of specialisation and thus has to be adapted beforehand.
need to be overcome. For instance, it took a long time to convince industry to use SQL for data management. To the same
issues which are still a weakness of ELD. extent there is now resistance against “non-relational” models like Linked Data, where new standards and new skills (like SPARQL) are required.
This includes centres of excellence
Turning web of documents into a Web of Data (or: web of things in the world, described by data on the Web)
Create links between data from different sources.
Traditionally, data published on the Web has been made available as raw dumps in formats such as CSV or XML, or marked up as HTML tables, sacrificing much of its structure and semantics
In the conventional hypertext Web, the nature of the relationships between two linked documents is implicit
Huge decentralized knowledge base of machine-accessible data
Uniquely identifying web objects (documents, images, named-entities, facts, …)
Enabling the discovery & interlinking of web objects through semantic metadata
Open access to data
RDF (subject-predicate-object triples)
Subjects and object of a triple are URIs that each identify a resource (object may also be a string literal)
Predicate specifies how the subject and object are related, also represented by a URI
Still good practice to use established terminology
Term mapping, ontology alignment
Mixture of using common vocabularies together with data source-specific terms that are connected by mappings as deemed necessary
→ Parallel use of arbitrary (self-describing) vocabularies
→ different URIs for same entities, resolved via sameAs links
DBPedia:
One of the first and most prominent nodes on the LOD cloud
Community effort to extract structured (“infobox”) information from Wikipedia
provide SPARQL endpoint to the dataset
Interlink the Dbpedia dataset with other datasets on the web
Certain data sets such as DBPedia or
Geo Names
Database of 10+ Mio geographical names in various languages
serve as linking hubs
Wikidata
Rather than extracting data from Wikipedia, Wikidata is a user curated source for structured information which is included in Wikipedia
Includes more than 38 Billion facts expressed as triples (we’ll get to that) from over 650.000 datasets.
More pragmatic bottom-up approach
Ontologies still important for data integration and particular applications, but the idea of agents using ontologies to reason about the global knowledge autonomously had to be given up for now.
We will discuss all that in more detail, but for now you can think of Linked Data as a lightweight, somewhat pragmatic, bottom-up approach as opposed to the grand vision that the Semantic Web is or was.
Basic recipe for publishing and connecting data using the infrastructure of the Web while adhering to its architecture and standards.
Semantic Web Stack defined as a layer model by the W3C
Every layer can access the functionality of the layers below
Every layer extends the functionality
Semantic web: one of the reasons the semantic web didn’t come to be as envisioned was that no global schema exists and people cannot be expected to agree on terms completely
Linked Data Paradigm:
- You don’t need to agree on all the terms
- 'term cherry-picking' approach
- bottom-up