The Web of Linked Open Data, or LOD, is the most relevant achievement of the Semantic Web. Initially proposed by Tim Berners-Lee in a seminal paper published in Scientific American in 2001, the Semantic Web envisions a web where software agents can interact with large volumes of structured, easy to process data. It is now when users have at our disposal the first, mature results of this vision. Among them, and probably the most significant ones, are the different LOD initiatives and projects that publish open data in standard formats like RDF.
This presentation provides an overview and comparison of different LOD initiatives in the area of patent information, and analyses potential opportunities for building new information services based on largely available datasets of patent information. Information is based on different interviews conducted with innovation agents and on the analysis of professional bibliography and current implementations.
LOD opportunities are not only restricted to information aggregators, but also to end-users and innovation agents that need to face with the difficulties of dealing with large amounts of data. In both cases, the opportunities offered by LOD need to be assessed, as LOD has just become a standard, universal method to distribute, share and access data.
"Big data" is a broad term that encompasses a wide range of data and contents. Big data offers new approaches to analysis and decision making. At first glance big data and IP may seem to be opposites, but have more in common than one may think. This talk will focus on how big data will impact, and be impacted, by IP. One of the biggest promises in big data is the possibility to re-use data produced via different sources, create new services or predict the future, via the analysis of correlations. In this context, how can companies protect information assets and analytical skills? What are the new skills required to search and analyze in real time a big amount of datasets ? Big data will change not only patents information, but will also generate new types of patents.
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Connected Data World
As one of the largest financial institutions worldwide, JP Morgan is reliant on data to drive its day-to-day operations, against an ever evolving regulatory regime. Our global data landscape possesses particular challenges of effectively maintaining data governance and metadata management.
The Data strategy at JP Morgan intends to:
a) generate business value
b) adhere to regulatory & compliance requirements
c) reduce barriers to access
d) democratize access to data
In this talk, we show how JP Morgan leverages semantic technologies to drive the implementation of our data strategy. We demonstrate how we exploit knowledge graph capabilities to answer:
1) What Data do I need?
2) What Data do we have?
3) Where does my Data come from?
4) Where should my Data come from?
5) What Data should be shared most?
"Big data" is a broad term that encompasses a wide range of data and contents. Big data offers new approaches to analysis and decision making. At first glance big data and IP may seem to be opposites, but have more in common than one may think. This talk will focus on how big data will impact, and be impacted, by IP. One of the biggest promises in big data is the possibility to re-use data produced via different sources, create new services or predict the future, via the analysis of correlations. In this context, how can companies protect information assets and analytical skills? What are the new skills required to search and analyze in real time a big amount of datasets ? Big data will change not only patents information, but will also generate new types of patents.
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Connected Data World
As one of the largest financial institutions worldwide, JP Morgan is reliant on data to drive its day-to-day operations, against an ever evolving regulatory regime. Our global data landscape possesses particular challenges of effectively maintaining data governance and metadata management.
The Data strategy at JP Morgan intends to:
a) generate business value
b) adhere to regulatory & compliance requirements
c) reduce barriers to access
d) democratize access to data
In this talk, we show how JP Morgan leverages semantic technologies to drive the implementation of our data strategy. We demonstrate how we exploit knowledge graph capabilities to answer:
1) What Data do I need?
2) What Data do we have?
3) Where does my Data come from?
4) Where should my Data come from?
5) What Data should be shared most?
Linked data for Enterprise Data IntegrationSören Auer
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Denodo
Watch Alberto's session from Fast Data Strategy on-demand here: https://buff.ly/2wByS41
Gartner’s recently published report “Data Catalogs Are the New Black in Data Management Analytics” emphasizes the importance of data catalogs.
Watch this session to learn more about:
• The vision behind the Denodo Data Catalog
• How to maximize information value with the Denodo Data Catalog
• Why it is essential to combine data delivery with a data catalog
Knowledge graphs - it’s what all businesses now are on the lookout for. But what exactly is a knowledge graph and, more importantly, how do you get one? Do you get it as an out-of-the-box solution or do you have to build it (or have someone else build it for you)? With the help of our knowledge graph technology experts, we have created a step-by-step list of how to build a knowledge graph. It will properly expose and enforce the semantics of the semantic data model via inference, consistency checking and validation and thus offer organizations many more opportunities to transform and interlink data into coherent knowledge.
This presentation, hold during Semantcs conference, introduce Ontos' current achievement towards a Streaming-based Text Mining solution by using Deep Learning and Semantic Web technologies.
Triplestores and inference, applications in Finance, text-mining. Projects and solutions for financial media and publishers.
Keystone Industrial Panel, ISWC 2014, Riva del Garda, 18 Oct 2014.
Thanks to Atanas Kiryakov for this presentation, I just cut it to size.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
This presentation will discuss how just a few parts of the Semantic Web Cake can already boost your analytics by making your (big) data smarter and even more connected.
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Connected Data World
Borislav Popov's slides from his lightning talk at Connected Data London. Borislav - a Director of Business Development at Ontotext presented Ontotext's approach to tackling the Panama Papers leak. Using a technology that is a mix between semantic web and graph databases.
SKOS - 2007 Open Forum on Metadata Registries - NYCjonphipps
An brief introduction to SKOS (Simple Knowledge Organization Systems) and its usage in the NSDL Metadata Registry, with some discussion of current challenges.
Linked data for Enterprise Data IntegrationSören Auer
The Web evolves into a Web of Data. In parallel Intranets of large companies will evolve into Data Intranets based on the Linked Data principles. Linked Data has the potential to complement the SOA paradigm with a light-weight, adaptive data integration approach.
Data Catalog in Denodo Platform 7.0: Creating a Data Marketplace with Data Vi...Denodo
Watch Alberto's session from Fast Data Strategy on-demand here: https://buff.ly/2wByS41
Gartner’s recently published report “Data Catalogs Are the New Black in Data Management Analytics” emphasizes the importance of data catalogs.
Watch this session to learn more about:
• The vision behind the Denodo Data Catalog
• How to maximize information value with the Denodo Data Catalog
• Why it is essential to combine data delivery with a data catalog
Knowledge graphs - it’s what all businesses now are on the lookout for. But what exactly is a knowledge graph and, more importantly, how do you get one? Do you get it as an out-of-the-box solution or do you have to build it (or have someone else build it for you)? With the help of our knowledge graph technology experts, we have created a step-by-step list of how to build a knowledge graph. It will properly expose and enforce the semantics of the semantic data model via inference, consistency checking and validation and thus offer organizations many more opportunities to transform and interlink data into coherent knowledge.
This presentation, hold during Semantcs conference, introduce Ontos' current achievement towards a Streaming-based Text Mining solution by using Deep Learning and Semantic Web technologies.
Triplestores and inference, applications in Finance, text-mining. Projects and solutions for financial media and publishers.
Keystone Industrial Panel, ISWC 2014, Riva del Garda, 18 Oct 2014.
Thanks to Atanas Kiryakov for this presentation, I just cut it to size.
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
This presentation will discuss how just a few parts of the Semantic Web Cake can already boost your analytics by making your (big) data smarter and even more connected.
Powerful Information Discovery with Big Knowledge Graphs –The Offshore Leaks ...Connected Data World
Borislav Popov's slides from his lightning talk at Connected Data London. Borislav - a Director of Business Development at Ontotext presented Ontotext's approach to tackling the Panama Papers leak. Using a technology that is a mix between semantic web and graph databases.
SKOS - 2007 Open Forum on Metadata Registries - NYCjonphipps
An brief introduction to SKOS (Simple Knowledge Organization Systems) and its usage in the NSDL Metadata Registry, with some discussion of current challenges.
Nelson Piedra , Janneth Chicaiza
and Jorge López, Universidad Técnica Particular de Loja, Edmundo
Tovar, Universidad Politécnica de Madrid,
and Oscar Martínez, Universitas
Miguel Hernández
Explore the advantages of using linked data with OERs.
morning session talk at the second Keystone Training School "Keyword search in Big Linked Data" held in Santiago de Compostela.
https://eventos.citius.usc.es/keystone.school/
Expressing Concept Schemes & Competency Frameworks in CTDLCredential Engine
This presentation is focused on how the Credential Engine can access 3rd party resource data stores and recipes for mapping and publishing competency frameworks as Linked Data.
Decentralised identifiers and knowledge graphs vty
Building an Operating System for Open Science: data integration challenges, Dataverse data repository and knowledge graphs. Lecture by Slava Tykhonov, DANS-KNAW, for the Journées Scientifiques de Rochebrune 2023 (JSR'23).
Ontologies, controlled vocabularies and Dataversevty
Presentation on Semantic Web technologies for Dataverse Metadata Working Group running by Institute for Quantitative Social Science (IQSS) of Harvard University.
3.Implementation with NOSQL databases Document Databases (Mongodb).pptxRushikeshChikane2
this Chapter gives information about Document Based Database and Graph based Database. It gives their basic structures, Features,applications ,Limitations and use cases
Presentación del Dr. Getaneh Alemu (Solent University, Reino Unido), en el II Congreso de Información, Comunicación e Investigación (CICI 2018) “Metadatos y Organización de la Información”. Facultad de Filosofía y Letras de la Universidad Autónoma de Chihuahua, México. Evento organizado por el Cuerpo Académico 'Estudios de la Información' y el Grupo Disciplinar ‘Información, Lenguaje, Comunicación y Desarrollo Sostenible’. 29 de octubre de 2018.
Web of Data as a Solution for Interoperability. Case StudiesSabin Buraga
The paper draws several considerations regarding the use of Web of Data (Semantic Web) technologies – such as metadata vocabularies and ontological constructs – to increase the degree of interoperability within distributed systems. A number of case studies are presenting to express the knowledge in a
platform- and programming language-independent manner.
Similar to Linked Open Data in the World of Patents (20)
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
Knowledge Graphs are an increasingly relevant approach to store detailed knowledge in many domains. Recent advances in NLP allow to enrich Knowledge Graphs through automated analysis of large volumes of literature, reducing a lot the efforts in traditional manual information capturing. In our presentation we report the approach taken in a project with partner Fraunhofer SCAI in the life sciences where a knowledge graph organising detailed facts about psychiatric diseases has been computed.
Information of cause-effect relations between proteins, genes, drugs and diseases has been encoded in the BEL (Biological Expression Language) and imported into a Graph database to approach an indication-wide Knowledge Graph for the selected therapeutic area. Ultimately, updating the graph will amount to just rerunning the analysis on the newly published literature.
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
In 2019 the UK was the first major economy to embrace a legal obligation to achieve net zero carbon emissions by 2050. More broadly, the 2021 UK Innovation Strategy sets out the UK government’s vision to make the UK a global hub for innovation by 2035 with a target of increasing public and private sector R&D expenditure to 2.4% of GDP to support the UK being a science superpower with a world-class research and innovation system.
IP rights create an incentive for R&D which ultimately leads to innovation. Analysis and insights from IP data can therefore help provide a better understanding of how the IP system is being used and where and what innovation is taking place. Research and analysis of IP data is a key input to the ongoing work of the UKIPO’s Green Tech Working Group which seeks to:
further the UK’s status as a global leader by making the UK’s IP environment the best for innovating green technology;
develop and deliver IP policies to support government’s ambition on climate change and green technologies; and
to help innovators best protect and commercialise their green tech innovations both at home and internationally.
The UKIPO has been developing a broad portfolio of ‘green’ IP analytics research. A series of patent analytics reports have been published looking at green technologies, and analysis of how the UK’s Green Channel scheme for accelerated processing of green patent applications has been conducted. Patents have been used to identify technological comparative advantage within different green technologies at a country level, and new insights uncovered by mapping green technology patents to the UN Sustainable Development Goals (SDGs). Trade mark data provides a timeliness and closeness to market factor that patent data does not, and complementary trade mark analysis of UK ‘green’ trade marks, identified using a machine learning algorithm, provides a commercialisation angle to our research.
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
In the patent domain, all types of issues, from very specific search requirements to the linguistic characteristics of the text domain, are accentuated. Consequently, to develop patent text mining tools for scientists and patent experts, we need to understand their daily work tasks, as well as the linguistic character of the text genre (i.e., patentese). Patent text is a mixture of legal and domain-specific terms. In processing technical English texts, a multi-word unit method is often deployed as a word-formation strategy to expand the working vocabulary, i.e., introducing a new concept without the invention of an entirely new word. This productive word formation is a well-known challenge for traditional natural language processing tools utilizing supervised machine learning algorithms due to limited domain-specific training data. Deep learning technologies have been introduced to overcome the reduction in performance of traditional NLP tools. In the Artificial Researcher technologies, we have integrated explicit and implicit linguistic knowledge into the deep learning algorithms, essential for domain-specific text mining tools. In this talk, we will present a step-by-step process of how we have developed the mentioned text mining tools. For the final outline, we will also demonstrate how these tools can be integrated in a cross-genre passage retrieval system, based on a technology from 2016 that still holds the state-of-the-art within the patent text mining research community in 2022.
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
In 2013 we witnessed an evolutionary change in the NLP field evolved thanks to the introduction of space embeddings that, with the use of deep learning architectures, achieved human-level performances in many NLP tasks. With the introduction of the Attention mechanism in 2017 the results were further improved and, as result, embeddings are quickly becoming the de facto standards in solving many NLP problems. In this presentation, you will learn how generate and use space embedding for search purposes and provide comparison metrics to more traditional relevance-based search engines. Moreover, I will provide some initial results on a paper currently under review that provides an insight on hyperparameter tuning during the generation of embeddings.
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
10 years in the making. How real-world business cases have driven the development of CCC's deep search solutions, leading to the capabilities for web-crawling and delivery of targeted intelligence that helps R&D; intensive companies gain a competitive advantage.
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
It is relatively easy for a human to read a document and quickly figure out which concepts are important. However, this task is a difficult challenge for a machine. During the past few decades, there have been two main approaches for concept identification: Natural Language Processing and Machine Learning. During the early part of this century, Machine Learning made great strides as new techniques came into wider use (SVM’s, Topic Modeling, etc..). Sensing the competition, Natural Language Processing responded with deployment of new emerging techniques (sematic networks, finite state automata, etc..). Neither approach has completely solved the WHAT problem. Advances in Artificial Intelligence have the potential to significantly improve the situation. Where AI is making the most impact is as an enhancement to make Machine Learning and Natural Language Processing work better and, more importantly, work together. This presentation looks at some of this history and what might happen in the future when we blend the interpretation of language with pattern prediction.
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
Trademarks serve as key leading indicators for innovation and economic growth. As the vanguards of new and expanding enterprises, trademarks can be used to study entrepreneurship and shifting market demands in response to varying economic factors. This responsiveness has been seen as recently as the COVID-19 pandemic, where trademark research revealed key insights about business reaction to the global upheaval.
At CIPO, we have been delving more deeply than ever before into trademark analysis by leveraging cutting-edge natural language processing (NLP) tools to derive actionable business intelligence from trademark data. In this presentation, we present a survey of NLP in use at CIPO and the insights we have learned applying them. These insights include COVID-19 responses, line-of-business trends based on firm characteristics, and more.
We also discuss ongoing and future trademark research projects at CIPO. These projects include emerging technology detection methods and high-resolution trademark classification systems. We conclude that artificial intelligence-enhanced tools like NLP are key components of future exploitation of trademark data for business and economic intelligence.
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
In our customer projects involving automated document processing, we often encounter document types providing crucial data in the form of tables. While established text analytics algorithms are usually optimized to operate on running text, they tend to produce rather poor results on tables as they do not capture the non-sequential relations inside them (e.g. interpret the content of a table cell relative to its column title, interpret line breaks inside a cell differently from line breaks between cells or rows). While there are elaborate information extraction products in the market for a few highly specific types of tabular documents, there is no general approach out there. The main cause for this is the fact that table structures can be encoded by a heterogenous range of layout means (e.g. column boundaries can be signaled by lines vs. aligned text vs. white space). In this talk, we will illustrate several solutions that we have developed for a range of challenges occurring in this context, both for scanned and digitally generated documents.
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is published usually unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or data mining unnecessarily difficult or even impossible. A concept is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently.
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
How do you find video when you only have sparse data? While you can wander the stacks (if you can still find open stacks) for inspiration, video either physical or digital, is difficult to discover. Wandering the virtual stacks is, well, virtually impossible. Discovery platforms on the whole have not replicated the inspirational experience of wandering the stacks.
More companies are using archivable video for internal communication of the various research projects, product developments, test results, and more that are being considered, in progress, or completed. Showing how an experiment was conducted can convey considerably more information that is very difficult to communicate via text. How do you find a company video that might be helpful for your project?
A case study is presented of the problems and the solutions that were implemented by a large, multinational chemical company. A suite of content discovery technologies was used including a video to text to tagging system connected to their documents database and automatically indexed using several chemical as well as conceptual systems (rule-based, NLP, inference engine). To build the system and support the manuscript and video submission there is a metadata extraction program which pulls and inserts the metadata into the submission forms so the author can move quickly through that process.
Copyright Clearance Center
A pioneer in voluntary collective licensing, CCC (Copyright Clearance Center) helps organizations integrate, access, and share information through licensing, content, software, and professional services. With expertise in copyright and information management, CCC and its subsidiary RightsDirect collaborate with stakeholders to design and deliver innovative information solutions that power decision-making by helping people integrate and navigate data sources and content assets. CCC recently acquired the assets and technology of Deep SEARCH 9 (DS9), a knowledge management platform that leverages machine learning to help customers perform semantic search, tag content, and discover new insights.
Lighthouse IP is the world’s leading provider of intellectual property content. The core business of Lighthouse IP is sourcing and creating content from the world’s most challenging authorities. Specialized in IP data, Lighthouse IP provides over 160 countries coverage for patents, over 200 authorities for trademarks and over 90 authorities for designs. Lighthouse IP data is available via several partners. The company is headquartered in Schiphol-Rijk in the Netherlands and has offices in the United States, China, Thailand, Vietnam, Egypt, Indonesia and Belarus. Globally a team of 150 experts works on the creation of this unique data collection.
CENTREDOC was created in 1964 as the technical information center of the swiss watchmaking industry. Building on a strong team of engineers, CENTREDOC now offers a complete range of services and solutions for the monitoring of strategic, technological and competitive information. CENTREDOC is also a leader in the research of patent, technical and business intelligence, and offers consulting expertise in the implementation of monitoring solutions.
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
The everyday use of AI-driven algorithms for data search, analysis and synthesis comes with important time savings, but also reveals the need to understand and accept the limitations of the technology. Practical deployments on concrete topics are inevitable to assess and manage the challenges of neuronal network based AI. A workshop report.
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
What if there was a platform where literature, conference abstracts, patents, clinical trials, news, grants and other sources were fully integrated? What if the data would be harmonized, enriched with standardized concepts and ready for analysis? After building our patent analytics platform we didn’t stop dreaming and built our big data analytics platform by semantically integrating text-rich, scientific sources. In my presentation I will talk about what we built and why we built it. And, of course, I will also address the challenges and hurdles along the way. Was it worth it and what comes next? Let’s talk about it!
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
1. Patents, Semantics and
Open Innovation
The role of LOD in a business directory for
knowledge intensive industries
Nice, 20-OCT-2015
Ricardo Eito-Brun
reito@uc3m.es
2. Patents, Semantic and Open
Innovation. LOD
Reasons leading to this research:
◦ Semantic Web technologies and applications, in particular LOD
publishing, constitutes a preliminary step to achieve Information
Systems interoperability.
◦ Having access to distributed data, hosted by different agents and
repositories, open new possibilities to research in multiple areas.
◦ In the particular case of patent information:
Which are the possibilities we have if we are able of aggregating /
analysing these data together with other information data set?
Is it feasible to implement improvements in the way we are
accessing patent information?
Can we figure out innovative user interfaces to browse and search
patent collections?
3. Patents, Semantic and Open
Innovation. LOD
Schema:
◦ The LOD promises. Potential benefits, technologies and
standards for data encoding and interoperability.
◦ LOD in the world of patents. A review of major milestones.
◦ Overview of current research: what researchers are doing.
◦ Case Study: the particular case of Web-based Innovation
Platforms and digital repositories.
4. Linked Open Data:
benefits, technologies and standards
LOD has become the main application of the SMW approach.
Semantic Web (SMW)
◦ Proposed by Tim Berners-Lee “The Semantic Web: A New Form
of Web Content That Is Meaninguful to Computers Will
Unleash a Revolution of New Possibilities”, Scientific American
2001.
◦ SMW is about having a more intelligent web, made up of
documents that could be easily processed by computers and
software agents with no human intervention.
◦ SMW data should be “exposed” or published in a machine-
readable format.
◦ Computes should be able of understanding the meaning of data.
5. Linked Open Data:
benefits, technologies and standards
W3C SMW Activity:
◦ “The Semantic Web provides a common framework that
allows data to be shared and reused across application,
enterprise, and community boundaries. […]
◦ The Semantic Web is about two things.
about common formats for integration and combination of
data drawn from diverse sources…
about language for recording how the data relates to real
world objects. “*
*http://www.w3.org/2001/sw/
6. Linked Open Data:
benefits, technologies and standards
SMW is presented as an extension of the traditional web.
If Content in the traditional web is published for humans, Content in
the SMW is published for software programs which could interpret
them and obtain new data, information and knowledge.
Inicial SMW initiatives were focused on improving software agents’
capabilities to solve information management problems:
◦ "Mom needs to see a specialist and then has to have a series of physical
therapy sessions. Biweekly or something. Lucy instructed her Semantic
Web agent through her handheld Web browser. The agent promptly
retrieved information about Mom's prescribed treatment from the
doctor's agent, looked up several lists of providers, and checked for the
ones in-plan for Mom's insurance within a 20-mile radius of her home…”
*http://www.w3.org/2001/sw/
7. Linked Open Data:
benefits, technologies and standards
SMW pillars:
◦ URIs or IRIs to provide unique identifiers to resources.
◦ XML to encode and transfer information.
◦ RDF as an XML-based vocabulary to encode metadata describing
resorces.
◦ RDF-S as a means to structure the metadata about resources
(What can be asserted for a resource of a specific type).
◦ OWL as an alternative to RDF/RDF-S with additional capablitities
to express constraints on data.
◦ Additional languages to express the rules that govern reasoining
on SMW data.
*http://www.w3.org/2001/sw/
8. Linked Open Data:
benefits, technologies and standards
RDF proposes a vocabulary (set of tags) to express metadata about
any type of resource.
RDF data can be expressed in XML or in other alternative formats.
An RDF file usually encloses metadata about a specific resource,
e.g.: person, document, institution, company, event…
Resources are identified by unique identifiers (URIs).
◦ URIs are used to ensure that metadata about the same entity are grouped
together.
◦ In case different applications use different identifiers for the same entity, it
is possible to keep the equivalences between the different identifiers.
*http://www.w3.org/2001/sw/
9. Linked Open Data:
benefits, technologies and standards
Unique identifier for the
resource, expressed as
an URI.
Equivalences with other
identifiers proposed in
other contexts
(owl:sameAs)
Metadata about the
resource, with clearly
defined meaning.
Resources are given a
type (rdf:type)
10. Linked Open Data:
benefits, technologies and standards
RDF records about resources can be linked or related.
The value of a specific metadata or property may refer to the
Identifier of another resource.
This allows having sets of structured, linked data.
For example:
◦ Subjects or Topics in a classification code may have unique Ids.
◦ The “Subject” metadata field in a document will take as a value the ID of the
referred topic.
◦ Personal or corporate authors may have unique Ids.
◦ The “Author” metadata field in a document will take as a value the ID of its
personal/corporate author.
*http://www.w3.org/2001/sw/
11. Linked Open Data:
benefits, technologies and standards
dc:language refers to
the ID of the English
language.
Dc:subject refers to
the ID of a
classification code
taken from the DDC
system.
Dc:subject also refers
to the ID of a topic
taken from the LCSH
system.
12. Linked Open Data:
benefits, technologies and standards
SMW standards and languages are not limited to RDF.
◦ RDF-S provides a way to define “schemas” for metadata,
in other words, what properties/metadata can we use to
describe entities of a specific type.
◦ SKOS provies a way to encode “subject headings”,
“thesauri” or “classification schemas” used to indicate the
topics the documents are about.
◦ Specific vocabularies to indicate which properties are
available to provide metadata on resources: e.g. Dublin
Core.
.
13. Linked Open Data:
benefits, technologies and standards
Properties starting
with dc: and dct: are
taken from the Dublin
Core vocabulary, that
provides a set of
metadata.
Dc:subject points to
LCSH/DDC topics that
are expressed – in
some other place in
the web – using
SKOS.
A separate RDF-S
document states
which properties can
be used when
providing metadata
about resources of
type « Book »
14. Linked Open Data:
benefits, technologies and standards
RDF statements build a “graphs” of resources, properties
and values.
As the number of metadata collected about the different entities
grows, the graph is expanded.
RDF model to represent information allows browsing and discovering
mechanisms that go beyond traditional search/browse capabilites.
15. SKOS:
conceptual structures for the SMW
Another vocabulary closely related to the SMW and LOD.
Used to:
◦ Encode “subject headings” or “classification schemas” in XML
format.
◦ Encode relationships between these conceptual structures (e.g.
equivalences between classes of different classification schemas)
◦ Provide list of topics to which document descriptions can be
linked.
Concepts within a SKOS-encoded schema are related to each other
by relationships like <broader> , <narrower> or <related>.
Labels can be given to concepts (linguistic equivalences, authorized,
not authorized, deprecated…).
Concepts can also be annotated.
16. SKOS:
conceptual structures for the SMW
Each concept has a
separate skos:Concept
element, identified by
an URI.
Skos:related points to
other concepts with a
related meaning.
Skos:prefLabel and
altLabel provides
linguistic labels to the
concept.
Skm:UF points to
deprecated concepts.
17. SKOS:
conceptual structures for the SMW
SKOS has become one of the key points in SMW initiatives.
Organizations usually start putting their controlled vocabularies /
classification schemes online using SKOS.
Then, “bibliographic” descriptions are linked to SKOS topics as a
second stage.
This provides an initial pair of linked data sets.
But SKOS becomes powerful if we can take advantage of the
capability of expressing relationships between different classification
schemas.
This gives the opportunity of cross-searching different repositories.
18. Semantic Web: standards
SPARQL
SPARQL is a W3C standard that defines a query language to search
for information within RDF graphs.
SPARQL is for the SMW, the equivalent to the SQL language used
for relational databases like Oracle, MySQL, Postgresql…
Collections of RDF documents within a repository can be searched
using “SPARQL end points”.
SPARQL end points are aimed to software agents and software
applications.
Queries are constructed dinamically by software agents, and results
are returned on XML for further processing.
20. Semantic Web:
technologies…
Technologies and tools used to deal with SMW standards and
concepts can be classified in these groups:
◦ Editors, to help define RDF-S schemas.
◦ Conversion tools, to move existing data into the RDF format.
◦ RDF repositories or “triple stores”, to support:
The storage of large data sets
Bulk downloads,
Human browsing
SPARQL searching.
◦ Specific tools to manage controlled vocabularies and generate
SKOS representations.
21. Linked Open Data (LOD)
SMW at work…
“Linked Data is simply about using the Web to create typed links
between data from different sources. These may be databases
maintained by two organisations, or heterogeneous systems within
one organisation that, historically, have not easily interoperated at the
data level.”
“… Linked Data refers to data published on the Web in such a way
that it is machine-readable, its meaning is explicitly defined, it is
linked to other external data sets, and can in turn be linked to from
external data sets.”
Tim Berners-Lee (2006)
22. Linked Open Data (LOD)
SMW at work…
Linked Data Principles:
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names
3. When someone looks up a URI, provide useful information, using
the standards (RDF, SPARQL)
4. Include links to other URIs, so that they can discover more
things.
These principles are “rules” or “recommendations” on how to publish
LOD data on the web.
23. Linked Open Data (LOD)
SMW at work…
There is a graphical display created by Richard Cyganiak and Anja
Jentzsch showing published data sets, http://lod-cloud.net/
24. Linked Open Data (LOD)
SMW at work…
Conditions to be included in this catalog:
◦ Data available via URIs through http or https.
◦ Data published in RDF format (any serialization method: RDFa, RDF/XML, Turtle,
N-Triples).
◦ Dataset must have at least 1000 RDF statements.
◦ Dataset must contain links to at least one of the datasets in the diagram (at least
50 links).
It is also possible to get data about the number of published LOD on:
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
http://datahub.io is another site where you can identify datasets.
It contains 31 datasets related to patents, including EPO, USPTO,
KIPO, UK Patent…
25. Linked Open Data (LOD)
SMW at work…
Conditions to be included in this catalog:
◦ Data available via URIs through http or https.
◦ Data published in RDF (serialization method: RDFa, RDF/XML, Turtle, N-Triples).
◦ Dataset must have at least 1000 RDF statements.
◦ Dataset must contain links to at least one of the datasets in the diagram (at least
50 links).
◦ Dataset available via dump OR through SPARQL endpoint.
It is also possible to get data about the number of published LOD on:
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/
http://datahub.io is another site where you can identify datasets.
It contains 31 datasets related to patents, including EPO, USPTO,
KIPO, UK Patent…
26. Linked Open Data (LOD)
SMW at work…
To check the use of specific vocabularies/metadata, Open
Knowledge Foundation hosts the LOV (Linked Open Vocabularies)
site since 2012:
http://lov.okfn.org/dataset/lov/
Additional LOD-related tools (search engines) include:
◦ http://watson.kmi.open.ac.uk/WatsonWUI/
◦ http://swoogle.umbc.edu/
◦ http://ws.nju.edu.cn/falcons/ontologysearch/index.jsp?query
27. LOD and Patents
Early, Academic Initiatives
NSF (SciSIP) – Science of Science and Innovation Policy
Discussion on 2011 “Patent Data Workshop” to support quantitaive
studies on innovation.
◦ Remarked the effort o USPTO Patent Dashboard and Data Visualization
Center to study causes of innovation and outcomes of programs to
stimulate innovation.
SciWire platform* that ingest and links metadata about patents,
grants, to explore R&D landscape.
AKSW (Agile Knowledge Engineering and Semantic Web) to
publish US Patents:
◦ SPARQL End Point: http://us.patents.aksw.org/sparql
◦ June 2014, 187 million triples.
◦ Proposed RDF schema with basic data based on dc, foaf and its own
schema.
*Haak, Laurel; Baker, David; Probus, Matt: Creating a Data Infrastructure for Tracking
Knowledge Flow. 2012
28. LOD and Patents
Academic Initiatives
Subramanian, S. (2013)*, analyzed USPTO patent conversion into
RDF format, and its merge with dbpedia data to provide
“consolidated / merged search results”. (enrich patent data).
Singhi M., Ding Y.***, also merged USPTO patent data from the
SDB (Scholarly Database at Indiana Univ.) with dbpedia entries for
locations in a common database.
SDB database includes 26 millions records (4 millions US patents
in the 1976-2010 period) including MEDLINE and NSF documents
about research grants.
SDB uses patent information as part of its R&D analysis with the
Sci2 bibliometric tool.
* Subramanian, S.; Dhilpe, S. Yalamanchi, U. Exploiting Linked Data and Big Data for Semantic Patent Discovery.
COEN 296, Aug. 2013.
**Singhi, M., Ding, Y. Linking US Patent Data with Wikipedia.
*** Bäumer et al., Linked Open Data for Scientific Data Sets. KONVENS, 2014. Heldesheim, Germany.
29. LOD and Patents
Academic Initiatives
Heinz Nixdorf Institute (Paderborn University) and KISTI (Korean
Institute of Science and Technology)* :
◦ Make scientific data available through RDF.
◦ Use of W3C D2R and D2RQ server/converter and RelFinder
visualization tool. Pilot project with 60 researchers and 400 related
publications..
Zaveri, A. et al., (2012)** describe its conversion of the USPTO
patents into RDF.
◦ The USPTO patents full-text data is available for download in XML format from
the years 2002 onwards.
◦ From the years 1976 to 2001, data are available in plain-text.
◦ Each week USPTO releases a zipped file of all patents accepted in that week.
◦ Each year ca. 52 files are published each one containing about 5000 patents.
◦ They developed an “ontology” or schema to encode patent information.
* Bäumer et al., Linked Open Data for Scientific Data Sets. KONVENS, 2014. Heldesheim, Germany.
** Zaveri et al. (2012). Publishing and Interlinking the USPTO Patent Data. Semantic Web Journal. 24/09/2014.
30. LOD and Patents
Academic Initiatives
Dongmin Seo et al. (2011) designed InSciTe, a technology
opportunity discovery (TOD) service to support decision-making on
R&D planning.
It used RDF data (including patent information) to analyze and
visualize relations between technologies and agents.
◦ Trends and predictions,
◦ Relationship,
◦ Roadmaps,
◦ Competitors and collaborators.
Data set included 3,100,000 patents from US, Europe and Japan.
31. LOD and Patents
Academic Initiatives
Zaveri et al. (2011) described an interesting study on the use of
linked open data to assess the impact of research on the
biomedical area.
They analyzed data for 20 European countries over a 10 years
period (1999 to 2009)..
The data set included data from Eurostat and World Bank LOD
datasets.
Input data included the number of Biotechnology patent
applications submitted to EPO.
Zaveri et al. (2013). Using Linked Data to evaluate the impact of Research and
Development in Europe: a Structural Equation Model. LNCS 8219, pp 244-259
32. LOD and Patents
Official Initiatives
USPTO,
◦ In 2014 developed a 18 month roadmap for open data initiatives.
◦ On April 2015, it published the “Report of Findings from an Open Data
Roundtable with the U.S. Patent & Trademark Office”.
◦ No specific reference to RDF or “linked open data”
◦ Bulk download through http://patents.reedtech.com/, Google Patents.
◦ Plans and achievements:
PatentsView prototype for patent visualization (5 million U.S. patents).
Electronic Data Hosting. Repository of public bulk patent and
trademark data
Assignment Search. Searchable database containing all recorded
Patent Assignment information dating back to August 1980.
33. LOD and Patents
Official Initiatives
EPO (European Patent Office),
◦ OPS (Open Patent Service)** provides long time ago XML patent data
via REST-based web services.
◦ Queries built on the CQL query language.
◦ Data coming from EPODOC, EPOQUE (full-text) and BNS (image)
databases (same soures and coverage for bibliographic data as
Espacenet)
◦ Bibliographic data, legal status, facsimile images, CPC classification,
character-coded full text, register and family.
◦ Well-documented query interface for developers.
◦ For large datasets, possibility of bulk download is available.
Kallas, P. (2006). Open Patent Service. World Patent Information
Volume 28, Issue 4, December 2006, Pages 296–304
34. LOD and Patents
Official Initiatives
EPO (European Patent Office),.
◦ In the LOD specific context, http://epo.publicdata.eu/ dataset with
around 22 million triples.
◦ Based on an OWL encoded schema/ontology.
◦ Current pilot based on the conversion of 100000 EP applications and
the CPC hierarchy (250000 technical classification symbols) into RDF
triples.
◦ A LOD user interface provided to view the data in a user friendly way
without programming, plus SPARQL endpoint and bulk downloads.
◦ Export to JSON, text, XML or Turtle
◦ Links to technical terms extracted from the abstracts are linked to
DBpedia
◦ States (geographical units) from addresses are mapped to
nuts.geovocab.org, and language codes to the Library of Congress
**Information provided by with Martin Kraker (EPO)
35. LOD and Patents
Official Initiatives
Intellectual Property Government Open Data, IPGOD (Australia).
◦ Announced in October 2014, (IPGOD).
◦ Available via the Australian government's data portal at data.gov.au,
◦ It covers more than a century of Australian patent, trade mark, and
design data
◦ Information on the application process for each right.
◦ They have also created a unique set of identifiers to link the data to
external information on companies.
◦ IPGOD includes the PATSTAT "application identifier", so its data can be
linked direct to PATSTAT.
◦ Harmonised names of rights holders in Australia.
◦ Available through: https://data.gov.au/organization/ip-australia
◦ Detailed, well-documented data model**.
“Linked and open patent data: Australia and Korea moving forward” Patent Information News. EPO. Issue 1/2015.
**http://www.ipaustralia.gov.au/uploaded-files/reports/IP_Government_Open_Data_Paper_-_Final.pdf
36. LOD and Patents
Official Initiatives
OEPM (Spain) Open Data.
◦ Part of the datos.gob.es initiative.
◦ http://datos.gob.es/catalogo/catalogo-opendata-de-oficina-
espanola-de-patentes-marcas-oepm
◦ It provides bulk downalod of data in XML (plus PDF) based on
the WIPO (ST36 standard for encoding data in XML)
◦ No SPARQL end point.
37. LOD and Patents
Official Initiatives
KIPO (Korean Intellectual Property Office), as part of IP5 initiative,
started dissemination of patent information in XML on July 2014.
◦ KIPRIS tool, http://plus.kipris.or.kr/eng/main.do
◦ Related to Open Government Data in South Korea.
◦ Patents is one of 16 strategic areas in this program.
◦ What information to share, How to Share, How to support utilization.
◦ How to share: bulk, via API or LOD in XML format (WIPO ST.96 Std.).
◦ How to support utilization: Applicant Name standardization.
◦ IP-Biz Integrated service to connect patent and Business data.
◦ API, Bulk download.
“Linked and open patent data: Australia and Korea moving forward” Patent Information News. EPO. Issue
1/2015.
38. LOD and Patents
A brief summary
Different data sets currently available, but…
◦ Open data (publicly made accessible) does not mean “linked
data”.
◦ XML publishing is not the same as RDF publishing.
◦ Adding links to other entities (companies, people or topics) is a
must-have to talk about “linked” open data.
◦ RDF-S standardization is also an interesting choice.
◦ Publishing data in RDF format is just a first step… to allow target
communities figure out how to use the data.
“Linked and open patent data: Australia and Korea moving forward” Patent Information News. EPO. Issue
1/2015.
39. LOD and Patents
Analysis of potential applications
LOD data are useful when they are linked to other data to enhance
the capability of finding additional information.
Different use cases are being considered to exploit existing patent
data sets, in particular:.
◦ Enhance institutional repositories with patent data (bulk import or
access through APIs).
Prototype under development to integrate UCA repository with OEPM data)
Integration of dspace repository SW to gather patent details.
◦ Integrate patent data into Web-based Innovation Platforms and
Business Directories.
Checking the “innovation capabilities” of agents (persons, entities)
Dynamic building of an “innovation profile” collecting and merging data about
patents, projects and papers.
40. In innovation, linear models have been replaced by
collaborative models
These models are based on feedback and interactions between
different partners.
This evolution has evolved toward the Open Innovation model or
paradigm (Chesbrough 2003).
Today, IM is seen as a:
◦ non-linear,
◦ evolutionary,
◦ interactive process between the company and its environment,
◦ that requires the close collaboration of different agents.
LOD and Patents
Analysis of potential applications
41. Principles of Open Innovation
Valuable ideas may come from both inside and outside the
company
It is the consequence of several factors:
◦ knowledge specialization,
◦ availability of highly skilled workers,
◦ increasing capabilities of suppliers,
◦ and the difficulties of having a complete domain of all the aspects that need to be
mastered in a successful innovation life cycle.
◦ Different “knowledge streams” must be managed: market, scientific and
technical, and social knowledge.
Different agents have a different level of participation in the
generation of the knowledge that provide the inputs to create
innovations
Complex interfaces between them.
LOD and Patents
Analysis of potential applications
42. OI depends on the ability to cooperate with other partners who
are developing innovation.
Agents need to give visibility to their innovation capabilities and
achievements (products or services, skills, etc.) in a global
context.
In some sectors, e.g. aerospace, big companies need to set up
agreements with other companies to ensure “geographical return
on investment”.
Which are the tools we have to give visibility to our company?
◦ Business directories
◦ Collaborative, Innovation platforms.
LOD and Patents
Analysis of potential applications
43. Current company/business directories and databases focus on
“contact details” and “financial data”
They are mostly oriented to assess the “health of the companies
from an economic perspective”.
◦ Do these directories offer the data to support OI planning
activities?
◦ Do they provide data to identify areas of expertise and
previous experience?
◦ How easy is to identify partners for specific projects?
LOD and Patents
Analysis of potential applications
44. Restrictions of current business directories fall in these areas:
◦ They are not exhaustive, and some of them exclude SMEs/VSEs.
◦ They classify companies by large, general areas of activity.
◦ They focus on company financial characteristics: income, sales, audits,
investors, employees….
Missing information:
◦ Product and service descriptions.
◦ Previous experience in projects.
◦ Technical achievements, patents.
◦ Experience and compliance with regulations and standards.
◦ Assessment of intellectual capital (employees with specific profiles),
areas of expertise, etc.
LOD and Patents
Analysis of potential applications
45. And the Collaborative innovation platforms?
Innovation platforms are web-based collective workspaces to
leverage the innovation process.
◦ Registered companies put “challenges”, with a technical description of
the problem to solve.
◦ Participants can propose their solution to the problem.
◦ The company that propose the challenge select the most suitable
solution.
Main constraints:
◦ Partners are identified in the specific context of a problem.
◦ They do not support partners’ assessments, but just the assessment of
the proposed solutions.
◦ Innovation life cycle requires a “long-term partnership”.
LOD and Patents
Analysis of potential applications
46. We can conclude that a different type of “directory” may be
useful to support and foster collaboration and innovation:
◦ With not only data about companies, but individual researchers, university
departments and research groups.
◦ Additional data : work experience, technical achievements (patents,
technical papers, products)
◦ With a high level of specialization to characterize content (areas of expertise
and achievements).
May Web 3.0 technologies be useful?
◦ Business directories are mainly Web 1.0 tools.
◦ Innovation platforms are mainly Web 2.0 tools.
◦ Hypothesis: this is a data integration problem.
LOD and Patents
Analysis of potential applications
47. A preliminary survey let us identify these information items:
◦ Company contact details, including lines of business and activities.
◦ Areas of knowledge/expertise, going further in detail.
◦ Description of the company facilities and resources.
◦ Achievements:
Projects
Products and services.
Technologies
Patents
Papers
◦ Other entities the company has worked with in collaborative projects
◦ References and customers, linked to achievements.
LOD and Patents
Analysis of potential applications
48. These data items contribute to a metadata infrastructure that can
be used in two different ways :
◦ to identify and assess partners in a global context
◦ to assess the relevance of incoming ideas sent in response to
“innovation challenges”.
LOD and Patents
Analysis of potential applications
49. Reusable ontologies:
◦ FOAF, DC, SIOC, SKOS, OBI, VIVO, Organization ontology, Core
Business Vocabulary.
◦ Idea Ontology for Innovation Management (Riedl et al., 2009)
◦ OntoGate (Bullinger, 2008)
Modelling he idea assessment and selection.
◦ GI2MO Ontology (Westerski et al. 2010)
Formalization of data to describe ideas and associated information.
◦ Iteams Ontology (Ning, 2006)
Cover goals, actions, teams, results and community.
◦ Innovation Management Ontology (Elbassiti, 2014)
Our Research
Metadata Infrastructure
50. Target:
◦ Having a prototype of a “Semantic-enabled” repository of agents
(companies, researchers and groups) and related achievements to
demonstrate how these tools can support OI initiatives.
◦ Two lines of work: biomedical engineering and aerospace.
◦ Geographical scope: Spain.
Phases:
◦ Identification of information needs.
◦ Data capture in relational structure (2000 main entities).
◦ Vocabulary selection for data encoding.
◦ Loading data into repository.
◦ Linking data to external sources.
◦ Building user interfaces including dynamic searching of remote sites.
Our Research
Phases
51. Patents contribution:
◦ Patents are part of the entity / person achievements.
◦ Patents provide “linguistic clues” to identify skills, competences and
areas of knowledge and build search/browse systems.
Our Research
Phases