With the ever-increasing volume of data to be searched, the techniques of semantic search will be key to successful prior art searching. These techniques consist of methods for understanding the searcher`s intent, disambiguating the contextual meaning of (search) terms and ultimately improving search accuracy by generating more relevant results. This
presentation explores how far the EPO has come in enabling some of those key elements through projects such as Annotated Patent Literature and Enhanced Ranking. It also introduces even more sophisticated models based on machined-learned algorithms that might help to shape the future of search at the EPO.
Averbis is specialized in the area of text mining and machine-learning-based patent monitoring. We help our clients screen large numbers of patents in no time, estimate their relevancy for the company and automatically classify them into customer-specific categories. Our approach is based on artificial intelligence – with the result that it learns from and imitates the behavior of IP professionals. Compared to conventional rule-based approaches, our approach is up to 400% more accurate and achieves the same accuracy offered by manual monitoring. At the same time, it reduces manual patent monitoring intervention by up to 80%. Thanks to Information Discovery, we enable IP professionals to reduce backlogs, improve staff efficiency and minimize inconsistencies associated with patent monitoring, ultimately improving the experience both for you and your customers.
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
Advances in text mining, analytics and machine learning are transforming our applications and enabling ever more powerful applications, yet most applications and platforms are designed to deal with a single (normalized) language. Hence as our applications and platforms are increasingly required to ingest international content, the challenge becomes to find ways to normalize content to a single language without compromising quality. An extension of this question in terms of such applications is also how we define quality in this context and what, if any, bi-products a localization effort can produce that may enhance the usefulness of the application.
This talk will, using patent searching as an example use case, review the challenges and possible solution approaches for handling localization effectively and will show what current emerging technology offers, what to expect and what not to expect and provide an introductory practical guide to handling localization in the context of data mining and analytics.
II-SV 2017: How to effectively monitor Technological Developments in IPDr. Haxel Consult
Modern, cutting-edge developments are not reflected in current patent classification systems, which tend to catalogue established technologies. Identifying patent portfolios in such emerging fields proves a challenging job for patent and technology experts.
Going beyond the mere identification of new IP, additional value may be added using a regional geographic weighting combined with consolidated portfolio owner information.
Effective monitoring of the technological field is achieved by training active-learning search engines to hunt for highly relevant patent documents, thus keeping IP portfolios for emerging technologies up to date. The system we have developed permits extremely accurate updates with drastically reduced noise and with low workload which have proven to be invaluable in a world of drastically increasing data blur.
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataDr. Haxel Consult
Analysis and visualisation of a “galaxy” of patent data can present challenges in being able to spot the “stars”. Being able to drawn meaningful conclusions from any patent landscape relies on the quality and comprehensiveness of the data input, as well as having features and functionality to visualize the landscape accurately and being able to focus on an area of interest.
Delivered from the perspective of an experienced patent analyst, we will use case studies to discuss the challenges in creating a meaningful patent landscape and the recent innovative features and functionality in PatBase which can help, including:
Using Analytics for customised, multidimensional analysis and to visually compare multiple datasets.
Text-mining to automatically identify and highlight chemical, physical, genetic and medical concepts within any full text patent.
Being able to efficiently identify the exact location of a chemical entity anywhere in the full text.
The ability to easily review and filter patent citations based on their relevance, origin and assignee.
This presentation will demonstrate how any user can benefit from the innovative features and functionality in PatBase to interrogate and visualize the patent landscape for any technical area.
Gridlogics is a leading provider of products and custom software solutions for patent research, management, data analysis and project management. Our products leverage the latest techniques in information retrieval, data mining and visualizations to help clients globally in deriving actionable intelligence from the masses of patent data.
II-PIC 2017: Gain insight into technical, legal and business information thro...Dr. Haxel Consult
Feinäugle Roland (European Patent Office, Austria)
A recent study commissioned by the EPO on the involvement of (patent) information in the innovation process in industry underlines the role of patent information in the various innovation stages. This result gives further impetus to the EPO's patent information strategy, aimed at supporting the economy by providing the access to a wealth of patent-related information, be it from its own patent granting process or its collection of worldwide bibliographic, legal status, procedural and full-text data.
In the presentation we highlight some key findings of the study and give an overview of the EPO’s patent information products and services. We will discuss how they can be used for technology-specific searches, for getting insight into the legal and procedural status of an application during and after its grant procedure, and for the statistical analysis of bulk data for business intelligence purposes. The talk covers the EPO’s free-of-charge Espacenet as well as our flagship product PATSTAT, both with a worldwide coverage. It also highlights the ever-improving European Patent Register and its accompanying services as well as our RESTful Open Patent Services.
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...Dr. Haxel Consult
Applications of machine learning on NLP tasks today receive a lot of attention and have been shown to yield state of the art results on a wide range of tasks. We describe several cases where machine learning is deployed productively under the usual constaints of real-world projects: Real-world requirements, fast throughput, reasonably low requirements in terms of training corpus size and high quality results. What we observe is a general trend towards open source - also our components are open source. With the software being mostly freely available, among the key success criteria for many NLP projects today therefore is first and foremost the necessary expertise required to combine, tune and apply open source components.
Averbis is specialized in the area of text mining and machine-learning-based patent monitoring. We help our clients screen large numbers of patents in no time, estimate their relevancy for the company and automatically classify them into customer-specific categories. Our approach is based on artificial intelligence – with the result that it learns from and imitates the behavior of IP professionals. Compared to conventional rule-based approaches, our approach is up to 400% more accurate and achieves the same accuracy offered by manual monitoring. At the same time, it reduces manual patent monitoring intervention by up to 80%. Thanks to Information Discovery, we enable IP professionals to reduce backlogs, improve staff efficiency and minimize inconsistencies associated with patent monitoring, ultimately improving the experience both for you and your customers.
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
Advances in text mining, analytics and machine learning are transforming our applications and enabling ever more powerful applications, yet most applications and platforms are designed to deal with a single (normalized) language. Hence as our applications and platforms are increasingly required to ingest international content, the challenge becomes to find ways to normalize content to a single language without compromising quality. An extension of this question in terms of such applications is also how we define quality in this context and what, if any, bi-products a localization effort can produce that may enhance the usefulness of the application.
This talk will, using patent searching as an example use case, review the challenges and possible solution approaches for handling localization effectively and will show what current emerging technology offers, what to expect and what not to expect and provide an introductory practical guide to handling localization in the context of data mining and analytics.
II-SV 2017: How to effectively monitor Technological Developments in IPDr. Haxel Consult
Modern, cutting-edge developments are not reflected in current patent classification systems, which tend to catalogue established technologies. Identifying patent portfolios in such emerging fields proves a challenging job for patent and technology experts.
Going beyond the mere identification of new IP, additional value may be added using a regional geographic weighting combined with consolidated portfolio owner information.
Effective monitoring of the technological field is achieved by training active-learning search engines to hunt for highly relevant patent documents, thus keeping IP portfolios for emerging technologies up to date. The system we have developed permits extremely accurate updates with drastically reduced noise and with low workload which have proven to be invaluable in a world of drastically increasing data blur.
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataDr. Haxel Consult
Analysis and visualisation of a “galaxy” of patent data can present challenges in being able to spot the “stars”. Being able to drawn meaningful conclusions from any patent landscape relies on the quality and comprehensiveness of the data input, as well as having features and functionality to visualize the landscape accurately and being able to focus on an area of interest.
Delivered from the perspective of an experienced patent analyst, we will use case studies to discuss the challenges in creating a meaningful patent landscape and the recent innovative features and functionality in PatBase which can help, including:
Using Analytics for customised, multidimensional analysis and to visually compare multiple datasets.
Text-mining to automatically identify and highlight chemical, physical, genetic and medical concepts within any full text patent.
Being able to efficiently identify the exact location of a chemical entity anywhere in the full text.
The ability to easily review and filter patent citations based on their relevance, origin and assignee.
This presentation will demonstrate how any user can benefit from the innovative features and functionality in PatBase to interrogate and visualize the patent landscape for any technical area.
Gridlogics is a leading provider of products and custom software solutions for patent research, management, data analysis and project management. Our products leverage the latest techniques in information retrieval, data mining and visualizations to help clients globally in deriving actionable intelligence from the masses of patent data.
II-PIC 2017: Gain insight into technical, legal and business information thro...Dr. Haxel Consult
Feinäugle Roland (European Patent Office, Austria)
A recent study commissioned by the EPO on the involvement of (patent) information in the innovation process in industry underlines the role of patent information in the various innovation stages. This result gives further impetus to the EPO's patent information strategy, aimed at supporting the economy by providing the access to a wealth of patent-related information, be it from its own patent granting process or its collection of worldwide bibliographic, legal status, procedural and full-text data.
In the presentation we highlight some key findings of the study and give an overview of the EPO’s patent information products and services. We will discuss how they can be used for technology-specific searches, for getting insight into the legal and procedural status of an application during and after its grant procedure, and for the statistical analysis of bulk data for business intelligence purposes. The talk covers the EPO’s free-of-charge Espacenet as well as our flagship product PATSTAT, both with a worldwide coverage. It also highlights the ever-improving European Patent Register and its accompanying services as well as our RESTful Open Patent Services.
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...Dr. Haxel Consult
Applications of machine learning on NLP tasks today receive a lot of attention and have been shown to yield state of the art results on a wide range of tasks. We describe several cases where machine learning is deployed productively under the usual constaints of real-world projects: Real-world requirements, fast throughput, reasonably low requirements in terms of training corpus size and high quality results. What we observe is a general trend towards open source - also our components are open source. With the software being mostly freely available, among the key success criteria for many NLP projects today therefore is first and foremost the necessary expertise required to combine, tune and apply open source components.
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...Dr. Haxel Consult
Parthiban Srinivasan (VINGYANI, India)
When new technologies become easier to use, they transform industries. That's what's happening with artificial intelligence (AI) and big data. Machine learning is often described as a type of AI where computers learn to do something without being programmed to do it. Deep learning, a subset of machine learning, is proving to work especially well on classification. Big breakthroughs happen when what is suddenly possible meets what is desperately needed. For years, patent analysts have been searching and reviewing terabytes of information, not only patents but also non-patent information. Not only to find prior art but also to identify patents of interest, rate their quality, assess the potential value of patent clusters, and identify potential business partners or infringers. With the rapid increase in the number of patent documents worldwide, demand for their automatic clustering/categorization has grown significantly. Many information science researchers have started to experiment with machine learning tools, but the adoption in the patent information space has been sporadic. In this talk, we aim to review the prevailing machine learning techniques and present several sample implementations by various research groups. We will also discuss how data science compares with machine learning, deep learning, AI, statistics and applied mathematics.
Kairntech combines technologies from natural language processing (NLP) and machine learning to support clients in analysing large amounts of text-based information.
You find more information at https://kairntech.com/
Averbis is specialized in the area of text mining and machine-learning-based patent monitoring. We help our clients screen large numbers of patents in no time, estimate their relevancy for the company and automatically classify them into customer-specific categories. Our approach is based on artificial intelligence – with the result that it learns from and imitates the behavior of IP professionals. Compared to conventional rule-based approaches, our approach is up to 400% more accurate and achieves the same accuracy offered by manual monitoring. At the same time, it reduces manual patent monitoring intervention by up to 80%. Thanks to Information Discovery, we enable IP professionals to reduce backlogs, improve staff efficiency and minimize inconsistencies associated with patent monitoring, ultimately improving the experience both for you and your customers.
II-SDV 2017: Applications of RNN (Recurrent Neural Networks) within Machine T...Dr. Haxel Consult
Pierre Bernassau will present the state of the art in Artificial Intelligence and Recurrent Neural Networks applied to natural language and in particular in the Machine Translation domain. This disruptive technology does not shift previous practices based on rules and statistics, but it makes new fields possible. New fields for end-users that can apply MT technologies on new languages, text styles, documents, or messages with a ‘good-enough’ result. But also new fields in terms of good practices, where new projects, new workflow, new applications are addressed that expand de facto the MT market.
With the experience of several recent projects done by his consulting team, Pierre will explain the best practices to apply Neural Technologies within the NLP field.
SciBite is an award-winning leading provider of semantic solutions for the life sciences industry. Our fast, scalable easy-to-use semantic technologies understand the complexity and variability of content within life sciences. We can quickly identify and extract scientific terminology from unstructured text and transform it into valuable machine-readable data for your downstream applications. Our hand-curated ontologies ensure accuracy and reliability of high-quality results. Headquartered in the UK, we support our customers with additional sites in the US and Japan.
More infos at: www.scibite.com
II-PIC 2017: Gain insight into technical, legal and business information thro...Dr. Haxel Consult
Feinäugle Roland (European Patent Office, Austria)
A recent study commissioned by the EPO on the involvement of (patent) information in the innovation process in industry underlines the role of patent information in the various innovation stages. This result gives further impetus to the EPO's patent information strategy, aimed at supporting the economy by providing the access to a wealth of patent-related information, be it from its own patent granting process or its collection of worldwide bibliographic, legal status, procedural and full-text data.
In the presentation we highlight some key findings of the study and give an overview of the EPO’s patent information products and services. We will discuss how they can be used for technology-specific searches, for getting insight into the legal and procedural status of an application during and after its grant procedure, and for the statistical analysis of bulk data for business intelligence purposes. The talk covers the EPO’s free-of-charge Espacenet as well as our flagship product PATSTAT, both with a worldwide coverage. It also highlights the ever-improving European Patent Register and its accompanying services as well as our RESTful Open Patent Services.
II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...Dr. Haxel Consult
Greg Harrop-Griffiths (minesoft, UK)
Patent data is a critical source of information to stimulate innovation and for competitive intelligence. Patents are often the first and only source of disclosure of a new invention and hence, ignoring them will only delay innovation and give an incomplete competitive intelligence picture.
Delivered from the perspective of an experienced patent analyst, we will use case studies to describe the use of patent data to compile a competitive landscape, to stimulate innovation by learning from others and to help identify valuable IP in a portfolio. We will discuss the challenges in using patents for competitive intelligence and the recent innovative features and functionality in PatBase which can help, including:
Using thesauri, semantic and non-patent literature searching to compile a comprehensive competitive landscape
The use of Analytics for customised, multidimensional analysis and to visually compare multiple datasets.
Text-mining to automatically identify and highlight concepts within any full text patent.
Citation analysis to identify key competitors, collaborators or potential infringers.
This presentation will demonstrate how any user can benefit from the innovative features and functionality in PatBase to interrogate and visualize the competitive landscape for any technical area.
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...Dr. Haxel Consult
Focusing on the significance of targets is one of the key drivers for quality of web search.
Filtering targeted companies based on the significance of their business model for the expected search results was one of our “nice to haves” last year.
Evaluating a number of artificial intelligence approaches based on neural networks, classical machine learning and semantic technologies lead us to a working hybrid approach.
II-SDV 2017: The Next Era: Deep Learning for Biomedical ResearchDr. Haxel Consult
Deep learning is hot, making waves, delivering results, and is somewhat of a buzzword today. There is a desire to apply deep learning to anything that is digital. Unlike the brain, these artificial neural networks have a very strict predefined structure. The brain is made up of neurons that talk to each other via electrical and chemical signals. We do not differentiate between these two types of signals in artificial neural networks. They are essentially a series of advanced statistics based exercises that review the past to indicate the likely future. Another buzzword that was used for the last few years across all industries is “big data”. In biomedical and health sciences, both unstructured and structured information constitute "big data". On the one hand deep learning needs lot of data whereas “big data" has value only when it generates actionable insight. Given this, these two areas are destined to be married. The couple is made for each other. The time is ripe now for a synergistic association that will benefit the pharmaceutical companies. It may be only a short time before we have vice presidents of machine learning or deep learning in pharmaceutical and biotechnology companies. This presentation will review the prominent deep learning methods and discuss these techniques for their usefulness in biomedical and health informatics.
II-PIC 2017: To err is human – growing in experience as a patent information ...Dr. Haxel Consult
Stephen Adams (Magister Ltd., UK)
Many people who start in patent information work believe that it is possible to become fully competent simply by gathering new facts about search sources, platforms or procedures. Whilst this sort of knowledge certainly contributes to growing competence, it is not sufficient in itself. The truly experienced professional is one who has been doing the job long enough to have encountered a variety of exceptions to the normal rules of search and retrieval, and who has learnt to identify and circumvent errors in an informed way. In short, experience is gained more by making mistakes, and recovering from them, than it is by proceeding smoothly through the challenges of a routine search. This presentation will give some examples of what a good searcher needs to have in their skill set, and why formal education will never prepare you for every eventuality
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...Dr. Haxel Consult
Market overview of patent information analytics, my take on strategic decsions, which data sources, and a live demo of, how to visualize open patent information with Tableau 10.
Deep SEARCH 9 is Data Analysis for the Web.
Combined indexing and search of structured and unstructured sources from the Surface Web, the Deep Web and the Dark Web and corporate data ensure that employees in every part of the organization have access to the information they need without having to use public search engines.
A powerful Application Builder for creating 360° information applications to bring information and analytics together and deliver them to users in a new SEARCH experience.
Advanced content analytics to aggregate, analyze and visualize unstructured (natural language) content to reveal hidden insights and patterns.
Deep SEARCH 9 is the only Web scale analytic search solution that offers corporations a proprietary and anonymous search solution by combining web crawling, content analytics, data linking and search.
Reinventing Laboratory Data To Be Bigger, Smarter & FasterOSTHUS
• Big Data technologies, especially Data Lakes are spreading across many industries at the moment with the hopes that they will provide unprecedented capabilities for data integration and data analytics
• In spite of the popularity and promise of these technology approaches, many early adopters are not seeking immediate solutions to their complex problems. Answers are not simply appearing – this talk will explore this issue more thoroughly
• Of the 4 V’s of Big Data, Data Variety and Data Veracity (uncertainty) are of increasing importance. These can cause barriers to successful integration strategies , which, in turn, can lead to poorly performing analytics.
• The problems of Variety and Veracity can be tackled using a new form of Data Science which combines formal ontologies with statistical heuristics. This talk will explore some key features of these approaches and how they can be developed together in symbiosis – leading to complex models that allow for improved analytics – or as we call it Big Analysis.
• The end result is improved capture of data types/sources, from laboratory instrument data, to clinical data, to regulatory rules & submissions, all the way to business drivers for the enterprise. In the end providing advanced analytics capabilities that can be built as modules and expand across an enterprise.
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...Dr. Haxel Consult
Parthiban Srinivasan (VINGYANI, India)
When new technologies become easier to use, they transform industries. That's what's happening with artificial intelligence (AI) and big data. Machine learning is often described as a type of AI where computers learn to do something without being programmed to do it. Deep learning, a subset of machine learning, is proving to work especially well on classification. Big breakthroughs happen when what is suddenly possible meets what is desperately needed. For years, patent analysts have been searching and reviewing terabytes of information, not only patents but also non-patent information. Not only to find prior art but also to identify patents of interest, rate their quality, assess the potential value of patent clusters, and identify potential business partners or infringers. With the rapid increase in the number of patent documents worldwide, demand for their automatic clustering/categorization has grown significantly. Many information science researchers have started to experiment with machine learning tools, but the adoption in the patent information space has been sporadic. In this talk, we aim to review the prevailing machine learning techniques and present several sample implementations by various research groups. We will also discuss how data science compares with machine learning, deep learning, AI, statistics and applied mathematics.
Kairntech combines technologies from natural language processing (NLP) and machine learning to support clients in analysing large amounts of text-based information.
You find more information at https://kairntech.com/
Averbis is specialized in the area of text mining and machine-learning-based patent monitoring. We help our clients screen large numbers of patents in no time, estimate their relevancy for the company and automatically classify them into customer-specific categories. Our approach is based on artificial intelligence – with the result that it learns from and imitates the behavior of IP professionals. Compared to conventional rule-based approaches, our approach is up to 400% more accurate and achieves the same accuracy offered by manual monitoring. At the same time, it reduces manual patent monitoring intervention by up to 80%. Thanks to Information Discovery, we enable IP professionals to reduce backlogs, improve staff efficiency and minimize inconsistencies associated with patent monitoring, ultimately improving the experience both for you and your customers.
II-SDV 2017: Applications of RNN (Recurrent Neural Networks) within Machine T...Dr. Haxel Consult
Pierre Bernassau will present the state of the art in Artificial Intelligence and Recurrent Neural Networks applied to natural language and in particular in the Machine Translation domain. This disruptive technology does not shift previous practices based on rules and statistics, but it makes new fields possible. New fields for end-users that can apply MT technologies on new languages, text styles, documents, or messages with a ‘good-enough’ result. But also new fields in terms of good practices, where new projects, new workflow, new applications are addressed that expand de facto the MT market.
With the experience of several recent projects done by his consulting team, Pierre will explain the best practices to apply Neural Technologies within the NLP field.
SciBite is an award-winning leading provider of semantic solutions for the life sciences industry. Our fast, scalable easy-to-use semantic technologies understand the complexity and variability of content within life sciences. We can quickly identify and extract scientific terminology from unstructured text and transform it into valuable machine-readable data for your downstream applications. Our hand-curated ontologies ensure accuracy and reliability of high-quality results. Headquartered in the UK, we support our customers with additional sites in the US and Japan.
More infos at: www.scibite.com
II-PIC 2017: Gain insight into technical, legal and business information thro...Dr. Haxel Consult
Feinäugle Roland (European Patent Office, Austria)
A recent study commissioned by the EPO on the involvement of (patent) information in the innovation process in industry underlines the role of patent information in the various innovation stages. This result gives further impetus to the EPO's patent information strategy, aimed at supporting the economy by providing the access to a wealth of patent-related information, be it from its own patent granting process or its collection of worldwide bibliographic, legal status, procedural and full-text data.
In the presentation we highlight some key findings of the study and give an overview of the EPO’s patent information products and services. We will discuss how they can be used for technology-specific searches, for getting insight into the legal and procedural status of an application during and after its grant procedure, and for the statistical analysis of bulk data for business intelligence purposes. The talk covers the EPO’s free-of-charge Espacenet as well as our flagship product PATSTAT, both with a worldwide coverage. It also highlights the ever-improving European Patent Register and its accompanying services as well as our RESTful Open Patent Services.
II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...Dr. Haxel Consult
Greg Harrop-Griffiths (minesoft, UK)
Patent data is a critical source of information to stimulate innovation and for competitive intelligence. Patents are often the first and only source of disclosure of a new invention and hence, ignoring them will only delay innovation and give an incomplete competitive intelligence picture.
Delivered from the perspective of an experienced patent analyst, we will use case studies to describe the use of patent data to compile a competitive landscape, to stimulate innovation by learning from others and to help identify valuable IP in a portfolio. We will discuss the challenges in using patents for competitive intelligence and the recent innovative features and functionality in PatBase which can help, including:
Using thesauri, semantic and non-patent literature searching to compile a comprehensive competitive landscape
The use of Analytics for customised, multidimensional analysis and to visually compare multiple datasets.
Text-mining to automatically identify and highlight concepts within any full text patent.
Citation analysis to identify key competitors, collaborators or potential infringers.
This presentation will demonstrate how any user can benefit from the innovative features and functionality in PatBase to interrogate and visualize the competitive landscape for any technical area.
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...Dr. Haxel Consult
Focusing on the significance of targets is one of the key drivers for quality of web search.
Filtering targeted companies based on the significance of their business model for the expected search results was one of our “nice to haves” last year.
Evaluating a number of artificial intelligence approaches based on neural networks, classical machine learning and semantic technologies lead us to a working hybrid approach.
II-SDV 2017: The Next Era: Deep Learning for Biomedical ResearchDr. Haxel Consult
Deep learning is hot, making waves, delivering results, and is somewhat of a buzzword today. There is a desire to apply deep learning to anything that is digital. Unlike the brain, these artificial neural networks have a very strict predefined structure. The brain is made up of neurons that talk to each other via electrical and chemical signals. We do not differentiate between these two types of signals in artificial neural networks. They are essentially a series of advanced statistics based exercises that review the past to indicate the likely future. Another buzzword that was used for the last few years across all industries is “big data”. In biomedical and health sciences, both unstructured and structured information constitute "big data". On the one hand deep learning needs lot of data whereas “big data" has value only when it generates actionable insight. Given this, these two areas are destined to be married. The couple is made for each other. The time is ripe now for a synergistic association that will benefit the pharmaceutical companies. It may be only a short time before we have vice presidents of machine learning or deep learning in pharmaceutical and biotechnology companies. This presentation will review the prominent deep learning methods and discuss these techniques for their usefulness in biomedical and health informatics.
II-PIC 2017: To err is human – growing in experience as a patent information ...Dr. Haxel Consult
Stephen Adams (Magister Ltd., UK)
Many people who start in patent information work believe that it is possible to become fully competent simply by gathering new facts about search sources, platforms or procedures. Whilst this sort of knowledge certainly contributes to growing competence, it is not sufficient in itself. The truly experienced professional is one who has been doing the job long enough to have encountered a variety of exceptions to the normal rules of search and retrieval, and who has learnt to identify and circumvent errors in an informed way. In short, experience is gained more by making mistakes, and recovering from them, than it is by proceeding smoothly through the challenges of a routine search. This presentation will give some examples of what a good searcher needs to have in their skill set, and why formal education will never prepare you for every eventuality
II-SDV 2017: How Visualisation of Open Patent Data can help with Strategic De...Dr. Haxel Consult
Market overview of patent information analytics, my take on strategic decsions, which data sources, and a live demo of, how to visualize open patent information with Tableau 10.
Deep SEARCH 9 is Data Analysis for the Web.
Combined indexing and search of structured and unstructured sources from the Surface Web, the Deep Web and the Dark Web and corporate data ensure that employees in every part of the organization have access to the information they need without having to use public search engines.
A powerful Application Builder for creating 360° information applications to bring information and analytics together and deliver them to users in a new SEARCH experience.
Advanced content analytics to aggregate, analyze and visualize unstructured (natural language) content to reveal hidden insights and patterns.
Deep SEARCH 9 is the only Web scale analytic search solution that offers corporations a proprietary and anonymous search solution by combining web crawling, content analytics, data linking and search.
Reinventing Laboratory Data To Be Bigger, Smarter & FasterOSTHUS
• Big Data technologies, especially Data Lakes are spreading across many industries at the moment with the hopes that they will provide unprecedented capabilities for data integration and data analytics
• In spite of the popularity and promise of these technology approaches, many early adopters are not seeking immediate solutions to their complex problems. Answers are not simply appearing – this talk will explore this issue more thoroughly
• Of the 4 V’s of Big Data, Data Variety and Data Veracity (uncertainty) are of increasing importance. These can cause barriers to successful integration strategies , which, in turn, can lead to poorly performing analytics.
• The problems of Variety and Veracity can be tackled using a new form of Data Science which combines formal ontologies with statistical heuristics. This talk will explore some key features of these approaches and how they can be developed together in symbiosis – leading to complex models that allow for improved analytics – or as we call it Big Analysis.
• The end result is improved capture of data types/sources, from laboratory instrument data, to clinical data, to regulatory rules & submissions, all the way to business drivers for the enterprise. In the end providing advanced analytics capabilities that can be built as modules and expand across an enterprise.
El nuevo superordenador Mare Nostrum y el futuro procesador europeoAMETIC
Presentación a cargo de Mateo Valero, del BSC-CNC, en el 33er Encuentro de la Economía Digital y las Telecomunicaciones organizado por AMETIC y Santander Empresas en colaboración con la UIMP
Healthcare professionals are concerned that Electronic Patient Records systems have become islands of information. Little or no interoperability exists. Major factors are cost, complexity, and maintainability.
The impact is that patient details are held on paper resulting in missing files, bad patient outcomes due to incomplete information and even patient deaths.
Through the application of semantics the interoperability problem is reduced to one of plugging in EPR systems and simply configuring them to send their native messages to one another. PPEPR does the rest. It manages the differences in standards, formats, and enables mapping between them.
Presentation to ImmPort Science Meeting, February 27, 2014 on the proper treatment of value sets in the Immport Immunology Database and Analysis Portal
Conference: 13th IEEE International Conference on Industrial Informatics, INDIN 2015. Cambridge, UK – July 22-24 2015
Title of the paper: Towards processing and reasoning streams of events in knowledge-driven manufacturing execution systems
Authors: Borja Ramis Ferrer, Sergii Iarovyi, Andrei Lobov, José L. Martinez Lastra
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
This presentation demonstrates the functionality provided by the Logical Model Designer (LMD) and Snow Owl tools, which enables terminology to be bound to the Singapore Logical Information Model.
Abstract:
A critical enabler in the journey towards semantic interoperability in Singapore is the Singapore "˜Logical Information Model' (LIM). The LIM is a model of the healthcare information shared within Singapore, and is defined as a set of reusable "˜archetypes' for each clinical concept (e.g. Problem/Diagnosis, Pharmacy Order). These archetypes are then constrained and composed into "˜templates' to support specific use cases.
The Singapore LIM harmonises the semantics of the information structures with the terminology, using multiple types of terminology bindings, including semantic, value domain and constraint bindings. Value domain bindings are defined to both national "˜reference terminology' (used for querying nationally-collated data), as well as to a variety of "˜interface terminologies' used within local clinical systems (required to enforce conformance-compliance rules over message specifications generated from the LIM). To support the diversity of pre-coordination captured in local interface terms, "˜design patterns' are included in the LIM, based on the SNOMED CT concept model. These design patterns represent a logical model of meaning for a specific concept, and allow more than one split between the information model and the terminology model to be represented in a semantically-consistent manner.
This presentation will demonstrate the "˜Logical Model Designer' (LMD) - an Eclipse-based tool that is being used to maintain Singapore's Logical Information Model. A number of features of the LMD tooling will be demonstrated, with a specific focus on how the information structure is bound to the terminology via an interface to the Snow Owl platform. Value Domains are defined as reference sets within Snow Owl and then linked to the information structures defined in the LMD.
Please see our website http://b2i.sg for further information.
The slides present recent results of a study on 5G declared standard essential patents (SEPs). The study link: https://www.iam-media.com/who-leading-5g-patent-race
For any questions on the data please contact: info@iplytics.com or go to: www.iplytics.com
Benchmarking Commercial RDF Stores with Publications Office DatasetGhislain Atemezing
The slides present a benchmark of RDF stores with real-world datasets and queries from the EU Publications Office (PO). The study compares the performance of four commercial triple stores: Stardog 4.3 EE, GraphDB 8.0.3 EE, Oracle 12.2c and Virtuoso 7.2.4.2 with respect to the following requirements: bulk loading, scalability, stability and query execution.
Speeding up information extraction programs: a holistic optimizer and a learn...INRIA-OAK
A wealth of information produced by individuals and organizations is expressed in natural language text. Text lacks the explicit structure that is necessary to support rich querying and analysis. Information extraction systems are sophisticated software tools to discover structured information in natural language text. Unfortunately, information extraction is a challenging and time-consuming task.
In this talk, I will first present our proposal to optimize information extraction programs. It consists of a holistic approach that focuses on: (i) optimizing all key aspects of the information extraction process collectively and in a coordinated manner, rather than focusing on individual subtasks in isolation; (ii) accurately predicting the execution time, recall, and precision for each information extraction execution plan; and (iii) using these predictions to choose the best execution plan to execute a given information extraction program.
Then, I will briefly present a principled, learning-based approach for ranking documents according to their potential usefulness for an extraction task. Our online learning-to-rank methods exploit the information collected during extraction, as we process new documents and the fine-grained characteristics of the useful documents are revealed. Then, these methods decide when the ranking model should be updated, hence significantly improving the document ranking quality over time.
This is joint work with Gonçalo Simões, INESC-ID and IST/University of Lisbon, and Pablo Barrio and Luis Gravano from Columbia University, NY.
Scaling Up AI Research to Production with PyTorch and MLFlowDatabricks
PyTorch, the popular open-source ML framework, has continued to evolve rapidly since the introduction of PyTorch 1.0, which brought an accelerated workflow from research to production.
EUGM 2014 - Alfonso Pozzan (Aptuit): Expanding the scope of “literature data”...ChemAxon
Data associated to chemical structures reported in literature are very important in drug design as they expand the scope of in-house generated knowledge. Public and commercially available databases allow scientist to access an increasing number of such information. However, alternative sources of data are still contained in document format and difficult to extract like in patents. Document to structure tools can be very helpful in this area; the objective of the talk will be to highlight some of these aspects in particular from a drug discovery angle.
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
Knowledge Graphs are an increasingly relevant approach to store detailed knowledge in many domains. Recent advances in NLP allow to enrich Knowledge Graphs through automated analysis of large volumes of literature, reducing a lot the efforts in traditional manual information capturing. In our presentation we report the approach taken in a project with partner Fraunhofer SCAI in the life sciences where a knowledge graph organising detailed facts about psychiatric diseases has been computed.
Information of cause-effect relations between proteins, genes, drugs and diseases has been encoded in the BEL (Biological Expression Language) and imported into a Graph database to approach an indication-wide Knowledge Graph for the selected therapeutic area. Ultimately, updating the graph will amount to just rerunning the analysis on the newly published literature.
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
In 2019 the UK was the first major economy to embrace a legal obligation to achieve net zero carbon emissions by 2050. More broadly, the 2021 UK Innovation Strategy sets out the UK government’s vision to make the UK a global hub for innovation by 2035 with a target of increasing public and private sector R&D expenditure to 2.4% of GDP to support the UK being a science superpower with a world-class research and innovation system.
IP rights create an incentive for R&D which ultimately leads to innovation. Analysis and insights from IP data can therefore help provide a better understanding of how the IP system is being used and where and what innovation is taking place. Research and analysis of IP data is a key input to the ongoing work of the UKIPO’s Green Tech Working Group which seeks to:
further the UK’s status as a global leader by making the UK’s IP environment the best for innovating green technology;
develop and deliver IP policies to support government’s ambition on climate change and green technologies; and
to help innovators best protect and commercialise their green tech innovations both at home and internationally.
The UKIPO has been developing a broad portfolio of ‘green’ IP analytics research. A series of patent analytics reports have been published looking at green technologies, and analysis of how the UK’s Green Channel scheme for accelerated processing of green patent applications has been conducted. Patents have been used to identify technological comparative advantage within different green technologies at a country level, and new insights uncovered by mapping green technology patents to the UN Sustainable Development Goals (SDGs). Trade mark data provides a timeliness and closeness to market factor that patent data does not, and complementary trade mark analysis of UK ‘green’ trade marks, identified using a machine learning algorithm, provides a commercialisation angle to our research.
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
In the patent domain, all types of issues, from very specific search requirements to the linguistic characteristics of the text domain, are accentuated. Consequently, to develop patent text mining tools for scientists and patent experts, we need to understand their daily work tasks, as well as the linguistic character of the text genre (i.e., patentese). Patent text is a mixture of legal and domain-specific terms. In processing technical English texts, a multi-word unit method is often deployed as a word-formation strategy to expand the working vocabulary, i.e., introducing a new concept without the invention of an entirely new word. This productive word formation is a well-known challenge for traditional natural language processing tools utilizing supervised machine learning algorithms due to limited domain-specific training data. Deep learning technologies have been introduced to overcome the reduction in performance of traditional NLP tools. In the Artificial Researcher technologies, we have integrated explicit and implicit linguistic knowledge into the deep learning algorithms, essential for domain-specific text mining tools. In this talk, we will present a step-by-step process of how we have developed the mentioned text mining tools. For the final outline, we will also demonstrate how these tools can be integrated in a cross-genre passage retrieval system, based on a technology from 2016 that still holds the state-of-the-art within the patent text mining research community in 2022.
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
In 2013 we witnessed an evolutionary change in the NLP field evolved thanks to the introduction of space embeddings that, with the use of deep learning architectures, achieved human-level performances in many NLP tasks. With the introduction of the Attention mechanism in 2017 the results were further improved and, as result, embeddings are quickly becoming the de facto standards in solving many NLP problems. In this presentation, you will learn how generate and use space embedding for search purposes and provide comparison metrics to more traditional relevance-based search engines. Moreover, I will provide some initial results on a paper currently under review that provides an insight on hyperparameter tuning during the generation of embeddings.
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
10 years in the making. How real-world business cases have driven the development of CCC's deep search solutions, leading to the capabilities for web-crawling and delivery of targeted intelligence that helps R&D; intensive companies gain a competitive advantage.
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
It is relatively easy for a human to read a document and quickly figure out which concepts are important. However, this task is a difficult challenge for a machine. During the past few decades, there have been two main approaches for concept identification: Natural Language Processing and Machine Learning. During the early part of this century, Machine Learning made great strides as new techniques came into wider use (SVM’s, Topic Modeling, etc..). Sensing the competition, Natural Language Processing responded with deployment of new emerging techniques (sematic networks, finite state automata, etc..). Neither approach has completely solved the WHAT problem. Advances in Artificial Intelligence have the potential to significantly improve the situation. Where AI is making the most impact is as an enhancement to make Machine Learning and Natural Language Processing work better and, more importantly, work together. This presentation looks at some of this history and what might happen in the future when we blend the interpretation of language with pattern prediction.
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
Trademarks serve as key leading indicators for innovation and economic growth. As the vanguards of new and expanding enterprises, trademarks can be used to study entrepreneurship and shifting market demands in response to varying economic factors. This responsiveness has been seen as recently as the COVID-19 pandemic, where trademark research revealed key insights about business reaction to the global upheaval.
At CIPO, we have been delving more deeply than ever before into trademark analysis by leveraging cutting-edge natural language processing (NLP) tools to derive actionable business intelligence from trademark data. In this presentation, we present a survey of NLP in use at CIPO and the insights we have learned applying them. These insights include COVID-19 responses, line-of-business trends based on firm characteristics, and more.
We also discuss ongoing and future trademark research projects at CIPO. These projects include emerging technology detection methods and high-resolution trademark classification systems. We conclude that artificial intelligence-enhanced tools like NLP are key components of future exploitation of trademark data for business and economic intelligence.
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
In our customer projects involving automated document processing, we often encounter document types providing crucial data in the form of tables. While established text analytics algorithms are usually optimized to operate on running text, they tend to produce rather poor results on tables as they do not capture the non-sequential relations inside them (e.g. interpret the content of a table cell relative to its column title, interpret line breaks inside a cell differently from line breaks between cells or rows). While there are elaborate information extraction products in the market for a few highly specific types of tabular documents, there is no general approach out there. The main cause for this is the fact that table structures can be encoded by a heterogenous range of layout means (e.g. column boundaries can be signaled by lines vs. aligned text vs. white space). In this talk, we will illustrate several solutions that we have developed for a range of challenges occurring in this context, both for scanned and digitally generated documents.
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is published usually unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or data mining unnecessarily difficult or even impossible. A concept is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently.
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
How do you find video when you only have sparse data? While you can wander the stacks (if you can still find open stacks) for inspiration, video either physical or digital, is difficult to discover. Wandering the virtual stacks is, well, virtually impossible. Discovery platforms on the whole have not replicated the inspirational experience of wandering the stacks.
More companies are using archivable video for internal communication of the various research projects, product developments, test results, and more that are being considered, in progress, or completed. Showing how an experiment was conducted can convey considerably more information that is very difficult to communicate via text. How do you find a company video that might be helpful for your project?
A case study is presented of the problems and the solutions that were implemented by a large, multinational chemical company. A suite of content discovery technologies was used including a video to text to tagging system connected to their documents database and automatically indexed using several chemical as well as conceptual systems (rule-based, NLP, inference engine). To build the system and support the manuscript and video submission there is a metadata extraction program which pulls and inserts the metadata into the submission forms so the author can move quickly through that process.
Copyright Clearance Center
A pioneer in voluntary collective licensing, CCC (Copyright Clearance Center) helps organizations integrate, access, and share information through licensing, content, software, and professional services. With expertise in copyright and information management, CCC and its subsidiary RightsDirect collaborate with stakeholders to design and deliver innovative information solutions that power decision-making by helping people integrate and navigate data sources and content assets. CCC recently acquired the assets and technology of Deep SEARCH 9 (DS9), a knowledge management platform that leverages machine learning to help customers perform semantic search, tag content, and discover new insights.
Lighthouse IP is the world’s leading provider of intellectual property content. The core business of Lighthouse IP is sourcing and creating content from the world’s most challenging authorities. Specialized in IP data, Lighthouse IP provides over 160 countries coverage for patents, over 200 authorities for trademarks and over 90 authorities for designs. Lighthouse IP data is available via several partners. The company is headquartered in Schiphol-Rijk in the Netherlands and has offices in the United States, China, Thailand, Vietnam, Egypt, Indonesia and Belarus. Globally a team of 150 experts works on the creation of this unique data collection.
CENTREDOC was created in 1964 as the technical information center of the swiss watchmaking industry. Building on a strong team of engineers, CENTREDOC now offers a complete range of services and solutions for the monitoring of strategic, technological and competitive information. CENTREDOC is also a leader in the research of patent, technical and business intelligence, and offers consulting expertise in the implementation of monitoring solutions.
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
The everyday use of AI-driven algorithms for data search, analysis and synthesis comes with important time savings, but also reveals the need to understand and accept the limitations of the technology. Practical deployments on concrete topics are inevitable to assess and manage the challenges of neuronal network based AI. A workshop report.
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
What if there was a platform where literature, conference abstracts, patents, clinical trials, news, grants and other sources were fully integrated? What if the data would be harmonized, enriched with standardized concepts and ready for analysis? After building our patent analytics platform we didn’t stop dreaming and built our big data analytics platform by semantically integrating text-rich, scientific sources. In my presentation I will talk about what we built and why we built it. And, of course, I will also address the challenges and hurdles along the way. Was it worth it and what comes next? Let’s talk about it!
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
guildmasters guide to ravnica Dungeons & Dragons 5...
II-SDV 2017: Towards Semantic Search at the European Patent Office
1. Dr. Alexander Klenner-Bajaja 30-31 March 2017Data Scientist Requirements Engineering-Solution Design | Dir. 2.8.3.3
Towards semantic search at the European
Patent Office
II-SDV 2017
2. European Patent Office
About me
Alexander Klenner-Bajaja
▪ Bioinformatics at Goethe University Frankfurt
▪ PhD ETH Zurich & Goethe University –
in Cheminformatics
▪ PostDoc at Fraunhofer Society SCAI in Cologne
− Chemical entity recognition in patents
▪ Data Scientist, Search & Knowledge, DG2, EPO
− automated search
− search benchmarking
− new search technologies
2
2006
2011
2013
current
3. European Patent Office
▪ Anecdotal Evidence and the need for a Benchmarking Environment
▪ Keywords and query generation in (automatic) search scenarios
(Project ERa)
▪ Introducing machine learned semantic search technologies and
(automatic) query expansion
▪ Introducing terminology based semantics within the APL project
▪ (Automated) Search result confidence
3
4. European Patent Office
Anecdotal Evidence and Benchmarking
▪ How can two different search
strategies be compared?
▪ Historically decision are taken
based on expert feedback
▪ Valuable information that can
be complemented with
measurements
▪ Distribution shown as Box-plot
4
median:
5 citations
in search
reports
5. European Patent Office
Search Benchmarking
5
▪ We implemented a holistic prototyping and benchmark system
▪ We have access to all distributed system in the EPO IT landscape
and can connect them in "visual" workflows
6. European Patent Office
Automated Queries exploiting the shown tools
6
Application
Automated
Query
Formulation
relevant documents
Evaluation and
Feedback
Search
PriorArt
7. European Patent Office
Benchmarking Environment
▪ From anecdotal to statistical evidence
▪ "Towards Automated Prior Art Search" (TAPAS)
7
Patent Corpus
Citations
Split Date
Applications
Method 1 Method 2
Ø Recall@100: 0.25 Ø Recall@100: 0.17
Search
Corpus
Result 1 Result 2
Data Set
Generation
Retrospective
Search
trec_eval trec_eval
8. European Patent Office
▪ Retrieval success measured by different metrics
▪ Computed by the trec_eval tool
▪ Average over all n simulated applications
Benchmarking Environment
8
Top 50 Documents Returned Cited Documents not Returned
Recall @ 50
Hit Rate @ 50
9. European Patent Office
Benchmarking Environment
9
▪ Search reports for about 40 million simple patent families
▪ The relevant documents are mentioned in the search report as
either
– X or A citations
▪ Only few citations per document (2 X-category, 3 A-category)
0 40x106patent documents
X X
A
relevant
10. European Patent Office
Benchmarking Environment
10
▪ Search reports for about 40 million simple patent families
▪ The relevant documents are mentioned in the search report as either
– X or A citations
▪ Only few citations per document (2 X-category, 3 A-category)
▪ But more "soft" information available: Returned during search (R), put
aside for detailed inspection (Ins) , printed (Pr), stored for citation (Cs)
relevance
0 80x106patent documents
X X
A
R
Ins
Pr
11. European Patent Office
▪ Anecdotal Evidence and the need for a Benchmarking Environment
▪ Keywords and query generation in (automatic) search scenarios
(Project ERa)
▪ Introducing machine learned semantic search technologies and
(automatic) query expansion
▪ Introducing terminology based semantics within the APL project
▪ (Automated) Search result confidence
11
12. European Patent Office
Term-Based Search
▪ Can we hope to find all citations with keywords?
overlap in vocabulary (non-stopwords)
12
Application Citation
Overlap
13. European Patent Office
Term-Based Search
13
▪ However: also overlap with random documents
▪ Random: Y-Scrambling
CitedDocumentsRandomDocuments
Frequency
Overlapping Terms
All Terms Nouns, no Stop Words Keywords (TF)
Overlapping Terms Overlapping Terms
Overlapping Terms Overlapping Terms Overlapping Terms
Frequency
Frequency
Frequency
Frequency
Frequency
14. European Patent Office
Term-Based Search fully automated
14
▪ How good can we get?
▪ More Like This on Title / Abstract / Description / Claims
▪ Percentage of citations found in the top k
Top 100 Top 1 000 Top 10 000 Top 100 000
0%
29%
57%
80%
15. European Patent Office
Term-Based Search fully automated
▪ Can we close the gap?
... using semantic search?
15
Top 100 Top 1 000 Top 10 000 Top 100 000
0%
29%
57%
80%
16. European Patent Office
▪ Anecdotal Evidence and the need for a Benchmarking Environment
▪ Keywords and query generation in (automatic) search scenarios
(Project ERa)
▪ Introducing machine learned semantic search technologies and
(automatic) query expansion
▪ Introducing terminology based semantics within the APL project
▪ (Automated) Search result confidence
16
18. European Patent Office
Automated Query Expansion
▪ Analyze word embeddings
▪ Words with similar meaning have similar context
▪ Train neural network words represented by vectors
▪ Words with similar meaning have similar vectors
allows distance calculations between words
semantic similarity
18
Word
eating apples is healthy
Context
eating fruits is healthy
apple = [0.253, 1.793, ..., 5.555, 3,142]
tomato
apple fruit
banana
steak
liver
19. European Patent Office
Automated Query Expansion
19
Patent
Corpus
CPC w2v model
A01B M1
A02B M2
A02C M3
H05K MN-1
All MN
Query Word
top X
relevant
words per
class
20. European Patent Office
Automated Query Expansion
A61K
A virus is a small infectious agent
that replicates only inside the
living cells of other organisms.
G06F
A computer virus is a type of
malicious software program
("malware") that, when executed,
replicates by reproducing itself
(copying its own source code) or
infecting other computer programs
by modifying them.
20
hiv, viral, human
immunodeficiency, viral
genome, genome, replication,
pathogen, adenovirus,
replicating infected, virus infection, malicious,
anti-virus, malware, virus-
scanning, virus worm, macro virus,
virus scanner
21. European Patent Office
Semantics through Machine Learning
Example Queries: descriptions from Wikipedia
.
21
A virus is a small infectious
agent that replicates only
inside the living cells of other
organisms.
A computer virus is a type of
malicious software program
("malware") that, when executed,
replicates by reproducing itself
(copying its own source code) or
infecting other computer programs
by modifying them.
Electrical digital
data processing
Preparations for medical,
dental or toilet purposes
22. European Patent Office
Automated Query Expansion
Inter model similarity in
CPC A Human Necessities
based on vocabularies
22
A47KA45DA47L
23. European Patent Office
Automated Query Expansion
▪ Word2Vec Models applied to (100) extracted TF-IDF Keywords
23
"Touch sensing system and display apparatus"
EP2869168A120150506
TF-IDF Keywords
(10 of 100)
TF-IDF
Scores
Word2Vec Enrichments
(G06F)
max 5 terms, at least Cosine 0.6
possessory
sensing
touch
bridge
nodes
shared
electrodes
controller
instructing
masters
memory
sensor
algorithm
senses
data
...
49.01
33.04
31.36
27.88
27.59
26.86
25.52
23.18
22.11
21.23
20.39
20.38
20.02
20.02
19.91
...
possessory
sensing OR sensed OR sensor OR touch
touch OR touched OR touch screen OR touches OR touch panel OR touching
bridge OR bridges OR pci bus OR bus OR interfaces 22a-22b OR pci-pci bridge
nodes OR node OR nodes n2
shared OR share OR sharing OR non-shared
electrodes OR electrode OR electrodes y2
controller OR control OR controllers OR bus OR unit OR processor
instructing OR instruct OR instructs
masters
memory OR ram OR non-volatile OR memories OR processor OR flash
sensor OR sensors OR sensing OR humidity sensor OR sensed OR sensor senses
algorithm OR algorithms
senses OR sensed
data OR stored OR stores OR store OR storing
...
24. European Patent Office
Automated Query Expansion
▪ Query Enrichment brings increased performance
EP2869168A120150506
24
"Touch sensing system and display apparatus"
Recall@100: 0
(0 of 2)
Recall@100: 1
(2 of 2)
Keyword
Extraction
W2V
Enrichment
TF-IDF Query
W2V Query
25. European Patent Office
Automated Query Expansion
▪ Documents found with increased Term Overlap with Query
25
39100958
48143418
Recall@100: 0
(0 of 2)
Recall@100: 1
(2 of 2)
39 80
43 80
Query Citation
TF-IDF
Query
W2V Query
26. European Patent Office
Automated Query Expansion
▪ Word2Vec Expansion Terms in Overlap
▪ Re-Suggestion of known Query Terms
26
outputs
apparatus
screen
increasing
serially
signal OR signals OR circuit OR output OR clock OR outputs
controlling OR control OR controlled OR apparatus
display OR screen OR displays OR displaying OR displayed OR lcd
increases OR increase OR decreases OR increased OR increasing OR decrease
decreasing OR increasing OR decrease OR increase OR decreases OR decreased
parallel OR serially
New Overlap Original Keyword Word2Vec Enrichment
transmission transmitted OR transmits OR received OR transmit OR transmission OR transmitting
reception OR transmission OR transmitted OR transmits
transmitting OR receiving OR transmitted OR transmission OR sending OR transmit
Term Original Keyword Word2Vec Enrichment
▪ Should this result in higher weight?
▪ Models heavily suggest inflections
27. European Patent Office
▪ Anecdotal Evidence and the need for a Benchmarking Environment
▪ Keywords and query generation in (automatic) search scenarios
(Project ERa)
▪ Introducing machine learned semantic search technologies and
(automatic) query expansion
▪ Introducing terminology based semantics within the APL project
▪ (Automated) Search result confidence
27
28. European Patent Office
Introducing semantics through annotations
We are in the process of annotating the full prior art collection with
normalised
▪ Chemical Entities
▪ Physical Units
▪ Citations
▪ Controlled Terminologies (e.g. MeSH)
to enable a semantic search of those entities.
28
Patent
Collection
Patent
Collection
Annotation Pipeline
Aspirin
400 mm
solar panel
29. European Patent Office
Introducing annotations and semantics
▪ Our annotation platform's key elements
− Based on the open source framework UIMA
− Plug and play of new components (analysis engines)
− Quality Evaluation Workbench
− Knowledge Base for controlled vocabulary
− Scaling architecture
− Data and annotations stored in noSQL databases
29
Application Citation
acetylsalicylic acid
(D001241)
matching terms through
controlled vocabulary
MeSH Unique ID: D001241
Aspirin
(D001241)
30. European Patent Office
Introducing annotations and semantics
Project in Execution - Platform installed and tested at this moment
30
1. Define (sub)set of documents to work on
2. Define an annotation Pipeline
31. European Patent Office
Word2Vec & APL
31
Application Citation
Aspirin acetylsalicylic acid
matching terms through
controlled vocabulary
Aspirin = acetylsalicylic acid
(would have been
False Negative)
Virus Virus
separating identical terms
with different meaning
Virus = Virus
(would have been
False Positive)
32. European Patent Office
▪ Anecdotal Evidence and the need for a Benchmarking Environment
▪ Keywords and query generation in (automatic) search scenarios
(Project ERa)
▪ Introducing machine learned semantic search technologies and
(automatic) query expansion
▪ Introducing terminology based semantics within the APL project
▪ (Automated) Search result confidence
32
33. European Patent Office
Identify Successful (Automated) Searches
Machine-Learning: look at the documents
33
Word2Vec Heatmap
Sequence Alignment
(Bioinformatics)
document1document2
document1document2
document 0
document 0
document 0
document 0
Citation
Random
34. European Patent Office
Can we identify successful searches? (Machine learned)
▪ Prediction Model for Citability based on Distribution Statistics of
heat maps
▪ 10-fold Cross Validation
(Y-Scrambling)
34
[mean, variance, skewness, kurtosis, min, q1, q2, q3, max]
Random ForestMulti-Layer Perceptron
▪ ensemble of decision trees
▪ bootstrap aggregating
▪ random selection of features
Training Set
Validation Set
Round 1
Error
Rate 1
Round 2
Error
Rate 2
Round 3
Error
Rate 3
Round 10
Error
Rate 10
~8% ~5%
Predicted
Relevance
35. European Patent Office
Identify Successful (Automated) Searches
▪ Meta Data: same author as in application occurs in result set
▪ Interesting Item Set Mining & Subgroup Discovery
35
Searches with same author in
result set (35%)
Recall@100 0.2357
Searches without same author in
result set (65%)
Recall@100 0.1209
All searches (100%)
Recall@100 0.1608
Only documents from same author
Recall@100 0.0884
MLT Search
0 100
Document Rank
Recall
39. European Patent Office
Conclusion and outline
▪ We are exploiting state-of-the-art semantic- and other search
technologies
▪ We have created a prototyping and benchmarking environment to
evaluate the performance and quality of a new search algorithm
learning from prior searches
▪ Bringing all these new technologies together in a productive search
environment is the biggest challenge ahead
▪ Semantic (text) search is one small mosaic piece in the complex
search environment at the EPO
39
40. European Patent Office
Acknowledgments
▪ The Search & Knowledge Directorate
− Domenico Golzio Director
− Stefan Klocke Head of Department
− Volker Hähnke Ranking and Confidence
− Matthias Wirth Neo4J, CPC classification
− Pavel Goncharik Ansera, ranking, lucene
▪ Examiners
− Robert Herman ERa
− Ype Kingma ERa
− Markus Arndt ERa
− Christos Stergio APL
40