This document provides an overview of LexisNexis' IP products: IP Data Direct, TotalPatent, PatentOptimizer, and Patent Advisor. IP Data Direct is a patent search API that delivers search results and patent documents. TotalPatent allows users to research patents for activities like competitive intelligence, prior art searches, and prosecution. PatentOptimizer is a tool that helps draft stronger patents and identify weaknesses. Patent Advisor provides information on pending US patent applications to help manage portfolios and prosecution costs. The document outlines the benefits and capabilities of each product.
The document describes Intellixir, a web system for technology and competitive intelligence. It was originally created in 1997 by the French Atomic Agency and Intellixir was founded in 2002. Intellixir collects and analyzes data from multiple sources to provide insights that drive innovation for its over 60 business clients. The system allows users to collect, consolidate, analyze, collaborate on, and share data and insights through dashboards and reports. It provides capabilities for visualizing data through graphs and statistics.
1. Boehringer Ingelheim Pharma GmbH & Co. KG's Scientific Information Center developed its own web crawler called SEARCHCORPORA to access information not available on public search engines, including university spin-offs, competitor activities, and internal company databases.
2. SEARCHCORPORA allows the Scientific Information Center to build custom searchable indexes of targeted websites and documents to help identify new technology opportunities and monitor competitors. Automatic alerts of relevant news can also be configured.
3. The Scientific Information Center implements a workflow to offer SEARCHCORPORA services to customers, including specifying project scopes, crawling and analyzing information, and providing a search interface and scheduled updates. Future plans include expanding the
The document summarizes a vendor presentation about updates to their patent database. It notes that the database now contains over 640,000 documents and 290 million sequences, including over 55,000 Chinese documents and 43,000 new Canadian documents. A new full text search capability was also introduced, allowing combination of sequence and text searches for life science applications. The presentation invites attendees to a live demo and talk on Tuesday at 4pm to learn more.
The document summarizes recent updates and enhancements to PatBase, a patent search and analytics platform. Key updates include expanded full text patent coverage for several countries including Russia, Spain, Colombia, and Israel. Enhancements improve the speed of Chinese patent data updates, expand machine translations, and introduce new tools for assignee analysis and monitoring legal status changes. New modules allow for customized alerts, mobile access, and integration of third-party data.
VantagePoint is analysis software that transforms structured text data into actionable intelligence through industry-leading tools for importing, cleaning, analyzing, and reporting data. The latest version of VantagePoint features improved performance and usability, new features and visualizations, as well as an enhanced automation tool called Super Profile that allows users to create customizable, flexible, repeatable and distributable data tables with one click.
This document discusses open source tools for graph and map visualization. It begins with an agenda that includes open source graphs, maps, and demos of the AklaBox and Thermolabo platforms. It then covers various open source mapping tools like OpenStreetMap, and charting/graphing tools like FusionCharts, JFreeChart, Google Charts, and Birt Chart. Statistical graphing tools like R, Weka, and R Shiny are also mentioned. The document demonstrates some maps and graphs as examples. It concludes with discussions of how AklaBox and Thermolabo integrate graphs and maps and how Thermolabo is transforming temperature monitoring data into valuable decision information.
Slalom Consulting is a business and technology consulting firm with over 2,700 consultants across 16 offices in North America and London. Their primary service areas include data visualization, customer and marketing analytics, predictive modeling, data mining, Alteryx, and technical architecture. The document discusses data science and analytical methods like reporting/visualization, market basket analysis, customer lifetime value, and attrition analysis that Slalom utilizes to provide insights for their clients.
This document provides an overview of LexisNexis' IP products: IP Data Direct, TotalPatent, PatentOptimizer, and Patent Advisor. IP Data Direct is a patent search API that delivers search results and patent documents. TotalPatent allows users to research patents for activities like competitive intelligence, prior art searches, and prosecution. PatentOptimizer is a tool that helps draft stronger patents and identify weaknesses. Patent Advisor provides information on pending US patent applications to help manage portfolios and prosecution costs. The document outlines the benefits and capabilities of each product.
The document describes Intellixir, a web system for technology and competitive intelligence. It was originally created in 1997 by the French Atomic Agency and Intellixir was founded in 2002. Intellixir collects and analyzes data from multiple sources to provide insights that drive innovation for its over 60 business clients. The system allows users to collect, consolidate, analyze, collaborate on, and share data and insights through dashboards and reports. It provides capabilities for visualizing data through graphs and statistics.
1. Boehringer Ingelheim Pharma GmbH & Co. KG's Scientific Information Center developed its own web crawler called SEARCHCORPORA to access information not available on public search engines, including university spin-offs, competitor activities, and internal company databases.
2. SEARCHCORPORA allows the Scientific Information Center to build custom searchable indexes of targeted websites and documents to help identify new technology opportunities and monitor competitors. Automatic alerts of relevant news can also be configured.
3. The Scientific Information Center implements a workflow to offer SEARCHCORPORA services to customers, including specifying project scopes, crawling and analyzing information, and providing a search interface and scheduled updates. Future plans include expanding the
The document summarizes a vendor presentation about updates to their patent database. It notes that the database now contains over 640,000 documents and 290 million sequences, including over 55,000 Chinese documents and 43,000 new Canadian documents. A new full text search capability was also introduced, allowing combination of sequence and text searches for life science applications. The presentation invites attendees to a live demo and talk on Tuesday at 4pm to learn more.
The document summarizes recent updates and enhancements to PatBase, a patent search and analytics platform. Key updates include expanded full text patent coverage for several countries including Russia, Spain, Colombia, and Israel. Enhancements improve the speed of Chinese patent data updates, expand machine translations, and introduce new tools for assignee analysis and monitoring legal status changes. New modules allow for customized alerts, mobile access, and integration of third-party data.
VantagePoint is analysis software that transforms structured text data into actionable intelligence through industry-leading tools for importing, cleaning, analyzing, and reporting data. The latest version of VantagePoint features improved performance and usability, new features and visualizations, as well as an enhanced automation tool called Super Profile that allows users to create customizable, flexible, repeatable and distributable data tables with one click.
This document discusses open source tools for graph and map visualization. It begins with an agenda that includes open source graphs, maps, and demos of the AklaBox and Thermolabo platforms. It then covers various open source mapping tools like OpenStreetMap, and charting/graphing tools like FusionCharts, JFreeChart, Google Charts, and Birt Chart. Statistical graphing tools like R, Weka, and R Shiny are also mentioned. The document demonstrates some maps and graphs as examples. It concludes with discussions of how AklaBox and Thermolabo integrate graphs and maps and how Thermolabo is transforming temperature monitoring data into valuable decision information.
Slalom Consulting is a business and technology consulting firm with over 2,700 consultants across 16 offices in North America and London. Their primary service areas include data visualization, customer and marketing analytics, predictive modeling, data mining, Alteryx, and technical architecture. The document discusses data science and analytical methods like reporting/visualization, market basket analysis, customer lifetime value, and attrition analysis that Slalom utilizes to provide insights for their clients.
This document discusses using Elasticsearch to store and analyze patent data. It describes how Elasticsearch provides a document store, full text search engine, and real-time analytics capabilities. Examples are given of using Elasticsearch for a patent document store, patent search engine, log store, and real-time log analysis of patent data. The document concludes by thanking the audience.
The document provides information about Search Technologies, a leading independent IT services firm specializing in enterprise search and big data search solutions. It details their expertise in Microsoft Search, Google Search Appliance, and open source search technologies. It also describes their content processing framework Aspire, connectors for integrating various content sources, and query processing language QPL.
The document discusses text mining and how it has come of age by providing speed to insight using real-time and temporal data. It highlights challenges like the increasing amounts of unstructured data from different sources and formats. Examples show how text mining can help with tasks like identifying disease comorbidities, extracting risk factors, and performing opposition searching across multiple data sources and time periods. The conclusion is that text mining demonstrates clear value for the pharmaceutical and healthcare sectors through time to insight, real-time data analysis, and use of temporal data.
Klaus Kater of black swan presents on analytic search technology to aggregate and analyze data from multiple sources including the surface web, deep web, and corporate resources. Black swan's SEARCHCORPUS indexes crawled documents and extracted structured data, annotating documents with context. The system allows users to pull search interfaces, pimp existing data with crawled information, and push profile-driven notifications. It features graphical tools to design filter chains and document projects, as well as crawling, extraction, analysis, administration, and deployment capabilities.
This document summarizes Dr. Kai Simon's work on large-scale patent classification at the European Patent Office. It discusses how Averbis was selected in 2015 to use text mining to pre-classify unpublished patents and re-classify published patents if the classification system changes. The process involves classifying patents into over 250,000 classification codes across 250 departments, presenting a big data and fast response time challenge that text mining can help address.
This document analyzes patent data related to smart city technologies from 1998-2012. It finds that over 100,000 patent applications were published, with growth increasing over time. Almost half of patents were for smart buildings, followed by smart energy networks. The top patenting offices were from China, US, Korea, and Japan. Electrical engineering was the most common technological domain. French applicants most commonly filed in France and at the EPO, focusing especially on smart energy networks.
This document summarizes a presentation about Text and Data Mining (TDM) and the DirectPath solution from Copyright Clearance Center. The DirectPath solution provides researchers with a centralized way to access licensed full-text content in XML format from multiple publishers for use in TDM projects through a web interface and API. It aims to streamline the content retrieval and licensing process for TDM by normalizing formats, managing licenses, and allowing customization of text analysis and indexing. The solution is designed to support applications like drug discovery and competitive intelligence by facilitating information retrieval and knowledge discovery from large article corpora.
II-SDV 2015 The International Information Conference on Search, Data Mining a...Dr. Haxel Consult
he II-SDV meeting takes place in Nice in April 2016 for an intensive two days. Venue is the Hotel Plaza in central Nice. The meeting provides an international forum for those in the field of advanced search applications, data and text mining, and visualization technology. The primary focus is on tools for intelligence and the meeting examines the requirements of specialists in scientific and technical information.
The meeting will be of interest to those who wish to update themselves and keep in touch with the leading edge of information search and analysis technologies; it features approximately 22 speakers for the two days. There will be an adjacent, focused exhibition to complement the conference programme.
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...Dr. Haxel Consult
The 2017 II-SDV Conference in Nice, 24 - 25 April 2017
The II-SDV meeting takes place in Nice in April 2017 for an intensive two days. Venue is the Hotel Plaza in central Nice. The meeting provides an international forum for those in the field of advanced search applications, data and text mining, and visualization technology. The primary focus is on tools for intelligence and the meeting examines the requirements of specialists in scientific and technical information.
The meeting will be of interest to those who wish to update themselves and keep in touch with the leading edge of information search and analysis technologies; it features approximately 22 speakers for the two days. There will be an adjacent, focused exhibition to complement the conference programme.
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
This document describes a metadata list of patent holders in various countries and regions compiled by Muchiu (Henry) Chang. The list is sorted geographically and includes patent holder names from Canada, China, Hong Kong, Macao, Taiwan, the Middle East, and Europe between 2009-2022. Key features include Chinese-English compatibility and use of open source intelligence. The list has previously been utilized by the Region of Peel in Ontario, Canada.
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
Knowledge Graphs are an increasingly relevant approach to store detailed knowledge in many domains. Recent advances in NLP allow to enrich Knowledge Graphs through automated analysis of large volumes of literature, reducing a lot the efforts in traditional manual information capturing. In our presentation we report the approach taken in a project with partner Fraunhofer SCAI in the life sciences where a knowledge graph organising detailed facts about psychiatric diseases has been computed.
Information of cause-effect relations between proteins, genes, drugs and diseases has been encoded in the BEL (Biological Expression Language) and imported into a Graph database to approach an indication-wide Knowledge Graph for the selected therapeutic area. Ultimately, updating the graph will amount to just rerunning the analysis on the newly published literature.
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
In 2019 the UK was the first major economy to embrace a legal obligation to achieve net zero carbon emissions by 2050. More broadly, the 2021 UK Innovation Strategy sets out the UK government’s vision to make the UK a global hub for innovation by 2035 with a target of increasing public and private sector R&D expenditure to 2.4% of GDP to support the UK being a science superpower with a world-class research and innovation system.
IP rights create an incentive for R&D which ultimately leads to innovation. Analysis and insights from IP data can therefore help provide a better understanding of how the IP system is being used and where and what innovation is taking place. Research and analysis of IP data is a key input to the ongoing work of the UKIPO’s Green Tech Working Group which seeks to:
further the UK’s status as a global leader by making the UK’s IP environment the best for innovating green technology;
develop and deliver IP policies to support government’s ambition on climate change and green technologies; and
to help innovators best protect and commercialise their green tech innovations both at home and internationally.
The UKIPO has been developing a broad portfolio of ‘green’ IP analytics research. A series of patent analytics reports have been published looking at green technologies, and analysis of how the UK’s Green Channel scheme for accelerated processing of green patent applications has been conducted. Patents have been used to identify technological comparative advantage within different green technologies at a country level, and new insights uncovered by mapping green technology patents to the UN Sustainable Development Goals (SDGs). Trade mark data provides a timeliness and closeness to market factor that patent data does not, and complementary trade mark analysis of UK ‘green’ trade marks, identified using a machine learning algorithm, provides a commercialisation angle to our research.
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
This document profiles Linda Andersson, the CEO of Artificial Researcher. It provides details on her background, awards, research fields, and academic merits. It then discusses how domain knowledge enables artificial intelligence systems to be smarter by allowing them to understand language and text in particular domains at a deeper level. Finally, it provides an overview of Artificial Researcher's natural language processing and text mining technologies and services for tasks like passage retrieval, ontology generation, and semantic search.
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
In 2013 we witnessed an evolutionary change in the NLP field evolved thanks to the introduction of space embeddings that, with the use of deep learning architectures, achieved human-level performances in many NLP tasks. With the introduction of the Attention mechanism in 2017 the results were further improved and, as result, embeddings are quickly becoming the de facto standards in solving many NLP problems. In this presentation, you will learn how generate and use space embedding for search purposes and provide comparison metrics to more traditional relevance-based search engines. Moreover, I will provide some initial results on a paper currently under review that provides an insight on hyperparameter tuning during the generation of embeddings.
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
10 years in the making. How real-world business cases have driven the development of CCC's deep search solutions, leading to the capabilities for web-crawling and delivery of targeted intelligence that helps R&D; intensive companies gain a competitive advantage.
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
It is relatively easy for a human to read a document and quickly figure out which concepts are important. However, this task is a difficult challenge for a machine. During the past few decades, there have been two main approaches for concept identification: Natural Language Processing and Machine Learning. During the early part of this century, Machine Learning made great strides as new techniques came into wider use (SVM’s, Topic Modeling, etc..). Sensing the competition, Natural Language Processing responded with deployment of new emerging techniques (sematic networks, finite state automata, etc..). Neither approach has completely solved the WHAT problem. Advances in Artificial Intelligence have the potential to significantly improve the situation. Where AI is making the most impact is as an enhancement to make Machine Learning and Natural Language Processing work better and, more importantly, work together. This presentation looks at some of this history and what might happen in the future when we blend the interpretation of language with pattern prediction.
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
This document discusses using natural language processing on trademark text data to gain insights. It presents research on how trademark activity changed during COVID-19, detecting emerging trends in trademarks over time, and classifying trademarks by industry. The research uses techniques like topic modeling and deep learning classifiers to analyze trademarks and identify patterns. The analysis of trademarks can provide economic indicators and reveal where businesses are focusing their innovation and market presence.
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
In our customer projects involving automated document processing, we often encounter document types providing crucial data in the form of tables. While established text analytics algorithms are usually optimized to operate on running text, they tend to produce rather poor results on tables as they do not capture the non-sequential relations inside them (e.g. interpret the content of a table cell relative to its column title, interpret line breaks inside a cell differently from line breaks between cells or rows). While there are elaborate information extraction products in the market for a few highly specific types of tabular documents, there is no general approach out there. The main cause for this is the fact that table structures can be encoded by a heterogenous range of layout means (e.g. column boundaries can be signaled by lines vs. aligned text vs. white space). In this talk, we will illustrate several solutions that we have developed for a range of challenges occurring in this context, both for scanned and digitally generated documents.
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is published usually unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or data mining unnecessarily difficult or even impossible. A concept is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently.
This document discusses using Elasticsearch to store and analyze patent data. It describes how Elasticsearch provides a document store, full text search engine, and real-time analytics capabilities. Examples are given of using Elasticsearch for a patent document store, patent search engine, log store, and real-time log analysis of patent data. The document concludes by thanking the audience.
The document provides information about Search Technologies, a leading independent IT services firm specializing in enterprise search and big data search solutions. It details their expertise in Microsoft Search, Google Search Appliance, and open source search technologies. It also describes their content processing framework Aspire, connectors for integrating various content sources, and query processing language QPL.
The document discusses text mining and how it has come of age by providing speed to insight using real-time and temporal data. It highlights challenges like the increasing amounts of unstructured data from different sources and formats. Examples show how text mining can help with tasks like identifying disease comorbidities, extracting risk factors, and performing opposition searching across multiple data sources and time periods. The conclusion is that text mining demonstrates clear value for the pharmaceutical and healthcare sectors through time to insight, real-time data analysis, and use of temporal data.
Klaus Kater of black swan presents on analytic search technology to aggregate and analyze data from multiple sources including the surface web, deep web, and corporate resources. Black swan's SEARCHCORPUS indexes crawled documents and extracted structured data, annotating documents with context. The system allows users to pull search interfaces, pimp existing data with crawled information, and push profile-driven notifications. It features graphical tools to design filter chains and document projects, as well as crawling, extraction, analysis, administration, and deployment capabilities.
This document summarizes Dr. Kai Simon's work on large-scale patent classification at the European Patent Office. It discusses how Averbis was selected in 2015 to use text mining to pre-classify unpublished patents and re-classify published patents if the classification system changes. The process involves classifying patents into over 250,000 classification codes across 250 departments, presenting a big data and fast response time challenge that text mining can help address.
This document analyzes patent data related to smart city technologies from 1998-2012. It finds that over 100,000 patent applications were published, with growth increasing over time. Almost half of patents were for smart buildings, followed by smart energy networks. The top patenting offices were from China, US, Korea, and Japan. Electrical engineering was the most common technological domain. French applicants most commonly filed in France and at the EPO, focusing especially on smart energy networks.
This document summarizes a presentation about Text and Data Mining (TDM) and the DirectPath solution from Copyright Clearance Center. The DirectPath solution provides researchers with a centralized way to access licensed full-text content in XML format from multiple publishers for use in TDM projects through a web interface and API. It aims to streamline the content retrieval and licensing process for TDM by normalizing formats, managing licenses, and allowing customization of text analysis and indexing. The solution is designed to support applications like drug discovery and competitive intelligence by facilitating information retrieval and knowledge discovery from large article corpora.
II-SDV 2015 The International Information Conference on Search, Data Mining a...Dr. Haxel Consult
he II-SDV meeting takes place in Nice in April 2016 for an intensive two days. Venue is the Hotel Plaza in central Nice. The meeting provides an international forum for those in the field of advanced search applications, data and text mining, and visualization technology. The primary focus is on tools for intelligence and the meeting examines the requirements of specialists in scientific and technical information.
The meeting will be of interest to those who wish to update themselves and keep in touch with the leading edge of information search and analysis technologies; it features approximately 22 speakers for the two days. There will be an adjacent, focused exhibition to complement the conference programme.
II-SDV 2017 in Nice - The International Information Conference on Search, Dat...Dr. Haxel Consult
The 2017 II-SDV Conference in Nice, 24 - 25 April 2017
The II-SDV meeting takes place in Nice in April 2017 for an intensive two days. Venue is the Hotel Plaza in central Nice. The meeting provides an international forum for those in the field of advanced search applications, data and text mining, and visualization technology. The primary focus is on tools for intelligence and the meeting examines the requirements of specialists in scientific and technical information.
The meeting will be of interest to those who wish to update themselves and keep in touch with the leading edge of information search and analysis technologies; it features approximately 22 speakers for the two days. There will be an adjacent, focused exhibition to complement the conference programme.
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
This document describes a metadata list of patent holders in various countries and regions compiled by Muchiu (Henry) Chang. The list is sorted geographically and includes patent holder names from Canada, China, Hong Kong, Macao, Taiwan, the Middle East, and Europe between 2009-2022. Key features include Chinese-English compatibility and use of open source intelligence. The list has previously been utilized by the Region of Peel in Ontario, Canada.
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
Knowledge Graphs are an increasingly relevant approach to store detailed knowledge in many domains. Recent advances in NLP allow to enrich Knowledge Graphs through automated analysis of large volumes of literature, reducing a lot the efforts in traditional manual information capturing. In our presentation we report the approach taken in a project with partner Fraunhofer SCAI in the life sciences where a knowledge graph organising detailed facts about psychiatric diseases has been computed.
Information of cause-effect relations between proteins, genes, drugs and diseases has been encoded in the BEL (Biological Expression Language) and imported into a Graph database to approach an indication-wide Knowledge Graph for the selected therapeutic area. Ultimately, updating the graph will amount to just rerunning the analysis on the newly published literature.
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
In 2019 the UK was the first major economy to embrace a legal obligation to achieve net zero carbon emissions by 2050. More broadly, the 2021 UK Innovation Strategy sets out the UK government’s vision to make the UK a global hub for innovation by 2035 with a target of increasing public and private sector R&D expenditure to 2.4% of GDP to support the UK being a science superpower with a world-class research and innovation system.
IP rights create an incentive for R&D which ultimately leads to innovation. Analysis and insights from IP data can therefore help provide a better understanding of how the IP system is being used and where and what innovation is taking place. Research and analysis of IP data is a key input to the ongoing work of the UKIPO’s Green Tech Working Group which seeks to:
further the UK’s status as a global leader by making the UK’s IP environment the best for innovating green technology;
develop and deliver IP policies to support government’s ambition on climate change and green technologies; and
to help innovators best protect and commercialise their green tech innovations both at home and internationally.
The UKIPO has been developing a broad portfolio of ‘green’ IP analytics research. A series of patent analytics reports have been published looking at green technologies, and analysis of how the UK’s Green Channel scheme for accelerated processing of green patent applications has been conducted. Patents have been used to identify technological comparative advantage within different green technologies at a country level, and new insights uncovered by mapping green technology patents to the UN Sustainable Development Goals (SDGs). Trade mark data provides a timeliness and closeness to market factor that patent data does not, and complementary trade mark analysis of UK ‘green’ trade marks, identified using a machine learning algorithm, provides a commercialisation angle to our research.
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
This document profiles Linda Andersson, the CEO of Artificial Researcher. It provides details on her background, awards, research fields, and academic merits. It then discusses how domain knowledge enables artificial intelligence systems to be smarter by allowing them to understand language and text in particular domains at a deeper level. Finally, it provides an overview of Artificial Researcher's natural language processing and text mining technologies and services for tasks like passage retrieval, ontology generation, and semantic search.
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
In 2013 we witnessed an evolutionary change in the NLP field evolved thanks to the introduction of space embeddings that, with the use of deep learning architectures, achieved human-level performances in many NLP tasks. With the introduction of the Attention mechanism in 2017 the results were further improved and, as result, embeddings are quickly becoming the de facto standards in solving many NLP problems. In this presentation, you will learn how generate and use space embedding for search purposes and provide comparison metrics to more traditional relevance-based search engines. Moreover, I will provide some initial results on a paper currently under review that provides an insight on hyperparameter tuning during the generation of embeddings.
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
10 years in the making. How real-world business cases have driven the development of CCC's deep search solutions, leading to the capabilities for web-crawling and delivery of targeted intelligence that helps R&D; intensive companies gain a competitive advantage.
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
It is relatively easy for a human to read a document and quickly figure out which concepts are important. However, this task is a difficult challenge for a machine. During the past few decades, there have been two main approaches for concept identification: Natural Language Processing and Machine Learning. During the early part of this century, Machine Learning made great strides as new techniques came into wider use (SVM’s, Topic Modeling, etc..). Sensing the competition, Natural Language Processing responded with deployment of new emerging techniques (sematic networks, finite state automata, etc..). Neither approach has completely solved the WHAT problem. Advances in Artificial Intelligence have the potential to significantly improve the situation. Where AI is making the most impact is as an enhancement to make Machine Learning and Natural Language Processing work better and, more importantly, work together. This presentation looks at some of this history and what might happen in the future when we blend the interpretation of language with pattern prediction.
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
This document discusses using natural language processing on trademark text data to gain insights. It presents research on how trademark activity changed during COVID-19, detecting emerging trends in trademarks over time, and classifying trademarks by industry. The research uses techniques like topic modeling and deep learning classifiers to analyze trademarks and identify patterns. The analysis of trademarks can provide economic indicators and reveal where businesses are focusing their innovation and market presence.
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
In our customer projects involving automated document processing, we often encounter document types providing crucial data in the form of tables. While established text analytics algorithms are usually optimized to operate on running text, they tend to produce rather poor results on tables as they do not capture the non-sequential relations inside them (e.g. interpret the content of a table cell relative to its column title, interpret line breaks inside a cell differently from line breaks between cells or rows). While there are elaborate information extraction products in the market for a few highly specific types of tabular documents, there is no general approach out there. The main cause for this is the fact that table structures can be encoded by a heterogenous range of layout means (e.g. column boundaries can be signaled by lines vs. aligned text vs. white space). In this talk, we will illustrate several solutions that we have developed for a range of challenges occurring in this context, both for scanned and digitally generated documents.
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is published usually unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or data mining unnecessarily difficult or even impossible. A concept is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently.
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
This document discusses using AI tools to improve patent search and analysis. It provides metrics on how well an AI system called IPscreener can retrieve patent citations compared to examiners. The metrics show recall rates increase with longer input text and when users provide additional context. Machine translation negatively impacts performance, but the AI can help users navigate patents by selecting relevant text segments. The goal is for AI to boost innovation by improving how users search for and understand prior art.
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
How do you find video when you only have sparse data? While you can wander the stacks (if you can still find open stacks) for inspiration, video either physical or digital, is difficult to discover. Wandering the virtual stacks is, well, virtually impossible. Discovery platforms on the whole have not replicated the inspirational experience of wandering the stacks.
More companies are using archivable video for internal communication of the various research projects, product developments, test results, and more that are being considered, in progress, or completed. Showing how an experiment was conducted can convey considerably more information that is very difficult to communicate via text. How do you find a company video that might be helpful for your project?
A case study is presented of the problems and the solutions that were implemented by a large, multinational chemical company. A suite of content discovery technologies was used including a video to text to tagging system connected to their documents database and automatically indexed using several chemical as well as conceptual systems (rule-based, NLP, inference engine). To build the system and support the manuscript and video submission there is a metadata extraction program which pulls and inserts the metadata into the submission forms so the author can move quickly through that process.
Copyright Clearance Center
A pioneer in voluntary collective licensing, CCC (Copyright Clearance Center) helps organizations integrate, access, and share information through licensing, content, software, and professional services. With expertise in copyright and information management, CCC and its subsidiary RightsDirect collaborate with stakeholders to design and deliver innovative information solutions that power decision-making by helping people integrate and navigate data sources and content assets. CCC recently acquired the assets and technology of Deep SEARCH 9 (DS9), a knowledge management platform that leverages machine learning to help customers perform semantic search, tag content, and discover new insights.
Lighthouse IP is the world’s leading provider of intellectual property content. The core business of Lighthouse IP is sourcing and creating content from the world’s most challenging authorities. Specialized in IP data, Lighthouse IP provides over 160 countries coverage for patents, over 200 authorities for trademarks and over 90 authorities for designs. Lighthouse IP data is available via several partners. The company is headquartered in Schiphol-Rijk in the Netherlands and has offices in the United States, China, Thailand, Vietnam, Egypt, Indonesia and Belarus. Globally a team of 150 experts works on the creation of this unique data collection.
CENTREDOC was created in 1964 as the technical information center of the swiss watchmaking industry. Building on a strong team of engineers, CENTREDOC now offers a complete range of services and solutions for the monitoring of strategic, technological and competitive information. CENTREDOC is also a leader in the research of patent, technical and business intelligence, and offers consulting expertise in the implementation of monitoring solutions.
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
The everyday use of AI-driven algorithms for data search, analysis and synthesis comes with important time savings, but also reveals the need to understand and accept the limitations of the technology. Practical deployments on concrete topics are inevitable to assess and manage the challenges of neuronal network based AI. A workshop report.
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
What if there was a platform where literature, conference abstracts, patents, clinical trials, news, grants and other sources were fully integrated? What if the data would be harmonized, enriched with standardized concepts and ready for analysis? After building our patent analytics platform we didn’t stop dreaming and built our big data analytics platform by semantically integrating text-rich, scientific sources. In my presentation I will talk about what we built and why we built it. And, of course, I will also address the challenges and hurdles along the way. Was it worth it and what comes next? Let’s talk about it!
Discover the benefits of outsourcing SEO to Indiadavidjhones387
"Discover the benefits of outsourcing SEO to India! From cost-effective services and expert professionals to round-the-clock work advantages, learn how your business can achieve digital success with Indian SEO solutions.
HijackLoader Evolution: Interactive Process HollowingDonato Onofri
CrowdStrike researchers have identified a HijackLoader (aka IDAT Loader) sample that employs sophisticated evasion techniques to enhance the complexity of the threat. HijackLoader, an increasingly popular tool among adversaries for deploying additional payloads and tooling, continues to evolve as its developers experiment and enhance its capabilities.
In their analysis of a recent HijackLoader sample, CrowdStrike researchers discovered new techniques designed to increase the defense evasion capabilities of the loader. The malware developer used a standard process hollowing technique coupled with an additional trigger that was activated by the parent process writing to a pipe. This new approach, called "Interactive Process Hollowing", has the potential to make defense evasion stealthier.
Ready to Unlock the Power of Blockchain!Toptal Tech
Imagine a world where data flows freely, yet remains secure. A world where trust is built into the fabric of every transaction. This is the promise of blockchain, a revolutionary technology poised to reshape our digital landscape.
Toptal Tech is at the forefront of this innovation, connecting you with the brightest minds in blockchain development. Together, we can unlock the potential of this transformative technology, building a future of transparency, security, and endless possibilities.
Gen Z and the marketplaces - let's translate their needsLaura Szabó
The product workshop focused on exploring the requirements of Generation Z in relation to marketplace dynamics. We delved into their specific needs, examined the specifics in their shopping preferences, and analyzed their preferred methods for accessing information and making purchases within a marketplace. Through the study of real-life cases , we tried to gain valuable insights into enhancing the marketplace experience for Generation Z.
The workshop was held on the DMA Conference in Vienna June 2024.
Gen Z and the marketplaces - let's translate their needs
II-SDV 2015, 20 - 21 April, in Nice
1. 1
An Overview of the Enterprise Search
Market, & Current Best Practices
Iain Fletcher
ifletcher@Searchtechnologies.com
April 20, 2015
2. 2
Agenda
• A brief overview of the current enterprise search
market
• The convergence of search with analytics
disciplines
• Likely future architectures for search applications
4. 4
High-level Search Engine Classifications
1. Part of a portfolio, many are recently acquired technologies
– E.g. SharePoint, HP Autonomy, IBM/Vivisimo, Dassault/Exalead
2. Stand-alone specialists, often bought to address specific apps
– E.g. GSA, Coveo, Attivio, Sinequa, Recommind
3. Open source, with or without support or proprietary add-ons
– Raw: E.g. Lucene, Solr, Elasticsearch
– With support/add-ons: E.g. LucidWorks, Cloudera Search, Elastic
4. Cloud-based services, typically based on open source technology
– E.g. Amazon Cloudsearch, MS Azure search
5. 5
The dominant market share is with SharePoint, open
source, and the Google Search Appliance
• SharePoint 2013 search is credible, and bundled
– Search teams are under pressure to use it, or to provide a
compelling reason to do otherwise
• Solr and Elasticsearch are robust and reliable
– Thanks to very wide-spread deployment
• The Google brand sells search – and a lot of GSAs have
been shipped during the past few years
Market Observations
6. 6
Functional Observations
• Core indexing / searching is generally fast and reliable
– Search is a maturing technology
• Key differences remain in peripheral functionality, such as
content processing prior to indexing. For example:
– Coveo, Attivio, Sinequa all have well-developed indexing
pipelines, UI tools, and a range of data connectors
– SharePoint and GSA have limited content processing
functionality and rely on 3rd parties for connectivity
– Solr, Elasticsearch, AWS Cloudsearch and Azure search don’t
provide a formal indexing pipeline, UI, or connectors
7. 7
Further Observations
• The search engines with less focus on peripheral issues
(such as content processing and connectivity) have
dominant market share
• Connectivity remains challenging, especially when
combined with continual data growth
• The movement of data sets to the cloud adds further
complexity
– Hybrid indexing environments will be with us for some years
8. 8
Content Processing / Text Analysis Examples
• Normalization
– Names, dates, synonyms, spelling
• Entity identification and resolution
• Additional metadata from content analysis
• Categorization
• Document vector extraction
• Splitting and concatenation
• Dupe & near-dupe detection
• Link analysis
• Ingesting external signals
• Security enforcement and analysis
Index
security
category
metadata
9. 9
Future Directions
So what will search architectures look like in the future?
Important Influences:
• The need for organizational and analytical agility
• The convergence of search and (“big data”) analytics
• Continual growth in data volumes, and churn in repository
/ storage fashions
10. 10
Converging Architectures
Let’s take a brief look at:
1. The “Big Data Architecture”, evangelized by IBM,
Cloudera, etc.
2. Contemporary Search Architectures
Background Info
12. 12
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
Designed for Unstructured Content
13. 13
The Traditional Search Architecture
Integrated Search EngineContent
Sources
Connectors Index Pipeline Search
IndexEmployee
Directory
CMS
File Share
UI
Etc.
• A few documents-per-second?
• There are only 2.6 million seconds in a month
• If you change something significant in the index
pipeline, you will need to re-index
RE-INDEX
14. 14
A Better Search Architecture
• Re-indexing rates greatly improved
• “Touch-time” with repositories can be managed autonomously
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
Index
Employee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Staging
Repository
Iterative
Development
15. 15
The Future Architecture?
Hadoop
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
IndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Staging
Repository
Iterative
Development
• This environment will encourage ever more sophisticated content
processing
• We expect much innovation in text analytics during the next few years
• Driven by cheap, easily available processing power
• The deliverable is a richer search index
16. 16
The Future Architecture
Hadoop
Search EngineContent
Sources
Connectors
Index
Pipeline
Search
IndexEmployee
Directory
CMS
Etc.
RE-INDEX
Content
Processing
Staging
Repository
Iterative
Development
• Google.com works something like this
for 10+ years
17. 17
An Integrated Search/Analytics Architecture
Hadoop
Content
Sources
Connectors
/ Crawlers
CMS
File system
Rapid, & ad hoc Indexing
Content
Processing
Staging
Repository
Iterative
Development
ETL
Data
Sources
Data
Warehouse
Logfiles
Etc.
OSINT Search
App.
Search
App.
Analysis
App.
Analysis
App.
• Encourages agile exploitation of data and content resources
18. 18
Summary
• Search and Analytics are tending towards to the same
architecture
• Autonomous connectivity and content processing systems
simplify and de-risk projects
• The “search index” is a mature technology, and becoming a
commodity
– Thanks to open source alternatives setting high standards
• The centre of attention is shifting from the index to the
content preparation
– This perhaps fits well with the profile of dominant market
leaders: SharePoint, GSA, Solr, Elasticsearch….
19. 19
Conclusion
• The foundation of great search and analytical applications
is a clean, rich and detailed index
• Much of the innovation during the next years will be in
content analytics
– The architecture discussed makes it easy to adopt new ideas
and products
– And it promotes agility, experimentation, and innovation
• In a data-driven world, agility is vital
20. 20
The analyst quote….
And finally….
“Enterprise Search Can Bring Big Data Within Reach”
• Multiple, purpose-built indexes that are derived from enriched
content are necessary.
http://blogs.gartner.com/darin-stewart/2014/04/01/enterprise-search-can-bring-big-data-within-reach/
* Darin Stewart, Enterprise Search Can Bring Big Data Within Reach, April 2014 Blog
21. 21
An Overview of the Current Enterprise
Search Market, & Current Best Practices
Iain Fletcher
ifletcher@Searchtechnologies.com
April 20, 2015
Thank you!