Advances in text mining, analytics and machine learning are transforming our applications and enabling ever more powerful applications, yet most applications and platforms are designed to deal with a single (normalized) language. Hence as our applications and platforms are increasingly required to ingest international content, the challenge becomes to find ways to normalize content to a single language without compromising quality. An extension of this question in terms of such applications is also how we define quality in this context and what, if any, bi-products a localization effort can produce that may enhance the usefulness of the application.
This talk will, using patent searching as an example use case, review the challenges and possible solution approaches for handling localization effectively and will show what current emerging technology offers, what to expect and what not to expect and provide an introductory practical guide to handling localization in the context of data mining and analytics.
II-SDV 2017: Applications of RNN (Recurrent Neural Networks) within Machine T...Dr. Haxel Consult
Pierre Bernassau will present the state of the art in Artificial Intelligence and Recurrent Neural Networks applied to natural language and in particular in the Machine Translation domain. This disruptive technology does not shift previous practices based on rules and statistics, but it makes new fields possible. New fields for end-users that can apply MT technologies on new languages, text styles, documents, or messages with a ‘good-enough’ result. But also new fields in terms of good practices, where new projects, new workflow, new applications are addressed that expand de facto the MT market.
With the experience of several recent projects done by his consulting team, Pierre will explain the best practices to apply Neural Technologies within the NLP field.
II-SDV 2017: Semantic Search Jargon - A short GuideDr. Haxel Consult
In the early 1990s, the term 'semantic' appeared in the context of text retrieval tools. However, from the very beginning of Information Retrieval as a research field (i.e. as computer-assisted identification of relevant documents), looking at the articles of Vannevar Bush (How we may think) or Luhn (The automatic creation of literature abstracts) in the 1940s and '50s, the idea of semantics was already there.
So where are we now in terms of semantics? The `latent semantic indexing` of the 1990s faded away, and the first decade of the millennium enthusiastically studied semantic web technologies. Now, in the second decade, `deep learning` is the new star. In this talk I will give a high-level overview of what has been done already, particularly in the context of the patent domain, what the main techniques are, and in which directions is the scientific community looking today. Ultimately, there will be no one answer to the question of 'What is semantic search?'. Instead, my aim is to empower the audience to ask the right questions next time somebody mentions the term.
II-SDV 2017: Applications of RNN (Recurrent Neural Networks) within Machine T...Dr. Haxel Consult
Pierre Bernassau will present the state of the art in Artificial Intelligence and Recurrent Neural Networks applied to natural language and in particular in the Machine Translation domain. This disruptive technology does not shift previous practices based on rules and statistics, but it makes new fields possible. New fields for end-users that can apply MT technologies on new languages, text styles, documents, or messages with a ‘good-enough’ result. But also new fields in terms of good practices, where new projects, new workflow, new applications are addressed that expand de facto the MT market.
With the experience of several recent projects done by his consulting team, Pierre will explain the best practices to apply Neural Technologies within the NLP field.
II-SDV 2017: Towards Semantic Search at the European Patent OfficeDr. Haxel Consult
With the ever-increasing volume of data to be searched, the techniques of semantic search will be key to successful prior art searching. These techniques consist of methods for understanding the searcher`s intent, disambiguating the contextual meaning of (search) terms and ultimately improving search accuracy by generating more relevant results. This
presentation explores how far the EPO has come in enabling some of those key elements through projects such as Annotated Patent Literature and Enhanced Ranking. It also introduces even more sophisticated models based on machined-learned algorithms that might help to shape the future of search at the EPO.
II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...Dr. Haxel Consult
Web scraping, content filtering, tagging and feeding web data into the day to day work environment takes many different shapes and requires an additional software stack that is blending well with existing big data analysis, text analysis and search technology.
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataDr. Haxel Consult
Analysis and visualisation of a “galaxy” of patent data can present challenges in being able to spot the “stars”. Being able to drawn meaningful conclusions from any patent landscape relies on the quality and comprehensiveness of the data input, as well as having features and functionality to visualize the landscape accurately and being able to focus on an area of interest.
Delivered from the perspective of an experienced patent analyst, we will use case studies to discuss the challenges in creating a meaningful patent landscape and the recent innovative features and functionality in PatBase which can help, including:
Using Analytics for customised, multidimensional analysis and to visually compare multiple datasets.
Text-mining to automatically identify and highlight chemical, physical, genetic and medical concepts within any full text patent.
Being able to efficiently identify the exact location of a chemical entity anywhere in the full text.
The ability to easily review and filter patent citations based on their relevance, origin and assignee.
This presentation will demonstrate how any user can benefit from the innovative features and functionality in PatBase to interrogate and visualize the patent landscape for any technical area.
Averbis is specialized in the area of text mining and machine-learning-based patent monitoring. We help our clients screen large numbers of patents in no time, estimate their relevancy for the company and automatically classify them into customer-specific categories. Our approach is based on artificial intelligence – with the result that it learns from and imitates the behavior of IP professionals. Compared to conventional rule-based approaches, our approach is up to 400% more accurate and achieves the same accuracy offered by manual monitoring. At the same time, it reduces manual patent monitoring intervention by up to 80%. Thanks to Information Discovery, we enable IP professionals to reduce backlogs, improve staff efficiency and minimize inconsistencies associated with patent monitoring, ultimately improving the experience both for you and your customers.
II-SDV 2017: Applications of RNN (Recurrent Neural Networks) within Machine T...Dr. Haxel Consult
Pierre Bernassau will present the state of the art in Artificial Intelligence and Recurrent Neural Networks applied to natural language and in particular in the Machine Translation domain. This disruptive technology does not shift previous practices based on rules and statistics, but it makes new fields possible. New fields for end-users that can apply MT technologies on new languages, text styles, documents, or messages with a ‘good-enough’ result. But also new fields in terms of good practices, where new projects, new workflow, new applications are addressed that expand de facto the MT market.
With the experience of several recent projects done by his consulting team, Pierre will explain the best practices to apply Neural Technologies within the NLP field.
II-SDV 2017: Semantic Search Jargon - A short GuideDr. Haxel Consult
In the early 1990s, the term 'semantic' appeared in the context of text retrieval tools. However, from the very beginning of Information Retrieval as a research field (i.e. as computer-assisted identification of relevant documents), looking at the articles of Vannevar Bush (How we may think) or Luhn (The automatic creation of literature abstracts) in the 1940s and '50s, the idea of semantics was already there.
So where are we now in terms of semantics? The `latent semantic indexing` of the 1990s faded away, and the first decade of the millennium enthusiastically studied semantic web technologies. Now, in the second decade, `deep learning` is the new star. In this talk I will give a high-level overview of what has been done already, particularly in the context of the patent domain, what the main techniques are, and in which directions is the scientific community looking today. Ultimately, there will be no one answer to the question of 'What is semantic search?'. Instead, my aim is to empower the audience to ask the right questions next time somebody mentions the term.
II-SDV 2017: Applications of RNN (Recurrent Neural Networks) within Machine T...Dr. Haxel Consult
Pierre Bernassau will present the state of the art in Artificial Intelligence and Recurrent Neural Networks applied to natural language and in particular in the Machine Translation domain. This disruptive technology does not shift previous practices based on rules and statistics, but it makes new fields possible. New fields for end-users that can apply MT technologies on new languages, text styles, documents, or messages with a ‘good-enough’ result. But also new fields in terms of good practices, where new projects, new workflow, new applications are addressed that expand de facto the MT market.
With the experience of several recent projects done by his consulting team, Pierre will explain the best practices to apply Neural Technologies within the NLP field.
II-SDV 2017: Towards Semantic Search at the European Patent OfficeDr. Haxel Consult
With the ever-increasing volume of data to be searched, the techniques of semantic search will be key to successful prior art searching. These techniques consist of methods for understanding the searcher`s intent, disambiguating the contextual meaning of (search) terms and ultimately improving search accuracy by generating more relevant results. This
presentation explores how far the EPO has come in enabling some of those key elements through projects such as Annotated Patent Literature and Enhanced Ranking. It also introduces even more sophisticated models based on machined-learned algorithms that might help to shape the future of search at the EPO.
II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...Dr. Haxel Consult
Web scraping, content filtering, tagging and feeding web data into the day to day work environment takes many different shapes and requires an additional software stack that is blending well with existing big data analysis, text analysis and search technology.
II-SDV 2017: Spotting the Stars in your Galaxy of Patent DataDr. Haxel Consult
Analysis and visualisation of a “galaxy” of patent data can present challenges in being able to spot the “stars”. Being able to drawn meaningful conclusions from any patent landscape relies on the quality and comprehensiveness of the data input, as well as having features and functionality to visualize the landscape accurately and being able to focus on an area of interest.
Delivered from the perspective of an experienced patent analyst, we will use case studies to discuss the challenges in creating a meaningful patent landscape and the recent innovative features and functionality in PatBase which can help, including:
Using Analytics for customised, multidimensional analysis and to visually compare multiple datasets.
Text-mining to automatically identify and highlight chemical, physical, genetic and medical concepts within any full text patent.
Being able to efficiently identify the exact location of a chemical entity anywhere in the full text.
The ability to easily review and filter patent citations based on their relevance, origin and assignee.
This presentation will demonstrate how any user can benefit from the innovative features and functionality in PatBase to interrogate and visualize the patent landscape for any technical area.
Averbis is specialized in the area of text mining and machine-learning-based patent monitoring. We help our clients screen large numbers of patents in no time, estimate their relevancy for the company and automatically classify them into customer-specific categories. Our approach is based on artificial intelligence – with the result that it learns from and imitates the behavior of IP professionals. Compared to conventional rule-based approaches, our approach is up to 400% more accurate and achieves the same accuracy offered by manual monitoring. At the same time, it reduces manual patent monitoring intervention by up to 80%. Thanks to Information Discovery, we enable IP professionals to reduce backlogs, improve staff efficiency and minimize inconsistencies associated with patent monitoring, ultimately improving the experience both for you and your customers.
II-SDV 2017: What is Innovation and how can we measure it?Dr. Haxel Consult
Innovation means many different things to many people. Ask five people and you will likely get ten answers. But all agree that it is a key driver behind the success of organizations, the growth of economies and provides major contributions in addressing global problems. This presentation will examine various analytical methods and possible metrics for measuring innovation and determining relative performance of organizations. The challenges involved in assessing innovation and how these can be addressed will be explored. The pros and cons associated with the metrics identified will also be discussed with a view to identifying a practical method for assessing innovation.
II-SV 2017: How to effectively monitor Technological Developments in IPDr. Haxel Consult
Modern, cutting-edge developments are not reflected in current patent classification systems, which tend to catalogue established technologies. Identifying patent portfolios in such emerging fields proves a challenging job for patent and technology experts.
Going beyond the mere identification of new IP, additional value may be added using a regional geographic weighting combined with consolidated portfolio owner information.
Effective monitoring of the technological field is achieved by training active-learning search engines to hunt for highly relevant patent documents, thus keeping IP portfolios for emerging technologies up to date. The system we have developed permits extremely accurate updates with drastically reduced noise and with low workload which have proven to be invaluable in a world of drastically increasing data blur.
Gridlogics is a leading provider of products and custom software solutions for patent research, management, data analysis and project management. Our products leverage the latest techniques in information retrieval, data mining and visualizations to help clients globally in deriving actionable intelligence from the masses of patent data.
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionDr. Haxel Consult
The global data sphere, consisting of machine data and human data, is growing exponentially reaching the order of zettabytes. In comparison, the processing power of computers has been stagnating for many years. Artificial Intelligence – a newer variant of Machine Learning – bypasses the need to understand a system when modelling it; however, this convenience comes with extremely high energy consumption.
The complexity of language makes statistical Natural Language Understanding (NLU) models particularly energy hungry. Since most of the zettabyte data sphere consists of human data, such as texts or social networks, we face four major obstacles:
1. Findability of Information – when truth is hard to find, fake news rule
2. Von Neumann Gap – when processors cannot process faster, then we need more of them (energy)
3. Stuck in the Average – when statistical models generate a bias toward the majority, innovation has a hard time
4. Privacy – if user profiles are created “passively” on the server side instead of “actively” on the client side, we lose control
The current approach to overcoming these limitations is to use larger and larger data sets on more and more processing nodes for training. AI algorithms should be optimized for efficiency rather than precision. In this case, statistical modelling should be disqualified as a brute force approach for language applications. When replacing statistical modelling and arithmetic, set theory and geometry seem to be a much better choice as it allows the direct processing of words instead of their occurrence counts, which is exactly what the human brain does with language – using only 7 Watts!
Averbis is specialized in the area of text mining and machine-learning-based patent monitoring. We help our clients screen large numbers of patents in no time, estimate their relevancy for the company and automatically classify them into customer-specific categories. Our approach is based on artificial intelligence – with the result that it learns from and imitates the behavior of IP professionals. Compared to conventional rule-based approaches, our approach is up to 400% more accurate and achieves the same accuracy offered by manual monitoring. At the same time, it reduces manual patent monitoring intervention by up to 80%. Thanks to Information Discovery, we enable IP professionals to reduce backlogs, improve staff efficiency and minimize inconsistencies associated with patent monitoring, ultimately improving the experience both for you and your customers.
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...Dr. Haxel Consult
Parthiban Srinivasan (VINGYANI, India)
When new technologies become easier to use, they transform industries. That's what's happening with artificial intelligence (AI) and big data. Machine learning is often described as a type of AI where computers learn to do something without being programmed to do it. Deep learning, a subset of machine learning, is proving to work especially well on classification. Big breakthroughs happen when what is suddenly possible meets what is desperately needed. For years, patent analysts have been searching and reviewing terabytes of information, not only patents but also non-patent information. Not only to find prior art but also to identify patents of interest, rate their quality, assess the potential value of patent clusters, and identify potential business partners or infringers. With the rapid increase in the number of patent documents worldwide, demand for their automatic clustering/categorization has grown significantly. Many information science researchers have started to experiment with machine learning tools, but the adoption in the patent information space has been sporadic. In this talk, we aim to review the prevailing machine learning techniques and present several sample implementations by various research groups. We will also discuss how data science compares with machine learning, deep learning, AI, statistics and applied mathematics.
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...Dr. Haxel Consult
Applications of machine learning on NLP tasks today receive a lot of attention and have been shown to yield state of the art results on a wide range of tasks. We describe several cases where machine learning is deployed productively under the usual constaints of real-world projects: Real-world requirements, fast throughput, reasonably low requirements in terms of training corpus size and high quality results. What we observe is a general trend towards open source - also our components are open source. With the software being mostly freely available, among the key success criteria for many NLP projects today therefore is first and foremost the necessary expertise required to combine, tune and apply open source components.
II-PIC 2017: Gain insight into technical, legal and business information thro...Dr. Haxel Consult
Feinäugle Roland (European Patent Office, Austria)
A recent study commissioned by the EPO on the involvement of (patent) information in the innovation process in industry underlines the role of patent information in the various innovation stages. This result gives further impetus to the EPO's patent information strategy, aimed at supporting the economy by providing the access to a wealth of patent-related information, be it from its own patent granting process or its collection of worldwide bibliographic, legal status, procedural and full-text data.
In the presentation we highlight some key findings of the study and give an overview of the EPO’s patent information products and services. We will discuss how they can be used for technology-specific searches, for getting insight into the legal and procedural status of an application during and after its grant procedure, and for the statistical analysis of bulk data for business intelligence purposes. The talk covers the EPO’s free-of-charge Espacenet as well as our flagship product PATSTAT, both with a worldwide coverage. It also highlights the ever-improving European Patent Register and its accompanying services as well as our RESTful Open Patent Services.
Combined indexing and search of structured and unstructured sources from the Surface Web, the Deep Web and the Dark Web and corporate data ensure that employees in every part of the organization have access to the information they need without having to use public search engines.
A powerful Application Builder for creating 360° information applications to bring information and analytics together and deliver them to users in a new SEARCH experience.
Advanced content analytics to aggregate, analyze and visualize unstructured (natural language) content to reveal hidden insights and patterns.
Deep SEARCH 9 is the only Web scale analytic search solution that offers corporations a proprietary and anonymous search solution by combining web crawling, content analytics, data linking and search.
II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...Dr. Haxel Consult
Greg Harrop-Griffiths (minesoft, UK)
Patent data is a critical source of information to stimulate innovation and for competitive intelligence. Patents are often the first and only source of disclosure of a new invention and hence, ignoring them will only delay innovation and give an incomplete competitive intelligence picture.
Delivered from the perspective of an experienced patent analyst, we will use case studies to describe the use of patent data to compile a competitive landscape, to stimulate innovation by learning from others and to help identify valuable IP in a portfolio. We will discuss the challenges in using patents for competitive intelligence and the recent innovative features and functionality in PatBase which can help, including:
Using thesauri, semantic and non-patent literature searching to compile a comprehensive competitive landscape
The use of Analytics for customised, multidimensional analysis and to visually compare multiple datasets.
Text-mining to automatically identify and highlight concepts within any full text patent.
Citation analysis to identify key competitors, collaborators or potential infringers.
This presentation will demonstrate how any user can benefit from the innovative features and functionality in PatBase to interrogate and visualize the competitive landscape for any technical area.
SciBite is an award-winning leading provider of semantic solutions for the life sciences industry. Our fast, scalable easy-to-use semantic technologies understand the complexity and variability of content within life sciences. We can quickly identify and extract scientific terminology from unstructured text and transform it into valuable machine-readable data for your downstream applications. Our hand-curated ontologies ensure accuracy and reliability of high-quality results. Headquartered in the UK, we support our customers with additional sites in the US and Japan.
More infos at: www.scibite.com
Deep SEARCH 9 is Data Analysis for the Web.
Combined indexing and search of structured and unstructured sources from the Surface Web, the Deep Web and the Dark Web and corporate data ensure that employees in every part of the organization have access to the information they need without having to use public search engines.
A powerful Application Builder for creating 360° information applications to bring information and analytics together and deliver them to users in a new SEARCH experience.
Advanced content analytics to aggregate, analyze and visualize unstructured (natural language) content to reveal hidden insights and patterns.
Deep SEARCH 9 is the only Web scale analytic search solution that offers corporations a proprietary and anonymous search solution by combining web crawling, content analytics, data linking and search.
Watch this recorded webinar by Richard Mallah, Director of Advanced Analytics, to learn more about advancements in Text Analytics and how our Anzo Unstructured platform helps marry unstructured text with structured data from a wide variety of sources, allowing our customers to gain significant insights and competitive advantage by more easily and efficiently extracting meaning and value from the documents and the data.
II-SDV 2017: What is Innovation and how can we measure it?Dr. Haxel Consult
Innovation means many different things to many people. Ask five people and you will likely get ten answers. But all agree that it is a key driver behind the success of organizations, the growth of economies and provides major contributions in addressing global problems. This presentation will examine various analytical methods and possible metrics for measuring innovation and determining relative performance of organizations. The challenges involved in assessing innovation and how these can be addressed will be explored. The pros and cons associated with the metrics identified will also be discussed with a view to identifying a practical method for assessing innovation.
II-SV 2017: How to effectively monitor Technological Developments in IPDr. Haxel Consult
Modern, cutting-edge developments are not reflected in current patent classification systems, which tend to catalogue established technologies. Identifying patent portfolios in such emerging fields proves a challenging job for patent and technology experts.
Going beyond the mere identification of new IP, additional value may be added using a regional geographic weighting combined with consolidated portfolio owner information.
Effective monitoring of the technological field is achieved by training active-learning search engines to hunt for highly relevant patent documents, thus keeping IP portfolios for emerging technologies up to date. The system we have developed permits extremely accurate updates with drastically reduced noise and with low workload which have proven to be invaluable in a world of drastically increasing data blur.
Gridlogics is a leading provider of products and custom software solutions for patent research, management, data analysis and project management. Our products leverage the latest techniques in information retrieval, data mining and visualizations to help clients globally in deriving actionable intelligence from the masses of patent data.
AI-SDV 2021: Francisco Webber - Efficiency is the New PrecisionDr. Haxel Consult
The global data sphere, consisting of machine data and human data, is growing exponentially reaching the order of zettabytes. In comparison, the processing power of computers has been stagnating for many years. Artificial Intelligence – a newer variant of Machine Learning – bypasses the need to understand a system when modelling it; however, this convenience comes with extremely high energy consumption.
The complexity of language makes statistical Natural Language Understanding (NLU) models particularly energy hungry. Since most of the zettabyte data sphere consists of human data, such as texts or social networks, we face four major obstacles:
1. Findability of Information – when truth is hard to find, fake news rule
2. Von Neumann Gap – when processors cannot process faster, then we need more of them (energy)
3. Stuck in the Average – when statistical models generate a bias toward the majority, innovation has a hard time
4. Privacy – if user profiles are created “passively” on the server side instead of “actively” on the client side, we lose control
The current approach to overcoming these limitations is to use larger and larger data sets on more and more processing nodes for training. AI algorithms should be optimized for efficiency rather than precision. In this case, statistical modelling should be disqualified as a brute force approach for language applications. When replacing statistical modelling and arithmetic, set theory and geometry seem to be a much better choice as it allows the direct processing of words instead of their occurrence counts, which is exactly what the human brain does with language – using only 7 Watts!
Averbis is specialized in the area of text mining and machine-learning-based patent monitoring. We help our clients screen large numbers of patents in no time, estimate their relevancy for the company and automatically classify them into customer-specific categories. Our approach is based on artificial intelligence – with the result that it learns from and imitates the behavior of IP professionals. Compared to conventional rule-based approaches, our approach is up to 400% more accurate and achieves the same accuracy offered by manual monitoring. At the same time, it reduces manual patent monitoring intervention by up to 80%. Thanks to Information Discovery, we enable IP professionals to reduce backlogs, improve staff efficiency and minimize inconsistencies associated with patent monitoring, ultimately improving the experience both for you and your customers.
II-PIC 2017: Artificial Intelligence, Machine Learning, And Deep Neural Netwo...Dr. Haxel Consult
Parthiban Srinivasan (VINGYANI, India)
When new technologies become easier to use, they transform industries. That's what's happening with artificial intelligence (AI) and big data. Machine learning is often described as a type of AI where computers learn to do something without being programmed to do it. Deep learning, a subset of machine learning, is proving to work especially well on classification. Big breakthroughs happen when what is suddenly possible meets what is desperately needed. For years, patent analysts have been searching and reviewing terabytes of information, not only patents but also non-patent information. Not only to find prior art but also to identify patents of interest, rate their quality, assess the potential value of patent clusters, and identify potential business partners or infringers. With the rapid increase in the number of patent documents worldwide, demand for their automatic clustering/categorization has grown significantly. Many information science researchers have started to experiment with machine learning tools, but the adoption in the patent information space has been sporadic. In this talk, we aim to review the prevailing machine learning techniques and present several sample implementations by various research groups. We will also discuss how data science compares with machine learning, deep learning, AI, statistics and applied mathematics.
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...Dr. Haxel Consult
Applications of machine learning on NLP tasks today receive a lot of attention and have been shown to yield state of the art results on a wide range of tasks. We describe several cases where machine learning is deployed productively under the usual constaints of real-world projects: Real-world requirements, fast throughput, reasonably low requirements in terms of training corpus size and high quality results. What we observe is a general trend towards open source - also our components are open source. With the software being mostly freely available, among the key success criteria for many NLP projects today therefore is first and foremost the necessary expertise required to combine, tune and apply open source components.
II-PIC 2017: Gain insight into technical, legal and business information thro...Dr. Haxel Consult
Feinäugle Roland (European Patent Office, Austria)
A recent study commissioned by the EPO on the involvement of (patent) information in the innovation process in industry underlines the role of patent information in the various innovation stages. This result gives further impetus to the EPO's patent information strategy, aimed at supporting the economy by providing the access to a wealth of patent-related information, be it from its own patent granting process or its collection of worldwide bibliographic, legal status, procedural and full-text data.
In the presentation we highlight some key findings of the study and give an overview of the EPO’s patent information products and services. We will discuss how they can be used for technology-specific searches, for getting insight into the legal and procedural status of an application during and after its grant procedure, and for the statistical analysis of bulk data for business intelligence purposes. The talk covers the EPO’s free-of-charge Espacenet as well as our flagship product PATSTAT, both with a worldwide coverage. It also highlights the ever-improving European Patent Register and its accompanying services as well as our RESTful Open Patent Services.
Combined indexing and search of structured and unstructured sources from the Surface Web, the Deep Web and the Dark Web and corporate data ensure that employees in every part of the organization have access to the information they need without having to use public search engines.
A powerful Application Builder for creating 360° information applications to bring information and analytics together and deliver them to users in a new SEARCH experience.
Advanced content analytics to aggregate, analyze and visualize unstructured (natural language) content to reveal hidden insights and patterns.
Deep SEARCH 9 is the only Web scale analytic search solution that offers corporations a proprietary and anonymous search solution by combining web crawling, content analytics, data linking and search.
II-PIC 2017: The Use of Patent Information for Innovation and Competitive Int...Dr. Haxel Consult
Greg Harrop-Griffiths (minesoft, UK)
Patent data is a critical source of information to stimulate innovation and for competitive intelligence. Patents are often the first and only source of disclosure of a new invention and hence, ignoring them will only delay innovation and give an incomplete competitive intelligence picture.
Delivered from the perspective of an experienced patent analyst, we will use case studies to describe the use of patent data to compile a competitive landscape, to stimulate innovation by learning from others and to help identify valuable IP in a portfolio. We will discuss the challenges in using patents for competitive intelligence and the recent innovative features and functionality in PatBase which can help, including:
Using thesauri, semantic and non-patent literature searching to compile a comprehensive competitive landscape
The use of Analytics for customised, multidimensional analysis and to visually compare multiple datasets.
Text-mining to automatically identify and highlight concepts within any full text patent.
Citation analysis to identify key competitors, collaborators or potential infringers.
This presentation will demonstrate how any user can benefit from the innovative features and functionality in PatBase to interrogate and visualize the competitive landscape for any technical area.
SciBite is an award-winning leading provider of semantic solutions for the life sciences industry. Our fast, scalable easy-to-use semantic technologies understand the complexity and variability of content within life sciences. We can quickly identify and extract scientific terminology from unstructured text and transform it into valuable machine-readable data for your downstream applications. Our hand-curated ontologies ensure accuracy and reliability of high-quality results. Headquartered in the UK, we support our customers with additional sites in the US and Japan.
More infos at: www.scibite.com
Deep SEARCH 9 is Data Analysis for the Web.
Combined indexing and search of structured and unstructured sources from the Surface Web, the Deep Web and the Dark Web and corporate data ensure that employees in every part of the organization have access to the information they need without having to use public search engines.
A powerful Application Builder for creating 360° information applications to bring information and analytics together and deliver them to users in a new SEARCH experience.
Advanced content analytics to aggregate, analyze and visualize unstructured (natural language) content to reveal hidden insights and patterns.
Deep SEARCH 9 is the only Web scale analytic search solution that offers corporations a proprietary and anonymous search solution by combining web crawling, content analytics, data linking and search.
Watch this recorded webinar by Richard Mallah, Director of Advanced Analytics, to learn more about advancements in Text Analytics and how our Anzo Unstructured platform helps marry unstructured text with structured data from a wide variety of sources, allowing our customers to gain significant insights and competitive advantage by more easily and efficiently extracting meaning and value from the documents and the data.
Presented by Wes Caldwell, Chief Architect, ISS, Inc.
The customers in the Intelligence Community and Department of Defense that ISS services have a big data challenge. The sheer volume of data being produced and ultimately consumed by large enterprise systems has grown exponentially in a short amount of time. Providing analysts the ability to interpret meaning, and act on time-critical information is a top priority for ISS. In this session, we will explore our journey into building a search and discovery system for our customers that combines Solr, OpenNLP, and other open source technologies to enable analysts to "Shrink the Haystack" into actionable information.
Chatbots have entered our lives unknowingly. Little do we realize that when that lil window pops up asking if we need support or help- it could just be a chatbot that we are talking to...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Databricks
The Semantic Engine is a custom search engine deployable on top of large, non-native language corpora that goes beyond keyword search and does NOT require translation. The large, on-the-fly calculations essential to making this an effective search engine necessitated development on a distributed platform capable of processing large volumes of unstructured data.
Hear how the low barrier to entry provided by Apache Spark allowed the Novetta Solutions team to focus on the hard analytical challenges presented by their data, without having to spend much time grappling with the inherent difficulties normally associated with distributed computing.
Join Concept Searching and partner C/D/H for this thought-provoking webinar on what intelligent enterprise search should be.
Our solution is unique in the marketplace, and overcomes the limitations of other enterprise search engines. It was originally deployed as an enterprise search solution for engineers and support staff.
This webinar will focus on how one unified view of all unstructured, semi-structured, and structured data assets, including 2D and 3D images, can be integrated into the search interface, with previewers and navigational aids.
Both business and technical professionals will benefit from this session:
• Understand how the technology works, and how it can be set up with a platform and search engine of choice
• See how search returns results, and provides visual and navigational aids for all information retrieved
• Watch how to select an image based on color, size, or shape
• Learn how any business or artificial intelligence applications can benefit from the multi-term metadata created
• Find out why the search framework provides a responsive user interface for any tablet, PC or mobile device
Precision Content™ Tools, Techniques, and Technologydclsocialmedia
This webinar will explore fundamental principles for writing and structuring content for the enterprise. Attendees will learn how to approach information typing for structured authoring for more concise and reusable content.
It is almost impossible to escape the topic of Data Science. While the core of Data Science has remained the same over the last decade, it’s emergence to the forefront is spurred by both the availability of new data types and a true realization of the value that it delivers. In this session, we will provide an overview of data science, the different classes of machine learning algorithm and deliver an end-to-end demonstration of performing Machine Learning Using Hadoop. Audience: Developers, Data Scientist Architects and System Engineers.
Recording: https://hortonworks.webex.com/hortonworks/lsr.php?RCID=4175a7421d00257f33df146f50c41af8
Introduction to Enterprise Search. A two hour class to introduce Enterprise Search. It covers:
The problems enterprise search can solve
History of (web) search
How we search and find?
Current state of Enterprise Search + stats
Technical concept
Information quality
Feedback cycle
Five dimensions of Findability
An Introduction to Natural Language ProcessingTyrone Systems
Learn about how Natural Language Processing in AI can be used and how it applies to you in the real world.
You can learn about NLP concepts, Pre-processing steps, Vectorization Methods, Generative and Unsupervised methods. All the resource is available for you to grow your knowledge and skills about Natural Language Processing webinar!
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
Knowledge Graphs are an increasingly relevant approach to store detailed knowledge in many domains. Recent advances in NLP allow to enrich Knowledge Graphs through automated analysis of large volumes of literature, reducing a lot the efforts in traditional manual information capturing. In our presentation we report the approach taken in a project with partner Fraunhofer SCAI in the life sciences where a knowledge graph organising detailed facts about psychiatric diseases has been computed.
Information of cause-effect relations between proteins, genes, drugs and diseases has been encoded in the BEL (Biological Expression Language) and imported into a Graph database to approach an indication-wide Knowledge Graph for the selected therapeutic area. Ultimately, updating the graph will amount to just rerunning the analysis on the newly published literature.
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
In 2019 the UK was the first major economy to embrace a legal obligation to achieve net zero carbon emissions by 2050. More broadly, the 2021 UK Innovation Strategy sets out the UK government’s vision to make the UK a global hub for innovation by 2035 with a target of increasing public and private sector R&D expenditure to 2.4% of GDP to support the UK being a science superpower with a world-class research and innovation system.
IP rights create an incentive for R&D which ultimately leads to innovation. Analysis and insights from IP data can therefore help provide a better understanding of how the IP system is being used and where and what innovation is taking place. Research and analysis of IP data is a key input to the ongoing work of the UKIPO’s Green Tech Working Group which seeks to:
further the UK’s status as a global leader by making the UK’s IP environment the best for innovating green technology;
develop and deliver IP policies to support government’s ambition on climate change and green technologies; and
to help innovators best protect and commercialise their green tech innovations both at home and internationally.
The UKIPO has been developing a broad portfolio of ‘green’ IP analytics research. A series of patent analytics reports have been published looking at green technologies, and analysis of how the UK’s Green Channel scheme for accelerated processing of green patent applications has been conducted. Patents have been used to identify technological comparative advantage within different green technologies at a country level, and new insights uncovered by mapping green technology patents to the UN Sustainable Development Goals (SDGs). Trade mark data provides a timeliness and closeness to market factor that patent data does not, and complementary trade mark analysis of UK ‘green’ trade marks, identified using a machine learning algorithm, provides a commercialisation angle to our research.
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
In the patent domain, all types of issues, from very specific search requirements to the linguistic characteristics of the text domain, are accentuated. Consequently, to develop patent text mining tools for scientists and patent experts, we need to understand their daily work tasks, as well as the linguistic character of the text genre (i.e., patentese). Patent text is a mixture of legal and domain-specific terms. In processing technical English texts, a multi-word unit method is often deployed as a word-formation strategy to expand the working vocabulary, i.e., introducing a new concept without the invention of an entirely new word. This productive word formation is a well-known challenge for traditional natural language processing tools utilizing supervised machine learning algorithms due to limited domain-specific training data. Deep learning technologies have been introduced to overcome the reduction in performance of traditional NLP tools. In the Artificial Researcher technologies, we have integrated explicit and implicit linguistic knowledge into the deep learning algorithms, essential for domain-specific text mining tools. In this talk, we will present a step-by-step process of how we have developed the mentioned text mining tools. For the final outline, we will also demonstrate how these tools can be integrated in a cross-genre passage retrieval system, based on a technology from 2016 that still holds the state-of-the-art within the patent text mining research community in 2022.
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
In 2013 we witnessed an evolutionary change in the NLP field evolved thanks to the introduction of space embeddings that, with the use of deep learning architectures, achieved human-level performances in many NLP tasks. With the introduction of the Attention mechanism in 2017 the results were further improved and, as result, embeddings are quickly becoming the de facto standards in solving many NLP problems. In this presentation, you will learn how generate and use space embedding for search purposes and provide comparison metrics to more traditional relevance-based search engines. Moreover, I will provide some initial results on a paper currently under review that provides an insight on hyperparameter tuning during the generation of embeddings.
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
10 years in the making. How real-world business cases have driven the development of CCC's deep search solutions, leading to the capabilities for web-crawling and delivery of targeted intelligence that helps R&D; intensive companies gain a competitive advantage.
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
It is relatively easy for a human to read a document and quickly figure out which concepts are important. However, this task is a difficult challenge for a machine. During the past few decades, there have been two main approaches for concept identification: Natural Language Processing and Machine Learning. During the early part of this century, Machine Learning made great strides as new techniques came into wider use (SVM’s, Topic Modeling, etc..). Sensing the competition, Natural Language Processing responded with deployment of new emerging techniques (sematic networks, finite state automata, etc..). Neither approach has completely solved the WHAT problem. Advances in Artificial Intelligence have the potential to significantly improve the situation. Where AI is making the most impact is as an enhancement to make Machine Learning and Natural Language Processing work better and, more importantly, work together. This presentation looks at some of this history and what might happen in the future when we blend the interpretation of language with pattern prediction.
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
Trademarks serve as key leading indicators for innovation and economic growth. As the vanguards of new and expanding enterprises, trademarks can be used to study entrepreneurship and shifting market demands in response to varying economic factors. This responsiveness has been seen as recently as the COVID-19 pandemic, where trademark research revealed key insights about business reaction to the global upheaval.
At CIPO, we have been delving more deeply than ever before into trademark analysis by leveraging cutting-edge natural language processing (NLP) tools to derive actionable business intelligence from trademark data. In this presentation, we present a survey of NLP in use at CIPO and the insights we have learned applying them. These insights include COVID-19 responses, line-of-business trends based on firm characteristics, and more.
We also discuss ongoing and future trademark research projects at CIPO. These projects include emerging technology detection methods and high-resolution trademark classification systems. We conclude that artificial intelligence-enhanced tools like NLP are key components of future exploitation of trademark data for business and economic intelligence.
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
In our customer projects involving automated document processing, we often encounter document types providing crucial data in the form of tables. While established text analytics algorithms are usually optimized to operate on running text, they tend to produce rather poor results on tables as they do not capture the non-sequential relations inside them (e.g. interpret the content of a table cell relative to its column title, interpret line breaks inside a cell differently from line breaks between cells or rows). While there are elaborate information extraction products in the market for a few highly specific types of tabular documents, there is no general approach out there. The main cause for this is the fact that table structures can be encoded by a heterogenous range of layout means (e.g. column boundaries can be signaled by lines vs. aligned text vs. white space). In this talk, we will illustrate several solutions that we have developed for a range of challenges occurring in this context, both for scanned and digitally generated documents.
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is published usually unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or data mining unnecessarily difficult or even impossible. A concept is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently.
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
How do you find video when you only have sparse data? While you can wander the stacks (if you can still find open stacks) for inspiration, video either physical or digital, is difficult to discover. Wandering the virtual stacks is, well, virtually impossible. Discovery platforms on the whole have not replicated the inspirational experience of wandering the stacks.
More companies are using archivable video for internal communication of the various research projects, product developments, test results, and more that are being considered, in progress, or completed. Showing how an experiment was conducted can convey considerably more information that is very difficult to communicate via text. How do you find a company video that might be helpful for your project?
A case study is presented of the problems and the solutions that were implemented by a large, multinational chemical company. A suite of content discovery technologies was used including a video to text to tagging system connected to their documents database and automatically indexed using several chemical as well as conceptual systems (rule-based, NLP, inference engine). To build the system and support the manuscript and video submission there is a metadata extraction program which pulls and inserts the metadata into the submission forms so the author can move quickly through that process.
Copyright Clearance Center
A pioneer in voluntary collective licensing, CCC (Copyright Clearance Center) helps organizations integrate, access, and share information through licensing, content, software, and professional services. With expertise in copyright and information management, CCC and its subsidiary RightsDirect collaborate with stakeholders to design and deliver innovative information solutions that power decision-making by helping people integrate and navigate data sources and content assets. CCC recently acquired the assets and technology of Deep SEARCH 9 (DS9), a knowledge management platform that leverages machine learning to help customers perform semantic search, tag content, and discover new insights.
Lighthouse IP is the world’s leading provider of intellectual property content. The core business of Lighthouse IP is sourcing and creating content from the world’s most challenging authorities. Specialized in IP data, Lighthouse IP provides over 160 countries coverage for patents, over 200 authorities for trademarks and over 90 authorities for designs. Lighthouse IP data is available via several partners. The company is headquartered in Schiphol-Rijk in the Netherlands and has offices in the United States, China, Thailand, Vietnam, Egypt, Indonesia and Belarus. Globally a team of 150 experts works on the creation of this unique data collection.
CENTREDOC was created in 1964 as the technical information center of the swiss watchmaking industry. Building on a strong team of engineers, CENTREDOC now offers a complete range of services and solutions for the monitoring of strategic, technological and competitive information. CENTREDOC is also a leader in the research of patent, technical and business intelligence, and offers consulting expertise in the implementation of monitoring solutions.
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
The everyday use of AI-driven algorithms for data search, analysis and synthesis comes with important time savings, but also reveals the need to understand and accept the limitations of the technology. Practical deployments on concrete topics are inevitable to assess and manage the challenges of neuronal network based AI. A workshop report.
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
What if there was a platform where literature, conference abstracts, patents, clinical trials, news, grants and other sources were fully integrated? What if the data would be harmonized, enriched with standardized concepts and ready for analysis? After building our patent analytics platform we didn’t stop dreaming and built our big data analytics platform by semantically integrating text-rich, scientific sources. In my presentation I will talk about what we built and why we built it. And, of course, I will also address the challenges and hurdles along the way. Was it worth it and what comes next? Let’s talk about it!
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
2. Agenda
• Who we are and what we do
• Setting the scene – a architecture for our discussion and the key challenges
• The localization workflow and why content localization and search are
intertwined
• Illustrating using a practical example
• Summary & Recommendations
4. MARKETS AND SOLUTIONS
• eCommerce and Online Travel
Automated, high-volume localization of complex product catalogue information as
well as user generated content and reviews
• Online Research System and Digital Publishing
Automated, high-volume tagging, language processing, translation and transliteration
of legal, intellectual property, scientific, financial and business information content as
well as generation of relevant meta data
• Government & Intelligence
Automated, high-volume language identification, entity and entity relationship recognition,
sentiment analysis, linking and translation and transliteration of various information sources
• Technology & Enterprise
Complex language processing, tagging, enriching and localization
• Localization Industry
Support of complex and high-volume localization
• Media and Subtitling
Subtitle extraction and manufacturing from different sources, support of re-writing source
for subtitling, localization and post-editing, automated placement in frames and improvement
• eDiscovery
Automated a high volume content tagging, localization and discovery for
litigation data gathering, analysis and support
7. HOW DO I KNOW WHAT TO “ASK” FOR?
Unstructured
Data
Structured
Data
Search “Engine”
• How do I construct the right
query / search?
• How do I know what
keywords to use?
• Semantic or Concept Search
• Keyword lists
• Domain classifications
• Keyword based domain
classification (AI)
• …
8. HOW DO WE DEAL WITH MULTI-LINGUAL CONTENT?
Unstructured
Data
Structured
Data
Search “Engine”
Option 1:
Normalize to a single language
Option 2:
Cross-lingual search
What domain, how do
we maintain quality,
what is quality, what
language do we
normalize to..?
What kind of data, is
normalization or
transliteration needed,
how do we dal with
variants?
9. THE GENERIC LOCALIZATION WORKFLOW
Extraction Enrichment Translation Enrichment Delivery
1 2 3 4 5
Extract from
source format
to text or XML
Identifying
entities, entity
relationships,
adding meta
data, sentiment
analysis, etc.
Translation
and/or
transliteration,
normalizing
terminology,
maintaining
meta-data
Post-translation
corrections,
additional
enrichment and
classification,
etc.
Delivery to user
/ application
with or without
enrichments
10. THE GENERIC LOCALIZATION WORKFLOW
Extraction Enrichment Translation Enrichment Delivery
1 2 3 4 5
Extract from
source format
to text or XML
Identifying
entities, entity
relationships,
adding meta
data, sentiment
analysis, etc.
Translation
and/or
transliteration,
normalizing
terminology,
maintaining
meta-data
Post-translation
corrections,
additional
enrichment and
classification,
etc.
Delivery to user
/ application
with or without
enrichments
11. THE GENERIC LOCALIZATION WORKFLOW
Extraction Enrichment Translation Enrichment Delivery
1 2 3 4 5
Extract from
source format
to text or XML
Identifying
entities, entity
relationships,
adding meta
data, sentiment
analysis, etc.
Translation
and/or
transliteration,
normalizing
terminology,
maintaining
meta-data
Post-translation
corrections,
additional
enrichment and
classification,
etc.
Delivery to user
/ application
with or without
enrichments
- Translation naturally provides the translated source – using either Statistical or Neural
Machine Translation
- However, bi-products and translation capabilities that are interesting in this context
are:
- Ability to normalize terminology
- Pre-processing and enriching content prior to translation (tagging, conversion..)
- Using the term analysis generated during the engine build
Extrémne problémy extrémne problémy extrémne problémy extrémnej problémy
refraktérnym
mnohopočetným
myelómom
refraktérnym mnohopočetným
myelómom
refraktérnym mnohopočetným
myelómom žiaruvzdorné myelómom je mladších
veľkosti nádoru veľkosť nádoru veľkosti nádoru veľkosti nádoru
13. THE GENERIC LOCALIZATION WORKFLOW
Extraction Enrichment Translation Enrichment Delivery
1 2 3 4 5
Extract from
source format
to text or XML
Identifying
entities, entity
relationships,
adding meta
data, sentiment
analysis, etc.
Translation
and/or
transliteration,
normalizing
terminology,
maintaining
meta-data
Post-translation
corrections,
additional
enrichment and
classification,
etc.
Delivery to user
/ application
with or without
enrichments