The global data sphere, consisting of machine data and human data, is growing exponentially reaching the order of zettabytes. In comparison, the processing power of computers has been stagnating for many years. Artificial Intelligence – a newer variant of Machine Learning – bypasses the need to understand a system when modelling it; however, this convenience comes with extremely high energy consumption.
The complexity of language makes statistical Natural Language Understanding (NLU) models particularly energy hungry. Since most of the zettabyte data sphere consists of human data, such as texts or social networks, we face four major obstacles:
1. Findability of Information – when truth is hard to find, fake news rule
2. Von Neumann Gap – when processors cannot process faster, then we need more of them (energy)
3. Stuck in the Average – when statistical models generate a bias toward the majority, innovation has a hard time
4. Privacy – if user profiles are created “passively” on the server side instead of “actively” on the client side, we lose control
The current approach to overcoming these limitations is to use larger and larger data sets on more and more processing nodes for training. AI algorithms should be optimized for efficiency rather than precision. In this case, statistical modelling should be disqualified as a brute force approach for language applications. When replacing statistical modelling and arithmetic, set theory and geometry seem to be a much better choice as it allows the direct processing of words instead of their occurrence counts, which is exactly what the human brain does with language – using only 7 Watts!
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...Dr. Haxel Consult
Synonym breaks search! How? Why is this important? What synonym is and how it breaks search will be explained with real-world examples. AI-based solutions are proposed, and relevant standards are identified. How synonym solutions should be used for search are explained. Learn what you can do yourself. Tools help, but it doesn’t have to be complicated, nor expensive. It is as straight forward as setting priorities!
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...Dr. Haxel Consult
• The rapidly growing amount of patent documentation will soon no longer be manageable with conventional search methods - possibly this is already the case today.
• For a couple of years now, the application of machine learning (ML) methods are being discussed as a potential solution for reducing human effort in searching large amounts of patent data. While some promising projects and ideas in this direction have been presented by different sources, the real breakthrough for ML as a standard and widely accepted patent search method has not happened yet.
• The presentation looks into the challenges that still exist in this area, especially as far as practical applicability and acceptance by users is concerned, using the Intergator Smart Search project as an example.
What do PLI, MetOpera, ASCO, and PLOS have in common? Content management and content discovery needed major improvements. User were not getting the results they needed. The content production team including editorials, managing editorials – the whole team – could no longer cope with the volume and variety. Content quality was suffering. Brief discussions of each organization’s challenges set the stage for AI-based, human curated solutions. What worked, what didn’t, and the how and the why will be presented.
AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext ...Dr. Haxel Consult
With all new technologies and intelligence one may think that all information issues will be solved in the (near) future. However, one of the most fundamental issues at hand is that with the lack of reliable, quality information there is no useable output to work with in the first place. This presentation looks at the global challenges that we are still faced with today relating to content that will keep us from truly intelligent discovery in the future if nothing is done.
Kairntech combines technologies from natural language processing (NLP) and machine learning to support clients in analysing large amounts of text-based information.
You find more information at https://kairntech.com/
AILANI is a novel and unique semantic search enterprise solution for fast, easy and comprehensive knowledge discovery. It combines semantic modelling, ontologies, linguistics and artificial intelligence (AI) algorithms in a self-refining system that delivers results based on inter-related meaning of facts. AILANI not just allows for phrase searches as well as structured queries, it offers its users a unique hybrid natural language question answering system combining machine learning algorithms with semantic network-based "prior knowledge" inference. It integrates seamlessly with existing infrastructure and helps leverage knowledge buried both in decade old data as well as data derived from news feeds and clinical trials providing real-time semantic analysis of breaking news. For the pharmaceutical industry it is critical to stay up to date with the latest clinical trials news for decision-making in drug development. Integration of the relevant data and using ontology-based refiners enables fast and efficient retrieval of information about the clinical competitive landscape.
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...Dr. Haxel Consult
Applications of machine learning on NLP tasks today receive a lot of attention and have been shown to yield state of the art results on a wide range of tasks. We describe several cases where machine learning is deployed productively under the usual constaints of real-world projects: Real-world requirements, fast throughput, reasonably low requirements in terms of training corpus size and high quality results. What we observe is a general trend towards open source - also our components are open source. With the software being mostly freely available, among the key success criteria for many NLP projects today therefore is first and foremost the necessary expertise required to combine, tune and apply open source components.
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...Dr. Haxel Consult
Synonym breaks search! How? Why is this important? What synonym is and how it breaks search will be explained with real-world examples. AI-based solutions are proposed, and relevant standards are identified. How synonym solutions should be used for search are explained. Learn what you can do yourself. Tools help, but it doesn’t have to be complicated, nor expensive. It is as straight forward as setting priorities!
AI-SDV 2021: Heiko Wongel - Machine learning tools in patent searching - are ...Dr. Haxel Consult
• The rapidly growing amount of patent documentation will soon no longer be manageable with conventional search methods - possibly this is already the case today.
• For a couple of years now, the application of machine learning (ML) methods are being discussed as a potential solution for reducing human effort in searching large amounts of patent data. While some promising projects and ideas in this direction have been presented by different sources, the real breakthrough for ML as a standard and widely accepted patent search method has not happened yet.
• The presentation looks into the challenges that still exist in this area, especially as far as practical applicability and acceptance by users is concerned, using the Intergator Smart Search project as an example.
What do PLI, MetOpera, ASCO, and PLOS have in common? Content management and content discovery needed major improvements. User were not getting the results they needed. The content production team including editorials, managing editorials – the whole team – could no longer cope with the volume and variety. Content quality was suffering. Brief discussions of each organization’s challenges set the stage for AI-based, human curated solutions. What worked, what didn’t, and the how and the why will be presented.
AI-SDV 2020: Special Hypertext Information Treatment in is Special Hypertext ...Dr. Haxel Consult
With all new technologies and intelligence one may think that all information issues will be solved in the (near) future. However, one of the most fundamental issues at hand is that with the lack of reliable, quality information there is no useable output to work with in the first place. This presentation looks at the global challenges that we are still faced with today relating to content that will keep us from truly intelligent discovery in the future if nothing is done.
Kairntech combines technologies from natural language processing (NLP) and machine learning to support clients in analysing large amounts of text-based information.
You find more information at https://kairntech.com/
AILANI is a novel and unique semantic search enterprise solution for fast, easy and comprehensive knowledge discovery. It combines semantic modelling, ontologies, linguistics and artificial intelligence (AI) algorithms in a self-refining system that delivers results based on inter-related meaning of facts. AILANI not just allows for phrase searches as well as structured queries, it offers its users a unique hybrid natural language question answering system combining machine learning algorithms with semantic network-based "prior knowledge" inference. It integrates seamlessly with existing infrastructure and helps leverage knowledge buried both in decade old data as well as data derived from news feeds and clinical trials providing real-time semantic analysis of breaking news. For the pharmaceutical industry it is critical to stay up to date with the latest clinical trials news for decision-making in drug development. Integration of the relevant data and using ontology-based refiners enables fast and efficient retrieval of information about the clinical competitive landscape.
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...Dr. Haxel Consult
Applications of machine learning on NLP tasks today receive a lot of attention and have been shown to yield state of the art results on a wide range of tasks. We describe several cases where machine learning is deployed productively under the usual constaints of real-world projects: Real-world requirements, fast throughput, reasonably low requirements in terms of training corpus size and high quality results. What we observe is a general trend towards open source - also our components are open source. With the software being mostly freely available, among the key success criteria for many NLP projects today therefore is first and foremost the necessary expertise required to combine, tune and apply open source components.
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...Dr. Haxel Consult
Focusing on the significance of targets is one of the key drivers for quality of web search.
Filtering targeted companies based on the significance of their business model for the expected search results was one of our “nice to haves” last year.
Evaluating a number of artificial intelligence approaches based on neural networks, classical machine learning and semantic technologies lead us to a working hybrid approach.
AI is Not Magic: It’s Time to Demystify and Apply Srinivasan Parthiban (VINGY...Dr. Haxel Consult
The term Artificial Intelligence was first coined in 1956 and since then the technology has progressed, disappointed, and re-emerged. Now the prediction is that AI will add $16 trillion to the global economy by 2030. AI is becoming as fundamental as electricity, the internet, and mobile as they were born into the mainstream. Not having an AI strategy in 2020 will be like not having a mobile strategy in 2010, or an Internet strategy in 2000. As a result of technology advancements, AI-related patent applications have surged over the recent years. The patent searchers, information professionals and bioinformatics researchers who have been involved in collecting data, organizing the data and analysing the data are starting to move up in the ladder with AI. Of course, AI can help you, your business, your employees, and your customers, but you need a prescriptive approach to harness its power and put AI to work. This presentation will take a glimpse under the hood of AI and look into some recent trends in data and analytics that are relevant to the information professionals.
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...Dr. Haxel Consult
We have created structure based chemical ontologies that are used to classify chemical compounds automatically. These classifications can be used with success in semantic search engines to find all representatives of a chemical class. In the present paper we would like to demonstrate use cases when utilizing these chemical classes as features in typical machine learning approaches.
Thus, we have used the co-occurrence of chemical compounds with biological and physico-chemical properties in scientific articles to train models that predict properties of novel compounds that did not occur in those training sets. One example is the prediction of hepatotoxicity as well as bioavailability. In principle, one can use any property that is found in the textual vicinity of compounds to build such predictive models. Criteria will be presented that allow to judge the quality and predictive power of such models.
Averbis is specialized in the area of text mining and machine-learning-based patent monitoring. We help our clients screen large numbers of patents in no time, estimate their relevancy for the company and automatically classify them into customer-specific categories. Our approach is based on artificial intelligence – with the result that it learns from and imitates the behavior of IP professionals. Compared to conventional rule-based approaches, our approach is up to 400% more accurate and achieves the same accuracy offered by manual monitoring. At the same time, it reduces manual patent monitoring intervention by up to 80%. Thanks to Information Discovery, we enable IP professionals to reduce backlogs, improve staff efficiency and minimize inconsistencies associated with patent monitoring, ultimately improving the experience both for you and your customers.
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Edureka!
( Python Data Science Training : https://www.edureka.co/python )
This Edureka video on "Python For Data Science" explains the fundamental concepts of data science using python. It will also help you to analyze, manipulate and implement machine learning using various python libraries such as NumPy, Pandas and Scikit-learn.
This video helps you to learn the below topics:
1. Need of Data Science
2. What is Data Science?
3. How Python is used for Data Science?
4. Data Manipulation in Python
5. Implement Machine Learning using Python
6. Demo
Subscribe to our channel to get video updates. Hit the subscribe button above.
Check out our Python Training Playlist: https://goo.gl/Na1p9G
Applications of Data Science in Drug Discovery, Financial Services, Project Management, Human Resources and Marketing.
By Dr. Laila Alabidi at the JOSA Data Science Meetup on 17/8/2019.
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
Advances in text mining, analytics and machine learning are transforming our applications and enabling ever more powerful applications, yet most applications and platforms are designed to deal with a single (normalized) language. Hence as our applications and platforms are increasingly required to ingest international content, the challenge becomes to find ways to normalize content to a single language without compromising quality. An extension of this question in terms of such applications is also how we define quality in this context and what, if any, bi-products a localization effort can produce that may enhance the usefulness of the application.
This talk will, using patent searching as an example use case, review the challenges and possible solution approaches for handling localization effectively and will show what current emerging technology offers, what to expect and what not to expect and provide an introductory practical guide to handling localization in the context of data mining and analytics.
In this talk, we introduce the Data Scientist role , differentiate investigative and operational analytics, and demonstrate a complete Data Science process using Python ecosystem tools, like IPython Notebook, Pandas, Matplotlib, NumPy, SciPy and Scikit-learn. We also touch the usage of Python in Big Data context, using Hadoop and Spark.
“Artificial Intelligence” covers a wide range of technologies today, including those that enable machine vision, effective computing, deep learning, and natural language processing. As advances increase, so do expectations. We now see a rush to add “AI inside” for applications and appliances in almost every domain. The reality is that some firms will have mega-hits with AI-enabled applications, and many more will suffer setbacks based on flawed adoption strategies.
This webinar will present an assessment of key AI technologies today, and help participants identify promising applications based on matching requirements to mature-enough technologies.
A number of recent milestones in AI have rekindled the faith that human-grade computer intelligence can fuel the next technological revolution. In parallel and almost independently, the job role of Data Scientist rose to one of the hottest tickets in the technology sector. Despite the obvious overlap in the domains of Data Science and Artificial Intelligence, the two approaches are sufficiently distinct that choosing the wrong one might trigger a product to fail or a hiring process to go wrong. This presentation will offer some clarity and best practices with regards to understanding what data analysis requirements you really have, as what opposed to what you think you have.
Biology, medicine, physics, astrophysics, chemistry: all these scientific domains need to process large amount of data with more and more complex software systems. For achieving reproducible science, there are several challenges ahead involving multidisciplinary collaboration and socio-technical innovation with software at the center of the problem. Despite the availability of data and code, several studies report that the same data analyzed with different software can lead to different results. I am seeing this problem as a manifestation of deep software variability: many factors (operating system, third-party libraries, versions, workloads, compile-time options and flags, etc.) themselves subject to variability can alter the results, up to the point it can dramatically change the conclusions of some scientific studies. In this keynote, I argue that deep software variability is a threat and also an opportunity for reproducible science. I first outline some works about (deep) software variability, reporting on preliminary evidence of complex interactions between variability layers. I then link the ongoing works on variability modelling and deep software variability in the quest for reproducible science.
IC-SDV 2018: Aleksandar Kapisoda (Boehringer) Using Machine Learning for Auto...Dr. Haxel Consult
Focusing on the significance of targets is one of the key drivers for quality of web search.
Filtering targeted companies based on the significance of their business model for the expected search results was one of our “nice to haves” last year.
Evaluating a number of artificial intelligence approaches based on neural networks, classical machine learning and semantic technologies lead us to a working hybrid approach.
AI is Not Magic: It’s Time to Demystify and Apply Srinivasan Parthiban (VINGY...Dr. Haxel Consult
The term Artificial Intelligence was first coined in 1956 and since then the technology has progressed, disappointed, and re-emerged. Now the prediction is that AI will add $16 trillion to the global economy by 2030. AI is becoming as fundamental as electricity, the internet, and mobile as they were born into the mainstream. Not having an AI strategy in 2020 will be like not having a mobile strategy in 2010, or an Internet strategy in 2000. As a result of technology advancements, AI-related patent applications have surged over the recent years. The patent searchers, information professionals and bioinformatics researchers who have been involved in collecting data, organizing the data and analysing the data are starting to move up in the ladder with AI. Of course, AI can help you, your business, your employees, and your customers, but you need a prescriptive approach to harness its power and put AI to work. This presentation will take a glimpse under the hood of AI and look into some recent trends in data and analytics that are relevant to the information professionals.
AI-SDV 2020: AI-augmented Question Answering and Semantic Search for Life Sci...Dr. Haxel Consult
We have created structure based chemical ontologies that are used to classify chemical compounds automatically. These classifications can be used with success in semantic search engines to find all representatives of a chemical class. In the present paper we would like to demonstrate use cases when utilizing these chemical classes as features in typical machine learning approaches.
Thus, we have used the co-occurrence of chemical compounds with biological and physico-chemical properties in scientific articles to train models that predict properties of novel compounds that did not occur in those training sets. One example is the prediction of hepatotoxicity as well as bioavailability. In principle, one can use any property that is found in the textual vicinity of compounds to build such predictive models. Criteria will be presented that allow to judge the quality and predictive power of such models.
Averbis is specialized in the area of text mining and machine-learning-based patent monitoring. We help our clients screen large numbers of patents in no time, estimate their relevancy for the company and automatically classify them into customer-specific categories. Our approach is based on artificial intelligence – with the result that it learns from and imitates the behavior of IP professionals. Compared to conventional rule-based approaches, our approach is up to 400% more accurate and achieves the same accuracy offered by manual monitoring. At the same time, it reduces manual patent monitoring intervention by up to 80%. Thanks to Information Discovery, we enable IP professionals to reduce backlogs, improve staff efficiency and minimize inconsistencies associated with patent monitoring, ultimately improving the experience both for you and your customers.
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...Edureka!
( Python Data Science Training : https://www.edureka.co/python )
This Edureka video on "Python For Data Science" explains the fundamental concepts of data science using python. It will also help you to analyze, manipulate and implement machine learning using various python libraries such as NumPy, Pandas and Scikit-learn.
This video helps you to learn the below topics:
1. Need of Data Science
2. What is Data Science?
3. How Python is used for Data Science?
4. Data Manipulation in Python
5. Implement Machine Learning using Python
6. Demo
Subscribe to our channel to get video updates. Hit the subscribe button above.
Check out our Python Training Playlist: https://goo.gl/Na1p9G
Applications of Data Science in Drug Discovery, Financial Services, Project Management, Human Resources and Marketing.
By Dr. Laila Alabidi at the JOSA Data Science Meetup on 17/8/2019.
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
Advances in text mining, analytics and machine learning are transforming our applications and enabling ever more powerful applications, yet most applications and platforms are designed to deal with a single (normalized) language. Hence as our applications and platforms are increasingly required to ingest international content, the challenge becomes to find ways to normalize content to a single language without compromising quality. An extension of this question in terms of such applications is also how we define quality in this context and what, if any, bi-products a localization effort can produce that may enhance the usefulness of the application.
This talk will, using patent searching as an example use case, review the challenges and possible solution approaches for handling localization effectively and will show what current emerging technology offers, what to expect and what not to expect and provide an introductory practical guide to handling localization in the context of data mining and analytics.
In this talk, we introduce the Data Scientist role , differentiate investigative and operational analytics, and demonstrate a complete Data Science process using Python ecosystem tools, like IPython Notebook, Pandas, Matplotlib, NumPy, SciPy and Scikit-learn. We also touch the usage of Python in Big Data context, using Hadoop and Spark.
“Artificial Intelligence” covers a wide range of technologies today, including those that enable machine vision, effective computing, deep learning, and natural language processing. As advances increase, so do expectations. We now see a rush to add “AI inside” for applications and appliances in almost every domain. The reality is that some firms will have mega-hits with AI-enabled applications, and many more will suffer setbacks based on flawed adoption strategies.
This webinar will present an assessment of key AI technologies today, and help participants identify promising applications based on matching requirements to mature-enough technologies.
A number of recent milestones in AI have rekindled the faith that human-grade computer intelligence can fuel the next technological revolution. In parallel and almost independently, the job role of Data Scientist rose to one of the hottest tickets in the technology sector. Despite the obvious overlap in the domains of Data Science and Artificial Intelligence, the two approaches are sufficiently distinct that choosing the wrong one might trigger a product to fail or a hiring process to go wrong. This presentation will offer some clarity and best practices with regards to understanding what data analysis requirements you really have, as what opposed to what you think you have.
Biology, medicine, physics, astrophysics, chemistry: all these scientific domains need to process large amount of data with more and more complex software systems. For achieving reproducible science, there are several challenges ahead involving multidisciplinary collaboration and socio-technical innovation with software at the center of the problem. Despite the availability of data and code, several studies report that the same data analyzed with different software can lead to different results. I am seeing this problem as a manifestation of deep software variability: many factors (operating system, third-party libraries, versions, workloads, compile-time options and flags, etc.) themselves subject to variability can alter the results, up to the point it can dramatically change the conclusions of some scientific studies. In this keynote, I argue that deep software variability is a threat and also an opportunity for reproducible science. I first outline some works about (deep) software variability, reporting on preliminary evidence of complex interactions between variability layers. I then link the ongoing works on variability modelling and deep software variability in the quest for reproducible science.
A sentient network - How High-velocity Data and Machine Learning will Shape t...Wenjing Chu
Dell's Distinguished Engineer Wenjing Chu discusses innovations in applying Machine Learning to solve challenges in Telco/Communication Services, and predicts that the future is a Sentient Network powered by Machine Learning that can handle real-time high-velocity data.
Presentation at Circuits of Profit and NetONets in Budapest on 6 and 7 June 2011.
The videos in the presentation are missing in this slideshare copy. They will be available later.
Big Data to SMART Data : Process scenario
Scenario of an implementation of a transformation process of the Data towards exploitable data and representative with treatments of the streaming, the distributed systems, the messages, the storage in an NoSQL environment, a management with an ecosystem Big Data graphic visualization of the data with the technologies:
Apache Storm, Apache Zookeeper, Apache Kafka, Apache Cassandra, Apache Spark and Data-Driven Document.
SAMOA: A Platform for Mining Big Data Streams (Apache BigData Europe 2015)Nicolas Kourtellis
A general overview of the APACHE SAMOA platform for mining big data streams using machine learning algorithms running on distributed stream processing platforms such as Apache STORM, Apache Flink, Apache Samza and Apache Apex.
Results are shown from experimentation with VHT, the Vertical Hoeffding Tree proposed in "VHT: Vertical Hoeffding Tree." N. Kourtellis, G. De Francisci Morales, A. Bifet, A. Mordupo. IEEE BigData 2016.
Presentation in APACHE BIG DATA Europe 2015
Digital Transformation and Innovation on http://denreymer.com
- Merging the Real World and the Virtual World
- Intelligence Everywhere
- The New IT Reality Emerges
http://www.gartner.com//it/content/2940400/2940420/january_15_top_10_technology_trends_2015_dcearley.pdf
Synergy of Human and Artificial Intelligence in Software EngineeringTao Xie
Keynote Talk by Tao Xie at International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013) http://promisedata.org/raise/2013/
Similar to AI-SDV 2021: Francisco Webber - Efficiency is the New Precision (20)
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
Knowledge Graphs are an increasingly relevant approach to store detailed knowledge in many domains. Recent advances in NLP allow to enrich Knowledge Graphs through automated analysis of large volumes of literature, reducing a lot the efforts in traditional manual information capturing. In our presentation we report the approach taken in a project with partner Fraunhofer SCAI in the life sciences where a knowledge graph organising detailed facts about psychiatric diseases has been computed.
Information of cause-effect relations between proteins, genes, drugs and diseases has been encoded in the BEL (Biological Expression Language) and imported into a Graph database to approach an indication-wide Knowledge Graph for the selected therapeutic area. Ultimately, updating the graph will amount to just rerunning the analysis on the newly published literature.
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
In 2019 the UK was the first major economy to embrace a legal obligation to achieve net zero carbon emissions by 2050. More broadly, the 2021 UK Innovation Strategy sets out the UK government’s vision to make the UK a global hub for innovation by 2035 with a target of increasing public and private sector R&D expenditure to 2.4% of GDP to support the UK being a science superpower with a world-class research and innovation system.
IP rights create an incentive for R&D which ultimately leads to innovation. Analysis and insights from IP data can therefore help provide a better understanding of how the IP system is being used and where and what innovation is taking place. Research and analysis of IP data is a key input to the ongoing work of the UKIPO’s Green Tech Working Group which seeks to:
further the UK’s status as a global leader by making the UK’s IP environment the best for innovating green technology;
develop and deliver IP policies to support government’s ambition on climate change and green technologies; and
to help innovators best protect and commercialise their green tech innovations both at home and internationally.
The UKIPO has been developing a broad portfolio of ‘green’ IP analytics research. A series of patent analytics reports have been published looking at green technologies, and analysis of how the UK’s Green Channel scheme for accelerated processing of green patent applications has been conducted. Patents have been used to identify technological comparative advantage within different green technologies at a country level, and new insights uncovered by mapping green technology patents to the UN Sustainable Development Goals (SDGs). Trade mark data provides a timeliness and closeness to market factor that patent data does not, and complementary trade mark analysis of UK ‘green’ trade marks, identified using a machine learning algorithm, provides a commercialisation angle to our research.
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
In the patent domain, all types of issues, from very specific search requirements to the linguistic characteristics of the text domain, are accentuated. Consequently, to develop patent text mining tools for scientists and patent experts, we need to understand their daily work tasks, as well as the linguistic character of the text genre (i.e., patentese). Patent text is a mixture of legal and domain-specific terms. In processing technical English texts, a multi-word unit method is often deployed as a word-formation strategy to expand the working vocabulary, i.e., introducing a new concept without the invention of an entirely new word. This productive word formation is a well-known challenge for traditional natural language processing tools utilizing supervised machine learning algorithms due to limited domain-specific training data. Deep learning technologies have been introduced to overcome the reduction in performance of traditional NLP tools. In the Artificial Researcher technologies, we have integrated explicit and implicit linguistic knowledge into the deep learning algorithms, essential for domain-specific text mining tools. In this talk, we will present a step-by-step process of how we have developed the mentioned text mining tools. For the final outline, we will also demonstrate how these tools can be integrated in a cross-genre passage retrieval system, based on a technology from 2016 that still holds the state-of-the-art within the patent text mining research community in 2022.
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
In 2013 we witnessed an evolutionary change in the NLP field evolved thanks to the introduction of space embeddings that, with the use of deep learning architectures, achieved human-level performances in many NLP tasks. With the introduction of the Attention mechanism in 2017 the results were further improved and, as result, embeddings are quickly becoming the de facto standards in solving many NLP problems. In this presentation, you will learn how generate and use space embedding for search purposes and provide comparison metrics to more traditional relevance-based search engines. Moreover, I will provide some initial results on a paper currently under review that provides an insight on hyperparameter tuning during the generation of embeddings.
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
10 years in the making. How real-world business cases have driven the development of CCC's deep search solutions, leading to the capabilities for web-crawling and delivery of targeted intelligence that helps R&D; intensive companies gain a competitive advantage.
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity
Susanne Tropf (Syngenta, Switzerland)
Kornel Marko (Averbis, Germany)
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
It is relatively easy for a human to read a document and quickly figure out which concepts are important. However, this task is a difficult challenge for a machine. During the past few decades, there have been two main approaches for concept identification: Natural Language Processing and Machine Learning. During the early part of this century, Machine Learning made great strides as new techniques came into wider use (SVM’s, Topic Modeling, etc..). Sensing the competition, Natural Language Processing responded with deployment of new emerging techniques (sematic networks, finite state automata, etc..). Neither approach has completely solved the WHAT problem. Advances in Artificial Intelligence have the potential to significantly improve the situation. Where AI is making the most impact is as an enhancement to make Machine Learning and Natural Language Processing work better and, more importantly, work together. This presentation looks at some of this history and what might happen in the future when we blend the interpretation of language with pattern prediction.
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
Trademarks serve as key leading indicators for innovation and economic growth. As the vanguards of new and expanding enterprises, trademarks can be used to study entrepreneurship and shifting market demands in response to varying economic factors. This responsiveness has been seen as recently as the COVID-19 pandemic, where trademark research revealed key insights about business reaction to the global upheaval.
At CIPO, we have been delving more deeply than ever before into trademark analysis by leveraging cutting-edge natural language processing (NLP) tools to derive actionable business intelligence from trademark data. In this presentation, we present a survey of NLP in use at CIPO and the insights we have learned applying them. These insights include COVID-19 responses, line-of-business trends based on firm characteristics, and more.
We also discuss ongoing and future trademark research projects at CIPO. These projects include emerging technology detection methods and high-resolution trademark classification systems. We conclude that artificial intelligence-enhanced tools like NLP are key components of future exploitation of trademark data for business and economic intelligence.
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
In our customer projects involving automated document processing, we often encounter document types providing crucial data in the form of tables. While established text analytics algorithms are usually optimized to operate on running text, they tend to produce rather poor results on tables as they do not capture the non-sequential relations inside them (e.g. interpret the content of a table cell relative to its column title, interpret line breaks inside a cell differently from line breaks between cells or rows). While there are elaborate information extraction products in the market for a few highly specific types of tabular documents, there is no general approach out there. The main cause for this is the fact that table structures can be encoded by a heterogenous range of layout means (e.g. column boundaries can be signaled by lines vs. aligned text vs. white space). In this talk, we will illustrate several solutions that we have developed for a range of challenges occurring in this context, both for scanned and digitally generated documents.
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is published usually unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or data mining unnecessarily difficult or even impossible. A concept is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently.
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
How do you find video when you only have sparse data? While you can wander the stacks (if you can still find open stacks) for inspiration, video either physical or digital, is difficult to discover. Wandering the virtual stacks is, well, virtually impossible. Discovery platforms on the whole have not replicated the inspirational experience of wandering the stacks.
More companies are using archivable video for internal communication of the various research projects, product developments, test results, and more that are being considered, in progress, or completed. Showing how an experiment was conducted can convey considerably more information that is very difficult to communicate via text. How do you find a company video that might be helpful for your project?
A case study is presented of the problems and the solutions that were implemented by a large, multinational chemical company. A suite of content discovery technologies was used including a video to text to tagging system connected to their documents database and automatically indexed using several chemical as well as conceptual systems (rule-based, NLP, inference engine). To build the system and support the manuscript and video submission there is a metadata extraction program which pulls and inserts the metadata into the submission forms so the author can move quickly through that process.
Copyright Clearance Center
A pioneer in voluntary collective licensing, CCC (Copyright Clearance Center) helps organizations integrate, access, and share information through licensing, content, software, and professional services. With expertise in copyright and information management, CCC and its subsidiary RightsDirect collaborate with stakeholders to design and deliver innovative information solutions that power decision-making by helping people integrate and navigate data sources and content assets. CCC recently acquired the assets and technology of Deep SEARCH 9 (DS9), a knowledge management platform that leverages machine learning to help customers perform semantic search, tag content, and discover new insights.
Lighthouse IP is the world’s leading provider of intellectual property content. The core business of Lighthouse IP is sourcing and creating content from the world’s most challenging authorities. Specialized in IP data, Lighthouse IP provides over 160 countries coverage for patents, over 200 authorities for trademarks and over 90 authorities for designs. Lighthouse IP data is available via several partners. The company is headquartered in Schiphol-Rijk in the Netherlands and has offices in the United States, China, Thailand, Vietnam, Egypt, Indonesia and Belarus. Globally a team of 150 experts works on the creation of this unique data collection.
CENTREDOC was created in 1964 as the technical information center of the swiss watchmaking industry. Building on a strong team of engineers, CENTREDOC now offers a complete range of services and solutions for the monitoring of strategic, technological and competitive information. CENTREDOC is also a leader in the research of patent, technical and business intelligence, and offers consulting expertise in the implementation of monitoring solutions.
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
The everyday use of AI-driven algorithms for data search, analysis and synthesis comes with important time savings, but also reveals the need to understand and accept the limitations of the technology. Practical deployments on concrete topics are inevitable to assess and manage the challenges of neuronal network based AI. A workshop report.
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
What if there was a platform where literature, conference abstracts, patents, clinical trials, news, grants and other sources were fully integrated? What if the data would be harmonized, enriched with standardized concepts and ready for analysis? After building our patent analytics platform we didn’t stop dreaming and built our big data analytics platform by semantically integrating text-rich, scientific sources. In my presentation I will talk about what we built and why we built it. And, of course, I will also address the challenges and hurdles along the way. Was it worth it and what comes next? Let’s talk about it!
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
# Internet Security: Safeguarding Your Digital World
In the contemporary digital age, the internet is a cornerstone of our daily lives. It connects us to vast amounts of information, provides platforms for communication, enables commerce, and offers endless entertainment. However, with these conveniences come significant security challenges. Internet security is essential to protect our digital identities, sensitive data, and overall online experience. This comprehensive guide explores the multifaceted world of internet security, providing insights into its importance, common threats, and effective strategies to safeguard your digital world.
## Understanding Internet Security
Internet security encompasses the measures and protocols used to protect information, devices, and networks from unauthorized access, attacks, and damage. It involves a wide range of practices designed to safeguard data confidentiality, integrity, and availability. Effective internet security is crucial for individuals, businesses, and governments alike, as cyber threats continue to evolve in complexity and scale.
### Key Components of Internet Security
1. **Confidentiality**: Ensuring that information is accessible only to those authorized to access it.
2. **Integrity**: Protecting information from being altered or tampered with by unauthorized parties.
3. **Availability**: Ensuring that authorized users have reliable access to information and resources when needed.
## Common Internet Security Threats
Cyber threats are numerous and constantly evolving. Understanding these threats is the first step in protecting against them. Some of the most common internet security threats include:
### Malware
Malware, or malicious software, is designed to harm, exploit, or otherwise compromise a device, network, or service. Common types of malware include:
- **Viruses**: Programs that attach themselves to legitimate software and replicate, spreading to other programs and files.
- **Worms**: Standalone malware that replicates itself to spread to other computers.
- **Trojan Horses**: Malicious software disguised as legitimate software.
- **Ransomware**: Malware that encrypts a user's files and demands a ransom for the decryption key.
- **Spyware**: Software that secretly monitors and collects user information.
### Phishing
Phishing is a social engineering attack that aims to steal sensitive information such as usernames, passwords, and credit card details. Attackers often masquerade as trusted entities in email or other communication channels, tricking victims into providing their information.
### Man-in-the-Middle (MitM) Attacks
MitM attacks occur when an attacker intercepts and potentially alters communication between two parties without their knowledge. This can lead to the unauthorized acquisition of sensitive information.
### Denial-of-Service (DoS) and Distributed Denial-of-Service (DDoS) Attacks
Bridging the Digital Gap Brad Spiegel Macon, GA Initiative.pptxBrad Spiegel Macon GA
Brad Spiegel Macon GA’s journey exemplifies the profound impact that one individual can have on their community. Through his unwavering dedication to digital inclusion, he’s not only bridging the gap in Macon but also setting an example for others to follow.
APNIC Foundation, presented by Ellisha Heppner at the PNG DNS Forum 2024APNIC
Ellisha Heppner, Grant Management Lead, presented an update on APNIC Foundation to the PNG DNS Forum held from 6 to 10 May, 2024 in Port Moresby, Papua New Guinea.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
AI-SDV 2021: Francisco Webber - Efficiency is the New Precision
1. Efficiency is the New Precision
Semantic Supercomputing in the Zettabyte age
Francisco De Sousa Webber
Co-Founder & CEO
f.webber@cortical.io
2. 2
Big Bang: Data Explosion
Transactional Data
Human Files
Social Interactions
M
a
c
h
i
n
e
G
e
n
e
r
a
t
e
d
D
a
t
a
(
I
o
T
)
Terabyte
Petabyte
Exabyte
Zettabyte
Mainframe/Mini Era PC/Client Era Internet Era Virtualisation Era
2025
Data
Volume
3. 3
Current Status
ML & AI is the Answer … Is it?
Productivity Decreases
(Text) Content Increases
Current Von Neumann Computer Platform
Performance Stagnates
6. 6
Von Neumann Computing Limitations
Computing Unit
Processor
Memory
Adress Logic
Arithmetic Unit
Control Unit
I / O World
Address
Bottleneck
1000+
Instructions
Sequencial
Access
Every instruction
passes over Bus
8. 8
Statistical Machine Learning Limitations
Machine
Learning
Use Case Data
Annotated
data
Training Data
Training
Use Case
Model
Training
Engine
Inference
Engine
Data In-Flow
Data
Out-Flow
Test Data
Insufficient
Data
Slow
Training
Model
Imprecision
Inference
Latency
Manual
Effort
9. 9
Statistic AI & ML Problem: Efficiency
Need to improve the Principle not just increase the Computational Power
Findability Issue
Von Neumann Gap
Exponential Power Consumption
Million Model Multiverse
Both are cars, for people, using “water gas” - the difference between them is efficiency ….
Initial Principle - Steam Latest Principle - Hydrogen Fuel Cell
10. 10
Statistical Modelling: Findability Issue
The Google field of view
The actual Internet
The Google “virtual view” of the internet
The “blind spot”
user
internet is growing
visible internet is
growing slowly
invisible internet
is growing fast
The number of new pages
grows faster than the
number of keywords
pointing at them.
11. 11
Statistical Modelling: Von Neumann Gap
1980 1990 2000 2010 2020
Processing Speed
Data Amount
GAP
Current computing paradigm is
insufficient for the growing data load:
• Increased Error Rates
• Increased Power Consumption
• Increased Processing Delays
12. 12
Statistical Modelling: Exponential Energy Need
0%
4%
8%
2018
2030
Current Global Energy Consumption
of Computing Devices
equals that of Global Air Transport
In 2030 Global Energy
Consumption of Computing
Devices will reach that of Global
Automobile Transportation
13. 13
Statistical Modelling: Million Model Multiverse
Individually labeled data for
supervised learning
Local
Use-Case
Local
Statistical
Model
Local
Training
Data
Local
Gold
Standard
Individually trained model
Individually collected and
prepared training data
No network effects
14. 14
Statistical Modelling: Technology Impact
• Findability Issue: Fake News
• Von Neumann Gap: Climate Change
• Stuck in the Average: Innovation Gap
• Phased ML-User Profiles: Populism
When its hard to find
information its also hard to find
the truth
Statistics Averages: Innovation
is not made by Majorities
Green Computing is beyond
the Von Neumann Gap
Statistical ML Models
facilitate Opinion Meddling
15. 15
The Solution: Semantic Folding
Based on recent findings in Neuroscience
Implemented as Unsupervised Machine Learning approach
Replaces complex statistical modelling with Analogical Computation
16. 16
Semantic Folding: Analogical Computation
“signed contract” Overlap 36% ”done deal”
“star trek” Overlap 1%
Similar
Meanings
Different
Meanings
Context:
Bank,
Account,
Holder,
Payment,
Tax,
In-house,
Manager
”done deal”
17. 17
Semantic Fingerprinting
Training of the Semantic Space
Reference Material
Semantic Word Fingerprint Dictionary
Converting Text into Semantic Fingerprints
Use Case Data Semantic Text Fingerprint
Comparing Semantic Fingerprints
18. 18
Level 1: Word Fingerprints
organ
Fingerprint Generation
“organ”
Context 3:
church,
altar,
baroque,
architecture,
renaissance
Context 1:
liver,
heart,
muscle,
endothelia,
body,
anatomy
Context 2b:
piano,
guitar,
trombone,
flute,
trumpet,
quartet,
music
Contexts 2a:
composer,
baroque,
music,
score,
Johann
Sebastian
Bach
19. 19
Level 2: Text Fingerprints
organs and pianos are musical instruments
organs and pianos are musical
instruments
Aggregation + Sparsification
1 2
3
4
20. 20
Many Languages - One Semantic Fingerprint
Concepts & their Representations are Stable Across Languages
philosophy
EN
philosophie
FR
filosofía
ES
философия
RU
فلسفة
AR
哲學
ZH
21. 21
Example document Most similar documents
Ordered along the users
information need
query result set ranking
Similarity Engine
document index
NLU Primitive 1: Semantic Search
22. 22
Email 12
Semantic filter FP
Positive class
Negative class
Semantic
space
trained for
Compliance
SH1_email
AND
SH2_email
AND
SH3_email
Email 189 Email 2443
Email 12 Email 189 Email 2443
NLU Primitive 2: Semantic Classification
23. 23
Socrates 470/469 – 399 BC was a classical Greek (Athenian)
philosopher credited as one of the founders of Western
philosophy. He is an enigmatic figure known chiefly through the
accounts of classical writers, especially the writings of his students
Plato and Xenophon and the plays of his contemporary
Aristophanes. Plato's dialogues are among the most
comprehensive accounts of Socrates to survive from antiquity,
though it is unclear the degree to which Socrates himself is hidden
behind his best disciple, Plato.
Aggregation
[
“plato",
“socrates",
“philosopher",
“aristophanes",
“antiquity",
“writings",
“xenophon",
“dialogues",
“disciple",
“philosophy"
]
Text Fingerprint
Maximize for Similarity
NLU Primitive 3: Keyword Extraction Extract Keywords
Word Fingerprints
24. 24
There are a number of remedies for
snoring, but few are proven clinically
effective. Popular treatments include:
Mechanical devices. Many splints,
braces, and other devices are
available which reposition the nose,
jaw, and/or mouth in order to clear the
airways.
Nasal strips that attach like an
adhesive bandage to the bridge of the
nose are available at most drugstores,
and can help stop snoring in some
individualss. Continuous positive
airway pressure. Several surgical
procedures are available for treat? ing
chronic snoring.
Snoring usually worsens when an
individual sleeps on his or her back, so
sleeping on ones side may alleviate
the problem. Those who have difficulty
staying in a side sleeping position may
find sleeping with pillows behind them
helps them maintain the position
longer.
Retina Engine SVM
Random Forrest
DL Network
Algorithm 1
Algorithm n
• Classification
• Clustering
• Prediction
• Generating
• Computing
• Analyzing
Semantic Folding based Machine Learning
25. 25
Semantic Engine Semantic Search Semantic Annotation
Document
Classification/Clustering
Keyword Extraction
Context Term Generation
Information Discovery
Expert Finding
Text Analytics
Risk Analysis
Business Intelligence
Lease/Credit
Agreements
Cortical.io Engines
26. 26
Hardware Acceleration for Semantic Folding
Match one Query Fingerprint against
an unlimited number of Document
Fingerprints
Match one Filter Fingerprint against a
stream of incoming Fingerprints
Enterprise Search
Discovery Search
Web Search
Social Media Profile Search
Desktop: Each Board searches up to 1 Billion Fingerprints per second
Enterprise: Each Server searches up to 10 Billion Fingerprints per second
Web Scale: Each Rack searches up to 100 Billion Fingerprints per second
Real-time Document Classification
Email Filtering - Routing
DeepPacket Inspection
Social Media Topic Detection
27. 27
Semantic Super Computing Platform
Retina Engine
Converter
Module
Similar Term
Module
Context
Module
Compare
Module
Retina Database
Retina Search
Document Index
Fingerprint
Matcher
Search
Re-ranker
Retina Filter
Filter Bank
Fingerprint
Matcher
Filter
Re-ranker
[
Xilinx
Alveo
Host System
Storage
CPU-cores Memory
X86
Server
Application Server
Retina System
Administration App
Email-Filter App Semantic-Search App Next App
Integration Layer
Identity Access
Management
SMTP
Connector
RDB
Connector
CMS
Connector
DMS
Connector
File Service
Connector
Email-Filter API Semantic-Search API Next Application API
Web Service
Connector
BPM
Connector Management API
&
Monitoring API
28. 28
Comparing the Leading NLU Approaches
"The Enron Email
Corpus Archived
2011-03-08 at the
Wayback Machine"
Retrieved March 5, 2011.
Retina Engine (CPU) Retina Engine (FPGA)
Pure Keyword Baseline (CPU)
FastText (CPU)
Doc2Vec (CPU)
Word2Vec (CPU)
BERT (GPU)
BERT (CPU)
70%
75%
80%
85%
90%
0% 0% 1% 10% 100% 1000%
Precision
Speed ---> faster (logarithmic)
Classification Enron email dataset: Farmer Set
PyTorch
(bert-base-uncased)
PyTorch
(bert-base-uncased)
AWS g3.8xlarge EC2
Scikit-learn
TfidfVectorizer
official
pre-trained model
Facebook
Gensim
implementation
Pre-trained
Google Model
1 x Xilinx
Alveo 250
+
29. 29
• Banking:
• E-mail & Chat Compliance Monitoring
• Credit Risk Analysis
• CRM:
• Customer Intent Analysis
• Legal:
• Contract Intelligence
• Regulatory Process Optimization
• Financial Services:
• Investment Signal Extraction from News
Streams
• Life Sciences:
• Information Discovery
• Media:
• Viewer Stream Analytics
• Automotive:
• Handbook Search
• Car Supplier Management
• Consolidation of Car Terminology
• Technical Support:
• Support Intelligence
• Social Media:
• Organic Topic Mining
• Commerce:
• Catalogue Management & Automation
• Human Resources:
• Job Description - Resume Matching
Demonstrated Semantic Folding Use Cases
30. 30
Simplicity
One Algorithm, One Operator, One Data Format
Compositionality
Words, Sentences, Paragraphs, Documents
Analogy
Normalized Representation, Bitwise Similarity
Modelability
Unsupervised Semantic Model Generation
Efficiency
Small Amounts of Reference Data
Scaleability
One Semantic Model Many Use-Cases
Replicability
Same Use-Case in New Domain
Inspectability
Refinement, Debugging, Verification
Robustness
“Graceful Failing”
NLU by Semantic Folding
31. info@cortical.io
Global Data Sphere (Zettabyte)
Transactional Data
Machine Generated Data
Human Generated Data
Social Media Data
ML-Data
Text-ML-Data
Semantic Folding Potential Market
L
O
G
D
a
t
a
S
e
n
s
o
r
D
a
t
a
T
e
x
t
D
a
t
a
T
e
x
t
D
a
t
a
Market Potential - Semantic Folding