We hosted a fantastic tutorial on Knowledge-infused Deep Learning at the 31st ACM Hypertext Conference on July 14. Broadly, the tutorial covered many exciting applications of Broad- and Community-based Knowledge Graph in Education, Clinical and Social-Media Healthcare, Pandemic, and Cryptomarkets.
We theorized the concept of Knowledge-infusion and showed its importance in gaining explainability and spectacular performance gains. We extended the idea of "Knowledge-infused Deep Learning" to Autonomous Driving, Cyber Social Harms, and DarkWeb.
The tutorial presentation with relevant resources and references are made online at http://kidl2020.aiisc.ai.
PyData Salamanca knowledge infusion in healthcareManas Gaur
The talk describes a paradigm of knowledge-infused learning in healthcare for explainability, interpretability, and traceability of outcome. Thus bridging the gap between AI and Clinical settings and developing architectures that are of clinical relevance.
The emergence in recent years of initiatives like the Linked Open Data (LOD) has led to a significant increase in the amount of structured semantic data on the Web. In this paper we argue that the shareability and wider reuse of such data can very often be hampered by the existence of vagueness within it, as this makes the data’s meaning less explicit. Moreover, as a way to reduce this problem,
we propose a vagueness metaontology that may represent in an explicit way the nature and characteristics of vague elements within semantic data.
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven SummarizationPanos Alexopoulos
The emergence in the last years of initiatives like the Linked Open Data (LOD) has led to a significant increase of the amount of structured semantic data on the Web. Nevertheless, the wider reuse of such public semantic data is inhibited by the difficulty for users to decide whether a given dataset is actually suitable for their needs. This is because semantic datasets typically cover diverse domains, do not follow a unified way of organizing the knowledge and may differ in a number of dimensions. With that in mind, in this paper, we report our work in progress on a goal-driven dataset summarization approach that may facilitate better understanding and reuse-oriented evaluation of available semantic data.
In this talk we will summarise some of the detectable trends on AI beyond deep learning. We will focus on the current transition from deep learning to deep semantics, describing the enabling infrastructures, challenges and opportunities in the construction of the next generation AI systems. The talk will focus on Natural Language Processing (NLP) as an AI sub-domain and will link to the research at the AI Systems Lab at the University of Manchester.
Effective Semantics for Engineering NLP SystemsAndre Freitas
Provide a synthesis of the emerging representation trends behind NLP systems.
Shift in perspective:
Effective engineering (task driven, scalable) instead of sound formalism.
Best-effort representation.
Knowledge Graphs (Frege revisited)
Information Extraction & Text Classification
Distributional Semantic Models
Knowledge Graphs & Distributional Semantics
(Distributional-Relational Models)
Applications of DRMs
KG Completion
Semantic Parsing
Natural Language Inference
PyData Salamanca knowledge infusion in healthcareManas Gaur
The talk describes a paradigm of knowledge-infused learning in healthcare for explainability, interpretability, and traceability of outcome. Thus bridging the gap between AI and Clinical settings and developing architectures that are of clinical relevance.
The emergence in recent years of initiatives like the Linked Open Data (LOD) has led to a significant increase in the amount of structured semantic data on the Web. In this paper we argue that the shareability and wider reuse of such data can very often be hampered by the existence of vagueness within it, as this makes the data’s meaning less explicit. Moreover, as a way to reduce this problem,
we propose a vagueness metaontology that may represent in an explicit way the nature and characteristics of vague elements within semantic data.
Towards Purposeful Reuse of Semantic Datasets Through Goal-Driven SummarizationPanos Alexopoulos
The emergence in the last years of initiatives like the Linked Open Data (LOD) has led to a significant increase of the amount of structured semantic data on the Web. Nevertheless, the wider reuse of such public semantic data is inhibited by the difficulty for users to decide whether a given dataset is actually suitable for their needs. This is because semantic datasets typically cover diverse domains, do not follow a unified way of organizing the knowledge and may differ in a number of dimensions. With that in mind, in this paper, we report our work in progress on a goal-driven dataset summarization approach that may facilitate better understanding and reuse-oriented evaluation of available semantic data.
In this talk we will summarise some of the detectable trends on AI beyond deep learning. We will focus on the current transition from deep learning to deep semantics, describing the enabling infrastructures, challenges and opportunities in the construction of the next generation AI systems. The talk will focus on Natural Language Processing (NLP) as an AI sub-domain and will link to the research at the AI Systems Lab at the University of Manchester.
Effective Semantics for Engineering NLP SystemsAndre Freitas
Provide a synthesis of the emerging representation trends behind NLP systems.
Shift in perspective:
Effective engineering (task driven, scalable) instead of sound formalism.
Best-effort representation.
Knowledge Graphs (Frege revisited)
Information Extraction & Text Classification
Distributional Semantic Models
Knowledge Graphs & Distributional Semantics
(Distributional-Relational Models)
Applications of DRMs
KG Completion
Semantic Parsing
Natural Language Inference
Introducing research works in the area of machine reasoning at our Applied AI Institute, Deakin University, Australia. Covering visual & social reasoning, neural Turing machine and System 2.
Building AI Applications using Knowledge GraphsAndre Freitas
Goals of this Tutorial:
Provide a broad view of the multiple perspectives underlying knowledge graphs.
Show knowledge graphs as a foundation for building AI systems.
Method:
Focus on the contemporary and emerging perspectives.
Sampling exemplar approaches and infrastructures on each of these emerging perspectives (not an exhaustive survey).
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
I will claim that Semantic Web Patterns can drive the next technological breakthrough: they can be key for providing intelligent applications with sophisticated ways of interpreting data. I will picture scenarios of a possible not so far future in order to support my claim. I will argue that current Semantic Web Patterns are not sufficient for addressing the envisioned requirements, and I will suggest a research direction for fixing the problem, which includes the hybridisation of existing computer science pattern-based approaches, and human computing.
Kalpa Gunaratna's Ph.D. dissertation defense: April 19 2017
The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the relationships and classes are defined. Today, there exist large knowledge graphs in the research community (e.g., encyclopedic datasets like DBpedia and Yago) and corporate world (e.g., Google knowledge graph) that encapsulate a large amount of knowledge for human and machine consumption. Typically, they consist of millions of entities and billions of facts describing these entities. While it is good to have this much knowledge available on the Web for consumption, it leads to information overload, and hence proper summarization (and presentation) techniques need to be explored.
In this dissertation, we focus on creating both comprehensive and concise entity summaries at: (i) the single entity level and (ii) the multiple entity level. To summarize a single entity, we propose a novel approach called FACeted Entity Summarization (FACES) that considers importance, which is computed by combining popularity and uniqueness, and diversity of facts getting selected for the summary. We first conceptually group facts using semantic expansion and hierarchical incremental clustering techniques and form facets (i.e., groupings) that go beyond syntactic similarity. Then we rank both the facts and facets using Information Retrieval (IR) ranking techniques to pick the highest ranked facts from these facets for the summary. The important and unique contribution of this approach is that because of its generation of facets, it adds diversity into entity summaries, making them comprehensive. For creating multiple entity summaries, we simultaneously process facts belonging to the given entities using combinatorial optimization techniques. In this process, we maximize diversity and importance of facts within each entity summary and relatedness of facts between the entity summaries. The proposed approach uniquely combines semantic expansion, graph-based relatedness, and combinatorial optimization techniques to generate relatedness-based multi-entity summaries.
Complementing the entity summarization approaches, we introduce a novel approach using light Natural Language Processing (NLP) techniques to enrich knowledge graphs by adding type semantics to literals.
This paper discusses the “Fine-Grained
Sentiment Analysis on Financial Microblogs
and News” task as part of
SemEval-2017, specifically under the
“Detecting sentiment, humour, and truth”
theme. This task contains two tracks, where
the first one concerns Microblog messages
and the second one covers News Statements
and Headlines. The main goal behind both
tracks was to predict the sentiment score for
each of the mentioned companies/stocks.
The sentiment scores for each text instance
adopted floating point values in the range
of -1 (very negative/bearish) to 1 (very
positive/bullish), with 0 designating neutral
sentiment. This task attracted a total of 32
participants, with 25 participating in Track
1 and 29 in Track 2.
Social media provides a natural platform for dynamic emergence of citizen (as) sensor communities, where the citizens share information, express opinions, and engage in discussions. Often such a Online Citizen Sensor Community (CSC) has stated or implied goals related to workflows of organizational actors with defined roles and responsibilities. For example, a community of crisis response volunteers, for informing the prioritization of responses for resource needs (e.g., medical) to assist the managers of crisis response organizations. However, in CSC, there are challenges related to information overload for organizational actors, including finding reliable information providers and finding the actionable information from citizens. This threatens awareness and articulation of workflows to enable cooperation between citizens and organizational actors. CSCs supported by Web 2.0 social media platforms offer new opportunities and pose new challenges. This work addresses issues of ambiguity in interpreting unconstrained natural language (e.g., ‘wanna help’ appearing in both types of messages for asking and offering help during crises), sparsity of user and group behaviors (e.g., expression of specific intent), and diversity of user demographics (e.g., medical or technical professional) for interpreting user-generated data of citizen sensors. Interdisciplinary research involving social and computer sciences is essential to address these socio-technical issues in CSC, and allow better accessibility to user-generated data at higher level of information abstraction for organizational actors. This study presents a novel web information processing framework focused on actors and actions in cooperation, called Identify-Match-Engage (IME), which fuses top-down and bottom-up computing approaches to design a cooperative web information system between citizens and organizational actors. It includes a.) identification of action related seeking-offering intent behaviors from short, unstructured text documents using both declarative and statistical knowledge based classification model, b.) matching of intentions about seeking and offering, and c.) engagement models of users and groups in CSC to prioritize whom to engage, by modeling context with social theories using features of users, their generated content, and their dynamic network connections in the user interaction networks. The results show an improvement in modeling efficiency from the fusion of top-down knowledge-driven and bottom-up data-driven approaches than from conventional bottom-up approaches alone for modeling intent and engagement. Several applications of this work include use of the engagement interface tool during recent crises to enable efficient citizen engagement for spreading critical information of prioritized needs to ensure donation of only required supplies by the citizens. The engagement interface application also won the United Nations ICT agency ITU's Young Innovator 2014 award.
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...Panos Alexopoulos
The advent and wide proliferation of Social Web in the re-
cent years has promoted the concept of social interaction as an important influencing factor of the way enterprises and organizations conduct business. Among the fields influenced is that of Enterprise Knowledge Management, where adoption of social computing approaches aims at increasing and maintaining at high levels the active participation of users in the organization's knowledge management activities. An important challenge towards this is the achievement of the right balance between informalities of socially generated data and the required formality of enterprise knowledge. In this context, we focus on the problem of mining vague knowledge from social content generated within an enterprise framework and we propose a learning framework based on microblogging and fuzzy ontologies.
Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas
The identification of semantic relations between terms within texts is a fundamental task in Natural Language Processing which can support applications requiring a lightweight semantic interpretation model. Currently, semantic relation classification concentrates on relations which are evaluated over open-domain data. This work provides a critique on the set of abstract relations used for semantic relation classification with regard to their ability to express relationships between terms which are found in a domain-specific corpora. Based on this analysis, this work proposes an alternative semantic relation model based on reusing and extending the set of abstract relations present in the DOLCE ontology. The resulting set of relations is well grounded,
allows to capture a wide range of relations and could thus be used as a foundation for automatic classification of semantic relations.
This talk provides a speculative contemplation of philosophical topics that might arise with brain-machine interface technology and explores the new ways that individuals and society might self-enact as a result. Brain-machine interfaces that could be pervasive, continuous, and widely-adopted suggest interesting new possibilities for our future selves. From a philosophical perspective, these possibilities concern the definition of what it is to be human, our current existence and interaction with reality, and how all of this could be dramatically different in a scenario of digitally-linked cloudmind collaborations. This talk looks at some of the foundational ontological questions of how the progression of the existence of the classic human might evolve. Perhaps the most pressing question that currently-minded potential adopters have is how to avoid getting irreparably pulled into a groupmind. To protect against this, there could be an expansion and letting go of the term and concepts of personal identity, and humans as a unit of organization, in favor of instead self-relying on a decentralized permissioning structure like blockchain technology for managing empowered and resilient crowdmind participations.
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...mlaij
Developments in natural language processing (NLP) techniques, convolutional neural networks (CNNs), and long-short- term memory networks (LSTMs) allow for a state-of-the-art automated system capable of predicting the status (pass/fail) of congressional roll call votes. The paper introduces a custom hybrid model labeled "Predict Text Classification Network" (PTCN), which inputs legislation and outputs a prediction of the document's classification (pass/fail). The convolutional layers and the LSTM layers automatically recognize features from the input data's latent space. The PTCN's custom architecture provides elements enabling adaptation to the input's variance from adjustment to the kernel weights over time. On the document level, the model reported an average evaluation of 67.32% using 10-fold crossvalidation. The results suggest that the model can recognize congressional voting behaviors from the associated legislation's language. Overall, the PTCN provides a solution with competitive performance to related systems targeting congressional roll call votes.
MalayIK: An Ontological Approach to Knowledge Transformation in Malay Unstruc...IJECEIAES
The number of unstructured documents written in Malay language is enormously available on the web and intranets. However, unstructured documents cannot be queried in simple ways, hence the knowledge contained in such documents can neither be used by automatic systems nor could be understood easily and clearly by humans. This paper proposes a new approach to transform extracted knowledge in Malay unstructured document using ontology by identifying, organizing, and structuring the documents into an interrogative structured form. A Malay knowledge base, the MalayIK corpus is developed and used to test the MalayIK-Ontology against Ontos, an existing data extraction engine. The experimental results from MalayIKOntology have shown a significant improvement of knowledge extraction over Ontos implementation. This shows that clear knowledge organization and structuring concept is able to increase understanding, which leads to potential increase in sharable and reusable of concepts among the community.
Comparative Study on Lexicon-based sentiment analysers over Negative sentimentAI Publications
Sentiment Analysis or Opinion Mining is one of the latest trends of social listening, which is presently reshaping Commercial Organisations. It is a significant task of Natural Language Processing (NLP). The vast availability of product review data within Social media like Twitter, Facebook, and e-commerce site like Amazon, Alibaba. An organisation can get insight into a customer's mind based on a product or what type of opinion the product has generated in the market. Accordingly, an organisation can take some reactive preventive measures. While analysing the above, we have found that negative opinion has a strong effect on customers' minds than the positive one. Also, negative opinions are more viral in terms of diffusion. Our present work is based on a comparison of two available rule-based Sentiment analysers, VADER, and TextBlob on domain-specific product review data from Amazon.co.in. It investigates, which has higher accuracy in terms of classifying negative opinions. Our research has found out that VADER’s negative polarity sentiment classification accuracy is more elevated than TextBlob.
Automatic detection of online abuse and analysis of problematic users in wiki...Melissa Moody
For their 2019 capstone project, DSI Master of Science in Data Science students Charu Rawat, Arnab Sarkar, and Sameer Singh proposed a framework to understand and detect such abuse in the English Wikipedia community.
Rawat, Sarkar, and Singh received the award for Best Paper in the Data Science for Society category at the 2019 Systems & Information Design Symposium (SIEDS). In "Automatic Detection of Online Abuse and Analysis of Problematic Users in Wikipedia," the team presented an analysis of user misconduct in Wikipedia and a system for the automated early detection of inappropriate behavior.
Data Collection Methods for Building a Free Response Training SimulationMelissa Moody
Master of Science in Data Science capstone project researchers Vaibhav Sharma, Beni Shpringer, and Michael Yang, along with UVA School of Engineering M.S. student Martin Bolger and Ph.D. students Sodiq Adewole and Erfaneh Gharavi, sought to develop new methods for collecting, generating, and labeling data to aid in the creation of educational, free-input dialogue simulations.
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Amit Sheth
Keynote at the IEEE ICSC Workshop on Semantic Machine Learning (#SML21: https://ist.gmu.edu/~hpurohit/events/sml21/#keynote):
Video of SML21: https://www.youtube.com/watch?v=cx-l0XDk9Tw
The recent series of deep learning innovations have shown enormous potential to impact individuals and society, both positively and negatively. The deep learning models utilizing massive computing power and enormous datasets have significantly outperformed prior historical benchmarks on increasingly difficult, well-defined research tasks across technology domains such as computer vision, natural language processing, signal processing, and human-computer interactions. However, the Black-Box nature of deep learning models and their over-reliance on massive amounts of data condensed into labels and dense representations pose challenges for the system’s interpretability and explainability. Furthermore, deep learning methods have not yet been proven in their ability to effectively utilize relevant domain knowledge and experience critical to human understanding. This aspect is missing in early data-focused approaches and necessitated knowledge-infused learning and other strategies to incorporate computational knowledge. Rapid advances in our ability to create and reuse structured knowledge as knowledge graphs make this task viable. In this talk, we will outline how knowledge, provided as a knowledge graph, is incorporated into the deep learning methods using knowledge-infused learning. We then discuss how this makes a fundamental difference in the interpretability and explainability of current approaches and illustrate it with examples relevant to a few domains.
The recent series of innovations in deep learning have shown enormous potential to impact individuals and society, both positively and negatively. The deep learning models utilizing massive computing power and enormous datasets have significantly outperformed prior historical benchmarks on increasingly difficult, well-defined research tasks across technology domains such as computer vision, natural language processing, signal processing, and human-computer interactions. However, the Black-Box nature of deep learning models and their over-reliance on massive amounts of data condensed into labels and dense representations pose challenges for the system’s interpretability and explainability. Furthermore, deep learning methods have not yet been proven in their ability to effectively utilize relevant domain knowledge and experience critical to human understanding. This aspect is missing in early data-focused approaches and necessitated knowledge-infused learning and other strategies to incorporate computational knowledge. Rapid advances in our ability to create and reuse structured knowledge as knowledge graphs make this task viable. In this talk, we will outline how knowledge, provided as a knowledge graph, is incorporated into the deep learning methods using knowledge-infused learning. We then discuss how this makes a fundamental difference in the interpretability and explainability of current approaches and illustrate it with examples relevant to a few domains.
Introducing research works in the area of machine reasoning at our Applied AI Institute, Deakin University, Australia. Covering visual & social reasoning, neural Turing machine and System 2.
Building AI Applications using Knowledge GraphsAndre Freitas
Goals of this Tutorial:
Provide a broad view of the multiple perspectives underlying knowledge graphs.
Show knowledge graphs as a foundation for building AI systems.
Method:
Focus on the contemporary and emerging perspectives.
Sampling exemplar approaches and infrastructures on each of these emerging perspectives (not an exhaustive survey).
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
I will claim that Semantic Web Patterns can drive the next technological breakthrough: they can be key for providing intelligent applications with sophisticated ways of interpreting data. I will picture scenarios of a possible not so far future in order to support my claim. I will argue that current Semantic Web Patterns are not sufficient for addressing the envisioned requirements, and I will suggest a research direction for fixing the problem, which includes the hybridisation of existing computer science pattern-based approaches, and human computing.
Kalpa Gunaratna's Ph.D. dissertation defense: April 19 2017
The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the relationships and classes are defined. Today, there exist large knowledge graphs in the research community (e.g., encyclopedic datasets like DBpedia and Yago) and corporate world (e.g., Google knowledge graph) that encapsulate a large amount of knowledge for human and machine consumption. Typically, they consist of millions of entities and billions of facts describing these entities. While it is good to have this much knowledge available on the Web for consumption, it leads to information overload, and hence proper summarization (and presentation) techniques need to be explored.
In this dissertation, we focus on creating both comprehensive and concise entity summaries at: (i) the single entity level and (ii) the multiple entity level. To summarize a single entity, we propose a novel approach called FACeted Entity Summarization (FACES) that considers importance, which is computed by combining popularity and uniqueness, and diversity of facts getting selected for the summary. We first conceptually group facts using semantic expansion and hierarchical incremental clustering techniques and form facets (i.e., groupings) that go beyond syntactic similarity. Then we rank both the facts and facets using Information Retrieval (IR) ranking techniques to pick the highest ranked facts from these facets for the summary. The important and unique contribution of this approach is that because of its generation of facets, it adds diversity into entity summaries, making them comprehensive. For creating multiple entity summaries, we simultaneously process facts belonging to the given entities using combinatorial optimization techniques. In this process, we maximize diversity and importance of facts within each entity summary and relatedness of facts between the entity summaries. The proposed approach uniquely combines semantic expansion, graph-based relatedness, and combinatorial optimization techniques to generate relatedness-based multi-entity summaries.
Complementing the entity summarization approaches, we introduce a novel approach using light Natural Language Processing (NLP) techniques to enrich knowledge graphs by adding type semantics to literals.
This paper discusses the “Fine-Grained
Sentiment Analysis on Financial Microblogs
and News” task as part of
SemEval-2017, specifically under the
“Detecting sentiment, humour, and truth”
theme. This task contains two tracks, where
the first one concerns Microblog messages
and the second one covers News Statements
and Headlines. The main goal behind both
tracks was to predict the sentiment score for
each of the mentioned companies/stocks.
The sentiment scores for each text instance
adopted floating point values in the range
of -1 (very negative/bearish) to 1 (very
positive/bullish), with 0 designating neutral
sentiment. This task attracted a total of 32
participants, with 25 participating in Track
1 and 29 in Track 2.
Social media provides a natural platform for dynamic emergence of citizen (as) sensor communities, where the citizens share information, express opinions, and engage in discussions. Often such a Online Citizen Sensor Community (CSC) has stated or implied goals related to workflows of organizational actors with defined roles and responsibilities. For example, a community of crisis response volunteers, for informing the prioritization of responses for resource needs (e.g., medical) to assist the managers of crisis response organizations. However, in CSC, there are challenges related to information overload for organizational actors, including finding reliable information providers and finding the actionable information from citizens. This threatens awareness and articulation of workflows to enable cooperation between citizens and organizational actors. CSCs supported by Web 2.0 social media platforms offer new opportunities and pose new challenges. This work addresses issues of ambiguity in interpreting unconstrained natural language (e.g., ‘wanna help’ appearing in both types of messages for asking and offering help during crises), sparsity of user and group behaviors (e.g., expression of specific intent), and diversity of user demographics (e.g., medical or technical professional) for interpreting user-generated data of citizen sensors. Interdisciplinary research involving social and computer sciences is essential to address these socio-technical issues in CSC, and allow better accessibility to user-generated data at higher level of information abstraction for organizational actors. This study presents a novel web information processing framework focused on actors and actions in cooperation, called Identify-Match-Engage (IME), which fuses top-down and bottom-up computing approaches to design a cooperative web information system between citizens and organizational actors. It includes a.) identification of action related seeking-offering intent behaviors from short, unstructured text documents using both declarative and statistical knowledge based classification model, b.) matching of intentions about seeking and offering, and c.) engagement models of users and groups in CSC to prioritize whom to engage, by modeling context with social theories using features of users, their generated content, and their dynamic network connections in the user interaction networks. The results show an improvement in modeling efficiency from the fusion of top-down knowledge-driven and bottom-up data-driven approaches than from conventional bottom-up approaches alone for modeling intent and engagement. Several applications of this work include use of the engagement interface tool during recent crises to enable efficient citizen engagement for spreading critical information of prioritized needs to ensure donation of only required supplies by the citizens. The engagement interface application also won the United Nations ICT agency ITU's Young Innovator 2014 award.
Learning Vague Knowledge From Socially Generated Content in an Enterprise Fra...Panos Alexopoulos
The advent and wide proliferation of Social Web in the re-
cent years has promoted the concept of social interaction as an important influencing factor of the way enterprises and organizations conduct business. Among the fields influenced is that of Enterprise Knowledge Management, where adoption of social computing approaches aims at increasing and maintaining at high levels the active participation of users in the organization's knowledge management activities. An important challenge towards this is the achievement of the right balance between informalities of socially generated data and the required formality of enterprise knowledge. In this context, we focus on the problem of mining vague knowledge from social content generated within an enterprise framework and we propose a learning framework based on microblogging and fuzzy ontologies.
Semantic Relation Classification: Task Formalisation and RefinementAndre Freitas
The identification of semantic relations between terms within texts is a fundamental task in Natural Language Processing which can support applications requiring a lightweight semantic interpretation model. Currently, semantic relation classification concentrates on relations which are evaluated over open-domain data. This work provides a critique on the set of abstract relations used for semantic relation classification with regard to their ability to express relationships between terms which are found in a domain-specific corpora. Based on this analysis, this work proposes an alternative semantic relation model based on reusing and extending the set of abstract relations present in the DOLCE ontology. The resulting set of relations is well grounded,
allows to capture a wide range of relations and could thus be used as a foundation for automatic classification of semantic relations.
This talk provides a speculative contemplation of philosophical topics that might arise with brain-machine interface technology and explores the new ways that individuals and society might self-enact as a result. Brain-machine interfaces that could be pervasive, continuous, and widely-adopted suggest interesting new possibilities for our future selves. From a philosophical perspective, these possibilities concern the definition of what it is to be human, our current existence and interaction with reality, and how all of this could be dramatically different in a scenario of digitally-linked cloudmind collaborations. This talk looks at some of the foundational ontological questions of how the progression of the existence of the classic human might evolve. Perhaps the most pressing question that currently-minded potential adopters have is how to avoid getting irreparably pulled into a groupmind. To protect against this, there could be an expansion and letting go of the term and concepts of personal identity, and humans as a unit of organization, in favor of instead self-relying on a decentralized permissioning structure like blockchain technology for managing empowered and resilient crowdmind participations.
A Deep Learning Model to Predict Congressional Roll Call Votes from Legislati...mlaij
Developments in natural language processing (NLP) techniques, convolutional neural networks (CNNs), and long-short- term memory networks (LSTMs) allow for a state-of-the-art automated system capable of predicting the status (pass/fail) of congressional roll call votes. The paper introduces a custom hybrid model labeled "Predict Text Classification Network" (PTCN), which inputs legislation and outputs a prediction of the document's classification (pass/fail). The convolutional layers and the LSTM layers automatically recognize features from the input data's latent space. The PTCN's custom architecture provides elements enabling adaptation to the input's variance from adjustment to the kernel weights over time. On the document level, the model reported an average evaluation of 67.32% using 10-fold crossvalidation. The results suggest that the model can recognize congressional voting behaviors from the associated legislation's language. Overall, the PTCN provides a solution with competitive performance to related systems targeting congressional roll call votes.
MalayIK: An Ontological Approach to Knowledge Transformation in Malay Unstruc...IJECEIAES
The number of unstructured documents written in Malay language is enormously available on the web and intranets. However, unstructured documents cannot be queried in simple ways, hence the knowledge contained in such documents can neither be used by automatic systems nor could be understood easily and clearly by humans. This paper proposes a new approach to transform extracted knowledge in Malay unstructured document using ontology by identifying, organizing, and structuring the documents into an interrogative structured form. A Malay knowledge base, the MalayIK corpus is developed and used to test the MalayIK-Ontology against Ontos, an existing data extraction engine. The experimental results from MalayIKOntology have shown a significant improvement of knowledge extraction over Ontos implementation. This shows that clear knowledge organization and structuring concept is able to increase understanding, which leads to potential increase in sharable and reusable of concepts among the community.
Comparative Study on Lexicon-based sentiment analysers over Negative sentimentAI Publications
Sentiment Analysis or Opinion Mining is one of the latest trends of social listening, which is presently reshaping Commercial Organisations. It is a significant task of Natural Language Processing (NLP). The vast availability of product review data within Social media like Twitter, Facebook, and e-commerce site like Amazon, Alibaba. An organisation can get insight into a customer's mind based on a product or what type of opinion the product has generated in the market. Accordingly, an organisation can take some reactive preventive measures. While analysing the above, we have found that negative opinion has a strong effect on customers' minds than the positive one. Also, negative opinions are more viral in terms of diffusion. Our present work is based on a comparison of two available rule-based Sentiment analysers, VADER, and TextBlob on domain-specific product review data from Amazon.co.in. It investigates, which has higher accuracy in terms of classifying negative opinions. Our research has found out that VADER’s negative polarity sentiment classification accuracy is more elevated than TextBlob.
Automatic detection of online abuse and analysis of problematic users in wiki...Melissa Moody
For their 2019 capstone project, DSI Master of Science in Data Science students Charu Rawat, Arnab Sarkar, and Sameer Singh proposed a framework to understand and detect such abuse in the English Wikipedia community.
Rawat, Sarkar, and Singh received the award for Best Paper in the Data Science for Society category at the 2019 Systems & Information Design Symposium (SIEDS). In "Automatic Detection of Online Abuse and Analysis of Problematic Users in Wikipedia," the team presented an analysis of user misconduct in Wikipedia and a system for the automated early detection of inappropriate behavior.
Data Collection Methods for Building a Free Response Training SimulationMelissa Moody
Master of Science in Data Science capstone project researchers Vaibhav Sharma, Beni Shpringer, and Michael Yang, along with UVA School of Engineering M.S. student Martin Bolger and Ph.D. students Sodiq Adewole and Erfaneh Gharavi, sought to develop new methods for collecting, generating, and labeling data to aid in the creation of educational, free-input dialogue simulations.
Semantics of the Black-Box: Using knowledge-infused learning approach to make...Amit Sheth
Keynote at the IEEE ICSC Workshop on Semantic Machine Learning (#SML21: https://ist.gmu.edu/~hpurohit/events/sml21/#keynote):
Video of SML21: https://www.youtube.com/watch?v=cx-l0XDk9Tw
The recent series of deep learning innovations have shown enormous potential to impact individuals and society, both positively and negatively. The deep learning models utilizing massive computing power and enormous datasets have significantly outperformed prior historical benchmarks on increasingly difficult, well-defined research tasks across technology domains such as computer vision, natural language processing, signal processing, and human-computer interactions. However, the Black-Box nature of deep learning models and their over-reliance on massive amounts of data condensed into labels and dense representations pose challenges for the system’s interpretability and explainability. Furthermore, deep learning methods have not yet been proven in their ability to effectively utilize relevant domain knowledge and experience critical to human understanding. This aspect is missing in early data-focused approaches and necessitated knowledge-infused learning and other strategies to incorporate computational knowledge. Rapid advances in our ability to create and reuse structured knowledge as knowledge graphs make this task viable. In this talk, we will outline how knowledge, provided as a knowledge graph, is incorporated into the deep learning methods using knowledge-infused learning. We then discuss how this makes a fundamental difference in the interpretability and explainability of current approaches and illustrate it with examples relevant to a few domains.
The recent series of innovations in deep learning have shown enormous potential to impact individuals and society, both positively and negatively. The deep learning models utilizing massive computing power and enormous datasets have significantly outperformed prior historical benchmarks on increasingly difficult, well-defined research tasks across technology domains such as computer vision, natural language processing, signal processing, and human-computer interactions. However, the Black-Box nature of deep learning models and their over-reliance on massive amounts of data condensed into labels and dense representations pose challenges for the system’s interpretability and explainability. Furthermore, deep learning methods have not yet been proven in their ability to effectively utilize relevant domain knowledge and experience critical to human understanding. This aspect is missing in early data-focused approaches and necessitated knowledge-infused learning and other strategies to incorporate computational knowledge. Rapid advances in our ability to create and reuse structured knowledge as knowledge graphs make this task viable. In this talk, we will outline how knowledge, provided as a knowledge graph, is incorporated into the deep learning methods using knowledge-infused learning. We then discuss how this makes a fundamental difference in the interpretability and explainability of current approaches and illustrate it with examples relevant to a few domains.
Knowledge Graphs and their central role in big data processing: Past, Present...Amit Sheth
Keynote at CODS-COMAD 2020, Hyderabad, India, 06 Jan 2020: https://cods-comad.in/keynotes.html
Abstract : Early use of knowledge graphs, before the start of this century, related to building a knowledge graph manually or semi-automatically and applying them for semantic applications, such as search, browsing, personalization, and advertisement. Taalee/Semagix Semantic Search in 2000 had a KG that covered many domains and supported search with an equivalent of today’s infobox. Along with the growth of big data, machine learning became the preferred technique for searching, analyzing and deriving insights from such data. We observed the complementary nature of bottom-up (machine learning-driven) and top-down (semantic, knowledge graph and planning based) techniques. Recently we have seen growing efforts involving the shallow use of a knowledge graph to improve the semantic and conceptual processing of data. The future promises deeper and congruent incorporation or integration of the knowledge graphs in the learning techniques (which we call knowledge-infused learning), where knowledge graphs combining statistical AI (bottom-up) and symbolic AI learning techniques (top-down) play a critical role in hybrid and integrated intelligent systems. Throughout this talk, we will provide real-world examples, products, and applications where the knowledge graph played a pivotal role.
The talk describes a paradigm of knowledge-infused learning in healthcare for explainability, interpretability, and traceability of outcome. Thus bridging the gap between AI and Clinical settings and developing architectures that are of clinical relevance.
In this lecture we explore how big datasets can be used with the Weka workbench and what other issues are currently under discussion in the real world, for ex: big data applications, predictive linguistic analysis, new platforms and new programming languages.
https://asonam.cpsc.ucalgary.ca/2021/speakers.php
With the increasing legalization of medical and recreational use of substances, more research is needed to understand the association between mental health and user behavior related to drug consumption. Specifically, drug overdose and substance use- related mental health issues have become two major topics that have been widely discussed on social media platforms. Big social media data has the potential to provide deeper insights about these associations to public health analysts for making policy decisions. Multiple national population surveys have found that about half of those who experience a mental health illness during their lives will also experience a substance use disorder and vice versa. The communications related to addiction and mental health are complex to process and understand given their language and contextual characteristics. Surface-level data analysis alone is not sufficient to understand the complex nature of relationships among the addiction and mental health context. Moreover, dark web vendors have been using social media as a new marketplace for drugs. Social media users also discuss the novel drugs emerging in dark web marketplaces and associated side effects/health conditions. These communications get complex when researchers try to annotate them or link them to a specific mental health entity. Considering the significant sensitivity of such communications and to protect user privacy on social media, a potential solution requires reliable algorithms for modeling such communications. We demonstrate the value of incorporating domain-specific knowledge in natural language understanding to identify the relationship between mental health and drug addiction. We discuss end-to-end knowledge-infused deep learning frameworks that leverage the pre-trained language representation model and domain-specific declarative knowledge source to extract entities and their relationships jointly. Our model is further tailored to focus on the entities mentioned in the sentence where ontology is used to locate the target entity, position. We also demonstrate the capabilities of inclusion of the knowledge-aware representation in association with language models that can extract the Drug and Mental health condition associations.
Explainable AI is not yet Understandable AIepsilon_tud
Keynote of Dr. Nava Tintarev at RCIS'2020. Decision-making at individual, business, and societal levels is influenced by online content. Filtering and ranking algorithms such as those used in recommender systems are used to support these decisions. However, it is often not clear to a user whether the advice given is suitable to be followed, e.g., whether it is correct, whether the right information was taken into account, or if the user’s best interests were taken into consideration. In other words, there is a large mismatch between the representation of the advice by the system versus the representation assumed by its users. This talk addresses why we (might) want to develop advice-giving systems that can explain themselves, and how we can assess whether we are successful in this endeavor. This talk will also describe some of the state-of-the-art in explanations in a number of domains (music, tweets, and news articles) that help link the mental models of systems and people. However, it is not enough to generate rich and complex explanations; more is required in order to understand and be understood. This entails among other factors decisions around which information to select to show to people, and how to present that information, often depending on the target users and contextual factors
3282016 Additional Book Resourceshttpscourserooma.cap.docxtamicawaysmith
3/28/2016 Additional Book Resources
https://courserooma.capella.edu/bbcswebdav/institution/ITFP/ITFP3300/Version0715/Course_Files/cf_additional_book_resources.html 1/2
To conduct additional research, you may search your local library or bookstore for the following course
related books:
BagtesBrkljac, N. (2012). Computer science, technology and applications: Virtual reality. Hauppage, NY:
Nova Science Publishers.
Crandall, B., Klein, G., & Hoffman, R. R. (2006). Working minds: A practitioner's guide to cognitive task
analysis. Cambridge, MA: MIT Press.
Dautenhahn, K., Bond, A. H., & Cañamero, L. (2002). Socially intelligent agents: Creating relationships
with computers and robots. Hingham, MA: Kluwer Academic Publishers.
Emerald Publishing Group. (2005). Digital library usability studies. Bradford, UK: Emerald Group
Publishing.
Fowler, S., & Stanwick, V. (2004). Interactive technologies: Web application design handbook: Best
practices for webbased software. Burlington, MA: Morgan Kaufmann.
Hillis, K. (1999). Digital sensations: Space, identity, and embodiment in virtual reality. Minneapolis, MN:
University of Minnesota Press.
Hashimoto, A. (2003). Visual design fundamentals: A digital approach. Irvine, CA: Delmar Cengage
Learning.
Holland, J. M. (2003). Designing autonomous mobile robots: Inside the mind of an intelligent machine.
Burlington, MA: Newnes Publishing.
Leung, L. (2008). Digital experience design: Ideas, industries, interaction. Bristol, UK: Intellect Ltd
Publishers.
Mavor, A. S., & Durlach, N. I. (Eds.). (1994). Virtual reality: Scientific and technological challenges.
Washington, DC: National Academies Press.
Proctor, R. W., & KimPhuong, L. V. (2004). Handbook of human factors in web design. Boca Raton, FL:
CRC Press.
Salvendy, G. (2012). Handbook of human factors and ergonomics. (4th ed.). Hoboken, NJ: John Wiley &
Sons.
Sherman, P. (2006). Usability success stories: How organizations improve by making easiertouse software
and web sites. Burlington, VT: Ashgate Publishing Company.
Steinfeld, E., & Maisel, J. L. (2012). Universal design: Creating inclusive environments. Hoboken, NJ:
John Wiley & Sons.
Westwood, J. D., Haluck, R. S., & Hoffman, H. M. (2007). Studies in health technology and informatics:
Medicine meets virtual reality. Amsterdam, Netherlands: IOS Press.
Print
Additional Book Resources
javascript:window.print()
3/28/2016 Additional Book Resources
https://courserooma.capella.edu/bbcswebdav/institution/ITFP/ITFP3300/Version0715/Course_Files/cf_additional_book_resources.html 2/2
Woolgar, S. (2002). Virtual society?: Technology, cyberbole, reality. Oxford, UK: Oxford University Press.
Designing a Complete Network Security Policy
Learning Outcomes: At the end of the assignment, student should be able:
· To have an understanding of the network security issues in organizations and how to solve them by developing and applying a network security policy, which contains different security ...
Full day lectures @International University, HCM City, Vietnam, May 2019. Review of AI in 2019; outlook into the future; empirical research in AI; introduction to AI research at Deakin University
Presentation given at the HEA Social Sciences learning and teaching summit 'Exploring the implications of ‘the era of big data’ for learning and teaching'.
A blog post outlining the issues discussed at the summit is available via: http://bit.ly/1lCBUIB
A DEVELOPMENT FRAMEWORK FOR A CONVERSATIONAL AGENT TO EXPLORE MACHINE LEARNIN...mlaij
This study aims to introduce a discussion platform and curriculum designed to help people understand how
machines learn. Research shows how to train an agent through dialogue and understand how information
is represented using visualization. This paper starts by providing a comprehensive definition of AI literacy
based on existing research and integrates a wide range of different subject documents into a set of key AI
literacy skills to develop a user-centered AI. This functionality and structural considerations are organized
into a conceptual framework based on the literature. Contributions to this paper can be used to initiate
discussion and guide future research on AI learning within the computer science community.
Arguments of DefintionChapter 9Arguments of Defi.docxjewisonantone
Arguments of Defintion
Chapter 9
Arguments of DefinitionThese arguments are particularly powerful in that they help determine what something or someone is. Thus, they can result in inclusion or exclusion.They help us recognize that classifications change over time and are the result of cultural, social, and political forces.Definitions often serve group agendas while ignoring or attempting to silence others.Often evolve from daily life.
Arguments of DefinitionWe rely on definition for successful, efficient communication.As you have experienced with the Fact Paper, our ability to make an argument is limited when we cannot appeal to values.Contrary to the belief that values diminish the validity of an argument by rendering it mere opinion, values are a necessary part of the argument.Indeed, they are the very heart of an argument.Thus, evaluative terms are notoriously difficult to define.
Formal Definitions
(pp.200-201)Questions related to genus:Is assisting in suicide a crime?Is NASCAR a sport?Is rap poetry?What is an X [insert your own choice here]
Questions related to species:Is marijuana a relatively harmless drug or a dangerous, addictive one?Is Saudi Arabia an ally or an opponent of the USA?Is TV’s “Survivor” a reality show or a game show?Is X a Y or a Z [Insert your own topic}
Questions related to conditions:Should a woman be held to the same physical requirements as a man in order to join the military?Should everyone pay the same percentage of their income taxes regardless of their income?Are high scores on the SAT’s a fair condition for entrance into universities?Must X occur in order for Y? [Insert your topic]
Questions related to the fulfillment of conditions:Should academic scholarships count as taxable income?Should nontraditional educated experiences, such as semesters abroad and internships, count for college credit?Should X be counted as Y for the purposes of Z? [Insert your topic]
In summary, keep in mind that you can approach an argument of definition by:
Formulating a definition (What is X?): “Terrorism is any non-wartime act of violence undertaken for political gain.”Negative definition (Y is not X.): “Violence undertaken as part of a revolt against an oppressive regime is not terrorism.”Definition by Example (Y is/is not X): “The Irish Republican Army is/is not a terrorist organization.”
Other items to consider:Who is your specific audience?What are the counter-arguments to your proposed definition?In other words, anticipate oppositional stances.How would you refute those stances?Do not forget about visuals and design of arguments of definition.Matching claims to definitions is critical.
8PROCEEDINGS OF THE IRE
Steps Toward Artificial Intelligence*
MARVIN MINSKYt, MEMBER, IRE
The work toward attaining "artificial intelligence" is the center of considerable computer research, design,
and application. The field is in its starting transient, characterized by many varied and independent efforts.
.
Design considerations for machine learning systemAkemi Tazaki
Critical commentary based on my professional experience in designing apps with artificial intelligence and on desktop research. Presentation slides for Botscampe 2016.
Similar to ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep Learning (20)
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Using recycled concrete aggregates (RCA) for pavements is crucial to achieving sustainability. Implementing RCA for new pavement can minimize carbon footprint, conserve natural resources, reduce harmful emissions, and lower life cycle costs. Compared to natural aggregate (NA), RCA pavement has fewer comprehensive studies and sustainability assessments.
Water billing management system project report.pdfKamal Acharya
Our project entitled “Water Billing Management System” aims is to generate Water bill with all the charges and penalty. Manual system that is employed is extremely laborious and quite inadequate. It only makes the process more difficult and hard.
The aim of our project is to develop a system that is meant to partially computerize the work performed in the Water Board like generating monthly Water bill, record of consuming unit of water, store record of the customer and previous unpaid record.
We used HTML/PHP as front end and MYSQL as back end for developing our project. HTML is primarily a visual design environment. We can create a android application by designing the form and that make up the user interface. Adding android application code to the form and the objects such as buttons and text boxes on them and adding any required support code in additional modular.
MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software. It is a stable ,reliable and the powerful solution with the advanced features and advantages which are as follows: Data Security.MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software.
ACM Hypertext and Social Media Conference Tutorial on Knowledge-infused Deep Learning
1. Knowledge-infused Deep Learning
Artificial Intelligence
Institute
Tutorial
Amit Sheth, Manas Gaur, Ugur Kursuncu, Ruwan Wickramarachchi, Shweta Yadav
Artificial Intelligence Institute, University of South Carolina, USA
Check Tutorial site for latest slides: http://kidl2020.aiisc.ai/
2. Tutorial Thesis
2
Broad Vision How do you make a system more intelligent?
Without Domain Knowledge
With Domain Knowledge
Motivational Interviewing
3. 3
How to gain deep
understanding of the content?
Tutorial Thesis
3
agitation
nervous
panicky
>Millions
Social Media
Deep
Clustering
Neural
Parsing
Repeated
panic attacks
agitation
nervous
panicky
Repeated
panic attacks
anxiety
KG
Sleep Disorder
Circadian
Rhythm
Disorder
Context understanding
Shallow
Semantics
Deeper
Semantics
[Lin 2020, Kitaev 2018]
https://github.com/facebookresearch/deepcluster
[Gaur 2020]
4. Knowledge in computing
4
The role of knowledge in computing has long been recognized
- at least since Vannever Bush’s 1945 seminal piece: As We May Think.
Enhanced (semantic) applications such as search,
browsing, personalization, recommendation,
advertisement, and summarization.
Improve integration of data, including data of
diverse modalities and from diverse sources.
Empower/enhance ML and NLP techniques. Use
as a knowledge transfer mechanism across
domains, between humans and machines
Improve automation and support intelligent
human-like behavior and activities that may
involve conversations or question-answering and
robots.
~2000
~2025
Focus:
From small data to big data.
Data alone is not enough.
[Domingos 2012].
Knowledge will propel
machine understanding of
content. [Sheth, et al. 2017]
5. Tutorial Thesis
5
Interpretability + Traceability → Explainability
Ethics, Bias, and False Alarms
Deeper Understanding of Content including
Context Understanding
What is the right knowledge graph to use?
[Semantic, Cognitive, Perceptual Computing, Sheth 2015]
Structured and
Unstructured
Data
Models
Knowledge
Graph/Base
Compute
Application/
Workflow
David Cox Talk: Neurosymbolic AI
F. Lécué: On the Role of Knowledge Graphs in Explainable AI A Machine Learning Perspective
6. About the Tutorial
66
All About the Knowledge Graphs
Knowledge-infused Deep Learning
Knowledge-infusion: Cyber Social Threats
Knowledge-infusion: Autonomous Driving
Knowledge-infusion: DarkNet
HT-2020 Tutorial: A. Sheth, M. Gaur, U. Kursuncu, R. Wickramarachchi, & S. Yadav Knowledge-infused Deep Learning
7. All About Knowledge Graphs
Artificial Intelligence
Institute
Amit Sheth
amit@sc.edu
@amit_p
9. Definition
Knowledge Graphs (KG) is a
structured knowledge in a graph
representation (in many cases, labeled
property graph, or RDF or its variants). We
cannot escape the class
expressivity-computability
Tread-off.
Community is still debating exact definition.
Key differentiator: Relationships
(“relationships at the heart of semantics”).
Different/Related forms:
● Ontology : Knowledge graph after human
curation of entities and relations;
“ontological commitment”, richer KR
● Knowledge Base: flattened graph
● Lexicons: Small application-specific
flattened graph
● Knowledge Networks (KN) integrate
and combine knowledge (usually
captured as KGs) to serve a network
(community); could be from and service multiple
domains.
9Knowledge Graphs and Knowledge Networks: The Story in Brief
10. Expressiveness Range: Knowledge Representation and Ontologies
Catalog/ID
General
Logical
constraints
Terms/
glossary
Thesauri
“narrower
term”
relation Formal
is-a
Frames
(properties)
Informal
is-a
Formal
instance
Value Restriction
Disjointness, Inverse,
part of…
Simple Taxonomies Expressive Ontologies
Wordnet
CYC
RDF DAML
OO
DB Schema
RDFS
IEEE SUOOWL
UMLS
GO GlycOSWETO
Pharma
Ontology Dimensions After McGuinness and Finin
KEGG
TAMBIS
BioPAX
EcoCyc
12. Creation & Use of Knowledge ~2000
First commercial semantic search/browsing/… on the Web and
for the content on the Web using KG. Term used for KR:
WorldModel, Ontology
13. Proliferation Broad-based &
Domain-Specific KGs
13
Examples of General Purpose Knowledge Graphs
1. DBpedia [Auer 2007, Lehmann 2015]
2. Yago [Rebele 2016]
3. Freebase [Bollacker 2008]
4. ConceptNet [Speer 2017]
5. Knowledge Vault [Dong 2014]
6. NELL [Mitchell 2018]
7. Wikidata [Vrandečić 2014]
Example of Healthcare-specific Knowledge Graphs
1. SNOMED-CT [ACL Chang 2020]
2. Unified Medical Language System (UMLS) [Yip 2019]
3. DataMed [JAMIA Chen 2018]
4. International Classification of Diseases (ICD-10)
[JAMIA Choi 2016]
5. DrugBank, Rx-NORM and MedDRA [ BMC Celebi 2019]
6. Drug Abuse Ontology [BMI Cameron 2013]
Many are also community-developed.
14. Variety of Sources for Large-scale KG and
in Different Representation
14
Linked Open Data (LOD) Schema.org (schema.org)
Data Commons
Knowledge Graph (DCKG)
(datacommons.org)
Wikidata (wikidata.org)
https://github.com/data
commonsorg/api-python https://dumps.wikimedia.org/wikid
atawiki/entities/
https://lov4iot.appspot.com/
https://github.com/schemaorg/schem
aorg
15. Domain-specific knowledge extraction from LOD
Linked Open
Data (LOD)
Book related
information?
Filter relevant datasets
Extract relevant portion
of a data set
Project
Gutenberg
DBpedia
DBTropes
Books, Countries, Drugs
Books, movie, games
Books
Book
specific
DBpedia
Book
specific
DBTropes
Lalithsena, Sarasi, et al. "Automatic domain identification for linked open data." Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on. Vol. 1. IEEE, 2013.
Sarasi Lalithsena, Pavan Kapanipathi and Amit Sheth "Harnessing relationships for domain-specific subgraph extraction: A recommendation use case." Big Data (Big Data), 2016 IEEE International Conference on. IEEE, 2016.
16. Enterprise Knowledge Graphs are also very popular
16
KG enabled Web and Enterprise
Applications: Google, Amazon,
Microsoft, Siemens, LinkedIn,
Airbnb,
eBay, and Apple, as well as
smaller companies (e.g. ezDI,
Franz, Metaphactory/
Metaphacts, Semantic Web
Company, Mondeca, Stardog,
Diffbot, Siren).
Enterprise KG development
service is also available.
(Maana). Industry-Scale Knowledge Graphs: Lessons and Challenges (Communications of the ACM, August 2019)
17. Why Knowledge Graphs: Shortcomings of
Deep Learning (DL)
17
Trivial Case for
Classification
Text: I sometimes wonder how many alcoholics are relapsing under the lockdowns (former
alcoholic).
Question: Does the person has addiction?
Answer: Yes
Not Trivial Text: Then others that insisted that what I have is depression even though manic episodes aren't
characteristic to depression. I dread having to retread all this again because the clinic where I get
my mental health addressed is closing down due to loss in business caused by the pandemic
Question: Does the person suffer from depression?
Answer: Yes Correct: No
Disjunctive
Questions
Question: Are you feeling nervous or anxious or on edge?
Question: Is the feeling of restlessness due to stress or anxiety ?
Questions: Does an employee own a company or work for a company?
Research in this directions: Query2Box and Multi-hop Reasoning [Ren ICLR 2020, Lin EMNLP 2018]
Covid context
Generic context
Bottom line: Most state of the art Deep learning approaches are not integrated
with prior knowledge. This tutorial is about strategies for doing so.
[David Cox] [Marcus 2018]
https://www.digitaltrends.com/cool-tec
h/neuro-symbolic-ai-the-future/
18. Why Knowledge Graphs: Shortcomings of DL
● Graph Convolutional Neural Networks (GCN) are blind to relation types. For example:
<shelter-in-place causes anxiety> and <shelter-in-place prevents anxiety> have similar
representations in GCN.
● Deep Clustering over unlabeled data exploits the inherent latent semantics to generate
diverse and cohesive clusters. But, interpretability of the clusters requires Knowledge Graphs.
18
ODKG: Opioid
Drug Knowledge
Graph
[Kamdar 2019]
19. Why Knowledge Graphs : NLP/NLU Challenges
19
● Natural Language Processing Challenges:
○ How do you learn quickly from small amount of data?
○ How do you mine (varied) relationships from existing text?
○ How do you reliably classify entities into known ontology?
○ Better contextualization of words
● Natural Language Understanding Challenges:
○ Query Interpretation or Understanding the user question
○ Answering the question with Trust and Transparency
○ How to measure “reasonability” and “meaningfulness” of the response to
a question?
○ How much context is needed to provide a precise response?
[Stanford Knowledge Graph Seminar 2020, Amit Prakash , Leilani Gilpin]
20. 20
[Image from Talukdar]
KG in Conversational AI
● Get same/similar
answers
based on trusted
knowledge
● Personalization
● Contextualization
21. Personalization: taking into account
the contextual factors such as
user’s health history, physical
characteristics, environmental
factors, activity, and lifestyle.
Chatbot with contextualized (e.g asthma) knowledge is
potentially more personalized and engaging.
Without
Contextualized Personalization
With
Contextualized Personalization
KG in Conversational AI
22. 22
How do we use Knowledge Graphs?
Health
Knowledge
Graph
[Shah and Sheth
US patent 2015]
24. Semantic
Proximity
GBV Index
GBV estimation
for 14 days
GBV Lexicon from
Tweets on bullying,
abuse. Domestic
violence, etc.
Mapping words to
categories for
expansion of lexicon
Generic Knowledge
Graph of Wikipedia
Aligning the lexicon
words and new
entities with respect
to DBpedia
Categories
Enriched Lexicon for
gathering abstract meaning
of GBV in tweets
Calculating cosine similarity
between two vectors (GBV
and Tweets) and setting
empirical threshold on
semantic proximity
Mental Health Tweets From
March 14-April 04
Analyzing Gender-based Violence (GBV) in Mental Health COVID-19 Twitter Conversation
How do we use Knowledge Graphs?
Maximum A Posteriori
Estimation (MAP)
Purohit, Hemant, Tanvi Banerjee, Andrew Hampton, Valerie L. Shalin, Nayanesh Bhandutia, and Amit P. Sheth.
"Gender-based violence in 140 characters or fewer: A# BigData case study of Twitter." arXiv preprint arXiv:1503.02086
(2015).
Psychidemic
https://www.youtube.com/watch?
v=XzYrn0PEzNk
25. Assessing Mental Health Impact of COVID using News Articles
How do we use Knowledge Graphs?
https://theconversation.com/were-measuring-online-conversation-to-track-the-social-and-mental-health-issues-surfacing-during-the-coronavirus-pandemic-135417
Multilingual KG
http://conceptnet.io/
GDelt Database
https://www.gdeltproject.org/
26. 26
Understanding City Traffic Events: Role of KG in
analyzing multimodal data
Anantharam, Pramod, Payam Barnaghi, Krishnaprasad Thirunarayan, and Amit Sheth. "Extracting city traffic events from social
streams." ACM Transactions on Intelligent Systems and Technology (TIST) 6, no. 4 (2015): 1-27.
28. 28
PREDOSE
Cameron, Delroy, Gary A. Smith, Raminta Daniulaityte, Amit P. Sheth, Drashti Dave, Lu Chen, Gaurish Anand, Robert Carlson, Kera Z. Watkins, and Russel Falck.
"PREDOSE: a semantic web platform for drug abuse epidemiology using social media." Journal of biomedical informatics 46, no. 6 (2013): 985-997.
29. 29
Knowledge Graph in Education
Educational
Knowledge Base
Enable expert system to answer questions
like:
1. What to study when there is less
time?
2. How to set a good question paper?
3. How to cover-up learning gaps from
previous years?
4. How to connect 8th grade with 10th
grade science?
Mentor Intelligence
mimics teacher thinking
Intelligent Content Authoring
and Curation
1. Granularity
2. Personalization (student or institution)
3. Robustness
4. Interventional: diagnosis and remedy
30. 30
Named Entity Recognition Relationship Extraction Entity Linking Implicit information extraction
Implicit Entity Linking using KG and Conditional Random Fields
Perera, Sujan, Pablo N. Mendes, Adarsh Alex, Amit P. Sheth, and Krishnaprasad Thirunarayan. "Implicit entity linking in tweets." In
European Semantic Web Conference, pp. 118-132. Springer, Cham, 2016.
31. 31
Experiences or
Factual Knowledge
Abstract Knowledge
1. Continuum of Knowledge
2. relationship mapping:
NLU through and knowledge
transfer across domains
Analogical Generalization
Applicable to new situation via
analogy
[Forbus and Gentner 1997, Gentner
and Medina 1998]
Mapping between Enzyme Kinetics
and Musical Chairs [Ongoing]
Mapping between two Conceptual Frames in
similar domain (Physics)
32. ML and Knowledge Graphs: Pipeline
32
Knowledge
Extraction
Knowledge
Alignment
Knowledge
Cleaning
Knowledge Mining &
Knowledge-based QA
Data Extraction
(NLP, Web)
Wrapper Induction
(DB, DM-Data
Mining)
Web Tables (DB)
Text Mining (DM)
Entity and
Relationship Linking
[Perera 2016]
Schema Mapping
and Ontology
Mapping
[Jain 2010]
Universal Schema
[Sheth 1990]
Data Cleaning
[Jadhav 2016]
Anomaly Detection
[Anantharam 2012,
2016]
Knowledge Fusion
[Sheth 2020,
Kapanipathi 2020,
Gaur 2018,
Kursuncu 2020]
Graph Mining [Lalithsena
2016, 2017, 2018]
Knowledge Embedding
[Wickramarachchi 2020,
Gaur 2018]
Search [Sheth 2003,
Cheekula 2015, Kho
2019]
QA [Alambo 2019,
Shekarpour 2017]
[Stanford Knowledge Graph Seminar 2020, Luna Dong]
33. 33
More Applications and Domains that use KG+DL
Pharmacy
[Futia 2020,
Gentile 2019]
Personalized
mHealth
[ Sheth 2017, 2018a,
2018b, 2019]
Public Health
[Yazdavar 2017,
Gaur 2018,
Daniulaityte 2016]
Question Answering/
Dialog System
[Alambo 2019,
Shekarpour 2017, 2015 ]
Hypothesis Generation to find
association between Stress
and Colorectal cancer
[ Cameron 2015]
Chatbot with contextualized (e.g
asthma) knowledge is potentially more
personalized and engaging.
35. 35
What is Knowledge infusion? Why do we need it?
What is Knowledge-infused Learning?
What are the different types of Knowledge-infused Learning?
How can Knowledge-infused Learning provide solutions to
complex problems:
Unstructured Healthcare on Social Media
Radicalization on Social Media
Autonomous Driving Vehicles
Drug Trafficking in Cryptomarkets
Questions we address next
36. 36
Vision: KG as Glue in Developing Hybrid AI Systems
STATISTICAL AI
CONNECTIONIST
“Unreasonable effectiveness of big data”
in machine processing &
powering bottom up processing
“Unreasonable effectiveness of small data”
in human decision making - can this be
emulated to power top down processing?
SYMBOLIC AI
FORMAL
KG will play an increasing role in developing hybrid neuro-symbolic systems (that is bottom-up
deep learning with top-down symbolic computing) as well as in building explainable AI systems
for which KGs will provide scaffolding for punctuating neural computing.
Cognitive Science Analogy: Combining Top Brain - Bottom Brain Processes.
38. 38
● Ambiguous online healthcare communications and difficult to
engineer discriminative features.
● Domain-specific embedding models provide a shallow infusion of knowledge.
● Decrease the dependence on large datasets
● Reduce bias in the dataset (ie: potentially avoid social discrimination and
unfair treatment)
● Provide information provenance: Allowing explainability of a model
● Improve information coverage specific to a domain that would be missed
otherwise
● Reduce time and space complexity of the models architecture
● Improve models sensitivity and specificity
● Explainability
Why Knowledge-infused Deep Learning ?
39. 39
Deep NLP Requires Background Knowledge
An excessive endogenous or exogenous
stimulations by estrogen induces adenomatous
hyperplasia of the endometrium
● adenomatous modifies hyperplasia
● An excessive endogenous or exogenous
stimulations modifies estrogen
● “adenomatous hyperplasia” and
“endometrium” occurs as “adenomatous
hyperplasia of the endometrium”
MeSH
Terms in
PubMed
Articles
[Ramakrishnan 2008]
[Gaur 2018, 2019, Limsopatham 2016 ]
40. 40
Gkotsis, George, Anika Oellrich, Tim Hubbard, Richard Dobson, Maria Liakata, Sumithra Velupillai, and Rina Dutta. "The language of mental health problems in social media." In
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology, 2016.
NLP Requires Background Knowledge
41. 41
How do you know that a training set has a good domain coverage?
How do ensure consistency of labeling, esp when label is not binary?
Do labels represent adequate semantics (e.g., number of alternatives)?
Do they have adequate domain knowledge?
How do you ensure consistency of labeling (interpretation)?
Questions
44. 44
SEMANTIC &
KNOWLEDGE GRAPH
Top-down symbolic
approach (concepts, rules)
in data reasoning,
inferencing, and deduction.
MACHINE/ DEEP
LEARNING
Bottom-up statistical
approach in searching,
analyzing and deriving
insights from Big Data.
In Abstract Sense
45. 45
Theoretically Why KiDL: Probably Approximately Correct Learning
Valiant, Leslie G. "Robust logics." Artificial Intelligence 117.2 (2000): 231-253.
46. 46
Theoretically Why KiDL: Probably Approximately Correct Learning
How do you know that a training set has a
good domain coverage?
Robust Classifier → Low Generalizability Error
Consistent Classifier → Low Training Error
Confidence: More Certainty (lower δ)
means more number of samples.
Complexity: More complicated
hypothesis (|H|) means more number
of samples
47. 47
K-IL: “The exploitation of domain knowledge and application semantics
to enhance existing deep learning methods by infusing relevant conceptual
information into a statistical, data-driven computational approach
(Neuro-Symbolic AI).”
A. Sheth, M. Gaur, U. Kursuncu and R. Wickramarachchi, "Shades of Knowledge-Infused Learning for Enhancing Deep Learning," in IEEE
Internet Computing, vol. 23, no. 6, pp. 54-63, 1 Nov.-Dec. 2019, doi: 10.1109/MIC.2019.2960071.
48. 48
of knowledge graphs to
improve the semantic
and conceptual
processing of data.
SEMI-DEEP Infusion
Deeper and congruent
incorporation or
integration of the
knowledge graphs in the
learning techniques.
DEEP Infusion
(Part of Future KG Strategy)
combines statistical AI
(bottom-up) and symbolic AI
learning techniques
(top-down) for hybrid and
integrated intelligent systems.
SHALLOW Infusion
Taxonomy of Knowledge Infusion
49. 49
Shallow external knowledge is described as those form of information which are extracted from
text based on some heuristics, often designed for task-specific problems:
○ Bag of Words/Phrases from Corpus [Hagoort 2004, Zhang 2019, Sun 2019]
○ Bag of Words/Phrases from Semantic Lexicons [Faruqui 2014, Mrkšić 2016]
○ Count of Nouns, Pronouns, Verbs [Gkotsis 2017, 2016]
○ Sentiment and Emotions of the sentence [Gaur 2019, Vedula 2017, Kursuncu 2019]
○ Latent topics describing the documents [Jiang 2016, Li 2016, Meng 2020]
○ Label assignment to words or phrases in sentence (Semantic Role Labeling):
Shallow Infusion
Mary sold the book to John
Agent ThemePredicate Recipient
50. 50
Knowledge: Domain
specific large corpora
Knowledge: pre-trained
embeddings + semantic
lexicons
Knowledge: Domain
specific large corpora
Word2Vec Retrofitting BERT
“Context is represented by a
set of words for a given target
word”
“Learned embeddings are further
enriched by using semantic
lexicons”
“Uses language modeling
objective to learn the
contextual representations”
Examples of Shallow Infusion
52. 52
Shallow Infusion: Retrofitting Example
52
damage
Infrastructure
affected
population
damage
Infrastructure
affected
population
Vector representation of words in
Tweets before retrofitting
Vector representation of words in
Tweets after retrofitting
MOAC Ontology
Empathi ontology
Disaster Ontology
DBpedia
Gaur, Manas, et al. "empathi: An ontology for Emergency Managing and Planning about Hazard Crisis." 2019 IEEE 13th International Conference on Semantic
Computing (ICSC). IEEE, 2019.
53. 53
SuicideWatch Subreddit
(93K Users)
NYC CDRN EHR (123K patients) Data specific to
Mental Health
Medical Knowledge Bases
We identified self-harm, depressive feelings, and suicide ideations as
latent topics expressed in Reddit and EHR data.
Both sources did not provide evidence of mentions or expressions of
impulsivity, family violence, and drug abuse.
Shallow Infusion: Association between Social Media
and EHR in Suicide-related Communications
[Gaur, Psychiatry Under Review 2020]
54. 54
Semi-Deep Infusion
In semi-deep infusion, external knowledge is
involved through attention mechanism or
learnable knowledge constraints acting as a
sentinel to guide model learning.
➢ External Knowledge through Attention
➢ External Knowledge through Learnable Constraints
55. 55
Tacit
Knowledge
Self-aware or
External Knowledge
Similarity
based
verification
Semi-Deep Infusion
Dataset
Deep Learning
Model
Dataset
enrich
Deep Learning
Model
Tacit
Knowledge
Hypothesis testing
or similarity-based
verification
Shallow Infusion
Self-aware or
External Knowledge
Comparing Semi-Deep Infusion with Shallow Infusion
Sheth, Amit, Manas Gaur, Ugur Kursuncu, and Ruwan Wickramarachchi. "Shades of Knowledge-Infused Learning for
Enhancing Deep Learning." IEEE Internet Computing 23, no. 6 (2019): 54-63.
56. 56
A neural attention mechanism equips a neural network with the ability to focus on a subset of its
inputs (or features):
○ Hard Attention or Position specific attention : location of important entities and
relationship in the text are hard-coded in the model. Thus allowing efficiency in feature
engineering, however, the model suffer from exposure bias.
○ Soft Attention: The model learns to attend to specific parts of the text while
generating the word describing that part (following distributional semantics).
○ Attention with Knowledge base: background knowledge is integrated using an
attention mechanism, which decide whether to attend to background knowledge and
which information from KBs is useful.
External Knowledge through Attention
57. 57
● Learnable constraints are empirical thresholds (probabilistic value) learnt by the
model which allows it to adaptively learn.
● It can be done in following ways:
○ Learning based on pre-structured axiomatic rules - axiomatic knowledge
○ Learning based on difference in content similarity - KL Divergence,
Cross-entropy loss
○ Learning based on commonsense knowledge - ConceptNet
○ Learning over different permutations of text generated through synonyms,
antonyms, and homonyms.
External Knowledge through Learnable Constraints
59. 59
_______ meant to ______ not to ______
Template: fill in the blanks
It was meant to dazzle not to make sense
Target:
Generative
Model
It was meant to dazzle not to make it
Infilling Content
Matching through
averaged KL
Divergence
Learnable knowledge
constraint module
Learnable Constraints
Hu, Zhiting, Zichao Yang, Russ R. Salakhutdinov, L. I. A. N. H. U. I. Qin, Xiaodan Liang, Haoye Dong, and Eric P. Xing. "Deep generative
models with learnable knowledge constraints." In Advances in Neural Information Processing Systems, pp. 10501-10512. 2018.
Replace the sentence
with KG or Resource
60. Semi-Deep Infusion : KG GANs
Generative Adversarial Network*
*Chang, Che-Han, Chun-Hsien Yu, Szu-Ying Chen, and Edward Y. Chang. "KG-GAN: Knowledge-Guided Generative Adversarial Networks."
arXiv preprint arXiv:1905.12261 (2019).
Seen Category
Data
UnSeen
Category Data
Generator
(G1
)
Generator
(G2
)
Z1
Z2
Real Data
Fake Data
(G1
)
Fake Data
(G2
)
Discriminator
(D)
Embedding
Regression
Network
Semantic
Embedding of
Unseen
Category
Prediction
(G2
)
Prediction
(G1
)
≅
Parameter
Sharing
Loss
(G1
)
Loss
(G2
)
Real or Fake
Objective
Function
61. Variants:
1. Knowledge base at each LSTM cell [1].
2. K-IL layer [2]:
a. 1D Convolutional Neural Network for mixing
b. Graph Convolutional Neural Network -- When
hierarchical structure of KG is important and
need to be preserved in representation.
c. Simple Multi-layer Perceptron.
[1] Yang, Bishan, and Tom Mitchell. Leveraging knowledge bases in lstms for improving machine reading. arXiv preprint arXiv:1902.09091 (2019).
[2] Ugur Kursuncu, Manas Gaur, and Amit Sheth. Knowledge Infused Learning (K-IL): Towards Deep Incorporation of Knowledge in Deep Learning. Proceedings of the AAAI 2020 Spring Symposium on
Combining Machine Learning and Knowledge Engineering in Practice (AAAI-MAKE 2020).
Semi-Deep Infusion : LSTMs
62. 62
Deep Infusion (Vision)
Ugur Kursuncu, Manas Gaur, and Amit Sheth. Knowledge Infused Learning (K-IL): Towards Deep Incorporation of Knowledge in Deep Learning. Proceedings of the AAAI 2020 Spring
Symposium on Combining Machine Learning and Knowledge Engineering in Practice (AAAI-MAKE 2020).
63. K-IL : Objective Functions and Evaluation
Kullback Leibler Divergence
● Measures the Information loss during
the learning phase between
Latent/hidden states and KGs
● KG Embeddings: TransE, HoIE etc.
● Models: Variational Autoencoders,
LSTMs, GANs, Siamese Neural
Networks
● Frameworks: Zero Shot Learning ,
One Shot Learning, Transfer Learning,
Parameter Sharing
● Other Variants: Jensen Divergence,
Regularization, Integer Linear
Programming
Kosheleva, Olga, and Vladik Kreinovich. "Why deep learning methods use KL divergence instead of least squares: a possible pedagogical explanation." Математические структуры и
моделирование 2 (46) (2018).
Evaluation: Before and After
Knowledge-infusion
Methods (Apart from Precision, Recall, F1-score):
● Frechet Inception Distance : measure of similarity
between two datasets (KG & Training Data)
● Statistical Significance Hypothesis Testing
● Word and Concept Features
● T-SNE Visualization of Clusters
● Area under perturbation curve:
Feature Ranking
● Human-centric evaluation: Crowdsourcing,
User Satisfaction, Mental Model, Trust
Assessment, Correctability
OF
EV
http://www-sop.inria.fr/members/Freddy.Lecue/presentation/ISWC2019-FreddyLecue-Thales-OnTheRoleOfK
nowledgeGraphsInExplainableAI.pdf
67. 67
BERT
Abstractive Summarization
using Integer Linear
Programming (ILP)
Abstractive Summarization
using ILP and PHQ-9
Statistical Statistical + Constraints
Statistical + Constraints
+ Knowledge
Manas G, Aribandi V, Kursuncu U, Alambo A, Shalin VL, Thirunarayan K, Beich J, Narasimhan M, Sheth A Knowledge-infused Abstractive Summarization of Clinical Diagnostic
Interviews , JMIR Preprints. 30/05/2020:20865 DOI: 10.2196/preprints.20865 URL: https://preprints.jmir.org/preprint/20865
Knowledge Infusion: Abstractive Summarization
of Clinical Diagnostic Interviews
68. 6868
Really struggling with my bisexuality which is causing chaos in my relationship with a girl. Being
a fan of LGBTQ community, I am equal to worthless for her. I’m now starting to get drunk
because I can’t cope with the obsessive, intrusive thoughts, and need to get out of my head.
BPD
DICD PND SAD SBI OCD
Don’t want to live anymore. Sexually assault, ignorant family members and my never
ending loneliness brights up my path to death.
SCW
PND SBI SAD DPR DICD
DPR
I do have a potential to live a decent life but not with people who abandon me.
Hopelessness and feelings of betrayal have turned my nights to days. I am developing
insomnia because of my restlessness.
SBI DPR DICD
BPD
I just can’t take it anymore. Been abandoned yet again by someone I cared about. I've been
diagnosed with borderline for a while, and I’m just going to isolate myself and sleep forever.
SBI PND
Linking Reddit to DSM-5 : Web-based Intervention
Reddit DSM-5 [Gaur 2018]
69. 6969
Mapping to SNOMED Concept Illustration
Really struggling with my bisexuality which is
causing chaos in my relationship with a girl.
Being a fan of LGBTQ community, I am equal to
worthless for her. I’m now starting to get drunk
because I can’t cope with the obsessive,
intrusive thoughts, and need to get out of my
head.
288291000119102: High risk bisexual behavior
365949003: Health-related behavior finding 365949003: Health-related behavior finding
307077003: Feeling hopeless
365107007: level of mood
225445003: Intrusive thoughts
55956009: Disturbance in content of thought
26628009: Disturbance in thinking
1376001: Obsessive compulsive personality disorder
70. 7070
Mapping Reddit to DSM-5
Medical Knowledge Bases
N-grams
(n=1, 2, 3)
LDA
LDA over
Bi-grams
Normalized
Hit
Score
DSM-5
Lexicon
<Reddit Post>
<Subreddit Label>
Input
<Reddit Post>
<DSM-5 Label>
Output
DAO
Drug Abuse
Ontology
71. 71
Mapping Reddit to DSM-5
http://www.papersfromsidcup.com/graham-daveys-blog/changes-in-dsm-5
72. 7272
Reddit to DSM-5
Task
I know you want me to say no and that it is a
part of me blah blah blah. But I can't. Honestly,
not having bipolar disorder would be a huge
blessing. I would be so much happier and
could control my life better. I wouldn't have
frantic, scattered thoughts and depression. I
would be normal, happy, and less dramatic.
Bipolar Subreddit
DSM-5: Depressive Disorder
I know you want me to say no and that it is a
part of me blah blah blah. But I can't. Honestly,
not having bipolar disorder would be a huge
blessing. I would be so much happier and
could control my life better. I wouldn't have
frantic, scattered thoughts and depression. I
would be normal, happy, and less dramatic.
BiPolar
Depression
Disorder
Subreddits DSM-5
Chapter
BiPolarReddit
BiPolarSOS
Depression
Addiction
Substance use &
Addictive Disorder
Crippling Alcoholism
Opiates Recovery
Opiates
Self-Harm
Stop Self-Harm
73. 7373
Semantic Encoding and Decoding Optimization
12808
Words
300 dimension embedding 300 dimension embedding
20 DSM-5
Categories
R
D
Reddit Word
Embedding Model
DSM-5 -DAO
Lexicon
W
Solvable Sylvester Equation
74. 74
Semantic Encoding and Decoding Optimization
Encoding DSM-5 to Reddit embedding space
Decoding Reddit to DSM-5 embedding space
76. 76
Method (with HLF, VLF, and FGF) Precision Recall F1-Score
BRF- Contextual Features (CF) 0.60 0.54 0.57
BRF - CF (SEDO Weights generated from DSM-5
Lexicon without DAO)
0.87 0.77 0.82
BRF - CF (SEDO Weights generated from DSM-5
Lexicon with DAO without Slang Terms)
0.87 0.80 0.83
BRF - CF (SEDO Weights generated from DSM-5
Lexicon without DAO with Slang Terms)
0.85 0.82 0.83
BRF- CF (SEDO Weights generated from DSM-5
Lexicon with DAO and Slang Terms)
0.88 0.83 0.85
Outcome
Model and Annotator Agreement: 84%
77. Mapping Social Media to EHR using KG
77
TwADR
AskaPatient
Drug Abuse
Ontology
DSM-5 Lexicon
Suicide Risk
Severity Lexicon
Treatment Information
Observation and
Drug-related
Information
Mental Health Condition
Suicide Risk Levels
Ideation
Behavior
Attempt
79. Resources
TwADR and
AskaPatient
Lexicon
https://zenodo.org/record/55013#.XsYEH8YpBQI
Ref: Limsopatham, Nut, and Nigel Collier. "Normalising medical concepts in social media texts by learning
semantic representation." Association for Computational Linguistics, 2016.
Suicide-Risk
Severity
Lexicon
https://bit.ly/SRS_lexicon
Ref: Gaur, Manas, Amanuel Alambo, Joy Prakash Sain, Ugur Kurşuncu, Krishnaprasad Thirunarayan, Ramakanth
Kavuluru, Amit Sheth, Randy Welton, and Jyotishman Pathak. "Knowledge-aware assessment of severity of suicide
risk for early intervention." In The World Wide Web Conference, 2019.
DSM-5 and Drug
Abuse Ontology
Lexicon
https://bit.ly/DSM5_DAO
Ref: Gaur, Manas, Ugur Kurşuncu, Amanuel Alambo, Amit Sheth, Raminta Daniulaityte, Krishnaprasad Thirunarayan,
and Jyotishman Pathak. "" Let Me Tell You About Your Mental Health!" Contextualized Classification of Reddit Posts
to DSM-5 for Web-based Intervention." In Proceedings of the 27th ACM International Conference on Information and
Knowledge Management, 2018.
Suicide Risk
Severity Dataset
(Reddit)
https://zenodo.org/record/2667859#.XsYH7MYpBQI
Ref: Gaur, Manas, Amanuel Alambo, Joy Prakash Sain, Ugur Kurşuncu, Krishnaprasad Thirunarayan, Ramakanth
Kavuluru, Amit Sheth, Randy Welton, and Jyotishman Pathak. "Knowledge-aware assessment of severity of suicide
risk for early intervention." In The World Wide Web Conference, 2019.
80. Other Works: Not Covered
80
Manas Gaur, Amanuel Alambo, Joy Prakash Sain, Ugur Kursuncu, Krishnaprasad Thirunarayan, Ramakanth
Kavuluru, Amit Sheth, Randy Welton, and Jyotishman Pathak.
Knowledge-aware assessment of severity of suicide risk for early intervention. In WWW 2019
Manas Gaur, Vamsi Aribandi, Amanuel Alambo, Ugur Kursuncu, Krishnaprasad Thirunarayan, Jonathan Beich,
Jyotishman Pathak, and Amit Sheth
Characterization of Time-variant and Time-invariant Assessment of Suicidality on Reddit using C-SSRS
Under Review in Nature Scientific Reports
Manas Gaur, Aditya Sharma, Ugur Kursuncu, Valerie L. Shalin and Amit Sheth
Knowledge-Guided Convolutional Autoencoder Clustering for Associating Support Seeker and Support
Providers in Online Mental Health Communities
Amanuel Alambo and Krishnaprasad Thirunarayan
Depressive, Drug Abusive, or Informative: Knowledge-aware Study of News Exposure during COVID-19
Outbreak. In ACM KDD KiML Workshop 2020
81. References
● Manas Gaur, Ugur Kursuncu, Amanuel Alambo, Amit Sheth, Raminta Daniulaityte, Krishnaprasad Thirunarayan, and Jyotishman Pathak. "" Let Me Tell
You About Your Mental Health!" Contextualized Classification of Reddit Posts to DSM-5 for Web-based Intervention." CIKM, 2018.
● Manas Gaur*, Chidubem Arachie*, Sam Anzaroot, William Groves, Ke Zhang, and Alejandro Jaimes. "Unsupervised Detection of Sub-Events in Large
Scale Disasters." AAAI 2020.
● Manas Gaur, Amanuel Alambo, Joy Prakash Sain, Ugur Kursuncu, Krishnaprasad Thirunarayan, Ramakanth Kavuluru, Amit Sheth, Randy Welton, and
Jyotishman Pathak. "Knowledge-aware assessment of severity of suicide risk for early intervention." WWW 2019.
● Manas Gaur, Saeedeh Shekarpour, Amelie Gyrard, and Amit Sheth. "empathi: An ontology for emergency managing and planning about hazard crisis."
ICSC, 2019.
● Gyrard, Amelia, Manas Gaur, Saeedeh Shekarpour, Krishnaprasad Thirunarayan, and Amit Sheth. "Personalized health knowledge graph." ISWC 2018.
● Amit Sheth, Manas Gaur, Ugur Kursuncu, and Ruwan Wickramarachchi. "Shades of knowledge-infused learning for enhancing deep learning." IEEE
Internet Computing 2019.
● Shreyansh Bhatt, Manas Gaur, Beth Bullemer, Valerie Shalin, Amit Sheth, and Brandon Minnery. "Enhancing crowd wisdom using explainable diversity
inferred from social media." Web Intelligence 2018.
● Kursuncu, Ugur, Manas Gaur, and Amit Sheth. "Knowledge Infused Learning (K-IL): Towards Deep Incorporation of Knowledge in Deep Learning.", AAAI
Spring Symposium 2020.
● Williams, Ronald J., and David Zipser. "A learning algorithm for continually running fully recurrent neural networks." Neural computation, 1989.
● Lamb, Alex M., Anirudh Goyal Alias Parth Goyal, Ying Zhang, Saizheng Zhang, Aaron C. Courville, and Yoshua Bengio. "Professor forcing: A new
algorithm for training recurrent networks." NIPS 2016.
● Yang, Bishan, and Tom Mitchell. "Leveraging Knowledge Bases in LSTMs for Improving Machine Reading." ACL 2017.
● Hu, Zhiting, Zichao Yang, Russ R. Salakhutdinov, L. I. A. N. H. U. I. Qin, Xiaodan Liang, Haoye Dong, and Eric P. Xing. "Deep generative models with
learnable knowledge constraints.” NIPS 2018.
81
84. Critical Points on Cyber Social Threats
● Context in social media
conversations is fluid and shades of
gray.
● False alarms in the models
developed and deployed.
● Ethical considerations and
consequences. Bias and
transparency. Implications on the
mass population.
● The role of knowledge on improving
the model in these critical points.
84
Photo: @budhelisson Unsplash.com
85. Online Extremism - Ongoing Open Problem
85
● Efforts by online platforms are
inadequate.
● Governments insist that the
industry has a ‘social
responsibility’ to do more to
remove harmful content.
● If unsolved, social media
platforms will continue to
negatively impact the society.
87. (e.g., recruiter, follower) with
respect to different stages of
radicalization.
Modeling users
content and psychological
process over time.
Persuasive
relevant to Islamist
extremism.
Domain Knowledge
of the context (“jihad” has
different meaning in different
context)
Multidimensionality
Radicalization
Challenges & Potential Solutions
88. 88
0
None
Mainstream
religious views
and
orientations
Indicator:
Islam; Allah;
jihad (self
struggle); halal;
democracy,
islam, salah,
fatwa, hajj.
1
Low
Attitudinal support
for politically
moderate
Islamism
Indicator:
Hadith; Caliphate
(Khilafah)
justified;
Sharia better
(than secular
law);
Hypocrisy west.
2
Elevated
Emergent
support for
exclusive rule of
the Shari’a law
Indicator:
Shariah best;
revenge
(justified); jihad
(against West);
justify Daesh
(ISIS)
3
High
Support for
extremist networks
and travel to “Darul
Islam”
Indicator:
Kafir; infidel;
hijrah to
Darul-Islam;
(supporting)
fatwa
Al-Awlaki;
mushrikeen.
4
Severe
Call for action
to join the fight
and the use of
violence.
Indicator:
apostate;
sahwat; taghut;
kill; kafir; kuffar;
murtadd;
tawaghit;
al_baghdadi;
martyrdom
khilafah
Radicalization Scale (Dilshod Achilov et al.)
89. 89
Analysis of content in context can provide deeper understanding
of the factors characterizing the radicalization process.
Non-extremist
ordinary
individual
Radicalized
extremist
individual
0 1 2 4
SevereHighLowNone Elevated
3
Radicalization Process over time
90. Cautionary Note
90
Specifically, unfair
classification of non-extremist
individuals as extremist.
False alarm might potentially
impact millions of innocent
people.
Local and Global security implications,
Need for reliable and fair of predicting
online terrorist activities.
91. ● Verified and suspended by Twitter.
● Time frame: Oct 2010 – Aug 2017
● Includes 538 extremist users, from two resources. (Fernandez, 2018) (Ferrara,
2016)
○ Twitter verified users by anti-abuse team.
○ Lucky Troll Club
● 538 Non-extremist users were created from an annotated muslim religious
dataset that contains Muslim users. (Chen, 2014)
-Miriam Fernandez, Moizzah Asif, and Harith Alani. 2018. Understanding the roots of radicalisation on twitter. In Proceedings of the 10th ACM Conference on Web
Science.
-Emilio Ferrara, Wen-Qiang Wang, Onur Varol, Alessandro Flammini, and Aram Galstyan. 2016. Predicting online extremism, content adopters, and interaction reciprocity.
In International conference on social informatics.
-Chen, L., Weber, I., & Okulicz-Kozaryn, A. (2014, November). US religious landscape on Twitter. In International Conference on Social Informatics (pp. 544-560). Springer,
Cham.
Dataset
93. 93
● Dimensions to define the context:
○ Based on literature and our empirical study of the data,
three contextual dimensions are identified:
Religion, Ideology, Hate
● The distribution of prevalent terms (i.e., words, phrases,
concepts) in each dimension is different.
● Different dimensions needed to contextualize and
disambiguate common ‘diagnostic’ terms (e.g., jihad).
Multidimensionality of Extremist Content
94. 94
“Reportedly, a number of
apostates were killed in
the process. Just
because they like it I
guess.. #SpringJihad
#CountrysideCleanup”
“Kindness is a language
which the blind can see
and the deaf can hear
#MyJihad be kind
always”
“By the Lord of Muhammad (blessings and peace be upon him)
The nation of Jihad and martyrdom can never be defeated”
“Jihad” can appear in tweets with different meanings in different dimensions of
the context.
H
I
R
Example Tweets with “Jihad”
95. 95
● Same term can have different
meanings for each dimensions.
● Example:
“Meaning of Jihad” is different
for extremists and
non-extremists.
○ For extremists, meaning closer to
“awlaki”, “islamic state”, “aqeedah”
○ For non-extremists, closer to
“muslims”, “quran”, “imams”
ExtremistsNon-Extremists
Ambiguity of Diagnostic terms/phrases
96. Contextual Dimension Modelling
96
● Different Contextual Dimensions
incorporating:
○ Knowledge Graphs
○ Dimension Corpora
● Utilization of Machine/Deep Learning
models, generate knowledge-enhanced
representations
● Resources for Dimensions:
Religion: Qur’an, Hadith
Ideology: Books, lectures of ideologues
Hate: Hate Speech Corpus (Davidson et
al. 2017)
● Can be applied over many social
problems.
Modeling
Modeling
Modeling
Dimension 1
Dimension 2
Dimension 3
DimensionDimensionDimension
Dimension Modeling
Process
Dimension based
Knowledge
enhanced
Representation
97. (Hate)
Using a Knowledge Graph
“You shall know the word by the company it keeps” - J.R. Firth (1957:11)
97
Capturing similarity:
● Learning word similarities from a substantial knowledge graph
● A solution via distance between concepts in the knowledge graph.
Modeling
98. (Hate)
Using a Corpus
“You shall know the word by the company it keeps” - J.R. Firth (1957:11)
Capturing similarity (and resolving ambiguity):
● Learning word similarities from a large corpora.
● A solution via distributional similarity-based representations.
98
Modeling
99. ● For religion:
Extremist and non-extremist users are significantly similar to each other.
● For hate:
Extremist and non-extremist users do not show much similarity.
Religion Ideology
NonExtremists
Extremists
99
Religion Ideology Hate
User Similarity
100. ● For religion and hate, among extremists:
There seems to be a number of users that are significantly different from
each other.
● Possibility of outliers.
Extremists
Extremists
100
Religion Ideology Hate
User Similarity
101. ● A group of extremist users, form a cluster farther from other users for
Religion and Hate.
● Suggesting there might be outliers in the dataset.
101
User Visualization for Dimensions
102. ● Randomly selected 10 users and visualize for each dimension.
● Repeated this selection many times, every time same users formed a
separate cluster. In this case below, the users are D, A.
102
Random 10 Users
User Visualization for Dimensions
103. ● Identified 99 (18%), 48 (9%) and 141 (26%)
users in the extremist dataset, clustered as
likely outliers for religion, ideology and hate,
respectively.
● A random sample of 76 users (15% ) from the
extremist dataset, to validate the identified
potential likely outliers.
● Our domain expert annotated these users as
likely extremist, likely extremist and unclear.
Kappa Score = 82%
Separation of users within the extremist dataset
through clustering
Mann-Whitney U-test
Outlier Detection
104. ● Obtained the set of 49 outlier users in
the extremist dataset. Rest is labeled
as likely extremists
● Content of the outlier users contains
the following prevalent concepts:
marriage, Allah, bonded, silence, Islam
leaders, Berjaya hilarious, cake, miss
mit, kemaren, Quran, Khuda, prophet,
Muhammad, Ahmad.
Separation of users within the extremist dataset
through clustering
Outliers
105. Results
105
● Tri-dimension model
performs best.
● Precision used as metric, to
emphasize reduction on
misclassification of
non-extremist content.
● Implications in a large scale
application.
106. ● Domain Specific Knowledge plays critical role and importance of ground
truth for such complex problems.
● False alarms: significantly reduced via incorporation of three domain
specific dimensions. It further reduces the likelihood of an unfair
mistreatment towards non-extremist individuals, in a potential real world
deployment.
● Misclassification of non-extremist users can have significant implications
in a large-scale application where non-extremists vastly outnumber
extremists.
● Higher precision reduces potential social discrimination. 106
Key Insights
107. ● Extremist users employ religion along with hate,
suggesting they employ different hate tactics for their
targets.
● Each dimension plays different roles in different levels of
radicalization, capturing nuances as well as linguistic and
semantic cues better throughout the radicalization
process.
107
Key Insights
108. Our Highly Multidisciplinary Approach
108
Public/
Society
Social
Interactions
Cognitive
Neuro
Cognitive
Process
● Human brain processes information from extremist
narratives on social media, that includes different
contexts, emotions, sentiment, etc.
● Individuals change behavior, make choices in
consuming/sharing content with an intent.
● Coordination, information flow and diffusion on
social networks.
● Outcomes/impact on society through events and
collective actions (eg, civil war or result of an
election).
Neural
109. References
● Ugur Kursuncu. “Modeling the Persona in Persuasive Discourse on Social Media Using Context-aware and
Knowledge-driven Learning.” University of Georgia. 2018.
● Ugur Kursuncu, Manas Gaur, Carlos Castillo, Amanuel Alambo, Krishnaprasad Thirunarayan, Valerie Shalin,
Dilshod Achilov, I. Budak Arpinar, and Amit Sheth. "Modeling Islamist extremist communications on social
media using contextual dimensions: Religion, ideology, and hate." CSCW 2019.
● Ugur Kursuncu, Manas Gaur, and Amit Sheth. "Knowledge infused learning (K-IL): Towards deep
incorporation of knowledge in deep learning." Proceedings of the AAAI 2020 Spring Symposium on
Combining Machine Learning and Knowledge Engineering in Practice (AAAI-MAKE 2020).
● Ugur Kursuncu, Manas Gaur, Usha Lokala, Krishnaprasad Thirunarayan, Amit Sheth, and I. Budak Arpinar.
"Predictive analysis on Twitter: Techniques and applications." Springer Nature 2019.
● Ugur Kursuncu, Manas Gaur, Usha Lokala, Anurag Illendula, Krishnaprasad Thirunarayan, Raminta
Daniulaityte, Amit Sheth, and I. Budak Arpinar. "What's ur Type? Contextualized Classification of User Types
in Marijuana-Related Communications Using Compositional Multiview Embedding." Web Intelligence 2018.
● Ugur Kursuncu, Manas Gaur, Krishnaprasad Thirunarayan, Amit Sheth, “Explainability of medical ai through
domain knowledge”, Ontology Summit Communications 2019.
109
110. Knowledge Graphs for Autonomous
Driving
Artificial Intelligence
Institute
ruwan@email.sc.edu
@ruwantw
125. References
● Ruwan Wickramarachchi, Cory Henson, and Amit Sheth. "An evaluation of knowledge graph embeddings for
autonomous driving data: Experience and practice." AAAI Spring Symposium 2020.
● Oltramari, Alessandro, Jonathan Francis, Cory Henson, Kaixin Ma, and Ruwan Wickramarachchi. "Neuro-symbolic
Architectures for Context Understanding." Knowledge Graphs for eXplainable AI, IOS Press 2020.
● Alessandro Oltramari, Cory Henson, Ruwan Wickramarachchi, Don Brutzman and Richard Markeloff. “Hybrid AI for
Context Understanding” 3rd U.S. Semantic Technologies Symposium, Raleigh, NC 2020
https://us2ts.org/2020/program-hybrid-ai
● Cory Henson, Stefan Schmid, Anh Tuan Tran, and Antonios Karatzoglou. "Using a Knowledge Graph of Scenes to
Enable Search of Autonomous Driving Data." In ISWC Satellites, pp. 313-314. 2019.
● Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan,
Giancarlo Baldan, and Oscar Beijbom. "nuscenes: A multimodal dataset for autonomous driving." In Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621-11631. 2020.
● Kesten, R., M. Usman, J. Houston, T. Pandya, K. Nadhamuni, A. Ferreira, M. Yuan et al. "Lyft level 5 av dataset 2019."
● Alshargi, Faisal, Saeedeh Shekarpour, Tommaso Soru, and Amit Sheth. “Metrics for evaluating quality of embeddings
for ontological concepts”. AAAI Spring Symposium 2019.
125
128. Research Question
128
Does semantically enriching the natural language processing
algorithm with domain-specific knowledge increase the
coverage in text understanding?
129. Darknet and Anonymity
129
Access
Cryptomarkets provide anonymity
to both buyers and sellers:
• Location on the Dark Web, which
requires specific software to
access (e.g., Tor, I2P);
• Use of untraceable
cryptocurrencies (e.g., bitcoin);
• Privacy and anonymity;
Approximately two thirds of the
goods sold on cryptomarkets are
drugs (EMCDDA, 2018).
130. Transaction in Cryptomarket
130
Cryptocurrencies
• Based on centralized blockchain technologies
• Identified by encrypted code
• Approximately 1800 cryptocurrencies
• Most commonly used on cryptomarkets: Bitcoin,
Litecoin, Monero.
Image Source: 1. Zhang, Yiming, et al. "Your style your identity: Leveraging writing and photography styles for drug trafficker identification in darknet markets over attributed heterogeneous
information network." The World Wide Web Conference. 2019.
Image Source 2. https://www.investopedia.com/terms/b/blockchain.asp
131. Motivation
131
◉ Darknet markets have grown substantially even with government
interventions from 2013-2016 [1]
[1] Kristy Kruithof. 2016. Internet-facilitated drugs trade: An analysis of the size, scope and the role of the Netherlands. RAND.
Feature Growth
Total revenue 2x
Total number of transactions 3x
Total number of listings 5.5x
Total number of listings per vendor 2x
Incremental growth of the Darknet Market [1]
132. Motivation
132
◉ Drug Traffickers may maintain multiple accounts across different
markets or in the same market
◉ Linking different accounts to the same individuals is essential to track
their status and better understand the online drug trafficking ecosystem
◉ Illegal trading of drugs in these markets has turned into a serious global
concern because of its severe consequences on society (e.g., violent
crimes) and public health at regional, national and international levels
134. Problem Statement
134
◉ The task involves the detection of similarity between two vendors on
online forums, i.e., Darknet, Reddit, and Twitter. (Identification of sybil
accounts)
◉ Formally, given any two vendors va
and vb
associated with the
respective sites si
and sj
, our goal is to develop a similarity measure
sim(va
si
, vb
sj
) between the two vendors using various
characteristics/patterns.
135. Dataset Creation
135
◉ Data extracted using eDarkTrends platform [5] with 1992 unique vendors
collected over 3 different sites.
[5] Usha Lokala, Francois R Lamy, Raminta Daniulaityte, Amit Sheth, Ramzi W Nahhas, Jason I Roden, Shweta Yadav, and Robert G Carlson. 2019. Global trends, local harms:
availability of fentanyl-type drugs on the dark web and accidental overdoses in Ohio. Computational and Mathematical Organization Theory 25, 1 (2019), 48–59.
Dark Web Sites Dream Market Tochka Wall street All
Unique # Vendor names 1448 408 466 1992
Unique # Substance 852 313 290 1148
Unique # Location 356 44 29 389
Unique # Descriptions 16800 1829 1723 18472
137. Methodology: Modelling Multi-view Learning
137
◉ Multi-view learning is an ideal learning
mechanism for the data where examples are
characterized by distinct (often orthogonal)
feature sets.
◉ Generalize and improve the performance by
exploiting the diverse views from multiple rich
sources such as textual, stylometric, and location
representation.
Image Source: 1. Zhang, Yiming, et al. "Your style your identity: Leveraging writing and photography styles for drug trafficker identification in darknet markets
over attributed heterogeneous information network." The World Wide Web Conference. 2019.
139. Knowledge Infusion: Drug Abuse Ontology
139
◉ The Drug Abuse Ontology (DAO) is a formal
representation of concepts and relationships
between them for the prescription drug abuse
domain.
◉ The current DAO contains 241 classes and 37
properties.
◉ DAO identify all variants of a concept in data
(e.g., generic names, slang terms, scientific
names).
◉ DAO contains names of psychoactive
substances (e.g., heroin, fentanyl), including
synthetic substances (e.g., U-47,700, MT-45),
brand and generic names of pharmaceutical
drugs (e.g., Duragesic, fentanyl transdermal
system) and slang terms (e.g., roxy, fent).
140. Knowledge Infusion: Drug Abuse Ontology
140
Augmentation of drug slang terms enables understanding of Drug Abuse-related
textual description that was not explored well at all.
141. Knowledge Infusion: Drug Abuse Ontology
141
◉ DAO contains information regarding the
route of administration (e.g., oral, IV), unit
of dosage (e.g., gr, gram, pint, tablets),
physiological effects (e.g., dysphoria,
vomiting) and substance form (e.g.,
powder, liquid, hcl)
◉ The DAO is also enriched with links to
concepts in external ontologies, through a
very careful manually supervised process.
Among the 43 DAO classes, 11 classes
have been mapped to URIs in DrugBank,
Freebase, DBpedia and the Cyc ontologies,
using the sameAs property.
144. Location and Substance View Encoding
144
◉ Utilize simple binary encoding to obtain the view representation:
◉ Add a self information weight or information content, for all features
Information content
USA CAN ESP IND CHN BEL NOR NZL SAU UKR
1 1 0 0 0 0 0 0 0 0
145. Multi-view Fusion-Canonical Correlation
Analysis
145
◉ Cannot simply concatenate since each vector
may correspond to different modalities (image vs
text) or very different distributional properties
◉ These views are fused using CCA [9] to obtain a
single representation, which we call Vendor
embedding
◉ Allows us to infer information from cross
variance matrices
◉ Employ an extension called weighted generalized
CCA.
[9] Harold Hotelling. 1992. Relations between two sets of variates. In Breakthroughs in statistics. Springer, 162–190.
148. Domain Specific Analysis
148
◉ Usage of Multilingual and Code-mixed text
◉ Use slang terms across listings captured by our model (e.g., horse for heroin)
◉ Lack of uniform features in website adds noise to our model (product description
and rating data)
◉ Some vendors may operate from different locations or may even be selling
different drugs
◉ Branding (posting favorable reviews) is common in these markets
149. Use Case Examples
149
Case Studies @Vendor 1 @Vendor 2
Branding 5//02/14 09:49 am,5/Thanks alles
schick/11/10 01:46 pm, <END>Tilidin
50MG/4MG Original Apothekenware
5//02/14 09:49 am,5/Thanks alles schick/11/10
01:46 pm, <END>Tilidin 50 MG/4MG Original
Apothekenware <END> 5/Thanks alles
schick/11/10 01:46 pm,
Comparing product
Description and rating
since the vendor did not
enter product description
in other site.
Percocet Oxycodone 5/325 mg 200 Tablets
Finalize Early and get 20 Free bonus sent for a
total of 220!US Made Mallinckrodt 5mg/325
(made in St. Louis, Miss. USA) ...
5//02/07 01:03 pm,5/Thanks Again. A++/01/21
11:49 pm,5/Trustworthy/01/16 12:22
pm,4.33//01/07 08:50 am,5/Great
communication, trustworthy, and over
delivered./12/31 11:09 pm,5//11/29 03:25
pm,5/FAST A+++ Best Stealth I’ve seen yet.
Similar stylometric
Features captured by
the use of special
characters or emojis.
—————————————
—————————————
****** NEWS 25.12.2018 NEWS ******
—————————————
—————————————
We ship all new ...
—————————
—————————
PRODUCTS
—————————
—————————
AFGHAN HEROIN
A+++COCAINE #3 ...
150. Conclusion
150
◉ 98% ACCURACY: Developed Multi-view learning Sybil account detection on
the real-life Darknet market dataset achieving an accuracy of 98%.
◉ UNSUPERVISED LEARNING: Utilizing unlabelled data to train the network.
◉ DOMAIN ADAPTATION: Performed cross-domain analysis to justify uniform
result.
◉ KNOWLEDGE-INFUSED NLP: Proved the effectiveness of utilizing domain
specific knowledge graph of drug (DAO) in textual content understanding on
DarkNet.
151. References
● Ramnath Kumar, Shweta Yadav, Raminta Daniulaityte, Francois Lamy, Krishnaprasad Thirunarayan, Usha Lokala, and Amit
Sheth. "eDarkFind: Unsupervised Multi-view Learning for Sybil Account Detection." The Web Conference. 2020.
● Zhang, Yiming, et al. "Your style your identity: Leveraging writing and photography styles for drug trafficker identification in
darknet markets over attributed heterogeneous information network." The World Wide Web Conference. 2019.
● Delroy Cameron, Gary A Smith, Raminta Daniulaityte, Amit P Sheth, Drashti Dave, Lu Chen, Gaurish Anand, Robert Carlson, Kera
Z Watkins, and Russel Falck. 2013. PREDOSE: a semantic web platform for drug abuse epidemiology using social media.
Journal of biomedical informatics 46, 6 (2013), 985–997
● Usha Lokala, Francois R Lamy, Raminta Daniulaityte, Amit Sheth, Ramzi W Nahhas, Jason I Roden, Shweta Yadav, and Robert G
Carlson. 2019. Global trends, local harms: availability of fentanyl-type drugs on the dark web and accidental overdoses in Ohio.
Computational and Mathematical Organization Theory 25, 1 (2019), 48–59.
● Xiangwen Wang, Peng Peng, Chun Wang, and Gang Wang. 2018. You are your photographs: Detecting multiple identities of
vendors in the darknet marketplaces. In Proceedings of the 2018 on Asia Conference on Computer and Communications
Security. ACM, 431–442.
151
152. Ongoing Research at #AIISC
152
Detection of Early Onset of Colorectal Cancer using
Digestive Inflammation Index
Conversational Systems for Nutrition Monitoring of High
School Children
Cyber Social Threats
Conversational Systems for pediatric patients with Neutropenia,
asthma in children, and obesity and hypertension in adults.
Development of an Instrumented, Intelligent Infant Interaction
Laboratory for the Prediction of Autism Spectrum Disorder
Current Collaboration across UofSC:
● College of Medicine (>5)
● College of Nursing (2)
● College of Arts & Science (2)
● College of Pharmacy (2)
● College of Information &
Communication
● College of Engineering &
Computing
● College Education
153. AIISC and Collaborators
153
5 faculty, >12 PhDs, few Masters, >5
undergrads, 2 Post-Docs, >10 Research
Interns
Alumni in/as
Industry: IBM T.J. Watson, Almaden, Amazon, Samsung
America, LinkedIn, Facebook, Bosch
Start-ups: AppZen, AnalyticsFox, Cognovi Labs
Faculty: George Mason, University of Kentucky, Case
Western Reserve, North Carolina State University,
University of Dayton
154. http://aiisc.ai/
We acknowledge partial support from the National Science Foundation (NSF) award
CNS-1513721: “Context-Aware Harassment Detection on Social Media", National
Institutes of Health (NIH) award: MH105384-01A1: “Modeling Social Behavior for
Health- care Utilization in Depression", and National Institute on Drug Abuse (NIDA)
Grant No. 5R01DA039454-02 “Trending: Social media analysis to monitor cannabis
and synthetic cannabinoid use”. Any opinions, conclusions or recommendations
expressed in this material are those of the authors and do not necessarily reflect the
views of the NSF, NIH, or NIDA.