The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can "understand and satisfy the requests of people and machines to use the web content" – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address the issue of alignment and relationship identification using a bootstrapping based approach. By alignment we mean the process of determining correspondences between classes and properties of ontologies. We identify subsumption, equivalence and part-of relationship between classes. The work identifies part-of relationship between instances. Between properties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. The work showcases use of bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can "understand and satisfy the requests of people and machines to use the web content" – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address the issue of alignment and relationship identification using a bootstrapping based approach. By alignment we mean the process of determining correspondences between classes and properties of ontologies. We identify subsumption, equivalence and part-of relationship between classes. The work identifies part-of relationship between instances. Between properties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. The work showcases use of bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesPrateek Jain
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can “understand and satisfy the requests of people and machines to use the web content” – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address these issues using a bootstrapping based approach. It showcases using bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
Structured data on the Web frequently referred to as knowledge graphs consists of large number of datasets representing diverse domains. Widely used commercial applications such as entity recommendation, search, question answering and knowledge discovery use these knowledge graphs as their knowledge source. Majority of these applications have a particular domain of interest, hence require only the segment of the Web of data representing that domain (e.g., movie, biomedical, sports). In fact, leveraging the entire Web of data for a domain-specific application is not only computationally intensive, but also the irrelevant portion negatively impact the accuracy of the application. Hence, finding the relevant portion of the Web of data for domain-specific applications has become a paramount issue. Identifying the relevant portion of the Web of data consists of two sub-tasks; 1) find the relevant datasets that contain knowledge on the domain of interest, and 2) extract the subgraph representing domain of interest from the knowledge graphs that represent multiple domains (e.g., DBpedia, YAGO, Freebase). In this talk, I will discuss both data-driven and knowledge-driven approaches to solve these two sub-tasks. The domain-specific subgraphs extracted by our approach were 80% less in size in terms of the number of paths compared to original KG and resulted in more than tenfold reduction of required computational time for domain-specific tasks, yet produced better accuracy on domain-specific applications. We believe that this work can significantly contribute for utilizing knowledge graphs for domain-specific applications, specially with the explosive growth in the creation of knowledge graphs.
Video of the talk: https://www.youtube.com/watch?v=7k-u_TUew3o
Abstract: Social media has experienced immense growth in recent times. These platforms are becoming increasingly common for information seeking and consumption, and as part of its growing popularity, information overload pose a significant challenge to users. For instance, Twitter alone generates around 500 million tweets per day and it is impractical for users to have to parse through such an enormous stream to find information that are interesting to them. This situation necessitates efficient personalized filtering mechanisms for users to consume relevant, interesting information from social media.
Building a personalized filtering system involves understanding users interests and utilizing these interests to deliver relevant information to users. These tasks primarily include analyzing and processing social media text which is challenging due to its shortness in length, and the real-time nature of the medium. The challenges include: (1) Lack of semantic context: Social Media posts are on an average short in length, which provides limited semantic context to perform textual analysis. This is particularly detrimental for topic identification which is a necessary task for mining users interests; (2) Dynamically changing vocabulary: Most social media websites such as Twitter and Facebook generate posts that are of current (timely) interests to the users. Due to this real-time nature, information relevant to dynamic topics of interest evolve reflecting the changes in the real world. This in turn changes the vocabulary associated with these dynamic topics of interest making it harder to filter relevant information; (3) Scalability: The number of users on social media platforms are significantly large, which is difficult for centralized systems to scale to deliver relevant information to users. This dissertation is devoted to exploring semantic techniques and Semantic Web technologies to address the above mentioned challenges in building a personalized information filtering system for social media. Particularly, the necessary semantics (knowledge) is derived from crowd sourced knowledge bases such as Wikipedia to improve context for understanding short-text and dynamic topics on social media.
Prateek Jain dissertation defense, Kno.e.sis, Wright State UniversityPrateek Jain
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can "understand and satisfy the requests of people and machines to use the web content" – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address the issue of alignment and relationship identification using a bootstrapping based approach. By alignment we mean the process of determining correspondences between classes and properties of ontologies. We identify subsumption, equivalence and part-of relationship between classes. The work identifies part-of relationship between instances. Between properties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. The work showcases use of bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesPrateek Jain
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can “understand and satisfy the requests of people and machines to use the web content” – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address these issues using a bootstrapping based approach. It showcases using bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
Structured data on the Web frequently referred to as knowledge graphs consists of large number of datasets representing diverse domains. Widely used commercial applications such as entity recommendation, search, question answering and knowledge discovery use these knowledge graphs as their knowledge source. Majority of these applications have a particular domain of interest, hence require only the segment of the Web of data representing that domain (e.g., movie, biomedical, sports). In fact, leveraging the entire Web of data for a domain-specific application is not only computationally intensive, but also the irrelevant portion negatively impact the accuracy of the application. Hence, finding the relevant portion of the Web of data for domain-specific applications has become a paramount issue. Identifying the relevant portion of the Web of data consists of two sub-tasks; 1) find the relevant datasets that contain knowledge on the domain of interest, and 2) extract the subgraph representing domain of interest from the knowledge graphs that represent multiple domains (e.g., DBpedia, YAGO, Freebase). In this talk, I will discuss both data-driven and knowledge-driven approaches to solve these two sub-tasks. The domain-specific subgraphs extracted by our approach were 80% less in size in terms of the number of paths compared to original KG and resulted in more than tenfold reduction of required computational time for domain-specific tasks, yet produced better accuracy on domain-specific applications. We believe that this work can significantly contribute for utilizing knowledge graphs for domain-specific applications, specially with the explosive growth in the creation of knowledge graphs.
Video of the talk: https://www.youtube.com/watch?v=7k-u_TUew3o
Abstract: Social media has experienced immense growth in recent times. These platforms are becoming increasingly common for information seeking and consumption, and as part of its growing popularity, information overload pose a significant challenge to users. For instance, Twitter alone generates around 500 million tweets per day and it is impractical for users to have to parse through such an enormous stream to find information that are interesting to them. This situation necessitates efficient personalized filtering mechanisms for users to consume relevant, interesting information from social media.
Building a personalized filtering system involves understanding users interests and utilizing these interests to deliver relevant information to users. These tasks primarily include analyzing and processing social media text which is challenging due to its shortness in length, and the real-time nature of the medium. The challenges include: (1) Lack of semantic context: Social Media posts are on an average short in length, which provides limited semantic context to perform textual analysis. This is particularly detrimental for topic identification which is a necessary task for mining users interests; (2) Dynamically changing vocabulary: Most social media websites such as Twitter and Facebook generate posts that are of current (timely) interests to the users. Due to this real-time nature, information relevant to dynamic topics of interest evolve reflecting the changes in the real world. This in turn changes the vocabulary associated with these dynamic topics of interest making it harder to filter relevant information; (3) Scalability: The number of users on social media platforms are significantly large, which is difficult for centralized systems to scale to deliver relevant information to users. This dissertation is devoted to exploring semantic techniques and Semantic Web technologies to address the above mentioned challenges in building a personalized information filtering system for social media. Particularly, the necessary semantics (knowledge) is derived from crowd sourced knowledge bases such as Wikipedia to improve context for understanding short-text and dynamic topics on social media.
Deep neural networks for matching online social networking profilesTraian Rebedea
> Proposed a large dataset for matching online social networking profiles
›This allowed us to train a deep neural network for profile matching using both domain-specific features and word embeddings generated from textual descriptions from social profiles
›Experiments showed that the NN surpassed both unsupervised and supervised models, achieving a high precision (P = 0.95) with a good recall rate (R = 0.85)
Wholi: The right people find each other (at the right time)
Two key elements in this talk:
•PART 1: Machine learning for entity extraction
Natural language processing (NLP), information extraction
•PART 2: Matching profiles using deep learning classifier
Deep learning, word embeddings
Kalpa Gunaratna's Ph.D. dissertation defense: April 19 2017
The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the relationships and classes are defined. Today, there exist large knowledge graphs in the research community (e.g., encyclopedic datasets like DBpedia and Yago) and corporate world (e.g., Google knowledge graph) that encapsulate a large amount of knowledge for human and machine consumption. Typically, they consist of millions of entities and billions of facts describing these entities. While it is good to have this much knowledge available on the Web for consumption, it leads to information overload, and hence proper summarization (and presentation) techniques need to be explored.
In this dissertation, we focus on creating both comprehensive and concise entity summaries at: (i) the single entity level and (ii) the multiple entity level. To summarize a single entity, we propose a novel approach called FACeted Entity Summarization (FACES) that considers importance, which is computed by combining popularity and uniqueness, and diversity of facts getting selected for the summary. We first conceptually group facts using semantic expansion and hierarchical incremental clustering techniques and form facets (i.e., groupings) that go beyond syntactic similarity. Then we rank both the facts and facets using Information Retrieval (IR) ranking techniques to pick the highest ranked facts from these facets for the summary. The important and unique contribution of this approach is that because of its generation of facets, it adds diversity into entity summaries, making them comprehensive. For creating multiple entity summaries, we simultaneously process facts belonging to the given entities using combinatorial optimization techniques. In this process, we maximize diversity and importance of facts within each entity summary and relatedness of facts between the entity summaries. The proposed approach uniquely combines semantic expansion, graph-based relatedness, and combinatorial optimization techniques to generate relatedness-based multi-entity summaries.
Complementing the entity summarization approaches, we introduce a novel approach using light Natural Language Processing (NLP) techniques to enrich knowledge graphs by adding type semantics to literals.
Wimmics Research Team 2015 Activity ReportFabien Gandon
Extract of the activity report of the Wimmics joint research team between Inria Sophia Antipolis - Méditerranée and I3S (CNRS and Université Nice Sophia Antipolis). Wimmics stands for web-instrumented man-machine interactions, communities and semantics. The team focuses on bridging social semantics and formal semantics on the web.
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Grace Hui Yang
This is the introductory talk for the TREC Dynamic Domain Track. The Track ran from 2015 to 2017, aiming to evaluate and advance research in dynamic search and domain-specific search. This talk was prepared to introduce the ideas and setups in the upcoming Track to the research community.
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
HyperMembrane Structures for Open Source Cognitive ComputingJack Park
Open source "cognitive computing" systems, specifically OpenSherlock; describes a HyperMembrane structure, a kind of information fabric, for machine reading, literature-based discovery, deep question answering. Platform is open source, uses ElasticSearch, topic maps, JSON, link-grammar parsing, and qualitative process models.
Description - Ajith defended his thesis on application and data portability in cloud
computing. More details on Ajith's research and publications can be
found at http://knoesis.wright.edu/researchers/ajith/
Video can be found at : http://www.youtube.com/watch?v=oDBeBIIFmHc&list=UUORqXk1ZV44MOwpCorAROyQ&index=1&feature=plpp_video
Deep neural networks for matching online social networking profilesTraian Rebedea
> Proposed a large dataset for matching online social networking profiles
›This allowed us to train a deep neural network for profile matching using both domain-specific features and word embeddings generated from textual descriptions from social profiles
›Experiments showed that the NN surpassed both unsupervised and supervised models, achieving a high precision (P = 0.95) with a good recall rate (R = 0.85)
Wholi: The right people find each other (at the right time)
Two key elements in this talk:
•PART 1: Machine learning for entity extraction
Natural language processing (NLP), information extraction
•PART 2: Matching profiles using deep learning classifier
Deep learning, word embeddings
Kalpa Gunaratna's Ph.D. dissertation defense: April 19 2017
The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the relationships and classes are defined. Today, there exist large knowledge graphs in the research community (e.g., encyclopedic datasets like DBpedia and Yago) and corporate world (e.g., Google knowledge graph) that encapsulate a large amount of knowledge for human and machine consumption. Typically, they consist of millions of entities and billions of facts describing these entities. While it is good to have this much knowledge available on the Web for consumption, it leads to information overload, and hence proper summarization (and presentation) techniques need to be explored.
In this dissertation, we focus on creating both comprehensive and concise entity summaries at: (i) the single entity level and (ii) the multiple entity level. To summarize a single entity, we propose a novel approach called FACeted Entity Summarization (FACES) that considers importance, which is computed by combining popularity and uniqueness, and diversity of facts getting selected for the summary. We first conceptually group facts using semantic expansion and hierarchical incremental clustering techniques and form facets (i.e., groupings) that go beyond syntactic similarity. Then we rank both the facts and facets using Information Retrieval (IR) ranking techniques to pick the highest ranked facts from these facets for the summary. The important and unique contribution of this approach is that because of its generation of facets, it adds diversity into entity summaries, making them comprehensive. For creating multiple entity summaries, we simultaneously process facts belonging to the given entities using combinatorial optimization techniques. In this process, we maximize diversity and importance of facts within each entity summary and relatedness of facts between the entity summaries. The proposed approach uniquely combines semantic expansion, graph-based relatedness, and combinatorial optimization techniques to generate relatedness-based multi-entity summaries.
Complementing the entity summarization approaches, we introduce a novel approach using light Natural Language Processing (NLP) techniques to enrich knowledge graphs by adding type semantics to literals.
Wimmics Research Team 2015 Activity ReportFabien Gandon
Extract of the activity report of the Wimmics joint research team between Inria Sophia Antipolis - Méditerranée and I3S (CNRS and Université Nice Sophia Antipolis). Wimmics stands for web-instrumented man-machine interactions, communities and semantics. The team focuses on bridging social semantics and formal semantics on the web.
Text REtrieval Conference (TREC) Dynamic Domain Track 2015Grace Hui Yang
This is the introductory talk for the TREC Dynamic Domain Track. The Track ran from 2015 to 2017, aiming to evaluate and advance research in dynamic search and domain-specific search. This talk was prepared to introduce the ideas and setups in the upcoming Track to the research community.
DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University
Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.
HyperMembrane Structures for Open Source Cognitive ComputingJack Park
Open source "cognitive computing" systems, specifically OpenSherlock; describes a HyperMembrane structure, a kind of information fabric, for machine reading, literature-based discovery, deep question answering. Platform is open source, uses ElasticSearch, topic maps, JSON, link-grammar parsing, and qualitative process models.
Description - Ajith defended his thesis on application and data portability in cloud
computing. More details on Ajith's research and publications can be
found at http://knoesis.wright.edu/researchers/ajith/
Video can be found at : http://www.youtube.com/watch?v=oDBeBIIFmHc&list=UUORqXk1ZV44MOwpCorAROyQ&index=1&feature=plpp_video
Cory Henson defended his thesis on "A Semantics-based Approach to Machine Perception".
Video can be found at: http://www.youtube.com/watch?v=L8M7eoGKtSE
Video: https://www.youtube.com/watch?v=ZCToaDgxnAs
Abstract:
People's emotions can be gleaned from their text using machine learning techniques to build models that exploit large self-labeled emotion data from social media. Further, the self-labeled emotion data can be effectively adapted to train emotion classifiers in different target domains where training data are sparse.
Emotions are both prevalent in and essential to most aspects of our lives. They influence our decision-making, affect our social relationships and shape our daily behavior. With the rapid growth of emotion-rich textual content, such as microblog posts, blog posts, and forum discussions, there is a growing need to develop algorithms and techniques for identifying people's emotions expressed in text. It has valuable implications for the studies of suicide prevention, employee productivity, well-being of people, customer relationship management, etc. However, emotion identification is quite challenging partly due to the following reasons: i) It is a multi-class classification problem that usually involves at least six basic emotions. Text describing an event or situation that causes the emotion can be devoid of explicit emotion-bearing words, thus the distinction between different emotions can be very subtle, which makes it difficult to glean emotions purely by keywords. ii) Manual annotation of emotion data by human experts is very labor-intensive and error-prone. iii) Existing labeled emotion datasets are relatively small, which fails to provide a comprehensive coverage of emotion-triggering events and situations.
Vahid Taslimitehrani's Dissertation Defense: Friday, February 19 2015.
Ph.D. Committee: Drs. Guozhu Dong, Advisor, T.K. Prasad, Amit Sheth, Keke Chen
and Jyotishman Pathak, Division of Health Informatics, Weill Cornell Medical College, Cornell University.
ABSTRACT:
Regression and classification techniques play an essential role in many data mining tasks and have broad applications. However, most of the state-of-the-art regression and classification techniques are often unable to adequately model the interactions among predictor variables in highly heterogeneous datasets. New techniques that can effectively model such complex and heterogeneous structures are needed to significantly improve prediction accuracy.
In this dissertation, we propose a novel type of accurate and interpretable regression and classification models, named as Pattern Aided Regression (PXR) and Pattern Aided Classification (PXC) respectively. Both PXR and PXC rely on identifying regions in the data space where a given baseline model has large modeling errors, characterizing such regions using patterns, and learning specialized models for those regions. Each PXR/PXC model contains several pairs of contrast patterns and local models, where a local classifier is applied only to data instances matching its associated pattern. We also propose a class of classification and regression techniques called Contrast Pattern Aided Regression (CPXR) and Contrast Pattern Aided Classification (CPXC) to build accurate and interpretable PXR and PXC models.
We have conducted a set of comprehensive performance studies to evaluate the performance of CPXR and CPXC. The results show that CPXR and CPXC outperform state-of-the-art regression and classification algorithms, often by significant margins. The results also show that CPXR and CPXC are especially effective for heterogeneous and high dimensional datasets. Besides being new types of modeling, PXR and PXC models can also provide insights into data heterogeneity and diverse predictor-response relationships.
We have also adapted CPXC to handle classifying imbalanced datasets and introduced a new algorithm called Contrast Pattern Aided Classification for Imbalanced Datasets (CPXCim). In CPXCim, we applied a weighting method to boost minority instances as well as a new filtering method to prune patterns with imbalanced matching datasets.
Finally, we applied our techniques on three real applications, two in the healthcare domain and one in the soil mechanic domain. PXR and PXC models are significantly more accurate than other learning algorithms in those three applications.
Dissertation Defense:
" Mining and Analyzing Subjective Experiences in User Generated Content "
By Lu Chen
Tuesday, April 9, 2016
Dissertation Committee: Dr. Amit Sheth, Advisor, Dr. T. K. Prasad, Dr. Keke Chen, Dr. Ingmar Weber, and Dr. Justin Martineau,
Pictures: https://www.facebook.com/Kno.e.sis/photos/?tab=album&album_id=1225911137443732
Video: https://youtu.be/tzLEUB-hggQ
Lu's Home page: http://knoesis.wright.edu/researchers/luchen/
ABSTRACT
Web 2.0 and social media enable people to create, share and discover information instantly anywhere, anytime. A great amount of this information is subjective information -- the information about people's subjective experiences, ranging from feelings of what is happening in our daily lives to opinions on a wide variety of topics. Subjective information is useful to individuals, businesses, and government agencies to support decision making in areas such as product purchase, marketing strategy, and policy making. However, much useful subjective information is buried in ever-growing user generated data on social media platforms, it is still difficult to extract high quality subjective information and make full use of it with current technologies.
Current subjectivity and sentiment analysis research has largely focused on classifying the text polarity -- whether the expressed opinion regarding a specific topic in a given text is positive, negative, or neutral. This narrow definition does not take into account the other types of subjective information such as emotion, intent, and preference, which may prevent their exploitation from reaching its full potential. This dissertation extends the definition and introduces a unified framework for mining and analyzing diverse types of subjective information. We have identified four components of a subjective experience: an individual who holds it, a target that elicits it (e.g., a movie, or an event), a set of expressions that describe it (e.g., "excellent", "exciting"), and a classification or assessment that characterize it (e.g., positive vs. negative). Accordingly, this dissertation makes contributions in developing novel and general techniques for the tasks of identifying and extracting these components.
We first explore the task of extracting sentiment expressions from social media posts. We propose an optimization-based approach that extracts a diverse set of sentiment-bearing expressions, including formal and slang words/phrases, for a given target from an unlabeled corpus. Instead of associating the overall sentiment with a given text, this method assesses the more fine-grained target-dependent polarity of each sentiment expression. Unlike pattern-based approaches which often fail to capture the diversity of sentiment expressions due to the informal nature of language usage and writing style in social media posts, the proposed approach is capable of identifying sentiment phrase
Understanding users’ latent intents behind search queries is essential for satisfying a user’s search needs. Search intent mining can help search engines to enhance its ranking of search results, enabling new search features like instant answers, personalization, search result diversification, and the recommendation of more relevant ads. Consequently, there has been increasing attention on studying how to effectively mine search intents by analyzing search engine query logs. While state-of-the-art techniques can identify the domain of the queries (e.g. sports, movies, health), identifying domain-specific intent is still an open problem. Among all the topics available on the Internet, health is one of the most important in terms of impact on the user and it is one of the most frequently searched areas. This dissertation presents a knowledge-driven approach for domain-specific search intent mining with a focus on health-related search queries.
First, we identified 14 consumer-oriented health search intent classes based on inputs from focus group studies and based on analyses of popular health websites, literature surveys, and an empirical study of search queries. We defined the problem of classifying millions of health search queries into zero or more intent classes as a multi-label classification problem. Popular machine learning approaches for multi-label classification tasks (namely, problem transformation and algorithm adaptation methods) were not feasible due to the limitation of label data creations and health domain constraints. Another challenge in solving the search intent identification problem was mapping terms used by laymen to medical terms. To address these challenges, we developed a semantics-driven, rule-based search intent mining approach leveraging rich background knowledge encoded in Unified Medical Language System (UMLS) and a crowd sourced encyclopedia (Wikipedia). The approach can identify search intent in a disease-agnostic manner and has been evaluated on three major diseases.
While users often turn to search engines to learn about health conditions, a surprising amount of health information is also shared and consumed via social media, such as public social platforms like Twitter. Although Twitter is an excellent information source, the identification of informative tweets from the deluge of tweets is the major challenge. We used a hybrid approach consisting of supervised machine learning, rule-based classifiers, and biomedical domain knowledge to facilitate the retrieval of relevant and reliable health information shared on Twitter in real time. Furthermore, we extended our search intent mining algorithm to classify health-related tweets into health categories. Finally, we performed a large-scale study to compare health search intents and features that contribute in the expression of search intent from 100+ million search queries from smarts devices (smartphones/tablets) and personal computers (desktops/laptops)
Social media provides a natural platform for dynamic emergence of citizen (as) sensor communities, where the citizens share information, express opinions, and engage in discussions. Often such a Online Citizen Sensor Community (CSC) has stated or implied goals related to workflows of organizational actors with defined roles and responsibilities. For example, a community of crisis response volunteers, for informing the prioritization of responses for resource needs (e.g., medical) to assist the managers of crisis response organizations. However, in CSC, there are challenges related to information overload for organizational actors, including finding reliable information providers and finding the actionable information from citizens. This threatens awareness and articulation of workflows to enable cooperation between citizens and organizational actors. CSCs supported by Web 2.0 social media platforms offer new opportunities and pose new challenges. This work addresses issues of ambiguity in interpreting unconstrained natural language (e.g., ‘wanna help’ appearing in both types of messages for asking and offering help during crises), sparsity of user and group behaviors (e.g., expression of specific intent), and diversity of user demographics (e.g., medical or technical professional) for interpreting user-generated data of citizen sensors. Interdisciplinary research involving social and computer sciences is essential to address these socio-technical issues in CSC, and allow better accessibility to user-generated data at higher level of information abstraction for organizational actors. This study presents a novel web information processing framework focused on actors and actions in cooperation, called Identify-Match-Engage (IME), which fuses top-down and bottom-up computing approaches to design a cooperative web information system between citizens and organizational actors. It includes a.) identification of action related seeking-offering intent behaviors from short, unstructured text documents using both declarative and statistical knowledge based classification model, b.) matching of intentions about seeking and offering, and c.) engagement models of users and groups in CSC to prioritize whom to engage, by modeling context with social theories using features of users, their generated content, and their dynamic network connections in the user interaction networks. The results show an improvement in modeling efficiency from the fusion of top-down knowledge-driven and bottom-up data-driven approaches than from conventional bottom-up approaches alone for modeling intent and engagement. Several applications of this work include use of the engagement interface tool during recent crises to enable efficient citizen engagement for spreading critical information of prioritized needs to ensure donation of only required supplies by the citizens. The engagement interface application also won the United Nations ICT agency ITU's Young Innovator 2014 award.
Sujan Perera's Dissertation Defense: Friday, August 12, 2016
Ph.D. Committee: Drs. Amit Sheth, Advisor; T.K. Prasad, Michael Raymer, and Pablo Mendes (IBM Research)
Video: https://youtu.be/pbjJ1zb8ayY
ABSTRACT:
Natural language is a powerful tool developed by humans over hundreds of thousands of years. The extensive usage, flexibility of the language, creativity of the human beings, and social, cultural, and economic changes that have taken place in daily life have added new constructs, styles, and features to the language. One such feature of the language is its ability to express ideas, opinions, and facts in an implicit manner. This is a feature that is used extensively in day to day communications in situations such as: 1) expressing sarcasm, 2) when trying to recall forgotten things, 3) when required to convey descriptive information, 4) when emphasizing the features of an entity, and 5) when communicating a common understanding.
Consider the tweet 'New Sandra Bullock astronaut lost in space movie looks absolutely terrifying' and the text snippet extracted from a clinical narrative 'He is suffering from nausea and severe headaches. Dolasteron was prescribed.' The tweet has an implicit mention of the entity Gravity and the clinical text snippet has implicit mention of the relationship between medication Dolasteron and clinical condition nausea. Such implicit references of the entities and the relationships are common occurrences in daily communication and they add unique value to conversations. However, extracting implicit constructs has not received enough attention. This dissertation focuses on extracting implicit entities and relationships from clinical narratives and extracting implicit entities from Tweets.
This dissertation demonstrates manifestations of implicit constructs in text, studies their characteristics, and develops a solution that is capable of extracting implicit factual information from text. The developed solution starts by acquiring relevant knowledge to solve the implicit information extraction problem. The relevant knowledge includes domain knowledge, contextual knowledge, and linguistic knowledge. The acquired knowledge can take different syntactic forms such as a text snippet, structured knowledge represented in standard knowledge representation languages like Resource Description Framework (RDF) or custom formats. Hence, the acquired knowledge is processed to create models that can be understood by machines. Such models provide the infrastructure to perform implicit information extraction of interest.
This dissertation focuses on three different use cases of implicit information and demonstrates the applicability of the developed solution in these use cases. They are:
- implicit entity linking in clinical narratives,
- implicit entity linking in Twitter,
- implicit relationship extraction from clinical narratives.
Literature-Based Discovery (LBD) refers to the process of uncovering hidden connections that are implicit in scientific literature. Numerous hypotheses have been generated from scientific literature, which influenced innovations in diagnosis, treatment, preventions and overall public health. However, much of the existing research on discovering hidden connections among concepts have used distributional statistics and graph-theoretic measures to capture implicit associations. Such metrics do not explicitly capture the semantics of hidden connections. ...
While effective in some situations, the practice of relying on domain expertise, structured background knowledge and heuristics to complement distributional and graph-theoretic approaches, has serious limitations. ..
This dissertation proposes an innovative context-driven, automatic subgraph creation method for finding hidden and complex associations among concepts, along multiple thematic dimensions. It outlines definitions for context and shared context, based on implicit and explicit (or formal) semantics, which compensate for deficiencies in statistical and graph-based metrics. It also eliminates the need for heuristics a priori. An evidence-based evaluation of the proposed framework showed that 8 out of 9 existing scientific discoveries could be recovered using this approach. Additionally, insights into the meaning of associations could be obtained using provenance provided by the system. In a statistical evaluation to determine the interestingness of the generated subgraphs, it was observed that an arbitrary association is mentioned in only approximately 4 articles in MEDLINE, on average. These results suggest that leveraging implicit and explicit context, as defined in this dissertation, is an advancement of the state-of-the-art in LBD research.
Ph.D. Committee: Drs. Amit Sheth (Advisor), TK Prasad, Michael Raymer,
Ramakanth Kavuluru (UKY), Thomas C. Rindflesch (NLM) and Varun Bhagwan (Yahoo! Labs)
Relevant Publications (more at: http://knoesis.wright.edu/students/delroy/)
D. Cameron, R. Kavuluru, T. C. Rindflesch, O. Bodenreider, A. P. Sheth, K. Thirunarayan. Leveraging Distributional Semantics for Domain Agnostic Literature-Based Discovery (under preparation)
D. Cameron, O. Bodenreider, H. Yalamanchili, T. Danh, S. Vallabhaneni, K. Thirunarayan, A. P. Sheth, T. C. Rindflesch. A Graph-based Recovery and Decomposition of Swanson’s Hypothesis using Semantic Predications. Journal of Biomedical Informatics (JBI13), 46(2): 238–251, 2013
D. Cameron, R. Kavuluru, O. Bodenreider, P. N. Mendes, A. P. Sheth, K. Thirunarayan. Semantic Predications for Complex Information Needs in Biomedical Literature International Bioinformatics and Biomedical Conference (BIBM11), pp. 512–519, 2011 (acceptance rate=19.4%)
D. Cameron, P. N. Mendes, A. P. Sheth, V. Chan. Semantics-empowered Text Exploration for Knowledge Discovery. ACM Southeast Conference (ACMSE10), 14, 2010
There is a rapid intertwining of sensors and mobile devices into the fabric of our lives. This has resulted in unprecedented growth in the number of observations from the physical and social worlds reported in the cyber world. Sensing and computational components embedded in the physical world is termed as Cyber-Physical System (CPS). Current science of CPS is yet to effectively integrate citizen observations in CPS analysis. We demonstrate the role of citizen observations in CPS and propose a novel approach to perform a holistic analysis of machine and citizen sensor observations. Specifically, we demonstrate the complementary, corroborative, and timely aspects of citizen sensor observations compared to machine sensor observations in Physical-Cyber-Social (PCS) Systems.
Physical processes are inherently complex and embody uncertainties. They manifest as machine and citizen sensor observations in PCS Systems. We propose a generic framework to move from observations to decision-making and actions in PCS systems consisting of: (a) PCS event extraction, (b) PCS event understanding, and (c) PCS action recommendation. We demonstrate the role of Probabilistic Graphical Models (PGMs) as a unified framework to deal with uncertainty, complexity, and dynamism that help translate observations into actions. Data driven approaches alone are not guaranteed to be able to synthesize PGMs reflecting real-world dependencies accurately. To overcome this limitation, we propose to empower PGMs using the declarative domain knowledge. Specifically, we propose four techniques: (a) automatic creation of massive training data for Conditional Random Fields (CRFs) using domain knowledge of entities used in PCS event extraction, (b) Bayesian Network structure refinement using causal knowledge from Concept Net used in PCS event understanding, (c) knowledge-driven piecewise linear approximation of nonlinear time series dynamics using Linear Dynamical Systems (LDS) used in PCS event understanding, and the (d) transforming knowledge of goals and actions into a Markov Decision Process (MDP) model used in PCS action recommendation.
We evaluate the benefits of the proposed techniques on real-world applications involving traffic analytics and Internet of Things (IoT).
Krishnaprasad Thirunarayan, Trust Management: Multimodal Data Perspective,
Invited Tutorial, The 2015 International Conference on Collaboration
Technologies and Systems (CTS 2015), June 2015
Kno.e.sis Approach to Impactful Research & Training for Exceptional CareersAmit Sheth
Abstract
Kno.e.sis (http://knoesis.org) is a world-class research center that uses semantic, cognitive, and perceptual computing for gathering insights from physical/IoT, cyber/Web, and social and enterprise (e.g., clinical) big data. We innovate and employ semantic web, machine learning, NLP/IR, data mining, network science and highly scalable computing techniques. Our highly interdisciplinary research impacts health and clinical applications, biomedical and translational research, epidemiology, cognitive science, social good, policy, development, etc. A majority of our $12+ million in active funds come from the NSF and NIH. In this talk, I will provide an overview of some of our major research projects.
Kno.e.sis is highly successful in its primary mission of exceptional student outcomes: our students have exceptional publication and real-world impact and our PhDs compete with their counterparts from top 10 schools for initial jobs in research universities, top industry research labs, and highly competitive companies. A key reason for Kno.e.sis' success is its unique work culture involving teamwork to solve complex problems. Practically all our work involves real-world challenges, real-world data, interdisciplinary collaborators, path-breaking research to solve challenges, real-world deployments, real-world use, and measurable real-world impact.
In this talk, I will also seek to discuss our choice of research topics and our unique ecosystem that prepares our students for exceptional careers.
This tutorial presents tools and techniques for effectively utilizing the Internet of Things (IoT) for building advanced applications, including the Physical-Cyber-Social (PCS) systems. The issues and challenges related to IoT, semantic data modelling, annotation, knowledge representation (e.g. modelling for constrained environments, complexity issues and time/location dependency of data), integration, analy- sis, and reasoning will be discussed. The tutorial will de- scribe recent developments on creating annotation models and semantic description frameworks for IoT data (e.g. such as W3C Semantic Sensor Network ontology). A review of enabling technologies and common scenarios for IoT applications from the data and knowledge engineering point of view will be discussed. Information processing, reasoning, and knowledge extraction, along with existing solutions re- lated to these topics will be presented. The tutorial summarizes state-of-the-art research and developments on PCS systems, IoT related ontology development, linked data, do- main knowledge integration and management, querying large- scale IoT data, and AI applications for automated knowledge extraction from real world data.
Related: Semantic Sensor Web: http://knoesis.org/projects/ssw
Physical-Cyber-Social Computing: http://wiki.knoesis.org/index.php/PCS
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
Amit Sheth's keynote at IEEE BigData 2014, Oct 29, 2014.
Abstract from:
http://cci.drexel.edu/bigdata/bigdata2014/keynotespeech.htm
Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is personalized digital health that related to taking better decisions about our health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (e.g., information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!
In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, “How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP. I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models.
For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations.
Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city.
Student Achievement Review (initially presented during Inauguration Function of the Ohio Center of Excellence in Knowledge-Enabled Computing at Wright State (Kno.e.sis)) - updated since
Center overview: http://bit.ly/coe-k
Invitation: http://bit.ly/COE-invite
The current status of Linked Open Data (LOD) shows evidence of many datasets available on the Web in RDF. In the meantime, there are still many challenges to overcome by organizations in their journey of publishing five stars datasets on the Web. Those challenges are not only technical, but are also organizational. At this moment where connectionist AI is gaining a wave of popularity with many applications, LOD needs to go beyond the guarantee of FAIR principles. One direction is to build a sustainable LOD ecosystem with FAIR-S principles. In parallel, LOD should serve as a catalyzer for solving societal issues (LOD for Social Good) and personal empowerment through data (Social Linked Data).
Talk at 3th Keystone Training School - Keyword Search in Big Linked Data - Institute for Software Technology and Interactive Systems, TU Wien, Austria, 2017
DDI-RDF Discovery Vocabulary; A Metadata Vocabulary for Documenting Research and Survey Data:
Overview:
- What is DDI?
- Motivation
- Relationships to Vocabularies
- DDI-RDF Discovery Vocabulary
- Conceptual Model
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
20 Comprehensive Checklist of Designing and Developing a WebsitePixlogix Infotech
Dive into the world of Website Designing and Developing with Pixlogix! Looking to create a stunning online presence? Look no further! Our comprehensive checklist covers everything you need to know to craft a website that stands out. From user-friendly design to seamless functionality, we've got you covered. Don't miss out on this invaluable resource! Check out our checklist now at Pixlogix and start your journey towards a captivating online presence today.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
6. Tim Berners-Lee 2006
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information,
using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more
things.
6
8. Linked Open Data
• Massive collection of instance data
• Primarily connected via owl:sameAs relationship
• Excellent source of information for background
knowledge
• Labeled as mainstream Semantic Web
7/30/2012 8
8
9. Is it really mainstream Semantic Web?
• What is the relationship between the models
whose instances are being linked?
• How to do querying on LOD without knowing
individual datasets?
• How to perform schema level reasoning over
LOD cloud?
9
10. What can be done?
• Relationships are at the heart of Semantics
• LOD primarily consists of owl:sameAs links
• LOD captures instance level relationships, but lacks
class level relationships.
o Superclass
o Subclass
o Equivalence
• How to find these relationships?
o Perform a matching of the LOD Ontology’s using state of the art ontology matching tools.
10
11. Linked Open Data
Alignment and Querying
Dissertation Defense July 27th, 2012
Prateek Jain
Kno.e.sis Center
Wright State University, Dayton, OH
12. Agenda
• Motivation and Significance of this research
• Research questions and proposed solutions
• State of the current research and planned
work
• Questions and comments
14th February 2012 12
13. Linked Open Data
• A set of best practices for publishing and
connecting structured data on the Web
• Practices have been adopted by an increasing
number of data providers in the past 5 years
• Latest count is at 295 datasets with over 50
Billion triples (approx)
14th February 2012 13
14. Linked Open Data 2007 (May)
Linking Open Data cloud diagram, this and subsequent pages, by Richard Cyganiak and AnjaJentzsch. http://lod-cloud.net/
14
18. Linked Open Data
Number of Datasets Number of triples (Sept 2011)
31,634,213,770
2011-09-19 295
with 503,998,829 out-links
2010-09-22 203
2009-07-14 95
2008-09-18 45
2007-10-08 25
2007-05-01 12
From http://www4.wiwiss.fu-berlin.de/lodcloud/state/
18
19. 6 years of existence how
many applications come to
your mind?
7/30/2012 19
23. Reality…
• “We DID NOT use the entire Dbpedia or LOD.
The only component of LOD which helped us
with Watson was YAGO class hierarchy present
in DBpedia. We had strict information gain
requirements and other components honestly
did not help much“
– Researcher with the Watson Team
7/30/2012 23
23
25. A simple query..
“Identify congress members, who have voted “No”
on pro environmental legislation in the past four
years, with
high-pollution industry in their congressional
districts.”
But even with LOD we cannot answer this query.
25
26. Example: GovTrack
Vote: 2009- vote:hasOption
vote:vote 887 Votes:2009-887/+
vote:votedBy
Aye rdfs:label
vote:hasAction
people/P000197
H.R. 3962: Affordable
Health Care for America
dc:title
Act name
On Passage: H R
dc:title 3962 Affordable Nancy Pelosi
Health Care for
Bills:h3962 America Act
26
28. Our Approach
Use knowledge contributed by users
To enhance existing approaches
to solve these issues:
• Ontology integration
• Detection relationships within
LOD
and across datasets
Cloud
• Querying multiple datasets
28
29. Circling Back
• LOD captures instance level relationships, but
lacks class level relationships.
o Superclass
o Subclass
o Equivalence
7/30/2012 30
30
31. • BLOOMS - Bootstrapping-based Linked Open
Data Ontology Matching System
• Developed specifically for LOD Ontologies
• Identifies schema level links between different
LOD datasets
• Aligns ontologies belonging to diverse domains
using diverse data sources
32
32. Existing Approaches
A survey of approaches to automatic Ontology matching by Erhard Rahm, Philip A. Bernstein in the VLDB Journal 10:
334–350 (2001)
33
33. LOD Ontology Alignment
• Actual Results from these techniques
Nation = Menstruation, Confidence=0.9
• They perform extremely well on established benchmarks, but
typically not in the wilds.
• LOD Ontology’s are of very different nature
• Created by community for community.
• Emphasis on number of instances, not number of meaningful relationships.
• Require solutions beyond syntactic and structural matching.
34
34. Rabbit out of a hat?
• Traditional auxiliary data sources (WordNet,
Upper Level Ontologies) have limited coverage.
• Community generated is noisy, but is rich in
• Content
• Structure
• Has a “self healing property”
• Problems like Ontology Matching have a
dimension of context associated with them.
35
35. Wikipedia
• The English version alone has more than 2.9
million articles
• Continually expanded by approx. 100,000 active
volunteer editors
• Multiple points of view are mentioned with
proper contexts
• Article creation/correction is an ongoing activity 36
36. Ontology Matching using Wikipedia
• On Wikipedia, categories are used to organize
the entire project.
• Wikipedia's category system consists of
overlapping trees.
• Simple rules for categorization
37
37. BLOOMS Approach – Step 1
• Pre-process the input ontology
Remove property restrictions
Remove individuals, properties
• Tokenize the class names
Remove underscores, hyphens and other delimiters
Breakdown complex class names
• example: SemanticWeb => Semantic Web
38
38. BLOOMS Approach – Step 2
• Identify article in Wikipedia corresponding to the concept.
o Each article related to the concept indicates a sense of the usage of the
word.
• For each article found in the previous step
o Identify the Wikipedia category to which it belongs.
o For each category found, find its parent categories till level 4.
• Once the “BLOOMS tree” for each of the sense of the source
concept is created (Ts), utilize it for comparison with the
“BLOOMS tree” of the target concepts (Tt).
39
39. BLOOMS Approach – Step 3
• In the tree Ts, remove all nodes for which the parent node
which occurs in Tt to create Ts’.
o All leaves of Ts are of level 4 or occur in Tt.
o The pruned nodes do not contribute any additional new knowledge.
• Compute overlap Os between the source and target tree.
o Os= n/(k-1), n = |z|, zε Ts’ ΠTt, k= |s|, sε Ts’
• The decision of alignment is made as follows.
o For Ts εTc and Ttε Td, we have Ts=Tt, then C=D.
o If min{o(Ts,Tt),o(Tt,Ts)} ≥ x, then set C rdfs:subClassOf D if o(Ts,Tt) ≤ o(Tt,
Ts), and set D rdfs:subClassOf C if o(Ts, Tt) ≥ o(Tt, Ts).
40
41. Evaluation Objectives
• To examine BLOOMS as a tool for the purpose of LOD
ontology matching.
• To examine the ability of BLOOMS to serve as a general
purpose ontology matching system.
42
46. Partonomy Identification
• Currently entities across datasets are linked using primarily the
owl:sameAs relationship
• Relationships such as partonomy (part-of), and causality can
allow creating even more intelligent applications such as Watson
• Approach PLATO (Part-Of relation finder on Linked Open DAta
Tool)
47
47. PLATO Approach
• PLATO generates all possible partonomically
linked pairs between the entities in the dataset.
o Utilize “strongly” associated entities
• Identify the type of each entity in the pair using
WordNet.
o Use Class Names
o Gives the lexicographer files for the synsets
corresponding to these entities
48
49. PLATO Approach – Step 2
• PLATO generates linguistic patterns for each applicable
property based on linguistic cues suggested by Winston.
o Cell Wall is made of Cellulose
• Tests the lexical patterns for each entity pair in a corpus-
driven manner.
o Using Web as a corpus
• PLATO counts the total number of web pages that contain
the pattern
o Parse the page and identify the occurance of pattern.
50
50. PLATO Approach – Step 3
• Asserts the partonomy property with strongest supporting
evidence
o Cell Wall is made of Cellulose, 48
o Cellulose is made of Cell Wall, 10
• PLATO also enriches the schema by generalizing from the
instance level assertions.
51
51. Evaluation Objectives
• To examine PLATO as a tool for finding different kinds of
part-of relation.
• To examine PLATO as a tool for finding part-of relation
within a dataset
• To examine PLATO as a tool for finding part-of relation
across dataset
52
54. Some other work
• Requirement document analysis
o Internship at Accenture
• Querying of partonomical relationship
• Operators for querying spatio-temporal-thematic
data
• Plug-n-Play system for BLOOMS
7/30/2012 55
55
55. BLOOMS BLOOMS+ PLATO Others
2010 1. 1 paper at ISWC 1. Paper at AAAI SS
2. 1 paper at OM 2. Paper at GEOS
workshop
2011 1. 1 paper at ESWC
2. Workshop at ICBO
3. 1 patent
2012 1. 1 paper at ACM
Hypertext
Total of 7 publications covering this research
14th February 2012 56
56. Potential Applications
• Automatic domain identification of datasets
o Work currently being pursued by Sarasi
• Property alignment on LOD cloud
o Work currently being pursued by Kalpa and Sanjaya
• Personalization of property and concepts match.
o Machine learning and data mining based techniques
14th February 2012 57
57. Publications
• Prateek Jain, Pascal Hitzler, KunalVerma, Peter Z. Yeh and Amit P. Sheth, “Moving beyond sameAs with PLATO:
Partonomy detection for Linked Data”. In Proceedings of the 23rd ACM Hypertext and Social Media conference (HT 2012),
Milwaukee, WI, USA, June 25th-28th, 2012 (Acceptance Rate 27.5%)
• Amit Krishna Joshi, Prateek Jain, Pascal Hitzler, Peter Yeh, KunalVerma, AmitSheth, Mariana Damova, "Alignment-based
Querying of Linked Open Data", In Proceedings of the 11th International Conference on Ontologies, DataBases, and
Applications of Semantics (ODBASE 2012) (To Appear)
• Prateek Jain,Peter Z. Yeh, KunalVerma, Reymonrod Vasquez, Mariana Damova, Pascal Hitzler and Amit P. Sheth,
“Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study with Proton”.InGrigoris Antoniou, Marko
Grobelnik, Elena Simperl, BijanParsia, DimitrisPlexousakis, Jeff Pan and Pieter De Leenheer, editors, Proceedings of the
8th Extended Semantic Web Conference 2011, volume 6643 of Lecture Notes in Computer Science, Heidelberg, 2011.
Springer Berlin. (Acceptance Rate 23.5%)
• Prateek Jain, Pascal Hitzler, Amit P. Sheth, KunalVerma and Peter Z. Yeh, “Ontology Alignment for Linked Open Data”. In
P. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika, L. Zhang, J. Pan, I. Horrocks, And B. Glimm, editors, Proceedings of the
9th International Semantic Web Conference 2010, Shanghai, China, November 7th-11th, 2010,volume 6496 of Lecture
Notes in Computer Science, pages 402-417, Heidelberg, 2010. Springer Berlin. (Acceptance Rate 20%)
14th February 2012 58
58. Publications
• Prateek Jain, Pascal Hitzler and Amit P. Sheth. "Flexible Bootstrapping-Based Ontology Alignment". In Proceedings of the
Fifth international Workshop on Ontology Matching (Shanghai, China, November 7th - 11th, 2010).
• Prateek Jain, Pascal Hitzler, Peter Z. Yeh, KunalVerma, and Amit P. Sheth, “Linked Data Is Merely More Data”. In: Dan
Brickley, Vinay K. Chaudhri, Harry Halpin, and Deborah McGuinness: Linked Data Meets Artificial Intelligence. Technical
Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp. 82-86. ISBN 978-1-57735-461-1
• Prateek Jain, Peter Yeh, KunalVerma, Cory Henson, and AmitSheth. “SPARQL Query Re-writing Using Partonomy Based
Transformation Rules”. In K. Janowicz, M. Raubal, and S. Levashkin, editors, Proceedings of the Third International
Conference on GeoSpatial Semantics, December 3-4, 2009, Mexico City, Mexico, volume 5892/2009 of Lecture Notes in
Computer Science, pages 140–158, Heidelberg, 2009. Springer Berlin.
• Prateek Jain, KunalVerma, Alex Kass, Reymonrod G. Vasquez, “Automated Review of Natural Language Requirements
Documents: Generating Useful Warnings with User-extensible Glossaries Driving a Simple State Machine”, In
KiranDeshpande, PankajJalote and Sriram K. Rajamani editors, Proceedings of the Second India Software Engineering
Conference, February 23-26, 2009, Pune, India, ACM, New York, NY, 37-46. DOI=
http://doi.acm.org/10.1145/1506216.1506224 (Acceptance Rate 10%).
14th February 2012 59
59. Publications
• Prateek Jain, Peter Z. Yeh, KunalVerma, Alex Kass, and Amit P. Sheth, 2008. "Enhancing process-adaptation capabilities
with web-based corporate radar technologies". In Proceedings of the First international Workshop on ontology-Supported
Business intelligence (Karlsruhe, Germany, October 27 - 27, 2008). OBI '08, vol. 308. ACM, New York, NY, 1-6. DOI=
http://doi.acm.org/10.1145/1452567.1452569
• Matthew Perry, Amit P. Sheth, FarshadHakimpour, Prateek Jain. "Supporting Complex Thematic, Spatial and Temporal
Queries over Semantic Web Data", In F. T. Fonseca, M. Andrea Rodriguez and S. Levashkin editors, Proceedings of the
Second International Conference on GeoSpatial Semantics, December 3-4, 2009, Mexico City, Mexico, volume 4853/2007
of Lecture Notes in Computer Science, pages 228–246, Heidelberg, 2007. Springer Berlin.
• Colin Puri, KarthikGomadam, Prateek Jain, Peter Z. Yeh, KunalVerma, “Multiple Ontologies in Healthcare Information
Technology: Motivations and Recommendation for Ontology Mapping and Alignment”.In Proceedings of the Workshop on
Working with Multiple Biomedical Ontologies (at ICBO), 26 July 2011, Buffalo, NY, USA.
• Cory Henson, Amit P. Sheth, Prateek Jain, Josh Pschorr and Terry Rapoch. "Video on the Semantic Sensor Web", W3C
Video on the Web Workshop 12-13 December 2007, San Jose, California and Brussels, Belgium
14th February 2012 60
60. Patent
• Peter Z. Yeh, Prateek Jain, KunalVerma,
Reymonrod G. Vasquez, Titled: Information
Source Alignment, Filed 4th March 2011, Status:
Pending.
14th February 2012 61
62. Acknowledgement
• Cory Henson
o coffee breaks, research, football, baseball, politics, life..
o First person I met while finding my way to LSDIS lab
• Kno.e.sis Lab Members & support staff
• Folks at Accenture Technology Labs
o Amazing group of people to work with/for
14th February 2012 63
63. Acknowledgement
• NSF Award:IIS-0842129, titled III-SGER: Spatio-
Temporal-Thematic Queries of Semantic Web
Data: a Study of Expressivity and Efficiency
• NSF Award 1143717 III: EAGER -- Expressive
Scalable Querying over Linked Open Data.
14th February 2012 64
Thanks to members of LOD Mailing List especially Dr. Hugh Glaser
both as a knowledge source and test bed
“If logical membership of one category implies logical membership of a second, then the first category should be made a subcategory”“Pages are not placed directly into every possible category, only into the most specific one in any branch”“Every Wikipedia article should belong to at least one category.”