The talk covers my rPhD esearch work so far and has been given as an introductory presentation at the beginning of my visiting period at the Web&Media Group of the Vrije Universiteit, Amsterdam. At first, I introduce a system for automated knowledge management, I.M.P.A.K.T., which embeds a module for Core Competence extraction. The module is described as use case for the application of non-standard inference services based on Least Common Subsumer in Description Logics (DLs) to the problem of finding commonalities in knowledge bases modeled in DLs. Moreover, I present the Knowledge Compilation approach adopted for efficiently solving subsumption through only standard SQL queries.
Then, I focus on my current investigation related to the possibility of expand Common Subsumer (CS) reasoning service to RDF datasets. Here, the formal definition of CS in RDF is given, together with a sketch of possible applications (e.g. clustering of RDF resources).
This document proposes DearIdea.net, an open source collaboration platform that uses natural language processing and user tags to identify shared needs or resources between community projects. The platform would allow projects to be submitted with tags and analyzed to determine similarities through weighted scoring of user and machine-applied tags. Challenges include inexperience, determining feasibility, and maintaining interest, but presenting the idea at BarCampNYC could help address feasibility while using existing tools could reduce programming requirements. Currently, the domain exists without content and the author seeks advice and criticism to further develop the idea.
This document provides an overview of network refactoring and offloading trends, including fluid network planes. It discusses the evolution of SDN from 2009 to 2019 and concepts like network softwarization. Instances of fluid network planes are described, such as RouteFlow, NFV layers, and VNF offloading to hardware or multi-vendor P4 fabrics. The document also covers slicing for IoT analytics and references recent works on in-network computing, fast connectivity recovery, and scaling distributed machine learning with in-network aggregation.
This lecture was delivered at the Intelligent systems and data mining workshop held in Faculty of Computers and information, Kafer Elshikh University On Wednesday 6 December 2017
Mithileysh Sathiyanarayanan and Mohammad Alsaffar presented Euler-time Diagrams, a novel visual method and software tool to represent set relationships over time. They combined the concepts of Euler diagrams, which show set relationships, and time-series, which show the sequence of events over time. The tool was developed using D3 and Google's toolkit and allows users to visualize how diseases rates change over multiple years by interacting with the diagrams. The authors aim to further develop the tool by incorporating principles of visual perception and cognition.
In this talk, I summarize the research conducted during my visiting period at the Web&Media Group of the Vrije Universiteit, Amsterdam.
Extracting relevant entities in TV-programs descriptions is a challenging problem, due to the broad amount of topics they cover and the different formats they have. None of existing tools for automatic Named-Entity Recognition and Classification is trained with these data.
I illustrate the workflow established for extracting relevant entities from a text in the entertainment domain,
relying on the adoption of different annotators, as well as the issues arising in
the integration of their outputs. In order to increase the coverage of the annotation task,
metrics based on majority-vote are combined with metrics established for the crowd-truth evaluation for gold-standard creation. This approach should be able of capturing cases
typically cut off by majority-vote integration techniques (i.e., unique information and distributed agreement).
Several features are computed in order to capture as many characteristics as possible, useful for assessing the relevance of an entity. Human annotators results, gathered through a crowd-sourcing task, are used for collecting positive and negative examples of relevance and, as an ultimate goal, for evaluating precision and recall of the entire system.
This document proposes DearIdea.net, an open source collaboration platform that uses natural language processing and user tags to identify shared needs or resources between community projects. The platform would allow projects to be submitted with tags and analyzed to determine similarities through weighted scoring of user and machine-applied tags. Challenges include inexperience, determining feasibility, and maintaining interest, but presenting the idea at BarCampNYC could help address feasibility while using existing tools could reduce programming requirements. Currently, the domain exists without content and the author seeks advice and criticism to further develop the idea.
This document provides an overview of network refactoring and offloading trends, including fluid network planes. It discusses the evolution of SDN from 2009 to 2019 and concepts like network softwarization. Instances of fluid network planes are described, such as RouteFlow, NFV layers, and VNF offloading to hardware or multi-vendor P4 fabrics. The document also covers slicing for IoT analytics and references recent works on in-network computing, fast connectivity recovery, and scaling distributed machine learning with in-network aggregation.
This lecture was delivered at the Intelligent systems and data mining workshop held in Faculty of Computers and information, Kafer Elshikh University On Wednesday 6 December 2017
Mithileysh Sathiyanarayanan and Mohammad Alsaffar presented Euler-time Diagrams, a novel visual method and software tool to represent set relationships over time. They combined the concepts of Euler diagrams, which show set relationships, and time-series, which show the sequence of events over time. The tool was developed using D3 and Google's toolkit and allows users to visualize how diseases rates change over multiple years by interacting with the diagrams. The authors aim to further develop the tool by incorporating principles of visual perception and cognition.
In this talk, I summarize the research conducted during my visiting period at the Web&Media Group of the Vrije Universiteit, Amsterdam.
Extracting relevant entities in TV-programs descriptions is a challenging problem, due to the broad amount of topics they cover and the different formats they have. None of existing tools for automatic Named-Entity Recognition and Classification is trained with these data.
I illustrate the workflow established for extracting relevant entities from a text in the entertainment domain,
relying on the adoption of different annotators, as well as the issues arising in
the integration of their outputs. In order to increase the coverage of the annotation task,
metrics based on majority-vote are combined with metrics established for the crowd-truth evaluation for gold-standard creation. This approach should be able of capturing cases
typically cut off by majority-vote integration techniques (i.e., unique information and distributed agreement).
Several features are computed in order to capture as many characteristics as possible, useful for assessing the relevance of an entity. Human annotators results, gathered through a crowd-sourcing task, are used for collecting positive and negative examples of relevance and, as an ultimate goal, for evaluating precision and recall of the entire system.
This patent application describes methods for delivering digital content securely. It involves receiving documents from sources, tagging the documents with metadata about the type and intended recipient, encrypting the documents, publishing them so the intended recipients can access them by verifying their identity, and granting access to the documents based on the verified identity. The digital content delivery system uses storage systems, interfaces, content providers, and lockboxes to classify, distribute and control access to encrypted documents for authorized recipients.
The document provides an overview of Carter Brothers Security Services' (CBSS) proposed revised delivery model for their AT&T account. Some key points include:
- CBSS will implement a hybrid approach including cost containment, creating a KPI dashboard, hiring a QA manager, reducing headcount, adopting a regional management structure, and increasing technician utilization.
- The delivery model is intended to drive increased focus on service quality, align with AT&T's needs, and achieve operational excellence.
- CBSS will transition from dedicated trainers to flex trainers, and move to a regional operating manager model from a market manager model to address headcount reductions and decreased revenue projections.
- Converting technicians
This document discusses clustering of RDF data across the Semantic Web. It begins by describing the Linking Open Data project and the growing amount of RDF data available. It then discusses the motivations for clustering RDF data, such as improving data access and query response times over distributed machines. Current approaches to RDF clustering are also summarized, including extracting instance subgraphs and computing distances between instances. The document outlines different techniques for instance extraction and distance computation in RDF clustering.
Nasim,trading & investment in indian stock marketnasimtom
This document provides an overview of the Indian securities market, including the primary and secondary markets. It discusses the leading stock exchanges in India - NSE and BSE - and lists some of the regional stock exchanges. It describes the trading systems used, including NEAT, and explains concepts like circuits and the transaction cycle. It also summarizes trading in derivatives and futures markets, clearing and settlement processes, and references some important websites for market data and analysis. Towards the end, it presents the author's invented formula for intraday trading based on high, low and average prices and volume.
A Discrete Krill Herd Optimization Algorithm for Community DetectionAboul Ella Hassanien
The document proposes a discrete krill herd optimization algorithm for community detection in complex social networks. It introduces the motivation and challenges of community detection. The proposed approach adapts the krill herd algorithm domain to represent community structures using modularity as the objective function. Experimental results on four benchmark networks show the algorithm achieves good accuracy and high modularity, particularly for small to medium networks. Future work aims to improve performance on large networks through hybrid approaches.
Data-centric AI and the convergence of data and model engineering:opportunit...Paolo Missier
A keynote talk given to the IDEAL 2023 conference (Evora, Portugal Nov 23, 2023).
Abstract.
The past few years have seen the emergence of what the AI community calls "Data-centric AI", namely the recognition that some of the limiting factors in AI performance are in fact in the data used for training the models, as much as in the expressiveness and complexity of the models themselves. One analogy is that of a powerful engine that will only run as fast as the quality of the fuel allows. A plethora of recent literature has started the connection between data and models in depth, along with startups that offer "data engineering for AI" services. Some concepts are well-known to the data engineering community, including incremental data cleaning, multi-source integration, or data bias control; others are more specific to AI applications, for instance the realisation that some samples in the training space are "easier to learn from" than others. In this "position talk" I will suggest that, from an infrastructure perspective, there is an opportunity to efficiently support patterns of complex pipelines where data and model improvements are entangled in a series of iterations. I will focus in particular on end-to-end tracking of data and model versions, as a way to support MLDev and MLOps engineers as they navigate through a complex decision space.
This document provides an overview of using deep learning techniques for recommender systems. It begins with establishing the need for recommender systems due to increasing information overload. It then gives a basic introduction and agenda for the talk, covering motivation, basics, deep learning for vehicle recommendations, and scalability/production. The talk discusses using deep learning approaches like wide and deep learning as well as sequential models to improve recommendation relevance for applications like vehicle recommendations. It provides details on preprocessing, training a classifier, candidate generation and ranking for recommendations. The document concludes with discussing deploying such a system at scale and current trends in recommender system research.
Recommender systems support the decision making processes of customers with personalized suggestions. These widely used systems influence the daily life of almost everyone across domains like ecommerce, social media, and entertainment. However, the efficient generation of relevant recommendations in large-scale systems is a very complex task. In order to provide personalization, engines and algorithms need to capture users’ varying tastes and find mostly nonlinear dependencies between them and a multitude of items. Enormous data sparsity and ambitious real-time requirements further complicate this challenge. At the same time, deep learning has been proven to solve complex tasks like object or speech recognition where traditional machine learning failed or showed mediocre performance.
Join Marcel Kurovski to explore a use case for vehicle recommendations at mobile.de, Germany’s biggest online vehicle market. Marcel shares a novel regularization technique for the optimization criterion and evaluates it against various baselines. To achieve high scalability, he combines this method with strategies for efficient candidate generation based on user and item embeddings—providing a holistic solution for candidate generation and ranking.
The proposed approach outperforms collaborative filtering and hybrid collaborative-content-based filtering by 73% and 143% for MAP@5. It also scales well for millions of items and users returning recommendations in tens of milliseconds.
Event: O'Reilly Artificial Intelligence Conference, New York, 18.04.2019
Speaker: Marcel Kurovski, inovex GmbH
Mehr Tech-Vorträge: inovex.de/vortraege
Mehr Tech-Artikel: inovex.de/blog
This document provides an overview of an introduction to machine learning course, including:
- A description of the course content which covers Python programming, data visualization, supervised learning algorithms, regression, and unsupervised learning.
- An example of predicting bike share usage at different stations and the importance of understanding the problem and data.
- Guidance on exploring and visualizing data in Python to gain insights before applying machine learning algorithms.
This document discusses profiling linked open data. It outlines the research background, plan, and preliminary results of profiling linked open data. The research aims to automatically generate new statistics and knowledge patterns to provide dataset summaries and inspect data quality. Preliminary results include profiling Italian public administration websites for compliance with open data policies and automatically classifying over 1,000 linked data sets into 8 topics with over 80% accuracy. Future work involves enriching the framework with additional statistics and applying it to unstructured microdata.
Knowledge Discovery in Remote Access Databases Zakaria Zubi
This document provides an overview of the thesis which investigates knowledge discovery in remote access databases. The thesis contains three parts:
Part 1 introduces knowledge discovery in databases (KDD) and data mining (DM), and defines the goal of the thesis work.
Part 2 discusses remote access KDD models, the logical foundation of data mining, mining discovered association rules, and data mining query languages.
Part 3 proposes the Knowledge Discovery Query Language (KDQL) for mining association rules from databases and visualizing results. It also discusses I-extended databases and the implementation of KDQL.
The thesis aims to develop methods for remote knowledge discovery using query languages and by extending databases to include generalized patterns discovered through
This patent application describes methods for delivering digital content securely. It involves receiving documents from sources, tagging the documents with metadata about the type and intended recipient, encrypting the documents, publishing them so the intended recipients can access them by verifying their identity, and granting access to the documents based on the verified identity. The digital content delivery system uses storage systems, interfaces, content providers, and lockboxes to classify, distribute and control access to encrypted documents for authorized recipients.
The document provides an overview of Carter Brothers Security Services' (CBSS) proposed revised delivery model for their AT&T account. Some key points include:
- CBSS will implement a hybrid approach including cost containment, creating a KPI dashboard, hiring a QA manager, reducing headcount, adopting a regional management structure, and increasing technician utilization.
- The delivery model is intended to drive increased focus on service quality, align with AT&T's needs, and achieve operational excellence.
- CBSS will transition from dedicated trainers to flex trainers, and move to a regional operating manager model from a market manager model to address headcount reductions and decreased revenue projections.
- Converting technicians
This document discusses clustering of RDF data across the Semantic Web. It begins by describing the Linking Open Data project and the growing amount of RDF data available. It then discusses the motivations for clustering RDF data, such as improving data access and query response times over distributed machines. Current approaches to RDF clustering are also summarized, including extracting instance subgraphs and computing distances between instances. The document outlines different techniques for instance extraction and distance computation in RDF clustering.
Nasim,trading & investment in indian stock marketnasimtom
This document provides an overview of the Indian securities market, including the primary and secondary markets. It discusses the leading stock exchanges in India - NSE and BSE - and lists some of the regional stock exchanges. It describes the trading systems used, including NEAT, and explains concepts like circuits and the transaction cycle. It also summarizes trading in derivatives and futures markets, clearing and settlement processes, and references some important websites for market data and analysis. Towards the end, it presents the author's invented formula for intraday trading based on high, low and average prices and volume.
A Discrete Krill Herd Optimization Algorithm for Community DetectionAboul Ella Hassanien
The document proposes a discrete krill herd optimization algorithm for community detection in complex social networks. It introduces the motivation and challenges of community detection. The proposed approach adapts the krill herd algorithm domain to represent community structures using modularity as the objective function. Experimental results on four benchmark networks show the algorithm achieves good accuracy and high modularity, particularly for small to medium networks. Future work aims to improve performance on large networks through hybrid approaches.
Data-centric AI and the convergence of data and model engineering:opportunit...Paolo Missier
A keynote talk given to the IDEAL 2023 conference (Evora, Portugal Nov 23, 2023).
Abstract.
The past few years have seen the emergence of what the AI community calls "Data-centric AI", namely the recognition that some of the limiting factors in AI performance are in fact in the data used for training the models, as much as in the expressiveness and complexity of the models themselves. One analogy is that of a powerful engine that will only run as fast as the quality of the fuel allows. A plethora of recent literature has started the connection between data and models in depth, along with startups that offer "data engineering for AI" services. Some concepts are well-known to the data engineering community, including incremental data cleaning, multi-source integration, or data bias control; others are more specific to AI applications, for instance the realisation that some samples in the training space are "easier to learn from" than others. In this "position talk" I will suggest that, from an infrastructure perspective, there is an opportunity to efficiently support patterns of complex pipelines where data and model improvements are entangled in a series of iterations. I will focus in particular on end-to-end tracking of data and model versions, as a way to support MLDev and MLOps engineers as they navigate through a complex decision space.
This document provides an overview of using deep learning techniques for recommender systems. It begins with establishing the need for recommender systems due to increasing information overload. It then gives a basic introduction and agenda for the talk, covering motivation, basics, deep learning for vehicle recommendations, and scalability/production. The talk discusses using deep learning approaches like wide and deep learning as well as sequential models to improve recommendation relevance for applications like vehicle recommendations. It provides details on preprocessing, training a classifier, candidate generation and ranking for recommendations. The document concludes with discussing deploying such a system at scale and current trends in recommender system research.
Recommender systems support the decision making processes of customers with personalized suggestions. These widely used systems influence the daily life of almost everyone across domains like ecommerce, social media, and entertainment. However, the efficient generation of relevant recommendations in large-scale systems is a very complex task. In order to provide personalization, engines and algorithms need to capture users’ varying tastes and find mostly nonlinear dependencies between them and a multitude of items. Enormous data sparsity and ambitious real-time requirements further complicate this challenge. At the same time, deep learning has been proven to solve complex tasks like object or speech recognition where traditional machine learning failed or showed mediocre performance.
Join Marcel Kurovski to explore a use case for vehicle recommendations at mobile.de, Germany’s biggest online vehicle market. Marcel shares a novel regularization technique for the optimization criterion and evaluates it against various baselines. To achieve high scalability, he combines this method with strategies for efficient candidate generation based on user and item embeddings—providing a holistic solution for candidate generation and ranking.
The proposed approach outperforms collaborative filtering and hybrid collaborative-content-based filtering by 73% and 143% for MAP@5. It also scales well for millions of items and users returning recommendations in tens of milliseconds.
Event: O'Reilly Artificial Intelligence Conference, New York, 18.04.2019
Speaker: Marcel Kurovski, inovex GmbH
Mehr Tech-Vorträge: inovex.de/vortraege
Mehr Tech-Artikel: inovex.de/blog
This document provides an overview of an introduction to machine learning course, including:
- A description of the course content which covers Python programming, data visualization, supervised learning algorithms, regression, and unsupervised learning.
- An example of predicting bike share usage at different stations and the importance of understanding the problem and data.
- Guidance on exploring and visualizing data in Python to gain insights before applying machine learning algorithms.
This document discusses profiling linked open data. It outlines the research background, plan, and preliminary results of profiling linked open data. The research aims to automatically generate new statistics and knowledge patterns to provide dataset summaries and inspect data quality. Preliminary results include profiling Italian public administration websites for compliance with open data policies and automatically classifying over 1,000 linked data sets into 8 topics with over 80% accuracy. Future work involves enriching the framework with additional statistics and applying it to unstructured microdata.
Knowledge Discovery in Remote Access Databases Zakaria Zubi
This document provides an overview of the thesis which investigates knowledge discovery in remote access databases. The thesis contains three parts:
Part 1 introduces knowledge discovery in databases (KDD) and data mining (DM), and defines the goal of the thesis work.
Part 2 discusses remote access KDD models, the logical foundation of data mining, mining discovered association rules, and data mining query languages.
Part 3 proposes the Knowledge Discovery Query Language (KDQL) for mining association rules from databases and visualizing results. It also discusses I-extended databases and the implementation of KDQL.
The thesis aims to develop methods for remote knowledge discovery using query languages and by extending databases to include generalized patterns discovered through
SURFconext: a next generation collaboration infrastructure across institution...University of Amsterdam
This document discusses SURFconext, a next generation collaboration infrastructure across Dutch academic institutions developed by SURF foundation. It provides an overview of the Dutch academic landscape including 14 research universities and 39 higher professional education institutions. It then discusses SURFconext from the perspectives of libraries and virtual research environments. It provides examples of SURFconext implementation at the University of Amsterdam and scenarios for collaboration across institutions. It discusses lessons learned and positions SURFconext as a cloud service broker to enable access to commercial and research services.
PhD Defense - A Context Management Framework based on Wisdom of Crowds for So...Adrien Joly
This document summarizes a PhD thesis presentation on developing a context management framework to filter social streams and recommend the most relevant updates. It proposes using contextual tag clouds generated from virtual and social sensors to represent users' contexts. An implementation was developed to test the approach. Evaluation results found that recommended social updates were 72% accurate and about half were deemed relevant to the posting context, depending on the type of social update. Future work is proposed to improve the quality of contextual tags and leverage additional sensors.
Jisc is a UK organization that aims to advance digital technology in education and research. Their Learning Analytics project has three core parts: a learning analytics service, toolkit, and community. The service provides dashboards and tools to analyze student data from various sources to identify at-risk students and enable interventions. It follows an open architecture approach. The toolkit includes guidance on best practices like privacy and consent. The community aspect involves events, blogs, and mailing lists to bring people together around learning analytics.
This document discusses interlinking in linked data and the challenges of link discovery. It defines interlinking as the degree to which entities representing the same concept are linked to each other. It describes two categories of link discovery frameworks: ontology matching and instance matching. The key challenges of link discovery are computational complexity and selecting an appropriate link specification. Current approaches include domain-specific and universal frameworks, and active learning techniques can help guide selection of optimal link specifications.
Early Analysis and Debuggin of Linked Open Data CubesEnrico Daga
The release of the Data Cube Vocabulary specification introduces a standardised method for publishing statistics following the linked data principles. However, a statistical dataset can be very complex, and so understanding how to get value out of it may be hard. Analysts need the ability to quickly grasp the content of the data to be able to make use of it appropriately. In addition, while remodelling the data, data cube publishers need support to detect bugs and issues in the structure or content of the dataset. There are several aspects of RDF, the Data Cube vocabulary and linked data that can help with these issues however, including that they make the data "self-descriptive". Here, we attempt to answer the question "How feasible is it to use this feature to give an overview of the data in a way that would facilitate debugging and exploration of statistical linked open data?" We present a tool that automatically builds interactive facets as diagrams out of a Data Cube representation without prior knowledge of the data content to be used for debugging and early analysis. We show how this tool can be used on a large, complex dataset and we discuss the potential of this approach.
Knowledge Sharing over social networking systemstanguy
1. The document discusses knowledge sharing over social networking systems and analyzes data from the social networking site Ecademy.
2. The Ecademy data showed a power law distribution structure typical of social networks and small world properties with short paths between users.
3. A survey of Ecademy users found that face-to-face relationships positively influenced relationship strength and knowledge sharing, though the site mainly facilitated weak relationships.
Adolfo Ruiz Calleja "Using social and semantic tech"ifi8106tlu
This document outlines a social-semantic infrastructure called SEEK-AT-WD to help educators discover and select ICT tools. It proposes using semantic technologies to publish descriptions of educational ICT tools as linked open data. This would allow tool descriptions to be obtained from multiple sources on the web of data and enriched by both educators and the web community. An ontology called SEEK Ontology is developed to provide semantic structure to the tool descriptions. The SEEK-AT-WD infrastructure implements approaches like crawling other datasets, automatically mapping data to the ontology, and building a semantic knowledge base to store and publish the linked data.
Afternoon session data dictionary (april 2013)OpenOrganize
The document discusses the goal of creating a data dictionary to improve the comparability and automation of comparing greenhouse gas, black carbon, and co-emitted air pollutant inventories. The final products will include a paper and supplementary website containing diagrams and schemas relating core objects in inventories. These include inventory, pollutant, sector, source category, estimation methodology, and emission metric classes. The framework aims to standardize key elements while allowing inventories to maintain individual formats and estimation approaches. It could enable centralized data hosting and decentralized markup to make inventories machine-readable.
Tangible Contextual Tag Clouds towards Controlled and Relevant Social Inter...Adrien Joly
Presented by Adrien Joly at Bell Labs France during a "SKP" session, this slideshow includes a motivated introduction to his phd thesis subject about contextual filtering of social interactions, its technical approach relying on "contextual tag clouds", and its current state of research.
Collaborative Knowledge Management in Organization from SECI model FrameworkNatapone Charsombut
A presentation file for TIIM conference 2010 Pattaya Thailand,
ABSTRACT
In the age of social collaboration and sharing that enables by Web 2.0 and Linked Data, many organizations adapt themselves into advantages of interactive, sharing, reusing, interoperability and collaboration on World Wide Web. Organizational learning which is sub of knowledge management also greatly gains benefit from this emerging collaboration culture too. It provides abilities to share valuable insights, to reduce redundant work, to avoid reinventing the wheel, to reduce training time for new employees, to retain intellectual capital as employee turnover in an organization, and to adapt to changing environments and markets.
However, user created content from Web 2.0 multiplying with published structure of data according to Linked Data concept will be a massive amount of data. It is inevitable facing the overwhelming of data. Traditional knowledge management is not designed to extract knowledge from social collaboration. We need a framework that fit for knowledge transfer in highly interaction environment.
SECI model which is a knowledge management based on collaborative knowledge transfer in organization seem to be the best candidate for navigating knowledge creation in this case. This study attempts to address how to apply SECI model to knowledge management system in collaborative organization.
[ADBIS 2021] - Optimizing Execution Plans in a MultistoreChiara Forresi
Multistores are data management systems that enable query processing across different database management systems (DBMSs); besides the distribution of data, complexity factors like schema heterogeneity and data replication must be resolved through integration and data fusion activities. In a recent work [2], we have proposed a multistore solution that relies on a dataspace to provide the user with an integrated view of the available data and enables the formulation and execution of GPSJ (generalized projection, selection and join) queries. In this paper, we propose a technique to optimize the execution of GPSJ queries by finding the most efficient execution plan on the multistore. In particular, we devise three different strategies to carry out joins and data fusion, and we build a cost model to enable the evaluation of different execution plans. Through the experimental evaluation, we are able to profile the suitability of each strategy to different multistore configurations, thus validating our multi-strategy approach and motivating further research on this topic.
A modified k means algorithm for big data clusteringSK Ahammad Fahad
Amount of data is getting bigger in every moment and this data comes from everywhere; social media, sensors, search engines, GPS signals, transaction records, satellites, financial markets, ecommerce sites etc. This large volume of data may be semi-structured, unstructured or even structured. So it is important to derive meaningful information from this huge data set. Clustering is the process to categorize data such that data are grouped in the same cluster when they are similar according to specific metrics. In this paper, we are working on k-mean clustering technique to cluster big data. Several methods have been proposed for improving the performance of the k-means clustering algorithm. We propose a method for making the algorithm less time consuming, more effective and efficient for better clustering with reduced complexity. According to our observation, quality of the resulting clusters heavily depends on the selection of initial centroid and changes in data clusters in the subsequence iterations. As we know, after a certain number of iterations, a small part of the data points change their clusters. Therefore, our proposed method first finds the initial centroid and puts an interval between those data elements which will not change their cluster and those which may change their cluster in the subsequence iterations. So that it will reduce the workload significantly in case of very large data sets. We evaluate our method with different sets of data and compare with others methods as well.
Similar to Finding Commonalities: from Description Logics to the Web of Data (20)
Applications of artificial Intelligence in Mechanical Engineering.pdfAtif Razi
Historically, mechanical engineering has relied heavily on human expertise and empirical methods to solve complex problems. With the introduction of computer-aided design (CAD) and finite element analysis (FEA), the field took its first steps towards digitization. These tools allowed engineers to simulate and analyze mechanical systems with greater accuracy and efficiency. However, the sheer volume of data generated by modern engineering systems and the increasing complexity of these systems have necessitated more advanced analytical tools, paving the way for AI.
AI offers the capability to process vast amounts of data, identify patterns, and make predictions with a level of speed and accuracy unattainable by traditional methods. This has profound implications for mechanical engineering, enabling more efficient design processes, predictive maintenance strategies, and optimized manufacturing operations. AI-driven tools can learn from historical data, adapt to new information, and continuously improve their performance, making them invaluable in tackling the multifaceted challenges of modern mechanical engineering.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Design and optimization of ion propulsion dronebjmsejournal
Electric propulsion technology is widely used in many kinds of vehicles in recent years, and aircrafts are no exception. Technically, UAVs are electrically propelled but tend to produce a significant amount of noise and vibrations. Ion propulsion technology for drones is a potential solution to this problem. Ion propulsion technology is proven to be feasible in the earth’s atmosphere. The study presented in this article shows the design of EHD thrusters and power supply for ion propulsion drones along with performance optimization of high-voltage power supply for endurance in earth’s atmosphere.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Comparative analysis between traditional aquaponics and reconstructed aquapon...bijceesjournal
The aquaponic system of planting is a method that does not require soil usage. It is a method that only needs water, fish, lava rocks (a substitute for soil), and plants. Aquaponic systems are sustainable and environmentally friendly. Its use not only helps to plant in small spaces but also helps reduce artificial chemical use and minimizes excess water use, as aquaponics consumes 90% less water than soil-based gardening. The study applied a descriptive and experimental design to assess and compare conventional and reconstructed aquaponic methods for reproducing tomatoes. The researchers created an observation checklist to determine the significant factors of the study. The study aims to determine the significant difference between traditional aquaponics and reconstructed aquaponics systems propagating tomatoes in terms of height, weight, girth, and number of fruits. The reconstructed aquaponics system’s higher growth yield results in a much more nourished crop than the traditional aquaponics system. It is superior in its number of fruits, height, weight, and girth measurement. Moreover, the reconstructed aquaponics system is proven to eliminate all the hindrances present in the traditional aquaponics system, which are overcrowding of fish, algae growth, pest problems, contaminated water, and dead fish.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
Data Control Language.pptx Data Control Language.pptx
Finding Commonalities: from Description Logics to the Web of Data
1. Finding Commonalities in Linked Open Data
Silvia Giannini
PhD Student
(Supervisor: Prof. Eugenio Di Sciascio)
Dipartimento di Ingegneria Elettrica e dell'Informazione (DEI),
Politecnico di Bari, Bari, Italy
in collaboration with
Prof. Francesco M. Donini, Ph.D. Simona Colucci
Web&Media Group Meeting | 31 March, 2014
2. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Outline
1 Finding Commonalities: A DLs use case
The I.M.P.A.K.T. system
The Core Competence module
2 Finding Commonalities: the Web of Data
3 Conclusion
Silvia Giannini Finding commonalities in Linked Open Data
3. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The I.M.P.A.K.T. system
What is I.M.P.A.K.T.
Information Management and Processing with the Aid of
Knowledge-based Technologies
An integrated system managing three enterprise business services based on
knowledge management:
1 Skill Matching 1
2 Team Composition 2
3 Core Competence Extraction 3
1
E. Tinelli, S. Colucci, S. Giannini, E. Di Sciascio, and F.M. Donini, Large scale skill matching
through knowledge compilation In: Proc. of ISMIS 2012, Springer-Verlag (2012) 192201.
2
E. Tinelli, S. Colucci, E. Di Sciascio, and F.M. Donini, Knowledge compilation for automated team
composition exploiting standard SQL In: Proc. of SAC 2012, ACM (2012) 16801685.
3
S. Colucci, E. Tinelli, S. Giannini, E. Di Sciascio, and F.M. Donini, Knowledge Compilation for Core
Competence Extraction in Organizations In: Proc. of Business Information Systems 2013, Springer
(2013) 163174.
Silvia Giannini Finding commonalities in Linked Open Data
4. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The I.M.P.A.K.T. system
What is I.M.P.A.K.T.
Information Management and Processing with the Aid of
Knowledge-based Technologies
An integrated system managing three enterprise business services based on
knowledge management:
1 Skill Matching 1
2 Team Composition 2
3 Core Competence Extraction 3
1
E. Tinelli, S. Colucci, S. Giannini, E. Di Sciascio, and F.M. Donini, Large scale skill matching
through knowledge compilation In: Proc. of ISMIS 2012, Springer-Verlag (2012) 192201.
2
E. Tinelli, S. Colucci, E. Di Sciascio, and F.M. Donini, Knowledge compilation for automated team
composition exploiting standard SQL In: Proc. of SAC 2012, ACM (2012) 16801685.
3
S. Colucci, E. Tinelli, S. Giannini, E. Di Sciascio, and F.M. Donini, Knowledge Compilation for Core
Competence Extraction in Organizations In: Proc. of Business Information Systems 2013, Springer
(2013) 163174.
Silvia Giannini Finding commonalities in Linked Open Data
5. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The I.M.P.A.K.T. system
What is I.M.P.A.K.T.
Skill Matching GUI
Silvia Giannini Finding commonalities in Linked Open Data
6. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The I.M.P.A.K.T. system
Behind I.M.P.A.K.T.
An ontology for the HR domain (nearly 5000 concepts)
T -Box
Employee Profile
(M0
)
Industry
(M1
)
Complementary
Skill
(M2
)
Level
(M3
)
Language
(M5
)
Job
Title
(M6
)
Knowledge
(M4
)
Main module M0: it models the properties (entry points) needed to
imports all the sections describing an employee CV.
Silvia Giannini Finding commonalities in Linked Open Data
7. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The I.M.P.A.K.T. system
Behind I.M.P.A.K.T.
An ontology for the HR domain (nearly 5000 concepts)
T -Box
Employee Profile
(M0
)
Industry
(M1
)
Complementary
Skill
(M2
)
Level
(M3
)
Language
(M5
)
Job
Title
(M6
)
Knowledge
(M4
)
Possible employee skills and technical tools usage ability.
Specied through:
type - experience role (e.g., developer, administrator)
year - experience level
lastdate - last temporal update of work experience
Silvia Giannini Finding commonalities in Linked Open Data
8. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The I.M.P.A.K.T. system
Behind I.M.P.A.K.T.
A Curriculum Vitae representation
A-Box
A prole P = (∃R0
j .C) is a concept in ALE(D), where R0
j , 1 ≤ j ≤ 6, is
an entry point, and C is a concept in FL0(D) modeled in Mj.
Silvia Giannini Finding commonalities in Linked Open Data
9. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
What is a Core Competence
Core Competence: a Knowledge Management process
Core competencies are a company collective knowledge about
how to coordinate diverse production skills and integrate multiple
streams of technologies. Identifying core comptencies helps in support
competitive advantage, articulate a strategic intent, and allocate
resources to build cross-unit technological and production links.
(G. Hamel, and C.K.A. Prahalad, The core competence of the corporation. Harvard Business, in Harvard
Business Review May-June (1990) 7990)
Examples:
Apple - design
Netix - content delivery
Google - expertise in algorithms
...
Silvia Giannini Finding commonalities in Linked Open Data
10. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
The reasoning service
Objective: Automatically extract Core Competence, by identifying a common
know-how in a signicant portion of personnel (k employees, with k set as a
threshold value by the people in charge for the strategic analysis).
Tool:
Logic-based approach
Non-standard inference services (LCS, k-CS, BICS)
Method:
Knowledge-compilation process
It solves subsumption only via SQL queries against a proper R-DB schema,
without any exponential-time inference engine
Silvia Giannini Finding commonalities in Linked Open Data
11. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
A logic-based approach
Least Common Subsumer (LCS)
Let C1, . . . , Cn be a collection of n
concepts in a DL L. The Least
Common Subsumer (LCS) of
C1, . . . , Cn is a concept D in L such
that D is the most specic concept
subsuming all the elements of the
collection.
k-Common Subsumer (k-CS)
Let C1, . . . , Cn be a collection of n
concepts in a DL L and let k n. A
k-Common Subsumer (k-CS) of
C1, . . . , Cn is a concept D in L such
that D is an LCS of k concepts among
C1, . . . , Cn.
Informative k-Common Subsumer
(IkCS)
Given k n, an Informative
k-Common Subsumer (IkCS) of the
concepts C1, . . . , Cn in a DL L is a
concept D such that D is a k-CS
stricltly subsumed by the
LCS(C1, . . . , Cn) and adding
informative content to it.
Best Informative Common Subsumer
(BICS)
Given k n, a Best Informative
Common Subsumer (BICS) of the
concepts C1, . . . , Cn in a DL L is a
concept B such that B is an IkCS for
C1, . . . , Cn, and for every k j ≤ n
every j-CS is not informative.
Silvia Giannini Finding commonalities in Linked Open Data
12. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
The Knowledge Compilation process
Issues:
Computational diculties of deduction in knowledge bases expressed
through a logical formalism;
Combining the representation power of a logical language, with the
scalability and eciency of information processing in a DBMS.
Knowledge Compilation:
1 OFF-LINE REASONING
pre-processing of a company intellectual capital, described in a Description
Logics (DLs) Knowledge Base (KB), in an appropriate relational database
schema.
2 ON-LINE REASONING
querying of the data structure coming out from the rst phase through
standard SQL-queries for ecient Core Competence Extraction.
Silvia Giannini Finding commonalities in Linked Open Data
13. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
CV translation
Silvia Giannini Finding commonalities in Linked Open Data
14. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
OFF-LINE REASONING: Relational schema design rules
T -Box informative content
Table CONCEPT: it stores CCNF of all the FL0(D) concepts (part (a))
Silvia Giannini Finding commonalities in Linked Open Data
15. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
OFF-LINE REASONING: Relational schema design rules
T -Box informative content
A table is created for each entry point R0
j , j 0 (part (b))
Silvia Giannini Finding commonalities in Linked Open Data
16. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
OFF-LINE REASONING: Relational schema design rules
A-Box informative content
Each atom of CCNF(C) of a conjunct ∃R0
j .C is stored in a dierent tuple
of table Rj with the same groupID (part (b))
Silvia Giannini Finding commonalities in Linked Open Data
17. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
OFF-LINE REASONING: Relational schema design rules
A-Box informative content
Table PROFILE includes proleID and extra-ontological structured
information (e.g., personal data, work-related information) (part (b))
Silvia Giannini Finding commonalities in Linked Open Data
18. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
ON-LINE REASONING: The Core Competence Extraction Algorithm
1 Proles Subsumers Matrix computation
Idea: Extract the common know-how, expressed in form of atomic
information, shared by the same group of employees, with cardinality
greater or equal to k.
Example
Mario Rossi: Cplusplus (5 years), Java (5 years), Visual Basic (5 years)
Daniela Bianchi: Cplusplus (2 years), Java (6 years), Visual Basic (1 years)
Elena Pomarico: CplusPlus, Java, Visual Basic
Carmelo Piccolo: VBScript, Process Performance Monitoring
Lucio Battista: DBMS (2 years)
Mariangela Porro: DBMS (2 years), Internet Technologies (2 years)
Nicola Marco: DBMS (5 years), Internet Technologies (5 years)
Domenico De Palo: OOprogramming (6 years), Articial intelligence (4 years), Internet technologies (4
years)
Silvia Giannini Finding commonalities in Linked Open Data
19. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
The Core Competence Extraction Algorithm
1 Proles Subsumers Matrix computation
Idea: Extract the common know-how, expressed in form of atomic
information, shared by the same group of employees, with cardinality
greater or equal to k.
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
Table: Portion of the previous Example Prole Subsumers Matrix
Silvia Giannini Finding commonalities in Linked Open Data
20. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
The Core Competence Extraction Algorithm
1 Proles Subsumers Matrix computation
Idea: Extract the common know-how, expressed in form of atomic
information, shared by the same group of employees, with cardinality
greater or equal to k.
D1 ∃hasKnowledge.ComputerScienceSkill
D2 ∃hasKnowledge.(ComputerScienceSkill =2 years)
D3 ∃hasKnowledge.ProgrammingLanguage
D4 ∃hasKnowledge.OOP
D5 ∃hasKnowledge.(ComputerScienceSkill =5 years)
D6 ∃hasKnowledge.(DBMS =2 years)
D7 ∃hasKnowledge.(OOP =5 years)
D8 ∃hasKnowledge.(InternetTechnologies =2 years)
D9 ∃hasKnowledge.C++
D10 ∃hasKnowledge.VisualBasic
D11 ∃hasKnowledge.Java
...
Table: Description of D1, . . . , D11 reported in the previous Table
Silvia Giannini Finding commonalities in Linked Open Data
21. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
The Core Competence Extraction Algorithm
1 Proles Subsumers Matrix computation
Silvia Giannini Finding commonalities in Linked Open Data
22. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
The Core Competence Extraction Algorithm
2 Common Subsumers enumeration
Referring to the PSM of the set P = {P(a1), . . . , P(an)}, and to a concept
component Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is the
union of the most specic features (i.e., prole concept components Dj) shared
by the same group of k employees, where k is a predened threshold.
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
Table: Portion of the previous Example Prole Subsumers Matrix
LCS = ∃hasKnowledge.ComputerScienceSkill
Silvia Giannini Finding commonalities in Linked Open Data
23. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
The Core Competence Extraction Algorithm
2 Common Subsumers enumeration
Referring to the PSM of the set P = {P(a1), . . . , P(an)}, and to a concept
component Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is the
union of the most specic features (i.e., prole concept components Dj) shared
by the same group of k employees, where k is a predened threshold.
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
Table: Portion of the previous Example Prole Subsumers Matrix
BICS = ∃hasKnowledge.ComputerScienceSkill =5 years
Silvia Giannini Finding commonalities in Linked Open Data
24. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
The Core Competence Extraction Algorithm
2 Common Subsumers enumeration
Referring to the PSM of the set P = {P(a1), . . . , P(an)}, and to a concept
component Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is the
union of the most specic features (i.e., prole concept components Dj) shared
by the same group of k employees, where k is a predened threshold.
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
Table: Portion of the previous Example Prole Subsumers Matrix
ICS3 = ∃hasKnowledge.(DBMS =2 years)
Silvia Giannini Finding commonalities in Linked Open Data
25. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
The Core Competence Extraction Algorithm
2 Common Subsumers enumeration
Referring to the PSM of the set P = {P(a1), . . . , P(an)}, and to a concept
component Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is the
union of the most specic features (i.e., prole concept components Dj) shared
by the same group of k employees, where k is a predened threshold.
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
Table: Portion of the previous Example Prole Subsumers Matrix
ICS3 = ∃hasKnowledge.(OOP =5 years)
Silvia Giannini Finding commonalities in Linked Open Data
26. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
The Core Competence Extraction Algorithm
2 Common Subsumers enumeration
Referring to the PSM of the set P = {P(a1), . . . , P(an)}, and to a concept
component Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is the
union of the most specic features (i.e., prole concept components Dj) shared
by the same group of k employees, where k is a predened threshold.
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
Table: Portion of the previous Example Prole Subsumers Matrix
ICS3 = ∃hasKnowledge.(InternetTechnologies =2 years)
Silvia Giannini Finding commonalities in Linked Open Data
27. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
The Core Competence Extraction Algorithm
2 Common Subsumers enumeration
Referring to the PSM of the set P = {P(a1), . . . , P(an)}, and to a concept
component Dk ∈ {D1, . . . , Dm} deriving from P, a Core Competence is the
union of the most specic features (i.e., prole concept components Dj) shared
by the same group of k employees, where k is a predened threshold.
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 ...
1 1 1 1 1 1 0 1 0 1 1 1 ...
2 1 1 1 1 1 0 1 0 1 1 1 ...
3 1 1 0 0 0 1 0 0 0 0 0 ...
4 1 1 0 0 0 1 0 1 0 0 0 ...
5 1 1 0 0 1 1 0 1 0 0 0 ...
6 1 0 1 0 0 0 0 0 0 0 0 ...
7 1 0 1 1 0 0 0 0 1 1 1 ...
8 1 1 1 1 1 0 1 1 0 0 0 ...
Table: Portion of the previous Example Prole Subsumers Matrix
ICS3 = ∃hasKnowledge.(C++ VisualBasic Java)
Silvia Giannini Finding commonalities in Linked Open Data
28. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
Core Competence module GUI
Silvia Giannini Finding commonalities in Linked Open Data
29. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
Core Competence module GUI
Silvia Giannini Finding commonalities in Linked Open Data
30. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
Core Competence module GUI
Silvia Giannini Finding commonalities in Linked Open Data
31. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
Core Competence module GUI
Silvia Giannini Finding commonalities in Linked Open Data
32. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
The Core Competence module
Lessons learned
Proposal: Knowledge Compilation approach for Core Competence Extraction.
+ It improves performances in terms of execution times, w.r.t. classical
logic-based approach.
+ It adopts standard SQL-queries to compute the same informative content
as advanced inference services.
+ It makes the computational costs of the process aordable also for large
organizations, while retaining the full expressiveness of the logic-based
approaches.
Notes on Performance:
The number of proles is highly relevant in the common subsumers
enumeration process.
The most computationally expensive process is the prole subsumers
matrix creation, under a threshold of proles concept components.
Silvia Giannini Finding commonalities in Linked Open Data
33. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Outline
1 Finding Commonalities: A DLs use case
2 Finding Commonalities: the Web of Data
Common Subsumer in RDF
RDF Clustering
3 Conclusion
Silvia Giannini Finding commonalities in Linked Open Data
34. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Motivation
Learning from the Web of Data:
huge amount of interconnected and machine-understandable data
data modeled as RDF resources
dataset addressed as Linked (Open) Data (LOD).
Facts to learn
identication of subsets of resources related to a common informative
content
- Cluster search (approximate matching)
- Disambiguation
- Personalization
Silvia Giannini Finding commonalities in Linked Open Data
35. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Problem Denition
In analogy to the LCS service, proposed in DLs to learn from examples.
Adaptation to the Web of Data:
giving up to the subsumption minimality requirement: even rough
Common Subsumers are useful for learning in the Web of Data
denition of Common Subsumer of pairs of RDF resources
Denition (Rooted Graph (r-graph))
Let TWr be the set of all triples with subject r in the Web. A Rooted Graph
(r-graph) is a pair r, Tr , where
1 r is either the URI of an RDF resource, or a blank node
2 Tr = {t | t = r p c} is a subset of relevant triples in TWr
Silvia Giannini Finding commonalities in Linked Open Data
36. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Example: A Possible Representation for resources a and b
Silvia Giannini Finding commonalities in Linked Open Data
37. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Example: A(nother) Possible Representation for resources a and b
Silvia Giannini Finding commonalities in Linked Open Data
38. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Common Subsumer
Denition (Common Subsumer)
Let a, Ta , b, Tb be two r-graphs and x, w, y be blank nodes.
If a, Ta = b, Tb , then a, Ta is a Common Subsumer of a, Ta , b, Tb .
if Ta = ∅ or Tb = ∅, the pair x, ∅ is a Common Subsumer of a, Ta ,
b, Tb
Otherwise, a pair x, T is a Common Subsumer of a, Ta , b, Tb i:
∃t = x w y such that (T entails t)
⇒ (1)
∃t1 = a p c, t2 = b q d such that
(T entails t1) ∧ (T entails t2)
where Ta ⊆ T, Tb ⊆ T and w, T is a Common Subsumer of p, Tp and
q, Tq , and y, T is a Common Subsumer of c, Tc and d, Td .
Note: We consider only simple entailment
Silvia Giannini Finding commonalities in Linked Open Data
39. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Example: a Common Subsumer of a and b
Silvia Giannini Finding commonalities in Linked Open Data
40. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Example: a Common Subsumer of a and b
Silvia Giannini Finding commonalities in Linked Open Data
41. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Example: a Common Subsumer of a and b
Silvia Giannini Finding commonalities in Linked Open Data
42. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Example: a Common Subsumer of a and b
Note: Triples with a blank node in predicate and object positions are discarded
Silvia Giannini Finding commonalities in Linked Open Data
43. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Example: a(nother) Common Subsumer of a and b
Silvia Giannini Finding commonalities in Linked Open Data
44. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Example: a(nother) Common Subsumer of a and b
Silvia Giannini Finding commonalities in Linked Open Data
45. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Example: a(nother) Common Subsumer of a and b
Silvia Giannini Finding commonalities in Linked Open Data
46. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Example: a(nother) Common Subsumer of a and b
Note: Triples with a blank node in predicate and object positions are discarded
Silvia Giannini Finding commonalities in Linked Open Data
47. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Common Subsumer in RDF
Solving Algorithm
Main Features:
anytime: if interrupted, it always returns a Common Subsumer of the
input pair of RDF resources
modular: it takes as input a function computing the sets of triples relevant
for the input RDF resources
Our current criterion for triples selection:
triples within a given graph distance from the input resource
triples having properties within to a selected set of signicant properties
for the dataset/application of interest
Output: A Common Subsumer of two r-graphs a, Ta and b, Tb :
a pair made up by a resource (anonymous or not) and a set of triples
stating facts about such a resource which are true for both a and b.
Alternative cases:
_ : cs, T : a blank node _ : cs together with a set of triples related to
_ : cs.
a, Ta , i and a, Ta = b, Tb
_ : cs, ∅ if either Ta = ∅ or Tb = ∅
Silvia Giannini Finding commonalities in Linked Open Data
48. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
RDF Clustering
Target Semantic Web Task
Clustering of Web resources with a CS
retrieving resources conveying the same information
in their dierent RDF descriptions
CS description → SPARQL queries:
WHERE { Tcs [blank nodes → variables] }
Silvia Giannini Finding commonalities in Linked Open Data
49. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
RDF Clustering
Clustering with a CS: A use case
The Italian Chamber of Deputies LOD
Public SPARQL endpoint (http://dati.camera.it/sparql)
Running example: Find the commonalities between deputies Nilde Iotti
and Tina Anselmi in the 10th Legislature
Silvia Giannini Finding commonalities in Linked Open Data
50. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
RDF Clustering
Clustering with a CS: A use case
The Italian Chamber of Deputies LOD
Public SPARQL endpoint (http://dati.camera.it/sparql)
Running example: Find the commonalities between deputies Nilde Iotti
and Tina Anselmi in the 10th Legislature
Silvia Giannini Finding commonalities in Linked Open Data
51. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
RDF Clustering
Clustering with a CS: A use case
The Italian Chamber of Deputies LOD
Public SPARQL endpoint (http://dati.camera.it/sparql)
SELECT DISTINCT ?x0
WHERE{
?x0 a http://dati.camera.it/ocd/deputato .
?x0 http:xmlns.comfoaf0.1gender female .
?x0 http://dati.camera.it/ocd/rif_mandatoCamera ?x1 .
. . .
}
Silvia Giannini Finding commonalities in Linked Open Data
52. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
RDF Clustering
Clustering with a CS: A use case
1st Legislature clusters
Silvia Giannini Finding commonalities in Linked Open Data
53. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Outline
1 Finding Commonalities: A DLs use case
2 Finding Commonalities: the Web of Data
3 Conclusion
Silvia Giannini Finding commonalities in Linked Open Data
54. Finding Commonalities: A DLs use case Finding Commonalities: the Web of Data Conclusion
Conclusion
Motivation: learning shared informative content in collections of RDF
resources
Problem Denition: search for Common Subsumers not subsumption
minimal in order to ensure computability in the Web of Data, too large to
be explored
Results:
An anytime algorithm computing Common Subsumers of pairs of RDF
resources:
allowing for using partial learned informative content for further processing,
whenever the search for Common Subsumers is interrupted
possibly supporting the clustering of collections of RDF resources, by
exploiting associativity of Common Subsumers.
Future works:
Extension of CS denition to other entailment regimes
Investigation on methods for selection of relevant triples
Automated link traversal techniques for more dataset exploration
Application to data quality problems (e.g.,missing values)
Silvia Giannini Finding commonalities in Linked Open Data