by Irene Celino, Simone Contessa, Marta Corubolo, Daniele Dell’Aglio, Emanuele Della Valle, Stefano Fumeo and Thorsten Krüger
CEFRIEL – Politecnico di Milano – SIEMENS
The document discusses the REST (Representational State Transfer) architectural style, describing it as a set of constraints that were derived through an incremental process to optimize for distributed hypermedia systems. The REST style aims to balance trade-offs through its constraints of being client-server, stateless, cached, having a uniform interface, and following a layered system approach. Implementations of REST may be imperfect due to real-world constraints, though the abstract idea of REST remains idealized.
Connor Martin used ReaFir to control the volume of different parts of a sound, adding a more dramatic effect. He also used Shifter to modify the sound to make it more futuristic and different than normal sounds. Overall, Connor experimented with audio editing tools to creatively alter and enhance sounds.
Four Steps to Creating an Effective Open Source Policyiasaglobal
A policy is a set of rules and guidelines for using and managing OSS in your organization. To be effective, it must cover all the essential aspects of managing OSS, yet it must be succinct and easily understood; otherwise nobody will read it, much less follow it
The tutorial will be presented on May 27 2012 at the 9th Extended Semantic Web Conference (ESWC 2012).
Short description of the tutorial:
The tutorial describes the traditional optimize-then-execute paradigm implemented in existing RDF engines and its main drawbacks when a large volume of data needs to be remotely accessed. As a solution to overcome limitations of current query processing approaches, we will present existing adaptive query processing techniques defined in the context of database management systems, and their applicability to the Semantic Web. Also, we will describe current solutions that have been proposed in the context of the Semantic Web to access remote data. The target audience includes researchers and practitioners that develop or use query engines to consume Linked and Big Data through SPARQL endpoints. The participants will learn limitations of existing RDF query engines and how current techniques can be extended to access remote data from Linked Data sets, and hide delays caused by unpredictable data transfers and datasets availability. A hands-on session will allow attendees to evaluate the performance and robustness of existing approaches.
by G. Larkou, J. Metochi, G. Chatzimilioudis and D. Zeinalipour-Yazti
Presented at: 1st IEEE International Workshop on Mobile Data Management Mining and Computing on Social Networks, collocated with IEEE MDM'13
Linking Smart Cities Datasets with Human Computation - the case of UrbanMatchIrene Celino
This document summarizes a study that links datasets from smart cities using human computation through a game called UrbanMatch. UrbanMatch links points of interest (POIs) from OpenStreetMap to representative photos from Wikimedia Commons and social media by having online players group photos with POIs. The game achieves high accuracy (>99%) and throughput (485 links/hour) in linking photos to POIs, showing the effectiveness of citizen computation games. However, engagement needs improvement as games averaged only 3 minutes. Overall, the study shows how human capabilities can be leveraged through games to link geospatial datasets.
This document discusses image matching techniques for object recognition and alignment. It begins with an overview of applications such as image retrieval, industrial inspection, target search, and automatic pick and place. It then covers 2D and 3D image matching methods for object recognition, including template matching, similarity measures like SAD, SSD and NCC, and shape-based matching. Optimization techniques like coarse-to-fine matching using image pyramids and SIMD instructions are also presented to improve efficiency. The document provides examples of image matching and object tracking in video samples.
The document discusses the REST (Representational State Transfer) architectural style, describing it as a set of constraints that were derived through an incremental process to optimize for distributed hypermedia systems. The REST style aims to balance trade-offs through its constraints of being client-server, stateless, cached, having a uniform interface, and following a layered system approach. Implementations of REST may be imperfect due to real-world constraints, though the abstract idea of REST remains idealized.
Connor Martin used ReaFir to control the volume of different parts of a sound, adding a more dramatic effect. He also used Shifter to modify the sound to make it more futuristic and different than normal sounds. Overall, Connor experimented with audio editing tools to creatively alter and enhance sounds.
Four Steps to Creating an Effective Open Source Policyiasaglobal
A policy is a set of rules and guidelines for using and managing OSS in your organization. To be effective, it must cover all the essential aspects of managing OSS, yet it must be succinct and easily understood; otherwise nobody will read it, much less follow it
The tutorial will be presented on May 27 2012 at the 9th Extended Semantic Web Conference (ESWC 2012).
Short description of the tutorial:
The tutorial describes the traditional optimize-then-execute paradigm implemented in existing RDF engines and its main drawbacks when a large volume of data needs to be remotely accessed. As a solution to overcome limitations of current query processing approaches, we will present existing adaptive query processing techniques defined in the context of database management systems, and their applicability to the Semantic Web. Also, we will describe current solutions that have been proposed in the context of the Semantic Web to access remote data. The target audience includes researchers and practitioners that develop or use query engines to consume Linked and Big Data through SPARQL endpoints. The participants will learn limitations of existing RDF query engines and how current techniques can be extended to access remote data from Linked Data sets, and hide delays caused by unpredictable data transfers and datasets availability. A hands-on session will allow attendees to evaluate the performance and robustness of existing approaches.
by G. Larkou, J. Metochi, G. Chatzimilioudis and D. Zeinalipour-Yazti
Presented at: 1st IEEE International Workshop on Mobile Data Management Mining and Computing on Social Networks, collocated with IEEE MDM'13
Linking Smart Cities Datasets with Human Computation - the case of UrbanMatchIrene Celino
This document summarizes a study that links datasets from smart cities using human computation through a game called UrbanMatch. UrbanMatch links points of interest (POIs) from OpenStreetMap to representative photos from Wikimedia Commons and social media by having online players group photos with POIs. The game achieves high accuracy (>99%) and throughput (485 links/hour) in linking photos to POIs, showing the effectiveness of citizen computation games. However, engagement needs improvement as games averaged only 3 minutes. Overall, the study shows how human capabilities can be leveraged through games to link geospatial datasets.
This document discusses image matching techniques for object recognition and alignment. It begins with an overview of applications such as image retrieval, industrial inspection, target search, and automatic pick and place. It then covers 2D and 3D image matching methods for object recognition, including template matching, similarity measures like SAD, SSD and NCC, and shape-based matching. Optimization techniques like coarse-to-fine matching using image pyramids and SIMD instructions are also presented to improve efficiency. The document provides examples of image matching and object tracking in video samples.
1. The document discusses recommender systems for processing data streams in real time. It introduces different types of recommender systems including unpersonalized, collaborative filtering, and content-based filtering approaches.
2. It then discusses challenges specific to recommending news, such as the large volume of new articles published daily and changing relevance of content over time.
3. Finally, it addresses big data issues related to news recommendation and introduces a system that provides researchers access to large news datasets to test different recommendation approaches.
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...NAVER Engineering
발표자: 이연창(한양대 박사과정)
발표일: 2018.2.
We investigate how to address the shortcomings of the popular One-Class Collaborative Filtering (OCCF) methods in handling challenging “sparse” dataset in one-class setting (e.g., clicked or bookmarked), and propose a novel graph-theoretic OCCF approach, named as gOCCF, by exploiting both positive preferences (derived from rated items) as well as negative preferences (derived from unrated items). In capturing both positive and negative preferences as a bipartite graph, further, we apply the graph shattering theory to determine the right amount of negative preferences to use. Then, we develop a suite of novel graph-based OCCF methods based on the random walk with restart and belief propagation methods. Through extensive experiments using 3 real-life datasets, we show that our gOCCF effectively addresses the sparsity challenge and significantly outperforms all of 8 competing methods in accuracy on very sparse datasets while providing comparable accuracy to the best performing OCCF methods on less sparse datasets.
EXPERIMENTAL STUDY OF MINUTIAE BASED ALGORITHM FOR FINGERPRINT MATCHING cscpconf
In this paper, a minutiae-based algorithm for fingerprint pattern recognition and matching is proposed. The algorithm uses the distance between the minutiae and core points to determine
the pattern matching scores for fingerprint images. Experiments were conducted using FVC2002 fingerprint database comprising four datasets of images of different sources and qualities. False Match Rate (FMR), False Non-Match Rate (FNMR) and the Average Matching Time (AMT) were the statistics generated for testing and measuring the performance of the proposed algorithm. The comparative analysis of the proposed algorithm and some existing minutiae based algorithms was carried out as well. The findings from the experimental study were presented, interpreted and some conclusions were drawn.
Experimental study of minutiae based algorithm for fingerprint matchingcsandit
In this paper, a minutiae-based algorithm for fingerprint pattern recognition and matching is
proposed. The algorithm uses the distance between the minutiae and core points to determine
the pattern matching scores for fingerprint images. Experiments were conducted using
FVC2002 fingerprint database comprising four datasets of images of different sources and
qualities. False Match Rate (FMR), False Non-Match Rate (FNMR) and the Average Matching
Time (AMT) were the statistics generated for testing and measuring the performance of the
proposed algorithm. The comparative analysis of the proposed algorithm and some existing
minutiae based algorithms was carried out as well. The findings from the experimental study
were presented, interpreted and some conclusions were drawn.
An analytical framework for formulating metrics for evaluating multi-dimensio...Rei Takami
presented at 25th International Conference on Intelligent User Interfaces (canceled due to COVID-19)
https://dl.acm.org/doi/abs/10.1145/3377325.3377529
Abstract: This paper proposes a visual analytics framework for formulating metrics for evaluating multi-dimensional time-series data. Multidimensional time-series data has been collected and utilized in different domains. We believe evaluation metrics play an important role in utilizing those data, such as decision making and labeling training data used in machine learning. However, it is a difficult task for even domain experts to formulate metrics. To support the process of formulating metrics, the proposed framework represents metrics as a linear combination of data attributes, and provides a means for formulating it through interactive data exploration. A prototype interface that visualizes target data as an animated scatter plot was implemented. Through this interface, several visualized objects can be directly manipulated: a node and a trajectory of an instance, and a convex hull as the group of nodes and trajectories. Linear combinations of attributes are adjusted in accordance with the manipulation of different objects' types by the user. The effectiveness of the proposed framework was demonstrated through two application examples with real-world data.
"Claudia Soares is an Invited Assistant Professor at Instituto Superior Tecnico in Portugal.
She will be presenting ""How can we use Data Science to measure people's perception of safety"". Feeling safe is a human basic need, and urban environments - where most people live today - can induce a wide range of perceptions. In order to quantify perceived safety and help decision makers to act with precision in creating a better environment for people, we developed an accurate measure of human perceived safety."
This document describes a Contextualized Knowledge Repository (CKR) framework that allows for representing and reasoning with contextual knowledge on the Semantic Web. The CKR extends the description logic SROIQ-RL to include defeasible axioms in the global context. Defeasible axioms can be overridden by local contexts, allowing exceptions. The CKR is composed of two layers - a global context containing metadata and defeasible axioms, and local contexts containing object knowledge with references. An interpretation of a CKR maps local contexts to descriptions logic interpretations over the object vocabulary, respecting references between contexts.
The document describes a Contextualized Knowledge Repository (CKR) framework for representing and reasoning with contextual knowledge on the Semantic Web. It discusses the need to make context explicit in the Semantic Web in order to represent knowledge that holds in specific contextual spaces like time, location, or topic. The CKR is presented as a formalism based on description logics that defines contexts as first-class objects and allows associating knowledge with contexts. It describes a prototype CKR implementation and outlines how a CKR could be used to represent open data about the Trentino region with contextual metadata.
This document discusses leveraging crowdsourcing techniques and consistency constraints to optimize the reconciliation of schema matching networks. It proposes:
1) Defining consistency constraints within schema matching networks and designing validation questions for crowdsourced workers.
2) Using consistency constraints to reduce reconciliation error rates and the monetary cost of asking additional validation questions.
3) Modeling a crowdsourcing process for schema matching networks that aims to minimize cost while maximizing accuracy through the application of consistency constraints.
This document discusses privacy-preserving schema reuse. It introduces the challenges of defining privacy constraints, generating an anonymized schema from multiple schemas while satisfying privacy constraints, defining a utility function for anonymized schemas, and solving the optimization problem of finding the anonymized schema with the highest utility that satisfies all privacy constraints. Experimental results demonstrate the trade-off between privacy enforcement and utility loss. The solution presents an approach for generating anonymized schemas from multiple schemas in a privacy-preserving manner.
Authros: Nguyen Quoc Viet Hung (1), Nguyen Thanh Tam (1), Zoltán Miklós (2), Karl Aberer (1),
Avigdor Gal (3), and Matthias Weidlich (4)
1 École Polytechnique Fédérale de Lausanne
2 Université de Rennes 1
3 Technion – Israel Institute of Technology
4 Imperial College London
This document summarizes a demo of using SPARQLstream and Morphstreams to visualize transport data from Madrid's public transport company (EMT) in a tablet application. Static EMT data like bus stop locations are extracted and mapped to RDF, while live bus waiting time data streams are transformed and queried in real-time. This allows a Map4RDF iOS app to retrieve bus stop information and lookup estimated arrival times using SPARQL and SPARQLstream queries. The demo illustrates how standards like SSN and R2RML can integrate static and streaming sensor data for web-based applications.
The document discusses the need for a W3C community group on RDF stream processing. It notes there is currently heterogeneity in RDF stream models, query languages, implementations, and operational semantics. The speaker proposes creating a W3C community group to better understand these differences, requirements, and potentially develop recommendations. The group's mission would be to define common models for producing, transmitting, and continuously querying RDF streams. The presentation provides examples of use cases and outlines a template for describing them to collect more cases to understand requirements.
This document describes SciQL, a language that bridges the gap between science and relational database management systems (DBMS). SciQL allows for the seamless integration of relational and array paradigms within DBMSs. It defines arrays and tables as first-class citizens and supports named dimensions, flexible structure-based grouping, and the distinction between arrays and tables. SciQL aims to lower the barrier for scientists to use DBMSs for array-based data while revealing new optimization opportunities for databases.
This document summarizes research on implementing defeasible logic, a non-monotonic reasoning method, in a distributed manner using the MapReduce framework. Defeasible logic allows commonsense reasoning over low-quality data and has low computational complexity. However, existing implementations did not scale to huge datasets. The researchers developed a multi-argument MapReduce implementation of defeasible logic that distributes the reasoning process. Experimental evaluation on large datasets showed this approach provides scalable defeasible reasoning over distributed data. Future work will address challenges with non-stratified rulesets and test the approach on additional real-world applications and knowledge representation methods.
This document discusses data and knowledge evolution on the semantic web. It begins by explaining the limitations of the current web in representing semantic content and introduces the semantic web as a way to give data well-defined meaning. It then discusses how ontologies and datasets are used to describe semantic data and how datasets are dynamic and change over time. It also introduces linked open data as a way to interconnect datasets and the challenges this presents. Finally, it outlines the scope of the talk, which is to survey research areas related to managing dynamic linked datasets, including remote change management, repair, and data/knowledge evolution.
This document discusses evolving workflow provenance information in the presence of custom inference rules. It presents three inference rules for provenance data, including that actors are associated with all subactivities if one activity, objects and their parts are used together, and information objects are present where physical objects carrying them are. It examines handling updates to provenance knowledge bases using these rules either by deleting all inferred facts or only as needed, and considers complexity of different approaches.
This document discusses access control for RDF graphs using abstract models. It presents an abstract access control model defined using abstract tokens and operators to model the computation of access labels for inferred RDF triples. The model supports dynamic datasets and policies. Experiments show that annotation time increases with the number of implied triples, while evaluation time increases linearly with the total number of triples. The abstract model approach allows different concrete access control policies to be applied to the same dataset.
Here are a few ways SciQL could help with this seismology use case:
1. The mseed array allows storing and querying the large seismic data in an efficient columnar format.
2. Window-based aggregation with dimensional grouping enables filtering signals by station/LTA ratios over time windows.
3. Views and queries on dimensional groups facilitate removing false positives by comparing signals across nearby stations over time.
4. Further window-based grouping and UDFs can extract signal windows for additional heuristic analysis.
By integrating the array and relational models, SciQL provides a declarative way to analyze large multidimensional scientific datasets like seismic signals interactively.
This talk was given by FORTH, Greece, at the European Data Forum (EDF) 2012 took place on June 6-7, 2012 in Copenhagen (Denmark) at the Copenhagen Business School (CBS).
Abstract:
Given the increasing amount of sensitive RDF data available on the Web, it becomes increasingly critical to guarantee secure access to this content. Access control is complicated when RDFS inference rules and other dependencies between access permissions of triples need to be considered; this is necessary, e.g., when we want to associate the access permissions of inferred triples with the ones that implied it. In this paper we advocate the use of abstract provenance models that are defined by means of abstract tokens operators to support fine grained access control for RDF graphs. The access label of a triple is a complex expression that encodes how said label was produced (i.e., the triples that contributed to its computation). This feature allows us to know exactly the effects of any possible change, thereby avoiding a complete recomputation of the labels when a change occurs. In addition, the same application can choose to enforce different access control policies or, different applications can enforce different policies on the same data, avoiding the recomputation of the label of a triple. Preliminary experiments have shown the applicability and benefits of our approach.
This talk has been given at the 13th International Conference on Principles of Knowledge Representation and Reasoning (KR 2012) to be held in Rome, Italy, June 10-14, 2012 by Ilias Tahmazidis (FORTH).
Abstract:
We are witnessing an explosion of available data from the Web, government authorities, scientific databases, sensors and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, commonsense knowledge etc. This raises the question of whether, how, and to what extent knowledge representation methods are capable of handling the vast amounts of data for these applications. In this paper, we consider nonmonotonic reasoning, which has traditionally focused on rich knowledge structures. In particular, we consider defeasible logic, and analyze how parallelization, using the MapReduce framework, can be used to reason with defeasible rules over huge data sets. Our experimental results demonstrate that defeasible reasoning with billions of data is performant, and has the potential to scale to trillions of facts.
More Related Content
Similar to Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
1. The document discusses recommender systems for processing data streams in real time. It introduces different types of recommender systems including unpersonalized, collaborative filtering, and content-based filtering approaches.
2. It then discusses challenges specific to recommending news, such as the large volume of new articles published daily and changing relevance of content over time.
3. Finally, it addresses big data issues related to news recommendation and introduces a system that provides researchers access to large news datasets to test different recommendation approaches.
Recommender Systems with Implicit Feedback Challenges, Techniques, and Applic...NAVER Engineering
발표자: 이연창(한양대 박사과정)
발표일: 2018.2.
We investigate how to address the shortcomings of the popular One-Class Collaborative Filtering (OCCF) methods in handling challenging “sparse” dataset in one-class setting (e.g., clicked or bookmarked), and propose a novel graph-theoretic OCCF approach, named as gOCCF, by exploiting both positive preferences (derived from rated items) as well as negative preferences (derived from unrated items). In capturing both positive and negative preferences as a bipartite graph, further, we apply the graph shattering theory to determine the right amount of negative preferences to use. Then, we develop a suite of novel graph-based OCCF methods based on the random walk with restart and belief propagation methods. Through extensive experiments using 3 real-life datasets, we show that our gOCCF effectively addresses the sparsity challenge and significantly outperforms all of 8 competing methods in accuracy on very sparse datasets while providing comparable accuracy to the best performing OCCF methods on less sparse datasets.
EXPERIMENTAL STUDY OF MINUTIAE BASED ALGORITHM FOR FINGERPRINT MATCHING cscpconf
In this paper, a minutiae-based algorithm for fingerprint pattern recognition and matching is proposed. The algorithm uses the distance between the minutiae and core points to determine
the pattern matching scores for fingerprint images. Experiments were conducted using FVC2002 fingerprint database comprising four datasets of images of different sources and qualities. False Match Rate (FMR), False Non-Match Rate (FNMR) and the Average Matching Time (AMT) were the statistics generated for testing and measuring the performance of the proposed algorithm. The comparative analysis of the proposed algorithm and some existing minutiae based algorithms was carried out as well. The findings from the experimental study were presented, interpreted and some conclusions were drawn.
Experimental study of minutiae based algorithm for fingerprint matchingcsandit
In this paper, a minutiae-based algorithm for fingerprint pattern recognition and matching is
proposed. The algorithm uses the distance between the minutiae and core points to determine
the pattern matching scores for fingerprint images. Experiments were conducted using
FVC2002 fingerprint database comprising four datasets of images of different sources and
qualities. False Match Rate (FMR), False Non-Match Rate (FNMR) and the Average Matching
Time (AMT) were the statistics generated for testing and measuring the performance of the
proposed algorithm. The comparative analysis of the proposed algorithm and some existing
minutiae based algorithms was carried out as well. The findings from the experimental study
were presented, interpreted and some conclusions were drawn.
An analytical framework for formulating metrics for evaluating multi-dimensio...Rei Takami
presented at 25th International Conference on Intelligent User Interfaces (canceled due to COVID-19)
https://dl.acm.org/doi/abs/10.1145/3377325.3377529
Abstract: This paper proposes a visual analytics framework for formulating metrics for evaluating multi-dimensional time-series data. Multidimensional time-series data has been collected and utilized in different domains. We believe evaluation metrics play an important role in utilizing those data, such as decision making and labeling training data used in machine learning. However, it is a difficult task for even domain experts to formulate metrics. To support the process of formulating metrics, the proposed framework represents metrics as a linear combination of data attributes, and provides a means for formulating it through interactive data exploration. A prototype interface that visualizes target data as an animated scatter plot was implemented. Through this interface, several visualized objects can be directly manipulated: a node and a trajectory of an instance, and a convex hull as the group of nodes and trajectories. Linear combinations of attributes are adjusted in accordance with the manipulation of different objects' types by the user. The effectiveness of the proposed framework was demonstrated through two application examples with real-world data.
"Claudia Soares is an Invited Assistant Professor at Instituto Superior Tecnico in Portugal.
She will be presenting ""How can we use Data Science to measure people's perception of safety"". Feeling safe is a human basic need, and urban environments - where most people live today - can induce a wide range of perceptions. In order to quantify perceived safety and help decision makers to act with precision in creating a better environment for people, we developed an accurate measure of human perceived safety."
Similar to Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch (6)
This document describes a Contextualized Knowledge Repository (CKR) framework that allows for representing and reasoning with contextual knowledge on the Semantic Web. The CKR extends the description logic SROIQ-RL to include defeasible axioms in the global context. Defeasible axioms can be overridden by local contexts, allowing exceptions. The CKR is composed of two layers - a global context containing metadata and defeasible axioms, and local contexts containing object knowledge with references. An interpretation of a CKR maps local contexts to descriptions logic interpretations over the object vocabulary, respecting references between contexts.
The document describes a Contextualized Knowledge Repository (CKR) framework for representing and reasoning with contextual knowledge on the Semantic Web. It discusses the need to make context explicit in the Semantic Web in order to represent knowledge that holds in specific contextual spaces like time, location, or topic. The CKR is presented as a formalism based on description logics that defines contexts as first-class objects and allows associating knowledge with contexts. It describes a prototype CKR implementation and outlines how a CKR could be used to represent open data about the Trentino region with contextual metadata.
This document discusses leveraging crowdsourcing techniques and consistency constraints to optimize the reconciliation of schema matching networks. It proposes:
1) Defining consistency constraints within schema matching networks and designing validation questions for crowdsourced workers.
2) Using consistency constraints to reduce reconciliation error rates and the monetary cost of asking additional validation questions.
3) Modeling a crowdsourcing process for schema matching networks that aims to minimize cost while maximizing accuracy through the application of consistency constraints.
This document discusses privacy-preserving schema reuse. It introduces the challenges of defining privacy constraints, generating an anonymized schema from multiple schemas while satisfying privacy constraints, defining a utility function for anonymized schemas, and solving the optimization problem of finding the anonymized schema with the highest utility that satisfies all privacy constraints. Experimental results demonstrate the trade-off between privacy enforcement and utility loss. The solution presents an approach for generating anonymized schemas from multiple schemas in a privacy-preserving manner.
Authros: Nguyen Quoc Viet Hung (1), Nguyen Thanh Tam (1), Zoltán Miklós (2), Karl Aberer (1),
Avigdor Gal (3), and Matthias Weidlich (4)
1 École Polytechnique Fédérale de Lausanne
2 Université de Rennes 1
3 Technion – Israel Institute of Technology
4 Imperial College London
This document summarizes a demo of using SPARQLstream and Morphstreams to visualize transport data from Madrid's public transport company (EMT) in a tablet application. Static EMT data like bus stop locations are extracted and mapped to RDF, while live bus waiting time data streams are transformed and queried in real-time. This allows a Map4RDF iOS app to retrieve bus stop information and lookup estimated arrival times using SPARQL and SPARQLstream queries. The demo illustrates how standards like SSN and R2RML can integrate static and streaming sensor data for web-based applications.
The document discusses the need for a W3C community group on RDF stream processing. It notes there is currently heterogeneity in RDF stream models, query languages, implementations, and operational semantics. The speaker proposes creating a W3C community group to better understand these differences, requirements, and potentially develop recommendations. The group's mission would be to define common models for producing, transmitting, and continuously querying RDF streams. The presentation provides examples of use cases and outlines a template for describing them to collect more cases to understand requirements.
This document describes SciQL, a language that bridges the gap between science and relational database management systems (DBMS). SciQL allows for the seamless integration of relational and array paradigms within DBMSs. It defines arrays and tables as first-class citizens and supports named dimensions, flexible structure-based grouping, and the distinction between arrays and tables. SciQL aims to lower the barrier for scientists to use DBMSs for array-based data while revealing new optimization opportunities for databases.
This document summarizes research on implementing defeasible logic, a non-monotonic reasoning method, in a distributed manner using the MapReduce framework. Defeasible logic allows commonsense reasoning over low-quality data and has low computational complexity. However, existing implementations did not scale to huge datasets. The researchers developed a multi-argument MapReduce implementation of defeasible logic that distributes the reasoning process. Experimental evaluation on large datasets showed this approach provides scalable defeasible reasoning over distributed data. Future work will address challenges with non-stratified rulesets and test the approach on additional real-world applications and knowledge representation methods.
This document discusses data and knowledge evolution on the semantic web. It begins by explaining the limitations of the current web in representing semantic content and introduces the semantic web as a way to give data well-defined meaning. It then discusses how ontologies and datasets are used to describe semantic data and how datasets are dynamic and change over time. It also introduces linked open data as a way to interconnect datasets and the challenges this presents. Finally, it outlines the scope of the talk, which is to survey research areas related to managing dynamic linked datasets, including remote change management, repair, and data/knowledge evolution.
This document discusses evolving workflow provenance information in the presence of custom inference rules. It presents three inference rules for provenance data, including that actors are associated with all subactivities if one activity, objects and their parts are used together, and information objects are present where physical objects carrying them are. It examines handling updates to provenance knowledge bases using these rules either by deleting all inferred facts or only as needed, and considers complexity of different approaches.
This document discusses access control for RDF graphs using abstract models. It presents an abstract access control model defined using abstract tokens and operators to model the computation of access labels for inferred RDF triples. The model supports dynamic datasets and policies. Experiments show that annotation time increases with the number of implied triples, while evaluation time increases linearly with the total number of triples. The abstract model approach allows different concrete access control policies to be applied to the same dataset.
Here are a few ways SciQL could help with this seismology use case:
1. The mseed array allows storing and querying the large seismic data in an efficient columnar format.
2. Window-based aggregation with dimensional grouping enables filtering signals by station/LTA ratios over time windows.
3. Views and queries on dimensional groups facilitate removing false positives by comparing signals across nearby stations over time.
4. Further window-based grouping and UDFs can extract signal windows for additional heuristic analysis.
By integrating the array and relational models, SciQL provides a declarative way to analyze large multidimensional scientific datasets like seismic signals interactively.
This talk was given by FORTH, Greece, at the European Data Forum (EDF) 2012 took place on June 6-7, 2012 in Copenhagen (Denmark) at the Copenhagen Business School (CBS).
Abstract:
Given the increasing amount of sensitive RDF data available on the Web, it becomes increasingly critical to guarantee secure access to this content. Access control is complicated when RDFS inference rules and other dependencies between access permissions of triples need to be considered; this is necessary, e.g., when we want to associate the access permissions of inferred triples with the ones that implied it. In this paper we advocate the use of abstract provenance models that are defined by means of abstract tokens operators to support fine grained access control for RDF graphs. The access label of a triple is a complex expression that encodes how said label was produced (i.e., the triples that contributed to its computation). This feature allows us to know exactly the effects of any possible change, thereby avoiding a complete recomputation of the labels when a change occurs. In addition, the same application can choose to enforce different access control policies or, different applications can enforce different policies on the same data, avoiding the recomputation of the label of a triple. Preliminary experiments have shown the applicability and benefits of our approach.
This talk has been given at the 13th International Conference on Principles of Knowledge Representation and Reasoning (KR 2012) to be held in Rome, Italy, June 10-14, 2012 by Ilias Tahmazidis (FORTH).
Abstract:
We are witnessing an explosion of available data from the Web, government authorities, scientific databases, sensors and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, commonsense knowledge etc. This raises the question of whether, how, and to what extent knowledge representation methods are capable of handling the vast amounts of data for these applications. In this paper, we consider nonmonotonic reasoning, which has traditionally focused on rich knowledge structures. In particular, we consider defeasible logic, and analyze how parallelization, using the MapReduce framework, can be used to reason with defeasible rules over huge data sets. Our experimental results demonstrate that defeasible reasoning with billions of data is performant, and has the potential to scale to trillions of facts.
The presentation was delivered during the 1st International Conference on Health Information Science (HIS 2012) on April 9th, 2012 in Beijing, China.
Abstract:
In cytomics bookkeeping of the data generated during lab experiments is crucial. The current approach in cytomics is to conduct High-Throughput Screening (HTS) experiments so that cells can be tested under many different experimental conditions. Given the large amount of different conditions and the readout of the conditions through images, it is clear that the HTS approach requires a proper data management system to reduce the time needed for experiments and the chance of man-made errors. As different types of data exist, the experimental conditions need to be linked to the images produced by the HTS experiments with their metadata and the results of further analysis. Moreover, HTS experiments never stand by themselves, as more experiments are lined up, the amount of data and computations needed to analyze these increases rapidly. To that end cytomic experiments call for automated and systematic solutions that provide convenient and robust features for scientists to manage and analyze their data. In this paper, we propose a platform for managing and analyzing HTS images resulting from cytomics screens taking the automated HTS workflow as a starting point. This platform seamlessly integrates the whole HTS workflow into a single system. The platform relies on a modern relational database system to store user data and process user requests, while providing a convenient web interface to end-users. By implementing this platform, the overall workload of HTS experiments, from experiment design to data analysis, is reduced significantly. Additionally, the platform provides the potential for data integration to accomplish genotype-to-phenotype modeling studies.
The talk was given at the 15th International Conference on Extending Database Technology (EDBT 2012) on March 29, 2012 in Berlin, Germany.
Abstract:
Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we define the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem. We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-of-the-art RDF-3X engine as well as comparing the plan quality with a relational (SQL) equivalent of the benchmarks.
The talk was delivered by Ian Rolewicz at the International Workshop on Cloud for High Performance Computing 2011 (C4HPC'11), co-located with the 2011 International Conference on Computational Science and its Applications (ICCSA 2011) .
Publication: http://bit.ly/GRBkC2
Abstract:
This document introduces the TimeCloud Front End, a web based interface for the TimeCloud platform that manages large-scale time series in the cloud. While the Back End is built upon scalable, fault tolerant distributed systems as Hadoop and HBase and takes novel approaches for faciliating data analysis over massive time series, the Front End was built as a simple and intuitive interface for viewing the data present in the cloud, both with simple tabular display and the help of various visualizations. In addition, the Front End implements model-based views and data fetch on-demand for reducing the amount of work performed at the Back End.
The paper was delivered in the Proceedings of the 2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing (ICCP 2011) on August 26th, 2011 in Cluj-Napoca, Romania.
Publication: http://bit.ly/GDIsRY
Abstract:
OntoGen is a semi-automatic and data-driven ontology editor focusing on editing of topic ontologies. It utilizes text mining tools to make the ontology-related tasks simpler to the user. This focus on building ontologies from textual data is what we are trying to bridge. We have successfully extended OntoGen to work with image data and allow for ontology construction and editing on collections of labeled or unlabeled images. Browsing large heterogeneous image collections efficiently is certainly a challenging task and we feel that semiautomatic ontology construction, as described in this paper, makes this task easier.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
CAKE: Sharing Slices of Confidential Data on BlockchainClaudio Di Ciccio
Presented at the CAiSE 2024 Forum, Intelligent Information Systems, June 6th, Limassol, Cyprus.
Synopsis: Cooperative information systems typically involve various entities in a collaborative process within a distributed environment. Blockchain technology offers a mechanism for automating such processes, even when only partial trust exists among participants. The data stored on the blockchain is replicated across all nodes in the network, ensuring accessibility to all participants. While this aspect facilitates traceability, integrity, and persistence, it poses challenges for adopting public blockchains in enterprise settings due to confidentiality issues. In this paper, we present a software tool named Control Access via Key Encryption (CAKE), designed to ensure data confidentiality in scenarios involving public blockchains. After outlining its core components and functionalities, we showcase the application of CAKE in the context of a real-world cyber-security project within the logistics domain.
Paper: https://doi.org/10.1007/978-3-031-61000-4_16
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
1. Linking Smart Cities Datasets
with Human Computation:
the case of UrbanMatch
Irene Celino, Simone Contessa, Marta Corubolo,
Daniele Dell’Aglio, Emanuele Della Valle,
Stefano Fumeo and Thorsten Krüger
CEFRIEL – Politecnico di Milano – SIEMENS
UrbanMatch - ISWC 2012
2. Citizen Computation
Human Computation
Citizen Science
exploiting human capabilities
to solve computational tasks
difficult for machines
exploiting volunteers
to collect scientific data or
to conduct experiments
"in the world"
Citizen Computation
exploiting human capabilities
to contribute to a mixed
computational system
by living "in the world"
UrbanMatch - ISWC 2012
2
Boston, 13th November 2012
3. UrbanMatch: Citizen Computation GWAP
Photos from social media
Places & POIs from
OpenStreetMap
e.g. Duomo
e.g. equestrian statue
Wikimedia
Commons
?
?
?
Which photos actually represent the urban POIs?
Can we link POIs to their most representative photos?
UrbanMatch - ISWC 2012
3
Boston, 13th November 2012
4. The Record Linkage Problem
Γ=A×B
B
Comparison space
A
M (set of matches) and
U (set of non-matches)
Each link γ has a score 𝑠 𝛾
B
Γ= 𝐴× 𝐵
Purpose is to split Γ in
[Fellegi&Sunter,1969]
𝑠 𝛾 = 𝑃 𝛾 ∈ Γ 𝑀 /𝑃(𝛾 ∈ Γ|𝑈)
Partitioning based on
lower/upper thresholds:
UrbanMatch - ISWC 2012
𝑠 𝛾 < 𝐿𝑂𝑊𝐸𝑅 ⇒ 𝛾 ∈ 𝑈
∈M
∈U
∈C
𝑠 𝛾 > 𝑈𝑃𝑃𝐸𝑅 ⇒ 𝛾 ∈ 𝑀
A
𝐿𝑂𝑊𝐸𝑅 ≤ 𝑠 𝛾 ≤ 𝑈𝑃𝑃𝐸𝑅 ⇒ 𝛾 ∈ 𝐶
(candidate links to be assessed by experts)
4
Boston, 13th November 2012
5. UrbanMatch Linkage Problem
B (photos)
A (POIs)
Piazza
del Duomo
Piazza della
Scala
A contains POIs
B contains photos
Γ contains POI-photo links
"Playable places" group
POIs partitioning of Γ
(problem space reduction)
UrbanMatch links
Input data in UrbanMatch:
UrbanMatch - ISWC 2012
5
34 POIs in 14 playable places
11,089 photos
196 links in M
382 links in U
37,413 links in C
Boston, 13th November 2012
6. Showing links to UrbanMatch players
The POI-photo links to be evaluated could be shown
directly to players...
Santa Maria
delle Grazie
correct?
???
...but the player could not know the name of the POI
On the other hand, the player can easily "see"
(Santa Maria
delle Grazie)
correct?
NO!
(Sant'Ambrogio)
thus we use a "sure" photo as proxy for the POI, thus the
UrbanMatch game is designed as a photo-pairing challenge
UrbanMatch - ISWC 2012
6
Boston, 13th November 2012
7. UrbanMatch level definition
Set B of photos
Set A of POIs
a
b
c
POI 1
d
(Duomo)
h
POI 2
(statue)
g
f
e
Photos d and g depict POI 1 (and do not depict POI 2)
Photos c and h depict POI 2 (and do not depict POI 1)
Photos b and f do not depict POI 1 nor POI 2 (distractors)
Photos a and e maybe depict POI 1 and/or POI 2
UrbanMatch - ISWC 2012
7
Boston, 13th November 2012
8. Feedbacks to UrbanMatch players
If the selected pair
is correct (two sure
photos) or plausible,
UrbanMatch gives a
positive feedback
d
e
b
If the selected pair is
wrong (at least one
selected photo is a
"distractor" option),
UrbanMatch gives a
negative feedback
(and counts the player's mistake)
f
UrbanMatch - ISWC 2012
8
Boston, 13th November 2012
10. Link score in UrbanMatch
Effect of players' evidences on the score sγ
If there is a positive evidence (an UrbanMatch player says that the
link γ is correct), sγ is increased at most by Kpos
If there is a negative evidence (an UrbanMatch player says that the
link γ is incorrect), sγ is decreased at most by Kneg
Each evidence is weighted with a factor that represents UrbanMatch
player's reliability
𝑟𝑝 =
𝜀
− 𝑝 2
𝑒
where 𝜀 𝑝 is the number of errors made by the player, thus 𝑟 𝑝 ∈ [0. . 1]
(reliability is 1 if the player makes no mistake, 0.6 if he makes 1 error,
almost zero if he makes 6 mistakes or more)
For each link γ in C, its score varies as follows
Positive evidence:
𝑠′ 𝛾 = 𝑠 𝛾 + 𝐾 𝑝𝑜𝑠 ∗ 𝑟 𝑝
Negative evidence:
𝑠′ 𝛾 = 𝑠 𝛾 − 𝐾 𝑛𝑒𝑔 ∗ 𝑟 𝑝
UrbanMatch - ISWC 2012
10
Boston, 13th November 2012
11. Evaluating UrbanMatch precision
Accuracy definition
(metrics specific to Data Quality)
candidate links correctly assessed to
be either trustable or incorrect
game ability to correctly
assess candidate links
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝐶𝑀 − 𝐹𝑃 + (𝐶𝑈 − 𝐹𝑁)
𝐶𝑀 + 𝐶𝑈
candidate links assessed to be
either trustable or incorrect
CM = right links, i.e. candidate links moved from C to M (because 𝑠 𝛾 > 𝑈𝑃𝑃𝐸𝑅)
CU = wrong links, i.e. candidate links moved from C to U (because 𝑠 𝛾 < 𝐿𝑂𝑊𝐸𝑅)
FP = false positives (candidate links assessed as trustable but actually incorrect)
FN = false negatives (candidate links assessed as incorrect but actually trustable)
accuracy requires ground truth computed only on a
manually analyzed set of links for evaluation's sake !
UrbanMatch - ISWC 2012
11
Boston, 13th November 2012
12. Evaluating UrbanMatch as a GWAP
Throughput and ALP definition
(metrics specific to Games with a Purpose)
game ability to assess
candidate links
game ability to
engage players
𝑆𝑜𝑙𝑣𝑒𝑑𝑃𝑟𝑜𝑏𝑙𝑒𝑚𝑠
𝑇ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡 =
𝑃𝑙𝑎𝑦𝑒𝑑𝑇𝑖𝑚𝑒
𝑃𝑙𝑎𝑦𝑒𝑑𝑇𝑖𝑚𝑒
𝐴𝐿𝑃 =
𝐴𝑐𝑡𝑖𝑣𝑒𝑃𝑙𝑎𝑦𝑒𝑟𝑠
where 𝑆𝑜𝑙𝑣𝑒𝑑𝑃𝑟𝑜𝑏𝑙𝑒𝑚𝑠 = 𝐶𝑀 + 𝐶𝑈 in which:
CM = right links, i.e. candidate links moved from C to M
(because 𝑠 𝛾 > 𝑈𝑃𝑃𝐸𝑅)
CU = wrong links, i.e. candidate links moved from C to U
(because 𝑠 𝛾 < 𝐿𝑂𝑊𝐸𝑅)
UrbanMatch - ISWC 2012
12
Boston, 13th November 2012
13. Setting the thresholds for link score 𝒔 𝜸
Accuracy and Throughput as a function of 𝐿𝑂𝑊𝐸𝑅
LOWER
0.05
0.10
0.15
0.20
CU
321
348
1152
1216
FN
4
5
7
8
98.75 %
98.56 %
99.39 %
99.34 %
108.08
117.17
387.87
409.42
Accuracy
Throughput
Accuracy and Throughput as a function of 𝑈𝑃𝑃𝐸𝑅
UPPER
0.60
0.65
0.70
0.75
0.80
0.85
0.90
CM
227
225
68
65
61
60
49
FP
48
47
4
4
3
3
1
Accuracy
Throughput
78.85 % 79.11 % 94.11 % 95.38 % 95.08 % 95.00 % 97.95 %
76.43
UrbanMatch - ISWC 2012
75.75
22.89
13
21.88
20.53
20.20
Boston, 13th November 2012
16.49
14. UrbanMatch evaluation results
Accuracy, Throughput and ALP results
Evaluation data
Accuracy (on the manually-checked subset)
54 unique players, 290 games (781 game levels)
2006 input links, 1284 assessed (trustable/incorrect)
upper threshold 0.70, lower threshold 0.20
99.06%
very good result
Throughput and ALP (on all processed links)
485 links/h
3m 17s
UrbanMatch - ISWC 2012
very good result
one-time game, needs improvement
14
Boston, 13th November 2012
15. Evaluation UrbanMatch as a whole
User-based evaluation: actual "engagement"
Based on game evaluation literature and integrated with our
research-specific questions
Questionnaire at http://bit.ly/um-survey
Findings:
UrbanMatch - ISWC 2012
15
Boston, 13th November 2012
16. Conclusions
Citizen Computation games are effective to leverage
linking capabilities of "humans"
Game is to be designed to keep in consideration users'
capabilities and usage context:
(lack of) background knowledge
small mobile-phone screen
Different tasks to be solved need different game "missions"
Linking task pairing mission
Verifying task rating/selecting mission
Collecting task selecting/editing mission
Advertisement try out Urbanopoly game here in Boston,
to collect and assess geo-spatial linked data
participant to Semantic Web Challenge http://bit.ly/urbanopoly
UrbanMatch - ISWC 2012
16
Boston, 13th November 2012
17. Thanks for your attention!
Any question?
Irene Celino – CEFRIEL, ICT Institute Politecnico di Milano
email: Irene.Celino@cefriel.it – web: http://swa.cefriel.it
UrbanMatch - ISWC 2012