This document discusses problems and opportunities in context-based personalization. It describes how standard database management systems are limiting and how context-based personalization can help manage information overload by shaping query answers according to a user's preferences and context. It presents the key aspects of context-based personalization including modeling user characteristics, collecting implicit data, and tailoring data based on non-functional aspects like quality. The document also outlines challenges in context representation, preference derivation, data access management with heterogeneous sources, and the potential for applications in emergency management and pervasive advertising.
Semantics empowered Physical-Cyber-Social Systems for EarthCubeAmit Sheth
Presentation at the EarthCube Face Face-to-Face Workshop of Semantics & Ontologies Workgroup: April 30-May 1, 2012, Ballston, VA.
Workshop site: http://earthcube.ning.com/group/semantics-and-ontologies/page/workshops
For more recent material on this topic, see: http://wiki.knoesis.org/index.php/PCS
How Emotional Are Users' Needs? Emotion in Query LogsMarina Santini
Emotional behaviour seems to be ubiquitous on the web. Predictably, social media web genres such as tweets, blog posts and blog comments show high emotional involvement. What about other genres on the web? In this talk, the focus is on the search query log genre. According to recent IR research, searchers’ behaviour is not only limited to traditional informational, navigational and transactional needs. A novel hypothesis is that the seeking behaviour is driven by emotion. But can emotion be detected by analysing the queries typed by users in a search box? In this talk, I will present the results of some experiments carried out to investigate whether it is possible to identify emotion in the query log genre, and discuss how emotion could be utilized to improve the relevance of retrieved documents in searches. These experiments are part of SearchInFocus, a study centred on search.
Semantics empowered Physical-Cyber-Social Systems for EarthCubeAmit Sheth
Presentation at the EarthCube Face Face-to-Face Workshop of Semantics & Ontologies Workgroup: April 30-May 1, 2012, Ballston, VA.
Workshop site: http://earthcube.ning.com/group/semantics-and-ontologies/page/workshops
For more recent material on this topic, see: http://wiki.knoesis.org/index.php/PCS
How Emotional Are Users' Needs? Emotion in Query LogsMarina Santini
Emotional behaviour seems to be ubiquitous on the web. Predictably, social media web genres such as tweets, blog posts and blog comments show high emotional involvement. What about other genres on the web? In this talk, the focus is on the search query log genre. According to recent IR research, searchers’ behaviour is not only limited to traditional informational, navigational and transactional needs. A novel hypothesis is that the seeking behaviour is driven by emotion. But can emotion be detected by analysing the queries typed by users in a search box? In this talk, I will present the results of some experiments carried out to investigate whether it is possible to identify emotion in the query log genre, and discuss how emotion could be utilized to improve the relevance of retrieved documents in searches. These experiments are part of SearchInFocus, a study centred on search.
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014Giorgio Orsi
ROSeAnn - Reconciling Opinions of Semantic Annotators. VLDB 2014 Conference.
A growing number of resources are available for enriching documents with semantic annotations. While originally focused on a few standard classes of annotations, the ecosystem of annotators is now becoming increasingly diverse. Although annotators often
have very different vocabularies, with both high-level and specialist concepts, they also have many semantic interconnections. We will show that both the overlap and the diversity in annotator vocabularies motivate the need for semantic annotation integration: middleware
that produces a unified annotation on top of diverse semantic annotators. On the one hand, the diversity of vocabulary allows applications to benefit from the much richer vocabulary available in an integrated vocabulary. On the other hand, we present evidence that the most widely-used annotators on the web suffer from serious accuracy deficiencies: the overlap in vocabularies from individual annotators allows an integrated annotator to boost accuracy by exploiting inter-annotator agreement and disagreement.
The integration of semantic annotations leads to new challenges, both compared to usual data integration scenarios and to standard aggregation of machine learning tools. We overview an approach to these challenges that performs ontology-aware aggregation. We introduce an approach that requires no training data, making use of ideas from database repair. We experimentally compare this with a supervised approach, which adapts maximal entropy Markov models to the setting of ontology-based annotations. We further experimentally
compare both these approaches with respect to ontology-unaware supervised approaches, and to individual annotators.
The tutorial has been presented at CAISE 2010. The tutorial discusses the state-of-the-art on research addresseing the quality of data at the conceptual level (conceptual schemas) and of Ontologies
Modern learning models require linking experiences in training environments with experiences in the real-world. However, data about real-world experiences is notoriously hard to collect. Social spaces bring new opportunities to tackle this challenge, supplying digital traces where people talk about their real-world experiences. These traces can become valuable resource, especially in ill-defined domains that embed multiple interpretations. The paper presents a unique approach to aggregate content from social spaces into a semantic-enriched data browser to facilitate informal learning in ill-defined domains. This work pioneers a new way to exploit digital traces about real-world experiences as authentic examples in informal learning contexts. An exploratory study is used to determine both strengths and areas needing attention. The results suggest that semantics can be successfully used in social spaces for informal learning – especially when combined with carefully designed nudges.
PRISSMA,Towards Mobile Adaptive Presentation of the Web of DataLuca Costabello
The Mobile Web is evolving fast and mobile access to the Web of Data is gaining momentum. Interlinked RDF resources consumed from portable devices need proper adaptation to the context in which the action is performed. This paper introduces PRISSMA (Presentation of Resources for Interoperable Semantic and Shareable Mobile Adapt- ability), a domain-independent vocabulary for displaying Web of Data resources in mobile environments. The vocabulary is the first step to- wards a declarative framework aimed at sharing and re-using presenta- tion information for context-adaptable user interfaces over RDF data.
The namespace vocabulary can be found at http://ns.inria.fr/prissma
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
I will claim that Semantic Web Patterns can drive the next technological breakthrough: they can be key for providing intelligent applications with sophisticated ways of interpreting data. I will picture scenarios of a possible not so far future in order to support my claim. I will argue that current Semantic Web Patterns are not sufficient for addressing the envisioned requirements, and I will suggest a research direction for fixing the problem, which includes the hybridisation of existing computer science pattern-based approaches, and human computing.
Linked Data and Semantic Technologies can support a next generation of science. This talk shows examples of discovery, access, integration, analysis, and shows directions towards prediction and vision.
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
Big Data is based on the vision of providing users and applications with a more complete picture of the reality supported and mediated by data. This vision comes with the inherent price of data variety, i.e. data which is semantically heterogeneous, poorly structured, complex and with data quality issues. Despite the hype on technologies targeting data volume and velocity, solutions for coping with data variety remain fragmented and with limited adoption. In this talk we will focus on emerging data management approaches, supported by semantic technologies, to cope with data variety. We will provide a broad overview of semantic computing approaches and how they can be applied to data management challenges within organizations today. This talk will allow the audience to have a glimpse into the next-generation, Big Data-driven information systems.
ROSeAnn: Reconciling Opinions of Semantic Annotators VLDB 2014Giorgio Orsi
ROSeAnn - Reconciling Opinions of Semantic Annotators. VLDB 2014 Conference.
A growing number of resources are available for enriching documents with semantic annotations. While originally focused on a few standard classes of annotations, the ecosystem of annotators is now becoming increasingly diverse. Although annotators often
have very different vocabularies, with both high-level and specialist concepts, they also have many semantic interconnections. We will show that both the overlap and the diversity in annotator vocabularies motivate the need for semantic annotation integration: middleware
that produces a unified annotation on top of diverse semantic annotators. On the one hand, the diversity of vocabulary allows applications to benefit from the much richer vocabulary available in an integrated vocabulary. On the other hand, we present evidence that the most widely-used annotators on the web suffer from serious accuracy deficiencies: the overlap in vocabularies from individual annotators allows an integrated annotator to boost accuracy by exploiting inter-annotator agreement and disagreement.
The integration of semantic annotations leads to new challenges, both compared to usual data integration scenarios and to standard aggregation of machine learning tools. We overview an approach to these challenges that performs ontology-aware aggregation. We introduce an approach that requires no training data, making use of ideas from database repair. We experimentally compare this with a supervised approach, which adapts maximal entropy Markov models to the setting of ontology-based annotations. We further experimentally
compare both these approaches with respect to ontology-unaware supervised approaches, and to individual annotators.
The tutorial has been presented at CAISE 2010. The tutorial discusses the state-of-the-art on research addresseing the quality of data at the conceptual level (conceptual schemas) and of Ontologies
Modern learning models require linking experiences in training environments with experiences in the real-world. However, data about real-world experiences is notoriously hard to collect. Social spaces bring new opportunities to tackle this challenge, supplying digital traces where people talk about their real-world experiences. These traces can become valuable resource, especially in ill-defined domains that embed multiple interpretations. The paper presents a unique approach to aggregate content from social spaces into a semantic-enriched data browser to facilitate informal learning in ill-defined domains. This work pioneers a new way to exploit digital traces about real-world experiences as authentic examples in informal learning contexts. An exploratory study is used to determine both strengths and areas needing attention. The results suggest that semantics can be successfully used in social spaces for informal learning – especially when combined with carefully designed nudges.
PRISSMA,Towards Mobile Adaptive Presentation of the Web of DataLuca Costabello
The Mobile Web is evolving fast and mobile access to the Web of Data is gaining momentum. Interlinked RDF resources consumed from portable devices need proper adaptation to the context in which the action is performed. This paper introduces PRISSMA (Presentation of Resources for Interoperable Semantic and Shareable Mobile Adapt- ability), a domain-independent vocabulary for displaying Web of Data resources in mobile environments. The vocabulary is the first step to- wards a declarative framework aimed at sharing and re-using presenta- tion information for context-adaptable user interfaces over RDF data.
The namespace vocabulary can be found at http://ns.inria.fr/prissma
Fueling the future with Semantic Web patterns - Keynote at WOP2014@ISWCValentina Presutti
I will claim that Semantic Web Patterns can drive the next technological breakthrough: they can be key for providing intelligent applications with sophisticated ways of interpreting data. I will picture scenarios of a possible not so far future in order to support my claim. I will argue that current Semantic Web Patterns are not sufficient for addressing the envisioned requirements, and I will suggest a research direction for fixing the problem, which includes the hybridisation of existing computer science pattern-based approaches, and human computing.
Linked Data and Semantic Technologies can support a next generation of science. This talk shows examples of discovery, access, integration, analysis, and shows directions towards prediction and vision.
Coping with Data Variety in the Big Data Era: The Semantic Computing ApproachAndre Freitas
Big Data is based on the vision of providing users and applications with a more complete picture of the reality supported and mediated by data. This vision comes with the inherent price of data variety, i.e. data which is semantically heterogeneous, poorly structured, complex and with data quality issues. Despite the hype on technologies targeting data volume and velocity, solutions for coping with data variety remain fragmented and with limited adoption. In this talk we will focus on emerging data management approaches, supported by semantic technologies, to cope with data variety. We will provide a broad overview of semantic computing approaches and how they can be applied to data management challenges within organizations today. This talk will allow the audience to have a glimpse into the next-generation, Big Data-driven information systems.
Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote given at IEEE Meta-Data, Bathesda, MD, April 6 1999.
A talk given at the annual Computer Science for High School Teachers event at Victoria University of Wellington. I presented on some basics of the World Wide Web and why it's worth to preserve it, our work on non-expert tools to populate semantically enriched content, a current project to identify NZ native birds based on their calls that involves citizen science and contemporary deep learning using TensorFlow, a project that investigates the impact of online citizen science on the development of science capabilities of primary school children, and my collaboration with Adam Grener from the School of English, Film, Theater and Media Studies at VUW with whom I am working on computational tools for the literature studies.
Query Rewriting and Optimization for Ontological DatabasesGiorgio Orsi
Ontological queries are evaluated against a knowledge base consisting of an extensional database and an ontology (i.e., a set of logical assertions and constraints that derive new intensional knowledge from the extensional database), rather than directly on the extensional database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this article, we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation
of an ontological query into an equivalent first-order query against the underlying extensional database.
We present a novel query rewriting algorithm for rather general types of ontological constraints that is well suited for practical implementations. In particular, we show how a conjunctive query against a knowledge base, expressed using linear and sticky existential rules, that is, members of the recently introduced Datalog+/- family of ontology languages, can be compiled into a union of conjunctive queries (UCQ) against the underlying database. Ontological query optimization, in this context, attempts to improve this rewriting process soas to produce possibly small and cost-effective UCQ rewritings for an input query.
Heuristic Ranking in Tightly Coupled Probabilistic Description LogicsGiorgio Orsi
The Semantic Web effort has steadily been gaining traction in the recent years. In particular,Web search companies are recently realizing that their products need to evolve towards having richer semantic search capabilities. Description logics (DLs) have been adopted as the formal underpinnings for Semantic Web languages used in describing ontologies. Reasoning under uncertainty has recently taken a leading role in this arena, given the nature of data found on theWeb. In this paper, we present a probabilistic extension of the DL EL++ (which underlies the OWL2 EL profile) using Markov logic networks (MLNs) as probabilistic semantics. This extension is tightly coupled, meaning that probabilistic annotations in formulas can refer to objects in the ontology. We show that, even though the tightly coupled nature of our language means that many basic operations are data-intractable, we can leverage a sublanguage of MLNs that allows to rank the atomic consequences of an ontology relative to their probability values (called ranking queries) even when these values are not fully computed. We present an anytime algorithm to answer ranking queries, and provide an upper bound on the error that it incurs, as well as a criterion to decide when results are guaranteed to be correct.
Wrapper induction faces a dilemma: To reach web scale, it requires automatically generated examples, but to produce accurate results, these examples must have the quality of human annotations. We resolve this conflict with AMBER, a system for fully automated data extraction from result pages. In contrast to previous approaches, AMBER employs domain specific gazetteers to discern basic domain attributes on a page, and leverages repeated occurrences of similar attributes to group related attributes into records rather than relying on the noisy structure of the DOM. With this approach AMBER is able to identify records and their attributes with almost perfect accuracy (>98%) on a large sample of websites. To make such an approach feasible at scale, AMBER automatically learns domain gazetteers from a small seed set. In this demonstration, we show how AMBER uses the repeated structure of records on deep web result pages to learn such gazetteers. This is only possible with a highly accurate extraction system. Depending on its parametrization, this learning process runs either fully automatically or with human interaction. We show how AMBER bootstraps a gazetteer for UK locations in 4 iterations: From a small seed sample we achieve 94.4% accuracy in recognizing UK locations in the 4th iteration.
Search engines are the sinews of the web. These sinews have become strained, however: Where the web's function once was a mix of library and yellow pages, it has become the central marketplace for information of almost any kind. We search more and more for objects with specific characteristics, a car with a certain mileage, an affordable apartment close to a good school, or the latest accessory for our phones. Search engines all too often fail to provide reasonable answers, making us sift through dozens of websites with thousands of offers--never to be sure a better offer isn't just around the corner. What search engines are missing is understanding of the objects and their attributes published on websites.
Automatically identifying and extracting these objects is akin to alchemy: transforming unstructured web information into highly structured data with near perfect accuracy. With DIADEM we present a formula for this transformation, but at a price: DIADEM identifies and extracts data from a website with high accuracy. The price is that for this task we need to provide DIADEM with extensive knowledge about the ontology and phenomenology of the domain, i.e., about entities (and relations) and about the representation of these entities in the textual, structural, and visual language of a website of this domain. In this demonstration, we demonstrate with a first prototype of DIADEM that, in contrast to alchemists, DIADEM has developed a viable formula.
OPAL: a passe-partout for web forms - WWW 2012 (Demonstration)Giorgio Orsi
Web forms are the interfaces of the deep web. Though modern web browsers provide facilities to assist in form filling, this assistance is limited to prior form fillings or keyword matching. Automatic form understanding enables a broad range of applications, including crawlers, meta-search engines, and usability and accessibility support for enhanced web browsing. In this demonstration, we use a novel form understanding approach, OPAL, to assist in form filling even for complex, previously unknown forms. OPAL associates form labels to fields by analyzing structural properties in the HTML encoding and visual features of the page rendering. OPAL interprets this labeling and classifies the fields according to a given domain ontology. The combination of these two properties, allows OPAL to deal effectively with many forms outside of the grasp of existing form filling techniques. In the UK real estate domain, OPAL achieves >99% accuracy in form understanding.
Querying UML Class Diagrams - FoSSaCS 2012Giorgio Orsi
UML Class Diagrams (UCDs) are the best known class-based formalism for conceptual modeling. They are used by software engineers to model the intensional structure of a system in terms of classes, attributes and operations, and to express constraints that must hold for every instance of the system. Reasoning over UCDs is of paramount importance in design, validation, maintenance and system analysis; however, for medium and large software projects, reasoning over UCDs may be impractical. Query answering, in particular, can be used to verify whether a (possibly incomplete) instance of the system modeled by the UCD, i.e., a snapshot, enjoys a certain property. In this work, we study the problem of querying UCD instances, and we relate it to query answering under guarded Datalog +/-, that is, a powerful Datalog-based language for ontological modeling. We present an expressive and meaningful class of UCDs, named UCDLog, under which conjunctive query answering is tractable in the size of the instances.
OPAL: automated form understanding for the deep web - WWW 2012Giorgio Orsi
Forms are our gates to the web. They enable us to access the deep content of web sites. Automatic form understanding unlocks this content for applications ranging from crawlers to meta-search engines and is essential for improving usability and accessibility of the web. Form understanding has received surprisingly little attention other than as component in specific applications such as crawlers. No comprehensive approach to form understanding exists and previous works disagree even in the definition of the problem. In this paper, we present OPAL, the first comprehensive approach to form understanding. We identify form labeling and form interpretation as the two main tasks involved in form understanding. On both problems OPAL pushes the state of the art: For form labeling, it combines signals from the text, structure, and visual rendering of a web page, yielding robust characterisations of common design patterns. In extensive experiments on the ICQ and TEL-8 benchmarks and a set of 200 modern web forms OPAL outperforms previous approaches by a significant margin. For form interpretation, we introduce a template language to describe frequent form patterns. These two parts of OPAL combined yield form understanding with near perfect accuracy (> 98%).
Nyaya: Semantic data markets: a flexible environment for knowledge management...Giorgio Orsi
We present Nyaya , a flexible system for the management of Semantic-Web data which couples a general-purpose storage mechanism with efficient ontology reasoning and querying capabilities. Nyaya processes large Semantic-Web datasets,
expressed in a variety of formalisms, by transforming them into a collection of Semantic Data Kiosks. Each kiosk exposes the native meta-data in a uniform fashion using Datalog± , a very general rule-based language for the representation of ontological constraints. The kiosks form a Semantic Data Market where the data in each kiosk can be uniformly accessed using conjunctive queries and where users can specify user-defined constraints over the data. Nyaya is easily extensible and robust to updates of both data and meta-data in the kiosk and can readily adapt to different logical organization of the persistent storage. The approach has been experimented using well-known benchmarks, and compared to state-of-the-art research prototypes and commercial systems.
1. Problems and Opportunities in
Context-Based Personalization
C. Bolchini, E. Quintarelli, F. A. Schreiber and L. Tanca
Dipartimento di Elettronica e Informazione
Politecnico di Milano
G. Orsi
Department of Computer Science
University of Oxford
PersDB@VLDB 2011
3. Data Management:
What does it mean today?
Standard DBMSs technology is limiting for many applications
What do users want from us?
data integration/exchange
heterogeneity
mobility
incompleteness/uncertainty
interaction with the physical world
personalization
manage the information overload
4. Information Overload and Noise:
Personalization and context-awareness
Context-based personalization: shaping answers (to queries)
according to the user’s preferences and situation (i.e., context).
model and collect characteristics of the users (or groups of)
mostly implicit (behavioral analysis, sensing, …)
non-functional (e.g., data quality)
5. Information Overload and Noise:
Personalization and context-awareness
Context-based personalization: shaping answers (to queries)
according to the user’s preferences and situation (i.e., context).
model and collect characteristics of the users (or groups of)
mostly implicit (behavioral analysis, sensing, …)
non-functional (e.g., data quality)
user processes
• dynamic
user situation • context and
evolution of preference based
personalization • static
• context-based • Involve sensing
user profile and sociality
• static
• user-based
6. Information Personalization:
Context-aware data tailoring
• instantiation Context- • context-aware
• observables
Context Context • validation Aware data
• context
Modelling Sensing Behaviour • context-aware
schema • reasoning operations
8. Information Personalization:
Context-aware data tailoring
• instantiation Context- • context-aware
• observables
Context Context • validation Aware data
• context
Modelling Sensing Behaviour • context-aware
schema • reasoning operations
design-time run-time
is that really so?
… will see later
9. Information Personalization:
Context-ADDICT
Context-aware data design, integration, contextualization and tailoring.
C. Bolchini, C. Curino, G. Orsi, E. Quintarelli, R. Rossato, F.A. Schreiber, L. Tanca.
And what can context do for data? (Commun. ACM) – 2009
10. Information Personalization:
Context-ADDICT
Context-aware data design, integration, contextualization and tailoring.
data
context
personali
zation
Context-aware data
management
C. Bolchini, C. Curino, G. Orsi, E. Quintarelli, R. Rossato, F.A. Schreiber, L. Tanca.
And what can context do for data? (Commun. ACM) – 2009
11. Context Representation and Management:
Model
Context Model
generality
multiple abstraction
levels
expressivity
tractability of context
querying and reasoning
12. Context Representation and Management:
Model
Context Model
generality
multiple abstraction
levels
expressivity
tractability of context
querying and reasoning
C. Bolchini, C. Curino, E. Quintarelli, F.A. Schreiber, L. Tanca.
Context information for knowledge reshaping. (Int. J. Web Eng. Technol.) – 2009
13. Context Representation and Management:
Data tailoring
C. Bolchini, E. Quintarelli, R. Rossato.
Relational Data Tailoring Through View Composition (ER) - 2007.
14. Context Representation and Management:
Evolution
Operations:
insert
delete
replace
Guarantee the context-schema context-instance consistency
E. Quintarelli, E. Rabosio, L. Tanca.
Context schema evolution in context-aware data management. (ER) – 2011.
15. Personalization Management:
Context vs preferences
Context:
coarse grained
targets classes of users
Preferences:
fine grained
targets individual users
Deriving preferences: s-rules
explicit input <C cond, conf>
mining
A. Miele, E. Quintarelli, L. Tanca.
A methodology for preference-based personalization of contextual data. (EDBT) – 2009.
17. Personalization Management:
s-rules
We are interested in s-rules, correlating contexts and data
A s-rule on a relation R(X) is a tuple: <C cond, conf>
C: a context
cond: a conjunction of conditions in the form A=value, where A is
an attribute belonging to R(X) or to a relation reachable from R(X)
through foreign keys
conf: is the confidence of the association rule C cond
Example:
< situation=alone, interest-topic=classroom classroom.type=‘computerized’, 0.73 >
18. Data Access Management:
Heterogeneity and semantics
• data
ontologies • context
• views
• schema
Reverse
heterogeneity engineering
inference
• transiency
• query
query
rewriting
answering
• reasoning
G. Orsi,
Context Based Querying of Dynamic and Heterogeneous Information Sources. (PhD Thesis)
19. Data Access Management:
Sensing and actuation
Plain PerLa
• sensors as
tuple
querying providers
• PerLa
language
context • Numerical
sensing sensing observables
Context PerLa
• contextual
context actions
switch • real-time
behaviour
F.A. Schreiber, R. Camplani, M. Fortunato, M. Marelli, G. Rota.
PerLa: A Language and Middleware Architecture for Data Management and Integration in
Pervasive Information Systems. (IEEE TSE) – 2011.
20. Context-Awareness and Personalization
What’s next?
Mature enough for a serious personalization theory
serious as in “let’s prove that!”
Process-centric, dynamic and social context management
static context models are limiting
Context as a bridge between software and physical world
sensors and actuators
Effective vs private personalization
21. Applications
Make it useful
Emergency Management
G. Orsi, L. Tanca, E. Zimeo.
Keyword-based, context-aware selection of natural language query patterns. (EDBT) - 2011.
Pervasive Advertisement
L. Carrara, G. Orsi.
A new perspective in pervasive advertisement. (Preliminary tech-report) 2011.
22. This is the end
Thank you
Giorgio Fabio A.
Orsi Schreiber
Letizia
Tanca
Cristiana Elisa
Bolchini Quintarelli