The document proposes methods to address the cold start problem in recommendation systems using semantic enhancements. It describes acquiring implicit semantics from item attributes using vector space models and exploiting these semantics in content-based filtering. Experimental results on a MovieLens dataset show that exploiting user-based semantic similarities through best-pair matching outperforms traditional collaborative filtering for cold start scenarios. Future work involves exploring other domains and enhancing context-aware recommendations.
This document discusses how pragmatic metadata, or data about how data is used, can support the generation of semantic metadata for user models. It presents an experiment using different topic modeling algorithms, including LDA and Dirichlet Multinomial Regression, to learn topics from user posts and annotations. Models incorporated pragmatic metadata like authorship and reply relationships. Evaluation showed models using pragmatic user metadata like replies had better predictive performance on future user posts than baselines without metadata. The results indicate pragmatic metadata can help generate semantic topic annotations for users and posts.
This document discusses a method for identifying thematic objects in videos. The method first extracts local feature points from each video frame using Harris corner detection. Descriptors are extracted around each corner point. The descriptors of a reference image are compared to the descriptors of video frames to find similarity and identify frames containing the same object. Identifying frequently appearing objects in this way helps with tasks like object search, tagging, and video summarization. The approach is able to identify objects even when there is partial occlusion or viewpoint variation between frames.
The document proposes a recommendation system that incorporates semantics to address limitations of traditional recommenders. It uses ontologies to represent user interests and item annotations, and employs semantic inference and similarity methods. An evaluation on movie ratings shows the semantic approach improves accuracy, especially for cold-start users with small profiles. Further experimentation analyzes how the structure of different taxonomy affects performance of the semantic methods.
The document presents a semantic approach called SETS (Semantic eTendering System) for evaluating offers in electronic tendering (eTendering) systems. It discusses problems with current non-semantic eTendering evaluation and proposes using semantic web technologies like ontologies, RDF, and SBVR (Semantics of Business Vocabulary and Business Rules) to formally represent tender offers and requirements to enable automated semantic evaluation and ranking of offers. The SETS model, architecture, operational model, and mathematical model are described for performing this semantic evaluation of tender offers in eTendering systems.
Cs231n convolutional neural networks for visual recognitionvidhya DS
The document introduces image classification and the nearest neighbor classifier approach. It discusses how image classification involves assigning labels to images from a fixed set of categories. The nearest neighbor classifier is a simple approach that compares a test image to all training images and labels the test image with the label of its nearest neighbor. On a sample dataset, the nearest neighbor approach only correctly labeled about 3 out of 10 test images.
This document summarizes an academic paper that proposes a method for incrementally training object detection models to classify unseen object classes in real-time. It begins by providing background on object detection techniques like YOLO and SSD that can perform detection in a single pass. The paper aims to improve these single-shot detectors through incremental learning to classify new object classes without retraining the entire model from scratch. It conducted experiments on YOLO and VGG16 to investigate how well they can classify objects from unseen classes and whether their performance is affected by factors like background, bounding box size, or network architecture. The goal is to develop a more robust object detection method that can easily adapt to new classes of objects in real-time applications.
This document discusses background elimination techniques which involve three main steps: object detection to select the target, segmentation to isolate the target from the background, and refinement to improve the quality of the segmented mask. It provides an overview of approaches that have been used for each step, including early methods based on SVM and more recent deep learning-based techniques like Mask R-CNN that integrate detection and segmentation. The document also notes that segmentation is challenging without object detection cues and discusses types of segmentation as well as refinement methods that use transformations, dimension reduction, and graph-based modeling.
This document provides an overview of a PowerPoint presentation on cells and systems for an 8th grade science class. The presentation covers topics like characteristics of living things, cell structures and organization, transport systems in plants and animals, and human organ systems. It includes concept maps, diagrams, and descriptions to support textbook content and enrich student learning about biology. Slides cover characteristics of life, microscope use, the cell theory, plant and animal cells and tissues, and how cells, tissues, organs and systems work together.
This document discusses how pragmatic metadata, or data about how data is used, can support the generation of semantic metadata for user models. It presents an experiment using different topic modeling algorithms, including LDA and Dirichlet Multinomial Regression, to learn topics from user posts and annotations. Models incorporated pragmatic metadata like authorship and reply relationships. Evaluation showed models using pragmatic user metadata like replies had better predictive performance on future user posts than baselines without metadata. The results indicate pragmatic metadata can help generate semantic topic annotations for users and posts.
This document discusses a method for identifying thematic objects in videos. The method first extracts local feature points from each video frame using Harris corner detection. Descriptors are extracted around each corner point. The descriptors of a reference image are compared to the descriptors of video frames to find similarity and identify frames containing the same object. Identifying frequently appearing objects in this way helps with tasks like object search, tagging, and video summarization. The approach is able to identify objects even when there is partial occlusion or viewpoint variation between frames.
The document proposes a recommendation system that incorporates semantics to address limitations of traditional recommenders. It uses ontologies to represent user interests and item annotations, and employs semantic inference and similarity methods. An evaluation on movie ratings shows the semantic approach improves accuracy, especially for cold-start users with small profiles. Further experimentation analyzes how the structure of different taxonomy affects performance of the semantic methods.
The document presents a semantic approach called SETS (Semantic eTendering System) for evaluating offers in electronic tendering (eTendering) systems. It discusses problems with current non-semantic eTendering evaluation and proposes using semantic web technologies like ontologies, RDF, and SBVR (Semantics of Business Vocabulary and Business Rules) to formally represent tender offers and requirements to enable automated semantic evaluation and ranking of offers. The SETS model, architecture, operational model, and mathematical model are described for performing this semantic evaluation of tender offers in eTendering systems.
Cs231n convolutional neural networks for visual recognitionvidhya DS
The document introduces image classification and the nearest neighbor classifier approach. It discusses how image classification involves assigning labels to images from a fixed set of categories. The nearest neighbor classifier is a simple approach that compares a test image to all training images and labels the test image with the label of its nearest neighbor. On a sample dataset, the nearest neighbor approach only correctly labeled about 3 out of 10 test images.
This document summarizes an academic paper that proposes a method for incrementally training object detection models to classify unseen object classes in real-time. It begins by providing background on object detection techniques like YOLO and SSD that can perform detection in a single pass. The paper aims to improve these single-shot detectors through incremental learning to classify new object classes without retraining the entire model from scratch. It conducted experiments on YOLO and VGG16 to investigate how well they can classify objects from unseen classes and whether their performance is affected by factors like background, bounding box size, or network architecture. The goal is to develop a more robust object detection method that can easily adapt to new classes of objects in real-time applications.
This document discusses background elimination techniques which involve three main steps: object detection to select the target, segmentation to isolate the target from the background, and refinement to improve the quality of the segmented mask. It provides an overview of approaches that have been used for each step, including early methods based on SVM and more recent deep learning-based techniques like Mask R-CNN that integrate detection and segmentation. The document also notes that segmentation is challenging without object detection cues and discusses types of segmentation as well as refinement methods that use transformations, dimension reduction, and graph-based modeling.
This document provides an overview of a PowerPoint presentation on cells and systems for an 8th grade science class. The presentation covers topics like characteristics of living things, cell structures and organization, transport systems in plants and animals, and human organ systems. It includes concept maps, diagrams, and descriptions to support textbook content and enrich student learning about biology. Slides cover characteristics of life, microscope use, the cell theory, plant and animal cells and tissues, and how cells, tissues, organs and systems work together.
Extending Recommendation Systems With Semantics And Context AwarenessVictor Codina
This document proposes extending recommendation systems with semantics and context-awareness. It discusses limitations of traditional recommendation models and how semantics and context could help overcome those limitations. The authors propose a model that uses domain concepts with implicit semantics relationships and contextual concepts without semantics. An offline experiment on a pruned MovieLens dataset compares the proposed model to baselines. Results show the proposed contextual-semantic model improves prediction accuracy overall and for cold-start users compared to static and non-semantic models.
Netflix uses a variety of techniques to provide personalized recommendations to users. Some key aspects include:
1. Netflix recommendations are generated using both offline and online techniques. Offline techniques allow for more complex computations but results may become stale, while online techniques can respond quickly but have stricter time constraints.
2. Recommendations are generated using a variety of data sources and machine learning models, including SVD, RBMs, gradient boosted trees, and other techniques. Both the data and models are important for generating high quality recommendations.
3. Netflix tests recommendations using both offline and online A/B testing techniques. Offline testing is used to evaluate new models and ideas before launching online tests involving real users
"LiquidPub: Services at Service of Science". Invited talks of Fabio Casati at the European Conference on Web Services 2009 and in the Politechnico di Milano
The document provides an overview of machine learning use cases. It begins with an agenda that will discuss the basic framework for ML projects, model deployment options, and various ML use cases like text classification, image classification, object detection, etc. It then covers the basic 5 step framework for ML projects - defining the problem, planning the solution, acquiring and preparing data, designing and training a model, and deploying the solution. Next, it discusses popular methods for various tasks like image classification, object detection, pose estimation. Finally, it shares several use cases for each task to demonstrate real-world applications.
This document provides an introduction to machine learning. It defines machine learning as developing algorithms that allow computers to learn from experience to improve their performance on tasks. The document outlines supervised learning and other learning frameworks. It discusses applications of machine learning such as autonomous vehicles, recommendation systems, and credit risk analysis. The document also provides examples of machine learning applications at the University of Liege including medical diagnosis, gene expression analysis, and patient classification.
Reward constrained interactive recommendation with natural language feedback ...Jeong-Gwan Lee
The document summarizes a proposed method called Reward-Constrained Interactive Recommendation that uses natural language feedback from users to iteratively improve recommendations. It models the recommendation process as a constrained Markov decision process and introduces a discriminator to detect violations of user preferences from feedback and constrain the recommender from such violations. The recommender, discriminator, and feature extractors are trained alternatively using policy gradient to find the saddle point that maximizes reward while satisfying constraints.
Integrating digital traces into a semantic enriched dataDhaval Thakker
The document discusses integrating digital traces from social media into a semantic-enriched data cloud for informal learning. It outlines a processing pipeline that collects digital traces, semantically augments them using ontologies, and allows browsing and interaction through a semantic query service. An exploratory study on job interviews found that authentic examples from digital traces were useful learning stimuli but could be mistaken as norms without context. Semantic technologies provide opportunities to organize digital traces for informal learning but further work is needed to fully realize this potential.
The document discusses a tool called MARS that provides software refactoring recommendations based on module coupling and cohesion metrics. It aims to improve software architectures that have drifted from their original design. MARS analyzes external dependencies between modules and internal dependencies within modules to identify candidates for refactoring where a module's coupling is greater than its cohesion. The goal is to recommend moves to reduce coupling and improve cohesion, thereby modulating the software architecture.
In this presentation we cover the high-level theory behind our idea to democratize assessment as presented at the Digital Media and Learning 2012 conference.
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Abstract: Cloud computing is no longer a buzz term but a reality. With a great opportunity for huge financial savings and demand for Software-as-a-Service products, developing products for the cloud is something that cannot be ignored. In this talk, I would like to touch upon 3 key aspects of cloud engineering – scalability, security and flexibility and its impact on application architecture, data processing needs and deployment.
* By Manjusha Madabushi, Co-Founder and CTO of Talentica Software Pvt. Ltd.
Speaker: Manjusha is a Co-Founder and CTO of Talentica Software Pvt. Ltd. She has a Bachelor’s degree from IIT Mumbai and a Master’s degree from Northwestern University, Chicago. She has over 23 years experience working in the IT industry. She started her career working for Amoco Research Centre, USA till 1989 before returning to India and joining TCS. During her 9 year career at TCS, Manjusha worked in different technology areas such as Artificial Intelligence, Application Modeling, Compilers etc. She was also the Engineering head of the TCS’ product – E.X. NGN. Post TCS, she founded Nitman Software, which was acquired by a US based CRM company, eGain Communications in the year 2000. She co-founded Talentica Software, a company that helps technology companies transform their ideas into successful products in 2003. Talentica specializes in building highly scalable products using cutting edge technologies in the areas of Social Analytics, CRM, Natural Language processing and Advertising.
The document discusses MoDisco, a model-driven platform for modernizing legacy software systems. It can discover models from various legacy technologies like Java source code and databases. These models can then be understood, transformed, and used to generate documentation, metrics, and code for a new system. MoDisco uses a metamodeling approach and supports technologies like Java through customizable discovery, modeling, and transformation tools.
DIAM : Towards a Model for Describing Appropriation Processes Through the Evo...Yannick Prié
1. The document proposes the Digital Instrument Appropriation Model (DIAM) to describe the appropriation process of digital artifacts over time through their evolution.
2. DIAM extends instrumental theory by conceptualizing digital artifacts as dynamic instruments that change with user particularization and circulations between users.
3. A case study applying DIAM to analyze researchers' appropriation of a video annotation tool found it helped explain how users particularized information structures, though it failed to fully capture collaborative dimensions and tool combinations.
This document summarizes tag-based recommenders and social tagging systems. It discusses:
1) Social tagging systems allow users to collaboratively tag and categorize content. Popular social tagging sites include Delicious, Flickr, YouTube, etc. Tagging systems have features like tag sharing and selection.
2) Tag recommenders aim to encourage tagging and reuse of common tags. Recommender techniques discussed include most popular, collaborative filtering, tensor factorization, and graph-based methods.
3) The document presents the speaker's work on tag-based collaborative filtering which improves neighbor selection by considering tag semantic similarity between users. Their IUI 2008 paper shows their tag-based approach improves recommendation performance over traditional collaborative filtering.
GeniUS is a topic and user modeling library that produces semantically meaningful user profiles from social web data to enhance interoperability between applications. It aggregates relevant user information from sources like Twitter, enriches it with semantic data, and generates customized profiles according to application needs. Evaluation shows domain-specific profiles generated by GeniUS improve recommendation performance compared to generic profiles, with performance varying slightly between domains.
This document summarizes a research paper that proposes a social ranking technique using tag-based recommender systems to uncover relevant content from large datasets. It outlines the problem of content overload and sparse tagging in social bookmarking sites. The researchers analyzed a dataset from CiteULike, identifying properties of users' tagging behavior. They developed a social ranking query model that expands queries based on tag similarity to improve accuracy and coverage compared to standard information retrieval systems. The model is evaluated and compared to related work.
The document discusses the past, present, and future of software architecture. It describes how software architecture has traditionally been defined as consisting of elements, form, and rationale. It notes that currently, software architecture is an area of active research in academia and practice in industry. However, it questions whether software architecture may become less important going forward as existing solutions are reused. It raises challenges for the future, such as handling dynamic and continuous changes to architecture, integrating software and hardware, and designing resilient systems with undefined boundaries.
Daniel Tunkelang argues that knowledge representation is overrated for AI systems and computation is underrated. He discusses past attempts at knowledge representation like Cyc and Freebase, and how today's data-driven approaches using large datasets have proven more effective than rule-based systems for tasks like machine translation and question answering. Tunkelang advocates for semi-structured data and data-driven recommendations and queries to empower users and fill gaps in systems' knowledge. He concludes that communication is both the problem and solution, and systems should leverage users as intelligent partners rather than relying solely on perfect schemas or vocabularies.
A Role for Provenance in Quality AssessmentChris Baillie
This document discusses using provenance information to help assess data quality. It proposes representing sensor observations and their provenance as linked data and using this information to evaluate quality metrics like accuracy, timeliness, and relevance. The work done so far involves representing observations and quality requirements as linked data and generating initial quality scores. Future work will focus on implementing quality rules that examine provenance information and enabling quality scores to be reused.
Digital twins of the environment: opportunities and barriers for citizen scienceLuigi Ceccaroni
This document summarizes a presentation on digital twins and citizen science given at the ECSA conference in Vienna on April 3, 2024. It discusses how citizen science could contribute to digital twins by increasing public engagement, awareness, participation, and the societal relevance of digital twins. It also provides an overview of relevant citizen science pilots in the Iliad project, including using citizen data to predict jellyfish swarms in Israel and map potential oil pollution. Finally, it discusses interviews exploring the opportunities and barriers to citizen engagement in digital twins of the ocean.
Harnessing the power of citizen science for environmental stewardship and wat...Luigi Ceccaroni
Environmental degradation poses a significant challenge to Africa's sustainable development, demanding transformative approaches to conservation efforts.
The MoRe4nature project emerges as an opportunity, integrating citizen-science initiatives as key activities in environmental compliance assurance (ECA). This innovative approach empowers citizens to contribute meaningfully to sustainable natural-resource management, fostering a collaborative data and knowledge production platform, particularly in the realm of water monitoring and water literacy. MoRe4nature's socio-technical approach addresses the barriers to the uptake and utilisation of citizen-generated data in ECA, ensuring the long-term sustainability and impact of citizen science initiatives in Africa. Specifically, MoRe4nature will work with 40 cases across Europe, Latin America, Asia and Africa, including two FreshWater Watch cases in Sierra Leone and Zambia.
FreshWater Watch in Africa (FWW), an exemplary citizen science initiative, empowers communities in Africa to monitor the health of their precious freshwater resources, providing valuable data for water quality assessments and environmental management. By harnessing the power of citizen science, FWW directly contributes to the achievement of the UN's Sustainable Development Goals, promoting access to safe water and sanitation for all. FWW is currently working with partners in Zambia, Sierra Leone, South Africa, Tanzania and Kenya and is looking to support work in other African countries in the future.
The ProBleu project complements MoRe4nature's and FWW’s efforts by fostering ocean and water literacy among students and teachers across and beyond Europe, including Africa. Through a comprehensive set of activities, the ProBleu project promotes ocean and water literacy, engages students in real-world ocean and water research, and enhances the sense of stewardship towards the value and challenges of oceans and waters. This initiative empowers individuals and schools to become active advocates for environmental protection and water literacy, influencing policy decisions and driving sustainable practices at local and national levels.
By strengthening existing citizen science, fostering collaboration and partnerships, synergising citizen science with living labs and fab labs, and developing data validation tools, MoRe4nature, ProBleu and FWW empower citizens to become active partners in environmental protection and water literacy, safeguarding our planet for generations to come.
More Related Content
Similar to Semantically-Enhanced Recommendation Algorithms
Extending Recommendation Systems With Semantics And Context AwarenessVictor Codina
This document proposes extending recommendation systems with semantics and context-awareness. It discusses limitations of traditional recommendation models and how semantics and context could help overcome those limitations. The authors propose a model that uses domain concepts with implicit semantics relationships and contextual concepts without semantics. An offline experiment on a pruned MovieLens dataset compares the proposed model to baselines. Results show the proposed contextual-semantic model improves prediction accuracy overall and for cold-start users compared to static and non-semantic models.
Netflix uses a variety of techniques to provide personalized recommendations to users. Some key aspects include:
1. Netflix recommendations are generated using both offline and online techniques. Offline techniques allow for more complex computations but results may become stale, while online techniques can respond quickly but have stricter time constraints.
2. Recommendations are generated using a variety of data sources and machine learning models, including SVD, RBMs, gradient boosted trees, and other techniques. Both the data and models are important for generating high quality recommendations.
3. Netflix tests recommendations using both offline and online A/B testing techniques. Offline testing is used to evaluate new models and ideas before launching online tests involving real users
"LiquidPub: Services at Service of Science". Invited talks of Fabio Casati at the European Conference on Web Services 2009 and in the Politechnico di Milano
The document provides an overview of machine learning use cases. It begins with an agenda that will discuss the basic framework for ML projects, model deployment options, and various ML use cases like text classification, image classification, object detection, etc. It then covers the basic 5 step framework for ML projects - defining the problem, planning the solution, acquiring and preparing data, designing and training a model, and deploying the solution. Next, it discusses popular methods for various tasks like image classification, object detection, pose estimation. Finally, it shares several use cases for each task to demonstrate real-world applications.
This document provides an introduction to machine learning. It defines machine learning as developing algorithms that allow computers to learn from experience to improve their performance on tasks. The document outlines supervised learning and other learning frameworks. It discusses applications of machine learning such as autonomous vehicles, recommendation systems, and credit risk analysis. The document also provides examples of machine learning applications at the University of Liege including medical diagnosis, gene expression analysis, and patient classification.
Reward constrained interactive recommendation with natural language feedback ...Jeong-Gwan Lee
The document summarizes a proposed method called Reward-Constrained Interactive Recommendation that uses natural language feedback from users to iteratively improve recommendations. It models the recommendation process as a constrained Markov decision process and introduces a discriminator to detect violations of user preferences from feedback and constrain the recommender from such violations. The recommender, discriminator, and feature extractors are trained alternatively using policy gradient to find the saddle point that maximizes reward while satisfying constraints.
Integrating digital traces into a semantic enriched dataDhaval Thakker
The document discusses integrating digital traces from social media into a semantic-enriched data cloud for informal learning. It outlines a processing pipeline that collects digital traces, semantically augments them using ontologies, and allows browsing and interaction through a semantic query service. An exploratory study on job interviews found that authentic examples from digital traces were useful learning stimuli but could be mistaken as norms without context. Semantic technologies provide opportunities to organize digital traces for informal learning but further work is needed to fully realize this potential.
The document discusses a tool called MARS that provides software refactoring recommendations based on module coupling and cohesion metrics. It aims to improve software architectures that have drifted from their original design. MARS analyzes external dependencies between modules and internal dependencies within modules to identify candidates for refactoring where a module's coupling is greater than its cohesion. The goal is to recommend moves to reduce coupling and improve cohesion, thereby modulating the software architecture.
In this presentation we cover the high-level theory behind our idea to democratize assessment as presented at the Digital Media and Learning 2012 conference.
Session presented at the 2nd IndicThreads.com Conference on Cloud Computing held in Pune, India on 3-4 June 2011.
http://CloudComputing.IndicThreads.com
Abstract: Cloud computing is no longer a buzz term but a reality. With a great opportunity for huge financial savings and demand for Software-as-a-Service products, developing products for the cloud is something that cannot be ignored. In this talk, I would like to touch upon 3 key aspects of cloud engineering – scalability, security and flexibility and its impact on application architecture, data processing needs and deployment.
* By Manjusha Madabushi, Co-Founder and CTO of Talentica Software Pvt. Ltd.
Speaker: Manjusha is a Co-Founder and CTO of Talentica Software Pvt. Ltd. She has a Bachelor’s degree from IIT Mumbai and a Master’s degree from Northwestern University, Chicago. She has over 23 years experience working in the IT industry. She started her career working for Amoco Research Centre, USA till 1989 before returning to India and joining TCS. During her 9 year career at TCS, Manjusha worked in different technology areas such as Artificial Intelligence, Application Modeling, Compilers etc. She was also the Engineering head of the TCS’ product – E.X. NGN. Post TCS, she founded Nitman Software, which was acquired by a US based CRM company, eGain Communications in the year 2000. She co-founded Talentica Software, a company that helps technology companies transform their ideas into successful products in 2003. Talentica specializes in building highly scalable products using cutting edge technologies in the areas of Social Analytics, CRM, Natural Language processing and Advertising.
The document discusses MoDisco, a model-driven platform for modernizing legacy software systems. It can discover models from various legacy technologies like Java source code and databases. These models can then be understood, transformed, and used to generate documentation, metrics, and code for a new system. MoDisco uses a metamodeling approach and supports technologies like Java through customizable discovery, modeling, and transformation tools.
DIAM : Towards a Model for Describing Appropriation Processes Through the Evo...Yannick Prié
1. The document proposes the Digital Instrument Appropriation Model (DIAM) to describe the appropriation process of digital artifacts over time through their evolution.
2. DIAM extends instrumental theory by conceptualizing digital artifacts as dynamic instruments that change with user particularization and circulations between users.
3. A case study applying DIAM to analyze researchers' appropriation of a video annotation tool found it helped explain how users particularized information structures, though it failed to fully capture collaborative dimensions and tool combinations.
This document summarizes tag-based recommenders and social tagging systems. It discusses:
1) Social tagging systems allow users to collaboratively tag and categorize content. Popular social tagging sites include Delicious, Flickr, YouTube, etc. Tagging systems have features like tag sharing and selection.
2) Tag recommenders aim to encourage tagging and reuse of common tags. Recommender techniques discussed include most popular, collaborative filtering, tensor factorization, and graph-based methods.
3) The document presents the speaker's work on tag-based collaborative filtering which improves neighbor selection by considering tag semantic similarity between users. Their IUI 2008 paper shows their tag-based approach improves recommendation performance over traditional collaborative filtering.
GeniUS is a topic and user modeling library that produces semantically meaningful user profiles from social web data to enhance interoperability between applications. It aggregates relevant user information from sources like Twitter, enriches it with semantic data, and generates customized profiles according to application needs. Evaluation shows domain-specific profiles generated by GeniUS improve recommendation performance compared to generic profiles, with performance varying slightly between domains.
This document summarizes a research paper that proposes a social ranking technique using tag-based recommender systems to uncover relevant content from large datasets. It outlines the problem of content overload and sparse tagging in social bookmarking sites. The researchers analyzed a dataset from CiteULike, identifying properties of users' tagging behavior. They developed a social ranking query model that expands queries based on tag similarity to improve accuracy and coverage compared to standard information retrieval systems. The model is evaluated and compared to related work.
The document discusses the past, present, and future of software architecture. It describes how software architecture has traditionally been defined as consisting of elements, form, and rationale. It notes that currently, software architecture is an area of active research in academia and practice in industry. However, it questions whether software architecture may become less important going forward as existing solutions are reused. It raises challenges for the future, such as handling dynamic and continuous changes to architecture, integrating software and hardware, and designing resilient systems with undefined boundaries.
Daniel Tunkelang argues that knowledge representation is overrated for AI systems and computation is underrated. He discusses past attempts at knowledge representation like Cyc and Freebase, and how today's data-driven approaches using large datasets have proven more effective than rule-based systems for tasks like machine translation and question answering. Tunkelang advocates for semi-structured data and data-driven recommendations and queries to empower users and fill gaps in systems' knowledge. He concludes that communication is both the problem and solution, and systems should leverage users as intelligent partners rather than relying solely on perfect schemas or vocabularies.
A Role for Provenance in Quality AssessmentChris Baillie
This document discusses using provenance information to help assess data quality. It proposes representing sensor observations and their provenance as linked data and using this information to evaluate quality metrics like accuracy, timeliness, and relevance. The work done so far involves representing observations and quality requirements as linked data and generating initial quality scores. Future work will focus on implementing quality rules that examine provenance information and enabling quality scores to be reused.
Similar to Semantically-Enhanced Recommendation Algorithms (20)
Digital twins of the environment: opportunities and barriers for citizen scienceLuigi Ceccaroni
This document summarizes a presentation on digital twins and citizen science given at the ECSA conference in Vienna on April 3, 2024. It discusses how citizen science could contribute to digital twins by increasing public engagement, awareness, participation, and the societal relevance of digital twins. It also provides an overview of relevant citizen science pilots in the Iliad project, including using citizen data to predict jellyfish swarms in Israel and map potential oil pollution. Finally, it discusses interviews exploring the opportunities and barriers to citizen engagement in digital twins of the ocean.
Harnessing the power of citizen science for environmental stewardship and wat...Luigi Ceccaroni
Environmental degradation poses a significant challenge to Africa's sustainable development, demanding transformative approaches to conservation efforts.
The MoRe4nature project emerges as an opportunity, integrating citizen-science initiatives as key activities in environmental compliance assurance (ECA). This innovative approach empowers citizens to contribute meaningfully to sustainable natural-resource management, fostering a collaborative data and knowledge production platform, particularly in the realm of water monitoring and water literacy. MoRe4nature's socio-technical approach addresses the barriers to the uptake and utilisation of citizen-generated data in ECA, ensuring the long-term sustainability and impact of citizen science initiatives in Africa. Specifically, MoRe4nature will work with 40 cases across Europe, Latin America, Asia and Africa, including two FreshWater Watch cases in Sierra Leone and Zambia.
FreshWater Watch in Africa (FWW), an exemplary citizen science initiative, empowers communities in Africa to monitor the health of their precious freshwater resources, providing valuable data for water quality assessments and environmental management. By harnessing the power of citizen science, FWW directly contributes to the achievement of the UN's Sustainable Development Goals, promoting access to safe water and sanitation for all. FWW is currently working with partners in Zambia, Sierra Leone, South Africa, Tanzania and Kenya and is looking to support work in other African countries in the future.
The ProBleu project complements MoRe4nature's and FWW’s efforts by fostering ocean and water literacy among students and teachers across and beyond Europe, including Africa. Through a comprehensive set of activities, the ProBleu project promotes ocean and water literacy, engages students in real-world ocean and water research, and enhances the sense of stewardship towards the value and challenges of oceans and waters. This initiative empowers individuals and schools to become active advocates for environmental protection and water literacy, influencing policy decisions and driving sustainable practices at local and national levels.
By strengthening existing citizen science, fostering collaboration and partnerships, synergising citizen science with living labs and fab labs, and developing data validation tools, MoRe4nature, ProBleu and FWW empower citizens to become active partners in environmental protection and water literacy, safeguarding our planet for generations to come.
Citizen science, training, data quality and interoperabilityLuigi Ceccaroni
Citizen science, training, data quality and interoperability
More and more people are interested in participating in citizen-science projects, and the technology is becoming more accessible.
Data quality is essential for citizen-science projects. Without quantified-quality data, the results of citizen science projects cannot be trusted.
There are several challenges to ensuring data quality in citizen-science projects, such as participant motivation and training, data-entry errors, and environmental factors.
These challenges can be addressed by using innovative technologies, such as artificial intelligence, and by developing better training methods.
Mobile devices are becoming increasingly powerful and sophisticated, and they are making it easier for participants to collect data anywhere and anytime.
Artificial intelligence is being used to develop new tools that can automatically analyse data and identify patterns. This makes it easier to identify and correct data errors.
Online communities are providing a space for citizen scientists to connect with each other, share data, and learn from each other. This is helping to improve the quality of data collected by citizen-science projects.
Citizen-science projects are increasingly aware of the importance of data ethics. This is leading to the development of new standards and guidelines for collecting and using citizen science data.
This document discusses methods for measuring the impact of citizen science projects online. It describes the development of a framework called MICS (Measuring Impact of Citizen Science) for assessing citizen science impact. MICS includes indicators for different domains like society, science, economy, environment and governance. The framework provides characteristics for each indicator such as its name, description, data type, and how data should be collected and analyzed. Case studies are being used to help implement and refine the MICS framework.
La empresa se dedica al desarrollo de proyectos tecnológicos y de I+D, con convenios con tres universidades. Tiene divisiones de ingeniería de software, diseño multimedia e investigación en inteligencia artificial y accesibilidad. Presenta el proyecto ABRAZO, una plataforma de comunicación para personas con discapacidad mental, y el proyecto IntegraTV-4all de televisión interactiva accesible.
Abrazo @ congreso e learning e inclusión social 2004Luigi Ceccaroni
El documento describe Abrazo, una plataforma de comunicación e integración para personas con discapacidades mentales. La plataforma tiene como objetivos facilitar la integración de estas personas en la sociedad a través de Internet y nuevas tecnologías, proporcionar herramientas para su formación, autonomía y comunicación. La plataforma se encuentra actualmente en la fase piloto y ofrecerá en el futuro servicios avanzados como e-learning y foros especializados.
The document describes an "Evening Organizer" agent that plans evenings for users by selecting restaurants, movies, and transportation based on user preferences. It composes these options into an itinerary using web services. The agent negotiates reservations by communicating with other agents using ontologies and semantic protocols. The case study highlights challenges in semantic interoperability and dynamic service composition.
El documento describe el proyecto IntegraTV-4all, que tiene como objetivo desarrollar una televisión interactiva accesible para personas con discapacidades a través de la inteligencia artificial. El proyecto consiste en tres fases para implementar servicios de televisión interactiva controlados por voz, contenidos adaptados y una interfaz accesible. Se forma un consorcio entre empresas y universidades para desarrollar la tecnología de reconocimiento de voz, síntesis de voz, comprensión y generación del lenguaje necesaria.
InOutTV G2 es un programa de gestión de entretenimiento y generación de guías electrónicas personalizadas de programación televisiva en entorno PC. Usa un motor de búsqueda y recomendación basado en lógica difusa y un sistema multi-agente. Genera recomendaciones personalizadas para el usuario basadas en sus preferencias explícitas, información estereotípica y observación de sus hábitos.
Modeling utility ontologies in agentcities with a collaborative approach 2002...Luigi Ceccaroni
Modeling Utility Ontologies in Agentcities with a Collaborative Approach describes modeling and implementing domain, service, and utility ontologies used within the Agentcities initiative. The utility ontologies include domain-independent concepts like address, contact details, and price that most services developed in the project use. The ontologies are implemented using DAML+OIL and DAML-S and the utility ontologies form the basis for the terminology used in DAML-S service ontologies.
The document describes an evening organizer agent that plans a user's evening by composing an itinerary that includes selecting a pizza restaurant and sci-fi movie. The agent communicates with other agents to gather information and make reservations. The organizer is presented as a case study for advanced web services that are semantically grounded, published and discovered through registries, and support dynamic composition.
The april agent platform 2002 agentcities, lausanneLuigi Ceccaroni
The April Agent Platform (AAP) is a FIPA-compliant multi-agent platform that provides services to facilitate the development and deployment of agents over the Internet. It is written in the April programming language and uses the InterAgent Communications Model. The AAP features include support for FIPA specifications, parsers, XML, ontologies, and platform services. It has a modular organization and supports various communication protocols. The AAP is being used in several research projects and can integrate different technologies and standards in a flexible way.
The document discusses the ILIAD project, which aims to create a "digital twin of the ocean" between 2022-2025 using 19 million Euros. The digital twin would integrate data on ocean physics, biogeochemistry, geology, and human activities to understand and forecast ocean behavior, protect marine ecosystems, and support industries like tourism and renewable energy. It would utilize AI and digital technologies to create an accurate, multi-dimensional simulation of the ocean based on data from sensors, platforms, citizen science, and historical records. The document also discusses the Capturing Our Coast project, which engaged 3000 citizen scientists in the UK from 2015-2018 to collect over 200,000 records of marine species across 1800 locations.
Metrics and instruments to evaluate the impacts of citizen scienceLuigi Ceccaroni
MICS project: Developing metrics and instruments to evaluate the impacts of citizen science on society, governance, the economy, the environment, and science
COST Action 15212 WG5 - Standardisation and interoperabilityLuigi Ceccaroni
This document summarizes the work of Working Group 5 from COST Action CA15212, which developed a metadata ontology for citizen science projects. The group had 47 members from different countries. It completed two main tasks: 1) Developing an ontology that was published on the COST website and in a journal article, and 2) Coordinating with other standardization groups to have the ontology adopted as a sensor web enablement profile. The work included four workshops over four years and will be continued under the ECSA Working Group on Data, Tools & Technology.
The role of interoperability in encouraging participation in citizen science ...Luigi Ceccaroni
The document discusses the role of interoperability in encouraging participation in citizen science. It describes how interoperability allows different citizen science data repositories and communities to be analyzed and served together, improving knowledge transfer, recruitment of observers, and understanding of global phenomena. The document outlines challenges like different platforms using various data standards, and opportunities for interoperability to increase automation, sharing of data and participants between projects, and scaling of projects. Citizen science is defined as work by civic educators and communities to advance science and encourage democratic engagement to help society rationally address complex problems.
Citclops/EyeOnWater @ Barcelona - Citizen science day 2016Luigi Ceccaroni
Citclops/EyeOnWater @ Vendée Globe is a cooperation among organizations involved in the European project Citclops, organizations involved in the Vendée Globe sailing race, skippers, scientists, and citizens, to observe the color of the ocean on the path of the Vendée Globe. It uses tools and sensors developed by the European-Commission–funded project Citclops (Citizens’ Observatory for Coast and Ocean Optical Monitoring), which introduced an innovative concept for water-quality monitoring, to help oceanographers and limnologists in monitoring natural waters, with a strong focus on long-term data series related to environmental sciences.
The document summarizes a workshop on data collection and management that took place on July 23rd, 2015 in Canberra. The workshop consisted of two hours. In the first hour, there were five talks illustrating different components of data collection and management, followed by a question and answer panel. In the second hour, there was an interactive session to determine challenges and opportunities in data collection and management, followed by a collation of ideas and a short question and answer period. Key topics discussed included reasons for collecting data, standards for data collection, collecting data over time, and tracking data usage.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Zilliz
Join us to introduce Milvus Lite, a vector database that can run on notebooks and laptops, share the same API with Milvus, and integrate with every popular GenAI framework. This webinar is perfect for developers seeking easy-to-use, well-integrated vector databases for their GenAI apps.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
1. Semantically-Enhanced
Recommendation Algorithms
CCIA 2012
Victor Codina & Luigi Ceccaroni
vcodina@lsi.upc.edu lceccaroni@BDigital.org
Departament de Llenguatges i Sistemes Informàtics Health Informatics
Knowledge Engineering and Machine Learning Group Personalized Computational Medicine
3. The value of recommendations
Netflix: 2/3 of the movies rented are recommend
Google News: 38% more clickthrough
Amazon: 35% sales from recommendations
All these systems employ as a main component
Collaborative Filtering (CF) approach
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 3
4. But in most online services the CF approach
does not work so well
Why??
Usually: Lack of Data
Other reasons: lack of context-awareness,
domain-specific particularities
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 4
5. Outline
Cold-start problem and existing solutions
Proposed solution to overcome cold start
Evaluation and results
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 5
6. Outline
Cold-start problem
Cold-start problem and
existing solutions
Hybrid recommenders
Proposed solution to overcome cold start
Evaluation and results
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 6
7. What is the cold-start problem?
Narrow view
o No ratings at all associated to items or users
Wider view
o Few ratings associated
Cold-start scenarios: Users
Many ratings Few ratings
Many
Normal New user
ratings
Items
Few
New item New user & item
ratings
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 7
8. Typical solution: hybrid recommender combining
CF with content-based filtering
PAST SOLUTION MORE RECENT SOLUTION
Collaborative Filtering Collaborative Filtering
+ +
Traditional Semantically-Enhanced
Content-based filtering Content-based filtering
New item
New user
Lack of understanding The need of domain
Limitation and exploitation of ontologies describing explicit
domain semantics metadata relations
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 8
9. Outline
Cold-start problem and existing solutions
Acquisition of implicit semantics
Proposed solution to
overcome cold start Methods for semantics exploitation
Evaluation and results
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 9
10. Acquisition of implicit domain semantics
Implicit semantics = semantic similarities among item
attributes extracted from Vector Space Models (VSMs)
Distributional hypothesis: “words that share similar
contexts share similar meaning”
Items Users
Context
Matrix
Attributes
Similarity
…
Attribute
… wa,c Transformation measure semantic
(SVD, Conditional (Cosine, similarities
probabilities) Jaccard)
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 10
11. Semantic similarities are context-dependant
Item-based
o Similarity is measured in terms of how many items are similarly
described by both attributes
User-based
o Similarity is measured in terms of how many users are similarly
interested in both attributes
Example: User-based Items-based
- Top-5 tags similar to “Sci-Fi” Scifi 0.79598457 Scifi 0.48631117
- Calculated using cosine future 0.6889696 aliens 0.42508063
similarity without matrix space 0.65459067 dystopia 0.34769687
transformation aliens 0.6110453 space 0.32580933
robots 0.59465224 future 0.27470198
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 11
12. Exploitation of implicit semantics in
content-based filtering
USER MODELING PREDICTION GENERATION
Attributes Attributes
Attribute
relevance [0,1] … wi,a
…
Items
… w Item attributes (i)
i,a
degree of interest [-1,1]
Items score
Attributes
… ru,i … User modeling … wu,a Vector-based
2. Semantic ( )
technique matching
matching
user ratings (u) User interests (u)
Expanded
user interests (u)
1. Profile
expansion
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 12
13. Method 1: User profile expansion by constrained
spreading activation
activated node
Attribute a1 a2 a3 a4 a5
semantic similarities 0 0.5 -0.1 0 0 User interests [-1,1]
a1 a2 a3 a4 a5
1 0.5 0.2 0 0.3
a1 (0.5) (0.3)
0.5 1 0.3 0 0.1
a2
a3
0.2 0.3 1 0.7 0.8
a4 0.25 0.5 0.05 0 0 Expanded
0 0 0.7 1 0
a1 a2 a3 a4 a5 user interests [-1,1]
a5
0.3 0.1 0.8 0 1 new interest Weight updated
Similarities can be symmetric or
not depending on the similarity
measure used Method - activation threshold = 0.25
hyper-parameters: - fan-out threshold = 0.25
- max.expansion levels = 1
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 13
14. Method 2: Prediction generation by pair-wise
semantic matching strategies
Approach: Vector-based matching
All-pairs matching
Best-pairs matching
Attribute Result: 0.15 - 0.056 = 0.094 - 0.056 = 0.12
- 0.009 + 0.035
semantic similarities (using the product as aggregation function)
a1 a2 a3 a4 a5 a1 a2 a3 a4 a5
Item attributes [0,1]
1 0.5 0.2 0 0.3
a1 0 0.3 0 0 0.7
0.5 1 0.3 0 0.1
a2
a3 (0.3)
0.2 0.3 1 0.7 0.8
Direct (0.1)
a4
0 0 0.7 1 0
matching (1)
(0.8)
a5
0.3 0.1 0.8 0 1
Similarities can be symmetric or 0 0.5 -0.1 0 0 User interests [-1,1]
not depending on the similarity
a1 a2 a3 a4 a5
measure used
Method
- similarity threshold = 0.05
hyper-parameter:
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 14
15. Outline
Cold-start problem and existing solutions
Proposed solution to overcome cold start
MovieLens data set
Evaluation and results
Experimental results
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 15
16. Offline experimentation with a MovieLens data
set extended with movie metadata
Data set statistics after pruning unusual
attributes values and movies with few attributes:
Users 2113
Movies 1646
Attributes 4 (Genres, directors, actors and tags)
Attribute values 2886
Ratings per user on avg. 239
Rating density 14%
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 16
17. Evaluation of methods for semantics exploitation
Baseline = Traditional CB using hybrid user modeling technique
Expansion-CB = CSA-same + User-based + raw frequencies
Matching-CB = Best-pairs-same + User-based + Forbes-Zhu method
BPR-MF = CF based on matrix factorization optimized for ranking
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 17
18. Conclusions
Cold-start problem can be very critical
o Above all in systems with small databases
Existing solutions have some limitations
o Traditional CB cannot solve new user scenario
o Semantically-enhanced CB requires domain ontologies to work
Exploitation of implicit semantics can be a good
alternative to overcome cold-start problem
o User-based semantics is more effective than item-based
o The best-pair semantic matching method is more effective than
the profile expansion based on spreading activation
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 18
19. Future work
Experimenting with data sets of different domains
o Million Song data set
Extending the study of Vector Space Models
o Probabilistic similarity measures (e.g. Kullback-Leiber)
Apply the same approach to enhance cold-start
performance of context-aware recommenders
o Implicit semantics of contextual conditions can also be acquired
from user data
o Similarly, pair-wise semantic strategies can be employed to
enhance contextual user modeling
Semantically-Enhanced Recommendation Algorithms - Victor Codina & Luigi Ceccaroni 19
Editor's Notes
Soc estudiant de doctorat del grup KEMLG a la UPC i el meu director de tesis és el Luigi Ceccaroni A grans trets la meva investigació consisteix en estudiar nous metodes per millorar el rendiment de les tecniques de recomanació existents mitjançant la explotació de la semantica implicita del domini
Desde l’arribada d’internet tenim un al nostre abast un exces d’informarció que fa dificil en moltes ocasions trobar els productes i serveis que millor s’adapten a les nostres preferencies. Per cobrir aquesta necessitat van apareixer els sistemes de filtrat d’informació o de recomanació personalitzada, i cada cop més, s’han convertit en un component imprescindible per a molts serveis en linea, principalment de l’industria de l’entreteniment.
Oferir bones recomanacions als usuaris comporta normalment una millora de la seva satisfacció i un increment de les ventes o de l’us del sistema. Clars exemples d’exit els tenim en companyies amb una gran base de dades com Netflix, Google I Amazon La tecnica de recomanació que més predomina actualment es la recomanació cooperativa o CF, donat que en condicions optimes es la que aconsegueix recomanacions més precises. La idea principal d’aquesta tecnica es la de recomanar objectes que han agradat a altres usuaris amb interessos semblants al nostre.
Pero el problema es que aquest bon rendiment no es repeteix normalment en la majoria de serveis online. Per què? Doncs la principal raó es la falta de dades d’usuari. Una de las principals limitacions dels metodes basats en CF es que el seu rendiment va altament lligat a la quantitat de dades disponibles per generar les prediccions, es a dir, en el nombre d’usuaris i de ratings disponibles. La falta de sensibilitat al context i particularitats del domini on s’aplica el recomanador també poden causar un mal funcionament.
- El nostre treball es centra amb el problema de la falta d’informació que normalment es coneix com el cold-start o d’arrencada en fred - Començaré parlant amb més detall d’aquest problema i de les solucions que existeixen actualment Després presentaré la solució que proposem I finalment mostraré els resultats principals de la nostra evaluació
- A continuació explicaré el problema de cold-start i les solucions principals que s’apliquen actualment
-En la literatura, el problema de cold-start es pot definir desde 2 punts de vista diferents: alguns consideren cold-start quan els usuaris o objectes son completament nous, es a dir, encara no hi ha cap valoració implicita o explicita associada amb ells; I a d’altres que consideren cold-start, ademés dels completament nous, els que tenen poques valoracions associades. Nosaltres fem us d’aquest punt de vista més ampli del problema. -Ens podem trobar en 3 escenaris de cold-start alhora de predir el grau d’utilitat d’un objecte per un usuari concret. -L’escenari de nou objecte, quan nomes tenim poques valoracions de l’objecte -L’escenari de nou usuari, quan nomes tenim -I l’escenari més extrem quan hi ha poques valoracions tan de l’objecte com de l’usuari.
-La solucio més comuna per evitar un baix rendiment en els escenaris de cold-start es utilitzar un sistem hibrid on es combini recomanacio cooperativa amb recomanacio basada en contingut. Aquesta altre familia de tecniques fa us dels descripcions textuals o metadata dels objectes per generar les recomanacions. -D’aquesta manera l’escenari de nou objecte queda solventat ja que no depen de que altres usuaris l’hagin valorat anteriorment. -En canvi, l’escenari nou usuari segueix sent un problema ja que per construir un perfil d’usuari precis es necessari que l’usuari proporcioni un nombre determinat de valoracions. -Ademes, el metode tradicional té la limitació de que la semantica del domini no es té en compte durant la predicció. -Per solventar aquesta limitació, més recentment va apareixer la familia de recomanadors semantics que es caracteritzar per explotar la semantica explicita del domini normalment representada en la forma d’ontologies. Gracies a la semantica diversos estudis han demostrat que també es pot millorar el rendiment en l’escenari de nou usuari ja que permet completar els perfils d’usuari. - Tot I això, l’aplicació dels recomanador semantics actuals depenen completament de l’existencia d’ontologies de domini I aixo no es sempre possible.
Amb l’objectiu de solventar aquesta limitació dels recomanadors semantics, en aquest treball hem desenvolupat I evaluat metodes per l’acquisició I explotacio de la semantica implicita del domini.
Nosaltres entenem com a semantica implicita del domini a les semblances semantics entre atributs que descrien els objectes calculades a partir de models distribucionals, també coneguts com vector space models. Aquests models es basen en la hypothesis distribucional, que assumeix que termes o paraules que apareixen frequentment en contexts semblants estan relacionades semanticament. Nosaltres hem generalitzat aquesta hypthosis per a ser utilitzada per calcular relacions semantics entre attributs, ja siguin tags, actors de peliculas. En particular, utilitzem com a corpus els perfils normalizats dels objectes o del usuaris, que com a continuació veureu implican resultats ben diferents. Un cop seleccionat el corpus, es pot aplicat una transformació a la matriu corresponent (com una reducció de dimensionalitat) I finalment es calcula la similitut entre attributs comparant els vectors de coocurrencia corresponents per a cada attribut. En els experiments hem utilitzat 2 tecnicas de reduccio de dimensionalitat i la measure del cosinus.
Com he dit anteriorment, depenent del context utilitzat com a corpus les similituts semantics resultants son diferents. En el cas d’utilitzar els objectes com a context de coocurrencia, la semblança entre dos attributs es mesura en termes de quants objectes contenen ambdos atributs. En el cas d’utilitzar els usuarios, la semblança es measura en termes de quants usuaris estan interessats en ambdos attributs. Com podeu veure en l’example, les semblances calculades varien dependen del context tan en valor com en ordre.
Aquest grafic mostra els principals components de la recomanacio basada en contingut: per una banda hi ha el component de modelatge d’usuari, que s’encarrega de crear el perfil d’usuari en relació als atributs del domini a partir de les valoracions als objectes del domini I de la seves descripcions. I per una altra banda hi ha el component de predicció que s’encarrega de generar la puntuació per a un objecte concret, calculant la correspodencia entre els perfil d’usuari I de l’objecte. En aquest treball hem implementat dos metodes per explotar la semantica implicita: el metode d’expansio de perfil d’usuari que modifica el vector d’interesos uriginal amb nova informació que despres s’utilitza pel calcul de la correspondencia. I el metode de correspodencia semantica que incorpora les relacions semantics entre atributs durant el calcul.
En aquesta transparencia mostro un exemple senzill de com funciona l’algoritme d’expansio de perfil d’usuari que hem desenvolupat basat en una tecnica de CSA. En el costat esquerra podeu veure la matriu de semblances semantiques entre els atributs del domini. En aquest exemple hi ha 5 attributs. I a la dreta teniu un perfil d’usuari en relació als 5 attributs. Un valor positiu representa que l’usuari esta interessat en l’atribut I un negatiu el contrari. El metode d’expansio té 3 hyperparamentres que regulen el grau de propagació: el llindar d’activació que delimita el grau d’interes necessari que a que s’activi la propagació desde un node; el llindar de fanout que delimita la semblança minima entre atributs per fer la propagació a un node; I finalment el numero maxim de nivells d’expansio des del node inicial. Tenint en compte els valors indicats del hyperparams, en aquest example nomes s’activaria la propagació des de l’atribut 2 ja que es l’unic que supera el llindar d’activació. Des d’aquest node es propagaria el valor als atributs 1 I 3 ja que el valor de les seves semblances superen el llindar de fanout. Donat que max num de nivells d’expansio es 1 aqui s’acabaria la expansio de perfil. Com a resultat el perfil d’usuari s’hauria completat amb 1 nou interes positiu I un recalcul del grau d’interes en l’atribut 3.
-Ara passaré a explicar com funcional el metode correspodencia semantica aprofitant el mateix example, per lo que la matriu de semblances I el perfil d’usuari son els mateixos -En aquest cas el que busquem es incorporar les relacions semantiques entre atributs durant el calcul de la predicció Començo per mostrar com funciona el metode tradicional basat en el producte vectorial. En aquest cas, l’unic attribut que coincideix en ambdos perfils es el 2 per lo que la predicció es calculario como el producte del pesos corresponents. Si en comptes del metode tradicional utilizem l’estrategia de correspondencia semantica de millor-parell, ademés del atribut 2 també es consideria la correspondecia entre l’atribut 5 de l’object I el 3 de l’usuari, ja que aquesta estrategia considera per a cada atribut del perfil de l’objecte amb valor diferent de zero l’atribut del perfil d’usuari més semblant. L’altre estrategia semantica que hem estudiat es la de tots els parells, en la qual es consideren totes les correspondencies semantiques. En aquests casos l’aportació de cada correspodencia es ponderada amb el valor de la semblança entre atributs. Amb l’objectiu d’evitar correspodencias massa debils les estrategies utilitzen un llindar de semblança que delimina el minim valor de semblança per a ser considerat en el calcul de la correspodencia.
A continuació mostraré els results principals de l’avaluació dels metodes proposats
Per a l’avaluació hem utilitzat un dels conjunts de dades disponibles del sistema MovieLens que inclou metadata sobre les peliculas. Aquestes son les principals estadisticas del data set despres de filtrar pelicules amb poca metadata. En particular hem utilitzat per a l’experiment 4 attributes differents: … amb un total de 2886 valors d’atributs diferents.
En aquest grafic de barres es poden apreciar els principals resultats dels metodes d’explotació semantica proposats. El que es mostra es el tan percent de millora respecte al baseline en quan a precisió de ranking. En aquest cas el baseline consisteix en un metode basat en contingut tradicional, es adir, sense fer us de la semantic del domini. Les barres de color negre corresponen als resultats globals, tenint en compte tots els usuaris I objectes. La de color vermell corresponen als resultats de nomes nous usuaris I la de color ver son els de nous objectes. Pels simular els escenaris de cold-start hem seleccionat el 10% d’usuaris I objectes amb menys ratings. En quant els algoritmes avaluats expansion-CB correspon el metode d’expansio de perfil d’usuari, matching-CB correspon al metode de correspondencia semantica de millor parells, I BPR-MF correspon a un algoritme actual de CF optimizat per generar rankings. Per a cada un dels algoritmes hem seleccionat la configuració amb millor rendiment global. A partir dels resultats s’observa que el metode correspodencia semantica es més efectiu que el metode d’expansio de perfil. Si el comparem amb el resultats de l’algoritme de filtrat cooperatiu podem comprobar que tan en nous usuaris com nous objectes el rendiment de matching-CB es millor. De fet, el rendiment del recomanador collaboratiu en l’escenari de nous items es pitjor que el de baseline, algo força normal tenint em compte que el baseline es una algoritme basat en contingut. Finalment, el terms de rendiment global els dos metodes estan força equiparats sent una mica millor el de filtrat cooperatiu.