The document presents a framework for recommending sequences of activities based on a user's past activity patterns and context. The framework uses machine learning to determine an optimal subsequence length for matching current and past activity timelines. It then ranks candidate timelines using an edit distance similarity measure and recommends the next sequence of activities. The framework has been applied to lifelogging, transportation, and tourism recommendation domains.
This document presents a sequence-based approach for recommending modes of transport to users based on their past activity patterns. It extends the authors' previous framework to extract and match subsequences from user timelines. A machine learning approach learns an optimal subsequence length for matching current and past user activity patterns. The framework is evaluated on a real-world GPS trajectory dataset containing transport mode labels for 18 users. Results show the proposed sequence-based recommender outperforms baseline methods that recommend frequent or long-duration transport modes.
This document provides an overview of evaluation measures for information retrieval systems. It discusses why evaluation is important for improving systems and measuring user satisfaction. Key points include:
- Common set-based measures include recall, precision, and F-measure. Ranked retrieval measures include average precision (AP), normalized discounted cumulative gain (nDCG), expected reciprocal rank (ERR), and Q-measure for graded relevance.
- Measures for diversified search aim to balance relevance and diversity across different user intents. Examples given include α-nDCG, ERR-IA, D#-nDCG, and U-IA.
- Statistical significance testing allows determining whether differences between systems are likely real or due to chance. The t
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
For further details contact:
N.RAJASEKARAN B.E M.S 9841091117,9840103301.
IMPULSE TECHNOLOGIES,
Old No 251, New No 304,
2nd Floor,
Arcot road ,
Vadapalani ,
Chennai-26.
www.impulse.net.in
Email: ieeeprojects@yahoo.com/ imbpulse@gmail.com
Drsp dimension reduction for similarity matching and pruning of time series ...IJDKP
The document summarizes a research paper that proposes a framework called DRSP (Dimension Reduction for Similarity Matching and Pruning) for time series data streams. DRSP addresses the challenges of large streaming data size by:
1) Performing dimension reduction using a Multi-level Segment Mean technique to compactly represent the data while retaining crucial information.
2) Incorporating a similarity matching technique to analyze if new data objects match existing streams.
3) Applying a pruning technique to filter out non-relevant data object pairs and join only relevant pairs.
The framework aims to reduce storage and computation costs for similarity matching on large time series data streams.
Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...IJSRD
To date, few construction methods have helped the project managers make a decision on the near-optimum distributions of men, material, Space and tools according to their job objectives and job limitations. This thesis presents an intelligent scheduling system (ISS) that can assist the project managers to find the near-optimum agenda plan according to their job objectives and job limitations. Intelligent scheduling system (ISS) uses model techniques to share out resources and allocate dissimilar levels of priorities to different tricks in each model cycle to find the near-optimum solution. ISS considers and combines most of the important construction factors (agenda of task, expenses, manpower, breathing space, utensils and material) at the same time in a incorporated environment, which makes the resulting agenda that will be nearer to optimum. Moreover, ISS allows for what-if analysis of probable scenarios, and schedule adjustments based on unexpected conditions (modified orders, delayed material delivery, etc.). As a final point, two model applications and one real-world construction job are utilized to illustrate and evaluate the success of ISS with two commonly used software packages, Primavera Project Planner and Microsoft Project.
This document presents a framework for recommending personalized sequences of activities to users in the tourism domain. It discusses using sequential user activity data with associated contextual features like location and time. The framework extends previous work on single-activity recommendations by recommending a sequence of activities. It models user activity timelines and candidate timelines, calculates similarity between them, ranks candidates, and iteratively recommends the next activity sequences. It applies this approach to a location check-in dataset, finding the sequence recommendation algorithm outperforms baselines at predicting user activity sequences.
Empirical Model of Supervised Learning Approach for Opinion MiningIRJET Journal
This summarizes an empirical model for opinion mining using supervised learning with an integrated alignment model and naive Bayesian classification model. The proposed model aims to automatically identify user reviews of products as positive or negative and provide an aggregated product rating based on review sentiment analysis and rankings. An alignment model is used to match keywords between source and target reviews to determine sentiment polarity. If a match is not found, the review is sent to a naive Bayesian classification model for sentiment analysis and rating. A rank aggregation model then considers data parameters like user ID, time, and rank to generate a ranked list of products based on ratings and sentiment analysis while excluding short-duration sessions or redundant comments. The proposed hybrid model aims to provide more accurate results for product sentiment analysis
This document presents a sequence-based approach for recommending modes of transport to users based on their past activity patterns. It extends the authors' previous framework to extract and match subsequences from user timelines. A machine learning approach learns an optimal subsequence length for matching current and past user activity patterns. The framework is evaluated on a real-world GPS trajectory dataset containing transport mode labels for 18 users. Results show the proposed sequence-based recommender outperforms baseline methods that recommend frequent or long-duration transport modes.
This document provides an overview of evaluation measures for information retrieval systems. It discusses why evaluation is important for improving systems and measuring user satisfaction. Key points include:
- Common set-based measures include recall, precision, and F-measure. Ranked retrieval measures include average precision (AP), normalized discounted cumulative gain (nDCG), expected reciprocal rank (ERR), and Q-measure for graded relevance.
- Measures for diversified search aim to balance relevance and diversity across different user intents. Examples given include α-nDCG, ERR-IA, D#-nDCG, and U-IA.
- Statistical significance testing allows determining whether differences between systems are likely real or due to chance. The t
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
For further details contact:
N.RAJASEKARAN B.E M.S 9841091117,9840103301.
IMPULSE TECHNOLOGIES,
Old No 251, New No 304,
2nd Floor,
Arcot road ,
Vadapalani ,
Chennai-26.
www.impulse.net.in
Email: ieeeprojects@yahoo.com/ imbpulse@gmail.com
Drsp dimension reduction for similarity matching and pruning of time series ...IJDKP
The document summarizes a research paper that proposes a framework called DRSP (Dimension Reduction for Similarity Matching and Pruning) for time series data streams. DRSP addresses the challenges of large streaming data size by:
1) Performing dimension reduction using a Multi-level Segment Mean technique to compactly represent the data while retaining crucial information.
2) Incorporating a similarity matching technique to analyze if new data objects match existing streams.
3) Applying a pruning technique to filter out non-relevant data object pairs and join only relevant pairs.
The framework aims to reduce storage and computation costs for similarity matching on large time series data streams.
Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...IJSRD
To date, few construction methods have helped the project managers make a decision on the near-optimum distributions of men, material, Space and tools according to their job objectives and job limitations. This thesis presents an intelligent scheduling system (ISS) that can assist the project managers to find the near-optimum agenda plan according to their job objectives and job limitations. Intelligent scheduling system (ISS) uses model techniques to share out resources and allocate dissimilar levels of priorities to different tricks in each model cycle to find the near-optimum solution. ISS considers and combines most of the important construction factors (agenda of task, expenses, manpower, breathing space, utensils and material) at the same time in a incorporated environment, which makes the resulting agenda that will be nearer to optimum. Moreover, ISS allows for what-if analysis of probable scenarios, and schedule adjustments based on unexpected conditions (modified orders, delayed material delivery, etc.). As a final point, two model applications and one real-world construction job are utilized to illustrate and evaluate the success of ISS with two commonly used software packages, Primavera Project Planner and Microsoft Project.
This document presents a framework for recommending personalized sequences of activities to users in the tourism domain. It discusses using sequential user activity data with associated contextual features like location and time. The framework extends previous work on single-activity recommendations by recommending a sequence of activities. It models user activity timelines and candidate timelines, calculates similarity between them, ranks candidates, and iteratively recommends the next activity sequences. It applies this approach to a location check-in dataset, finding the sequence recommendation algorithm outperforms baselines at predicting user activity sequences.
Empirical Model of Supervised Learning Approach for Opinion MiningIRJET Journal
This summarizes an empirical model for opinion mining using supervised learning with an integrated alignment model and naive Bayesian classification model. The proposed model aims to automatically identify user reviews of products as positive or negative and provide an aggregated product rating based on review sentiment analysis and rankings. An alignment model is used to match keywords between source and target reviews to determine sentiment polarity. If a match is not found, the review is sent to a naive Bayesian classification model for sentiment analysis and rating. A rank aggregation model then considers data parameters like user ID, time, and rank to generate a ranked list of products based on ratings and sentiment analysis while excluding short-duration sessions or redundant comments. The proposed hybrid model aims to provide more accurate results for product sentiment analysis
PhD defense presentation of Dominik Kowald: Modeling Activation Processes in Human Memory to Improve Tag Recommendations. Presented at Know-Center / Graz University of Technology (Austria)
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Dr. Cornelius Ludmann
Talk at the Data Streams and Event Processing Workshop at the 16. Fachtagung »Datenbanksysteme für Business, Technologie und Web« (BTW) of the Gesellschaft für Informatik (GI) in Hamburg, Germany. March 3, 2015
The document summarizes presentations from three perspectives on progress towards open and interoperable research data service workflows:
1) Angus Whyte of the Digital Curation Centre discussed new DCC guidance and design principles for integrating research data service workflows.
2) Rory Macneil of Research Space discussed integrating their ELN with University of Edinburgh's DataShare and Harvard's Dataverse repositories using open standards.
3) Stuart Lewis of University of Edinburgh discussed their DataVault prototype for packaging data to be archived from a Jisc Research Data Spring project. The case studies illustrate challenges and opportunities for improving integration between active data management and long-term preservation services.
This document presents research on predicting user engagement with direct displays like knowledge panels using mouse cursor data. The researchers conducted a crowdsourcing study tracking users' mouse cursors during search tasks. They developed predictive models to determine when users notice, find useful, and perceive faster task completion from direct displays. Their models outperformed baselines in accuracy and other metrics, showing mouse cursor data can predict user engagement without explicit feedback. The researchers conclude this approach offers an efficient way to analyze interactions and optimize direct display placement and content.
Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...IJSRD
To date, few construction methods have helped the project managers make a decision on the near-optimum distributions of men, material, Space and tools according to their job objectives and job limitations. This thesis presents an intelligent scheduling system (ISS) that can assist the project managers to find the near-optimum agenda plan according to their job objectives and job limitations. Intelligent scheduling system (ISS) uses model techniques to share out resources and allocate dissimilar levels of priorities to different tricks in each model cycle to find the near-optimum solution. ISS considers and combines most of the important construction factors (agenda of task, expenses, manpower, breathing space, utensils and material) at the same time in a incorporated environment, which makes the resulting agenda that will be nearer to optimum. Moreover, ISS allows for what-if analysis of probable scenarios, and schedule adjustments based on unexpected conditions (modified orders, delayed material delivery, etc.). As a final point, two model applications and one real-world construction job are utilized to illustrate and evaluate the success of ISS with two commonly used software packages, Primavera Project Planner and Microsoft Project.
Software size estimation at early stages of project development holds great significance to meet
the competitive demands of software industry. Software size represents one of the most
interesting internal attributes which has been used in several effort/cost models as a predictor
of effort and cost needed to design and implement the software. The whole world is focusing
towards object oriented paradigm thus it is essential to use an accurate methodology for
measuring the size of object oriented projects. The class point approach is used to quantify
classes which are the logical building blocks in object oriented paradigm. In this paper, we
propose a class point based approach for software size estimation of On-Line Analytical
Processing (OLAP) systems. OLAP is an approach to swiftly answer decision support queries
based on multidimensional view of data. Materialized views can significantly reduce the
execution time for decision support queries. We perform a case study based on the TPC-H
benchmark which is a representative of OLAP System. We have used a Greedy based approach
to determine a good set of views to be materialized. After finding the number of views, the class
point approach is used to estimate the size of an OLAP System The results of our approach are
validated.
Software size estimation at early stages of project development holds great significance to meet the competitive demands of software industry. Software size represents one of the most
interesting internal attributes which has been used in several effort/cost models as a predictor of effort and cost needed to design and implement the software. The whole world is focusing
towards object oriented paradigm thus it is essential to use an accurate methodology for measuring the size of object oriented projects. The class point approach is used to quantify classes which are the logical building blocks in object oriented paradigm. In this paper, we propose a class point based approach for software size estimation of On-Line Analytical
Processing (OLAP) systems. OLAP is an approach to swiftly answer decision support queries based on multidimensional view of data. Materialized views can significantly reduce the
execution time for decision support queries. We perform a case study based on the TPC-H benchmark which is a representative of OLAP System. We have used a Greedy based approach
to determine a good set of views to be materialized. After finding the number of views, the class point approach is used to estimate the size of an OLAP System The results of our approach are validated.
This document analyzes a cloud workload dataset from Google to characterize usage patterns. The key steps are:
1) The data is preprocessed and important attributes like CPU/memory usage are analyzed.
2) Clustering algorithms are used to classify users based on resource estimation ratios and tasks based on attributes.
3) Time series analysis via DTW is performed on tasks to identify patterns, and tasks are clustered.
4) For target high estimation ratio users, resource usage is predicted based on matching task patterns and allocated dynamically with a threshold to allow for spikes. This approach aims to reallocate unused resources to other users.
MARKOV CHAIN FOR THE RECOMMENDATION OF MATERIALIZED VIEWS IN REAL-TIME DATA W...IJCSEA Journal
In this paper we propose an approach, which is based on Markov Chain, to cluster and recommend candidate views for the selection algorithm of materialized views. Our idea is to intervene at regular period of time in order to filter the candidate views which will be used by an algorithm for the selection of materialized views in real-time data warehouse. The aim is to reduce the complexity and the execution cost of the online selection of materialized views. Our experiment results have shown that our solution is very efficient to specify the more profitable views and to improve the query response time.
Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...IJCSEA Journal
In this paper we propose an approach, which is based on Markov Chain, to cluster and recommend candidate views for the selection algorithm of materialized views. Our idea is to intervene at regular period of time in order to filter the candidate views which will be used by an algorithm for the selection of materialized views in real-time data warehouse. The aim is to reduce the complexity and the execution cost of the online selection of materialized views. Our experiment results have shown that our solution is very efficient to specify the more profitable views and to improve the query response time.
In this paper we propose an approach, which is based on Markov Chain, to cluster and recommend
candidate views for the selection algorithm of materialized views. Our idea is to intervene at regular period
of time in order to filter the candidate views which will be used by an algorithm for the selection of
materialized views in real-time data warehouse. The aim is to reduce the complexity and the execution cost
of the online selection of materialized views. Our experiment results have shown that our solution is very
efficient to specify the more profitable views and to improve the query response time.
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Data Con LA
At IRIS.TV, our business builds algorithmic solutions for video recommendation with the end goal to deliver a great user experience as evidenced by users viewing more video content. This talk outlines our reasons for expanding from a descriptive/predictive approach to data analytics toward a philosophy that features more prescriptive analytics, driven by our data science team.
Review of Existing Methods in K-means Clustering AlgorithmIRJET Journal
This document reviews existing methods for improving the K-means clustering algorithm. K-means is widely used but has limitations such as sensitivity to outliers and initial centroid selection. The document summarizes several proposed approaches, including using MapReduce to select initial centroids and form clusters for large datasets, reducing execution time by cutting off iterations, improving cluster quality by selecting centroids systematically, and using sampling techniques to reduce I/O and network costs. It concludes that improved algorithms address K-means limitations better than the traditional approach.
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...IRJET Journal
This document analyzes different machine learning algorithms that can be used to build a music recommendation system. It first discusses how machine learning and data mining are used to extract patterns from large music datasets. It then analyzes different classification, clustering, and association algorithms that are suitable for a music recommendation system. Specifically, it applies two algorithms (Random Forest and XGBClassifier) to a music dataset and compares their performance at different training/test data splits. It finds that Random Forest achieved the highest accuracy of 75% when the split was 75% training and 25% testing data. In conclusion, ensemble techniques like Random Forest can improve the accuracy of music recommendation over single algorithms.
A Comparison of Different Strategies for Automated Semantic Document AnnotationAnsgar Scherp
We introduce a framework for automated semantic document annotation that is composed of four processes, namely concept extraction, concept activation, annotation selection, and evaluation. The framework is used to implement and compare different annotation strategies motivated by the literature. For concept extraction, we apply entity detection with semantic hierarchical knowledge bases, Tri-gram, RAKE, and LDA. For concept activation, we compare a set of statistical, hierarchy-based, and graph-based methods. For selecting annotations, we compare top-k as well as kNN. In total, we define 43 different strategies including novel combinations like using graph-based activation with kNN. We have evaluated the strategies using three different datasets of varying size from three scientific disciplines (economics, politics, and computer science) that contain 100, 000 manually labeled documents in total. We obtain the best results on all three datasets by our novel combination of entity detection with graph-based activation (e.g., HITS and Degree) and kNN. For the economic and political science datasets, the best F-measure is .39 and .28, respectively. For the computer science dataset, the maximum F-measure of .33 can be reached. The experiments are the by far largest on scholarly content annotation, which typically are up to a few hundred documents per dataset only.
Gregor Große-Bölting, Chifumi Nishioka, and Ansgar Scherp. 2015. A Comparison of Different Strategies for Automated Semantic Document Annotation. In Proceedings of the 8th International Conference on Knowledge Capture (K-CAP 2015). ACM, New York, NY, USA, , Article 8 , 8 pages. DOI=http://dx.doi.org/10.1145/2815833.2815838
This document summarizes the analysis of a cloud workload using Google trace data. The key steps included:
1) Preprocessing and analyzing the Google trace data to identify important attributes like CPU and memory usage.
2) Calculating resource usage statistics and classification of users into clusters based on estimation ratios using clustering algorithms. Target users who overestimated resources were identified.
3) Performing time series analysis using DTW on tasks of target users to identify patterns and cluster tasks with similar workloads.
IRJET- Comparative Analysis between Critical Path Method and Monte Carlo S...IRJET Journal
This document compares the Critical Path Method (CPM) and Monte Carlo simulation for project scheduling. CPM uses deterministic activity durations to calculate the critical path and project duration. Monte Carlo simulation incorporates uncertainty by using three time estimates per activity - optimistic, pessimistic, and most likely - represented as probability distributions. It runs thousands of simulations to determine the likely project duration based on random sampling from these distributions. The document reviews literature on applying Monte Carlo simulation in construction projects. It then describes a study that uses both CPM and Monte Carlo simulation on a real construction project to compare the results and evaluate Monte Carlo simulation's usefulness for the construction industry.
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...IRJET Journal
This document reviews different recommendation techniques for group recommender systems (GRS) in online social networks. It discusses traditional recommender approaches like content-based filtering and collaborative filtering. It also reviews related work applying opinion dynamics models and weight matrices to GRS. The document concludes that using a smart weights matrix to consider relationships between group members' preferences in a recommendation process improves aggregation and ensures consensus, providing the best way to recommend items to a complete group.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
PhD defense presentation of Dominik Kowald: Modeling Activation Processes in Human Memory to Improve Tag Recommendations. Presented at Know-Center / Graz University of Technology (Austria)
Continuous Evaluation of Collaborative Recommender Systems in Data Stream Man...Dr. Cornelius Ludmann
Talk at the Data Streams and Event Processing Workshop at the 16. Fachtagung »Datenbanksysteme für Business, Technologie und Web« (BTW) of the Gesellschaft für Informatik (GI) in Hamburg, Germany. March 3, 2015
The document summarizes presentations from three perspectives on progress towards open and interoperable research data service workflows:
1) Angus Whyte of the Digital Curation Centre discussed new DCC guidance and design principles for integrating research data service workflows.
2) Rory Macneil of Research Space discussed integrating their ELN with University of Edinburgh's DataShare and Harvard's Dataverse repositories using open standards.
3) Stuart Lewis of University of Edinburgh discussed their DataVault prototype for packaging data to be archived from a Jisc Research Data Spring project. The case studies illustrate challenges and opportunities for improving integration between active data management and long-term preservation services.
This document presents research on predicting user engagement with direct displays like knowledge panels using mouse cursor data. The researchers conducted a crowdsourcing study tracking users' mouse cursors during search tasks. They developed predictive models to determine when users notice, find useful, and perceive faster task completion from direct displays. Their models outperformed baselines in accuracy and other metrics, showing mouse cursor data can predict user engagement without explicit feedback. The researchers conclude this approach offers an efficient way to analyze interactions and optimize direct display placement and content.
Surveying, Planning and Scheduling For A Hill Road Work at Kalrayan Hills by ...IJSRD
To date, few construction methods have helped the project managers make a decision on the near-optimum distributions of men, material, Space and tools according to their job objectives and job limitations. This thesis presents an intelligent scheduling system (ISS) that can assist the project managers to find the near-optimum agenda plan according to their job objectives and job limitations. Intelligent scheduling system (ISS) uses model techniques to share out resources and allocate dissimilar levels of priorities to different tricks in each model cycle to find the near-optimum solution. ISS considers and combines most of the important construction factors (agenda of task, expenses, manpower, breathing space, utensils and material) at the same time in a incorporated environment, which makes the resulting agenda that will be nearer to optimum. Moreover, ISS allows for what-if analysis of probable scenarios, and schedule adjustments based on unexpected conditions (modified orders, delayed material delivery, etc.). As a final point, two model applications and one real-world construction job are utilized to illustrate and evaluate the success of ISS with two commonly used software packages, Primavera Project Planner and Microsoft Project.
Software size estimation at early stages of project development holds great significance to meet
the competitive demands of software industry. Software size represents one of the most
interesting internal attributes which has been used in several effort/cost models as a predictor
of effort and cost needed to design and implement the software. The whole world is focusing
towards object oriented paradigm thus it is essential to use an accurate methodology for
measuring the size of object oriented projects. The class point approach is used to quantify
classes which are the logical building blocks in object oriented paradigm. In this paper, we
propose a class point based approach for software size estimation of On-Line Analytical
Processing (OLAP) systems. OLAP is an approach to swiftly answer decision support queries
based on multidimensional view of data. Materialized views can significantly reduce the
execution time for decision support queries. We perform a case study based on the TPC-H
benchmark which is a representative of OLAP System. We have used a Greedy based approach
to determine a good set of views to be materialized. After finding the number of views, the class
point approach is used to estimate the size of an OLAP System The results of our approach are
validated.
Software size estimation at early stages of project development holds great significance to meet the competitive demands of software industry. Software size represents one of the most
interesting internal attributes which has been used in several effort/cost models as a predictor of effort and cost needed to design and implement the software. The whole world is focusing
towards object oriented paradigm thus it is essential to use an accurate methodology for measuring the size of object oriented projects. The class point approach is used to quantify classes which are the logical building blocks in object oriented paradigm. In this paper, we propose a class point based approach for software size estimation of On-Line Analytical
Processing (OLAP) systems. OLAP is an approach to swiftly answer decision support queries based on multidimensional view of data. Materialized views can significantly reduce the
execution time for decision support queries. We perform a case study based on the TPC-H benchmark which is a representative of OLAP System. We have used a Greedy based approach
to determine a good set of views to be materialized. After finding the number of views, the class point approach is used to estimate the size of an OLAP System The results of our approach are validated.
This document analyzes a cloud workload dataset from Google to characterize usage patterns. The key steps are:
1) The data is preprocessed and important attributes like CPU/memory usage are analyzed.
2) Clustering algorithms are used to classify users based on resource estimation ratios and tasks based on attributes.
3) Time series analysis via DTW is performed on tasks to identify patterns, and tasks are clustered.
4) For target high estimation ratio users, resource usage is predicted based on matching task patterns and allocated dynamically with a threshold to allow for spikes. This approach aims to reallocate unused resources to other users.
MARKOV CHAIN FOR THE RECOMMENDATION OF MATERIALIZED VIEWS IN REAL-TIME DATA W...IJCSEA Journal
In this paper we propose an approach, which is based on Markov Chain, to cluster and recommend candidate views for the selection algorithm of materialized views. Our idea is to intervene at regular period of time in order to filter the candidate views which will be used by an algorithm for the selection of materialized views in real-time data warehouse. The aim is to reduce the complexity and the execution cost of the online selection of materialized views. Our experiment results have shown that our solution is very efficient to specify the more profitable views and to improve the query response time.
Markov Chain for the Recommendation of Materialized Views in Real-Time Data W...IJCSEA Journal
In this paper we propose an approach, which is based on Markov Chain, to cluster and recommend candidate views for the selection algorithm of materialized views. Our idea is to intervene at regular period of time in order to filter the candidate views which will be used by an algorithm for the selection of materialized views in real-time data warehouse. The aim is to reduce the complexity and the execution cost of the online selection of materialized views. Our experiment results have shown that our solution is very efficient to specify the more profitable views and to improve the query response time.
In this paper we propose an approach, which is based on Markov Chain, to cluster and recommend
candidate views for the selection algorithm of materialized views. Our idea is to intervene at regular period
of time in order to filter the candidate views which will be used by an algorithm for the selection of
materialized views in real-time data warehouse. The aim is to reduce the complexity and the execution cost
of the online selection of materialized views. Our experiment results have shown that our solution is very
efficient to specify the more profitable views and to improve the query response time.
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Data Con LA
At IRIS.TV, our business builds algorithmic solutions for video recommendation with the end goal to deliver a great user experience as evidenced by users viewing more video content. This talk outlines our reasons for expanding from a descriptive/predictive approach to data analytics toward a philosophy that features more prescriptive analytics, driven by our data science team.
Review of Existing Methods in K-means Clustering AlgorithmIRJET Journal
This document reviews existing methods for improving the K-means clustering algorithm. K-means is widely used but has limitations such as sensitivity to outliers and initial centroid selection. The document summarizes several proposed approaches, including using MapReduce to select initial centroids and form clusters for large datasets, reducing execution time by cutting off iterations, improving cluster quality by selecting centroids systematically, and using sampling techniques to reduce I/O and network costs. It concludes that improved algorithms address K-means limitations better than the traditional approach.
IRJET- Analysis of Music Recommendation System using Machine Learning Alg...IRJET Journal
This document analyzes different machine learning algorithms that can be used to build a music recommendation system. It first discusses how machine learning and data mining are used to extract patterns from large music datasets. It then analyzes different classification, clustering, and association algorithms that are suitable for a music recommendation system. Specifically, it applies two algorithms (Random Forest and XGBClassifier) to a music dataset and compares their performance at different training/test data splits. It finds that Random Forest achieved the highest accuracy of 75% when the split was 75% training and 25% testing data. In conclusion, ensemble techniques like Random Forest can improve the accuracy of music recommendation over single algorithms.
A Comparison of Different Strategies for Automated Semantic Document AnnotationAnsgar Scherp
We introduce a framework for automated semantic document annotation that is composed of four processes, namely concept extraction, concept activation, annotation selection, and evaluation. The framework is used to implement and compare different annotation strategies motivated by the literature. For concept extraction, we apply entity detection with semantic hierarchical knowledge bases, Tri-gram, RAKE, and LDA. For concept activation, we compare a set of statistical, hierarchy-based, and graph-based methods. For selecting annotations, we compare top-k as well as kNN. In total, we define 43 different strategies including novel combinations like using graph-based activation with kNN. We have evaluated the strategies using three different datasets of varying size from three scientific disciplines (economics, politics, and computer science) that contain 100, 000 manually labeled documents in total. We obtain the best results on all three datasets by our novel combination of entity detection with graph-based activation (e.g., HITS and Degree) and kNN. For the economic and political science datasets, the best F-measure is .39 and .28, respectively. For the computer science dataset, the maximum F-measure of .33 can be reached. The experiments are the by far largest on scholarly content annotation, which typically are up to a few hundred documents per dataset only.
Gregor Große-Bölting, Chifumi Nishioka, and Ansgar Scherp. 2015. A Comparison of Different Strategies for Automated Semantic Document Annotation. In Proceedings of the 8th International Conference on Knowledge Capture (K-CAP 2015). ACM, New York, NY, USA, , Article 8 , 8 pages. DOI=http://dx.doi.org/10.1145/2815833.2815838
This document summarizes the analysis of a cloud workload using Google trace data. The key steps included:
1) Preprocessing and analyzing the Google trace data to identify important attributes like CPU and memory usage.
2) Calculating resource usage statistics and classification of users into clusters based on estimation ratios using clustering algorithms. Target users who overestimated resources were identified.
3) Performing time series analysis using DTW on tasks of target users to identify patterns and cluster tasks with similar workloads.
IRJET- Comparative Analysis between Critical Path Method and Monte Carlo S...IRJET Journal
This document compares the Critical Path Method (CPM) and Monte Carlo simulation for project scheduling. CPM uses deterministic activity durations to calculate the critical path and project duration. Monte Carlo simulation incorporates uncertainty by using three time estimates per activity - optimistic, pessimistic, and most likely - represented as probability distributions. It runs thousands of simulations to determine the likely project duration based on random sampling from these distributions. The document reviews literature on applying Monte Carlo simulation in construction projects. It then describes a study that uses both CPM and Monte Carlo simulation on a real construction project to compare the results and evaluate Monte Carlo simulation's usefulness for the construction industry.
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...IRJET Journal
This document reviews different recommendation techniques for group recommender systems (GRS) in online social networks. It discusses traditional recommender approaches like content-based filtering and collaborative filtering. It also reviews related work applying opinion dynamics models and weight matrices to GRS. The document concludes that using a smart weights matrix to consider relationships between group members' preferences in a recommendation process improves aggregation and ensures consensus, providing the best way to recommend items to a complete group.
Similar to Gunjan insight student conference v2 (20)
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
1. A Sequence-based and Context Modelling
Framework for Recommendation
Gunjan Kumar, Houssem Jerbi, and Michael P. O’Mahony
Insight Centre for Data Analytics
University College Dublin
Insight Student Conference 2017, Cork
Sep 8, 2017
4. Rich User Activity Data
• Sequential nature of user activities
• Activities have associated features/context, e.g.
location, time, weather, etc.
5. Rich User Activity Data
For Recommender Systems
Facilitates real time recommendations for a given user and context
(e.g. time, location, weather, etc.)
Our Work
A framework for sequence- and context-based recommendation:
• Recommending the next activity to perform.
- Lifelogging
- Transportation
• Recommending the next sequence of activities to perform.
- Tourism
Insight Centre for Data Analytics RecTour 2017 Slide 4
7. Related Work
Capturing
Sequence
Capturing
Context
Recommending
Sequences
Hierarchical-graph-based models
[Li et al., 2008; Zheng et al., 2009; Yoon et al., 2010]
All-kth -order Markov models
[Bohnenberger and Jameson, 2001; Deshpande and
Karypis, 2004; Shani et al., 2005]
Tensor and matrix factorization models
[Zheng et al., 2010, 2012; Wang et al., 2010; Symeonidis et al., 2011;
Adomavicius et al., 2011; Braunhofer et al., 2013]
Music Playlists
[Baccigalupo and Plaza, 2006; Chen et al., 2012]
POI/Itinerary
[Tai et al., 2008; Yoon et al., 2012]
Activity Recommendation Framework
[Kumar et al., 2017]
Stochastic Modelling
[Sun et al., 2016]
Activity Recommendation Framework
[Kumar et al., 2014, 2016]
8. Our Contribution
• A generic activity recommendation framework to recommend
the next sequence of activities to users based on past activity
patterns and context.
• A ML approach to learn optimal subsequence length for
matching current and past subsequences of user activity
patterns.
• Application of the proposed approach in lifelogging [iiWAS 2014],
transportation [KDD UrbComp 2016] and tourism [RecSys RecTour
2017] domains.
Insight Centre for Data Analytics RecTour 2017 Slide 7
9. Framework Overview
Ranking
User Data TimelinesData Modelling
Timeline Matching
Append top ranked activity
to timeline/RS
Similarity Assesment
Recommended Sequence (RS)
10. Data Model
Activity Object
A single occurrence of an activity and consists of a set of features
describing the activity or the context.
Activity Timeline
A chronological sequence of n activity objects performed by the
user during a time interval δ:
T =< ao1, ao2, ..., aon >
Museum, 10:15, (53.343N,-6.317W), 11015
Theatre, 16:50, (53.340,-6.263W), 2008
Italian-Food, 12:31, (53.339N,-6.277W), 128
ao1 ao2 ao3
11. Recommendation Algorithm
• Extracting candidate timelines from user timelines.
• ML: learning the optimal matching unit based on regularity
(sample entropy) and repetition (k-grams) in user activity
patterns in timelines.
• Timeline matching approaches:
• N-count matching
• N-hours matching
• Daywise matching
• Similarity Assessment: Two-level Edit Distance
• Ranking
• Predicting the next sequence of activities
Insight Centre for Data Analytics RecTour 2017 Slide 10
12. Recommendation Algorithm: Overview
Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity (aoc)
? ? ?
Current Timeline
Candidate Timeline #2
Candidate Timeline #1
Candidate Timeline #j
...
124 3
124 3
124 3
124 3
j
Ranked
1
Score(aoj
rec) = 1 −
d(Tj, Tc) − min
Tp∈T
d(Tp, Tc)
max
Tp∈T
d(Tp, Tc) − min
Tp∈T
d(Tp, Tc)
Insight Centre for Data Analytics RecTour 2017 Slide 11
13. Datasets
1. Lifelogging: 5 users, 41.4k images
2. Geolife Modes of transport: 18 users, 334 trips
3. Gowalla Checkins: 9.2k users, 6 million checkins
• median # checkins per user per day: 10 - 270
• median # of distinct categories per user per day : 2 - 76
0
500
1,000
1,500
2,000
2,500
3,000
3,500
0 25 50 75 100
Median #checkins per user per day
Numofusers
0
1000
2000
3000
4000
5000
0 10 20 30 40 50 60 70 80
Median #distinct categories per user per day
Numofusers
14. Methodology
User Timeline
Time
00 hrs 00 hrs 00 hrs Target Activity Seq
< aot1
, aot2
, aot3
>
Current
Activity
(aoc)
? ? ?
Recommendation time
(RT)
• Leave-one-out evaluation: Each user’s complete timeline is
split into training and test timelines by time.
• Agreement @ k (k = 1, 2, 3): % of RTs for a user where the
first k categories in the recommended sequence and the actual
sequence are an exact match.
• Recommendation algorithms:
• N-count recommendation algorithm (SeqNCSeqRec)
• Bi-gram-based sequence recommender (BiGramSeqRec)
• Popularity-based sequence recommender (PopSeqRec)
Insight Centre for Data Analytics RecTour 2017 Slide 13
15. Recommendation Performance (Level 2)
Figure: Mean percentage agreements for recommended sequences for
SeqNCSeqRec (top 10% neighbours) and baseline algorithms using
timelines constructed from categories at level 2 in the hierarchy.
Insight Centre for Data Analytics RecTour 2017 Slide 14
16. Recommendation Performance
• Some level 2 activities are semantically closer than others.
• ‘true’ performance between those at level 2 and level 1.
Food
Outdoors
Shopping
Nightlife
Entertainment
Community
Mexican
Travel
Asian
Italian
Coffee Shop
South American/Latin
Fish & Chips
Starbucks
Dunkin Donuts... ...
Level 1 Level 2 Level 3
(7) (134) (151)
Insight Centre for Data Analytics RecTour 2017 Slide 15
17. Recommendation Performance
• Some level 2 activities are semantically closer than others.
• ‘true’ performance between those at level 2 and level 1.
Food
Outdoors
Shopping
Nightlife
Entertainment
Community
Live Music
Travel
Casino
Art Museum
Ice Skating
Theatre
Zoo
...
Level 1 Level 2 Level 3
(7) (134) (151)
Insight Centre for Data Analytics RecTour 2017 Slide 16
18. Recommendation Performance (Level 1)
Figure: Mean percentage agreements for recommended sequences for
SeqNCSeqRec (top 10% neighbours) and baseline algorithms using
timelines constructed from categories at level 1 in the hierarchy.
Insight Centre for Data Analytics RecTour 2017 Slide 17
19. Conclusion
• A generic activity recommendation framework to recommend
the next sequence of activities to users based on past activity
patterns and context [Kumar et al., 2014, 2016, 2017].
• Experiments demonstrate the efficacy of our approach in
recommending sequences given a diverse variety of activities
and user activity patterns.
Insight Centre for Data Analytics RecTour 2017 Slide 18
20. Future Work
• Consider alternative approaches to suggest sequences of
activities (for example, using RNNs).
• Introduce new evaluation metrics for evaluating sequence
recommendation.
• Investigate the recommendation of context (for example,
where, when, with whom etc.) associated with each of the
suggested sequence of activities.
• Consider socio-economic characteristics, user demographics,
and travel variables.
Insight Centre for Data Analytics RecTour 2017 Slide 19
22. References I
G. Dong and J. Pei. Sequence Data Mining (Advances in Database Systems).
Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007. ISBN 0387699368.
G. Kumar, H. Jerbi, C. Gurrin, and M. P. O’Mahony. Towards activity
recommendation from lifelogs. In Proceedings of the 16th International Conference
on Information Integration and Web-based Applications & Services, iiWAS ’14,
pages 87–96, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-3001-5. doi:
10.1145/2684200.2684298. URL http://doi.acm.org/10.1145/2684200.2684298.
G. Kumar, H. Jerbi, and M. P. O’Mahony. Personalised recommendations for modes
of transport: A sequence-based approach. The 5th ACM SIGKDD International
Workshop on Urban Computing (UrbComp 2016), 2016.
G. Kumar, H. Jerbi, and M. P. O’Mahony. Towards the recommendation of
personalised activity sequences in the tourism domain. The 2nd ACM RecSys
Workshop on Recommenders in Tourism (RecTour 2017), 2017.
Z. Xing, J. Pei, and E. Keogh. A brief survey on sequence classification. SIGKDD
Explor. Newsl., 12(1):40–48, Nov. 2010. ISSN 1931-0145. doi:
10.1145/1882471.1882478. URL http://doi.acm.org/10.1145/1882471.1882478.
36. Two-level Distance
• Inspired by edit distance
• Adapted for sequence of objects
09:00 10:05 10:35 10:50 11:32
Step 1
Align activities
11:25
( (
dactivity (T1, T2) =
r
i=1
wobj × cins +
s
j=1
wobj × cdel +
t
k=1
wobj × csub
37. Two-level Distance
• Inspired by edit distance
• Adapted for sequence of objects
09:00 10:05 10:35 10:50 11:32
Step 1
Step 2
Align activities
Align features
11:25
(
(
(
(
d(T1, T2) = dactivity (T1, T2) +
n
i=1
dfeature(ao1
i , ao2
i ) .
38. Methodology
Learning Optimal Matching Unit range
• Wrapper attribute selection: C4.5 algorithm, greedy
backward search and area under ROC curve as evaluation
measure.
• Classification: pruned attribute vectors for each user fed into
a C4.5 induction algorithm to predict optimal matching unit
range.
Insight Centre for Data Analytics RecTour 2017 Slide 8
39. Attribute Extraction: Timeline Decomposition
• Each user represented by an attribute vector.
• For attribute extraction:
timelines are decomposed into features-sequence :
User Timeline
Time
00 hrs 00 hrs 00 hrs
40. Attribute Extraction: Timeline Decomposition
• Each user represented by an attribute vector.
• For attribute extraction:
timelines are decomposed into features-sequence :
User Timeline
Time
00 hrs 00 hrs 00 hrs
dist-travel
start-geo
start-time
41. Attribute Extraction: Timeline Decomposition
• Each user represented by an attribute vector.
• For attribute extraction:
timelines are decomposed into features-sequence :
User Timeline
Time
00 hrs 00 hrs 00 hrs
dist-travel
start-geo
dist-travel
start-geo
start-time
start-time
43. Timeline Attributes
Regularity Attributes: Sample Entropy
1. SampEnp
z : sample entropy of a feature sequence Sz for epoch
length p,
2. µSampEnp
T : mean sample entropy over all feature sequences
Sz , z = 1, 2, ..., m of timeline T for epoch length p,
3. σSampEnp
T : standard deviation of sample entropy over all
feature sequences Sz , z = 1, 2, ..., m of timeline T for epoch
length p.
Insight Centre for Data Analytics RecTour 2017 Slide 11
44. Timeline Attributes
Repetition Attributes: k-gram attributes
Previously used for sequence classification, biological sequence
analysis and text classification [Xing et al., 2010; Dong and Pei, 2007].
1. ηk
z : total number of distinct k-grams in feature sequence Sz ,
normalised by total number of k-grams occurring in Sz ,
2. µf k
z : mean frequency of occurrence of distinct k-grams in
feature sequence Sz , normalised by total number of k-grams
occurring in Sz ,
3. σf k
z : standard deviation of frequency of occurrence of distinct
k-grams in feature sequence Sz , normalised by length of Sz .
Insight Centre for Data Analytics RecTour 2017 Slide 12
45. Trend across matching-unit N
5
6
7
8
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 3333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333333 4444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444444 6666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666666 8888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
N−count
Mean%agreement
Group1[0]
Group2[1,4]
Group3[5+)
Figure: MRR versus matching unit for three user groups for first
sequence index.