This document describes a new technique called "Latent Cross" for incorporating contextual data into recurrent neural network (RNN) recommender systems more effectively. The authors first demonstrate that modeling context as direct features in feed-forward neural networks is inefficient at capturing common feature interactions. They then apply this insight to design an improved RNN recommender system that uses Latent Cross. Latent Cross embeds the context and performs an element-wise product of the context embedding with the RNN's hidden states, allowing the model to better understand how context affects recommendations. The authors evaluate their approach on a large-scale RNN recommender system at YouTube and show that Latent Cross improves recommendation performance over conventional techniques.
A recommender system-using novel deep network collaborative filteringIAESIJAI
The recommendation model aims to predict the user’s preferred items among million through analyzing the user-item relations; furthermore, Collaborative Filtering has been utilized as one of the successful recommendation approaches in last few years; however, it has the issue of sparsity. This research work develops a deep network collaborative filtering (DeepNCF), which incorporates graph neural network (GNN), and novel network collaborative filtering (NCF) for performance enhancement. At first user-item dual network is constructed, thereafter-custom weighted dual mode modularity is developed for edge clustering. Furthermore, GNN is utilized for capturing the complex relation between user and item. DeepNCF is evaluated considering the two distinctive. The experimental analysis is carried out on two datasets for Amazon and movielens dataset for recall@20 and recall@50 and the normalized discounted cumulative gain (NDCG) metric is evaluated for Amazon Dataset for NDCG@20 and NDCG@50. The proposed method outperforms the most relevant research and is accurate enough to give personalized recommendations and diversity.
Big data cloud-based recommendation system using NLP techniques with machine ...TELKOMNIKA JOURNAL
Recommendation systems (RS) are crucial for social networking sites. Without it, finding precise products is harder. However, existing systems lack adequate efficiency, especially with big data. This paper presents a prototype cloud-based recommendation system for processing big data. The proposed work is implemented by utilizing the matrix factorization method with three approaches. In the first approach, singular value decomposition (SVD) is used, which is an old and traditional recommendation technique. The second recommendation approach is fine-tuned using the alternating least squares (ALS) algorithm with Apache Spark. Finally, the deep neural network (DNN) algorithm is utilized with TensorFlow. This study solves the challenge of handling large-scale datasets in the collaborative filtering (CF) technique after tuning the algorithms by adjusting the parameters in the second approach, which uses machine learning, as well as in the third approach, which uses deep learning. Furthermore, the results of these two approaches outperformed conventional techniques and achieved an acceptable computational time. The dataset size is about 1.5 GB and it is collected from the Goodreads website API. Moreover, the Hadoop distributed file system (HDFS) is used as cloud storage instead of the computer’s local disk for handling larger dataset sizes in the future.
On Using Network Science in Mining Developers Collaboration in Software Engin...IJDKP
Background: Network science is the set of mathematical frameworks, models, and measures that are used to understand a complex system modeled as a network composed of nodes and edges. The nodes of a network represent entities and the edges represent relationships between these entities. Network science has been used in many research works for mining human interaction during different phases of software engineering (SE). Objective: The goal of this study is to identify, review, and analyze the published research works that used network analysis as a tool for understanding the human collaboration on different levels of software development. This study and its findings are expected to be of benefit for software engineering practitioners and researchers who are mining software repositories using tools from network science field. Method: We conducted a systematic literature review, in which we analyzed a number of selected papers from different digital libraries based on inclusion and exclusion criteria. Results: We identified 35 primary studies (PSs) from four digital libraries, then we extracted data from each PS according to a predefined data extraction sheet. The results of our data analysis showed that not all of the constructed networks used in the PSs were valid as the edges of these networks did not reflect a real relationship between the entities of the network. Additionally, the used measures in the PSs were in many cases not suitable for the used networks. Also, the reported analysis results by the PSs were not, in most cases, validated using any statistical model. Finally, many of the PSs did not provide lessons or guidelines for software practitioners that can improve the software engineering practices. Conclusion: Although employing network analysis in mining developers’ collaboration showed some satisfactory results in some of the PSs, the application of network analysis needs to be conducted more carefully. That is said, the constructed network should be representative and meaningful, the used measure needs to be suitable for the context, and the validation of the results should be considered. More and above, we state some research gaps, in which network science can be applied, with some pointers to recent advances that can be used to mine collaboration networks.
On Using Network Science in Mining Developers Collaboration in Software Engin...IJDKP
Background: Network science is the set of mathematical frameworks, models, and measures that are
used to understand a complex system modeled as a network composed of nodes and edges. The nodes of a network
represent entities and the edges represent relationships between these entities. Network science has been used in
many research works for mining human interaction during different phases of software engineering (SE)
An efficient information retrieval ontology system based indexing for contexteSAT Journals
Abstract Many of the research or development projects are constructed and vast type of artifacts are released such as article, patent, report of research, conference papers, journal papers, experimental data and so on. The searching of the particular context through the keywords from the repository is not an easy task because the earliest system the problem of huge recalls with low precision. This paper challenges to construct a search algorithm based on the ontology to retrieve the relevant contexts. Ontology's are great knowledge of retrieving the context. In this paper, we utilize the WordNet ontology to retrieve the relevant contexts from the document repository. It is very difficult to retrieve the relevant context in its original format since we use the pre-processing step, which helps to retrieve context. The pre-processing step includes two major steps first one is stop word removal and the second one is stemming process. The outcome of the pre-processing step is indexing consist of important keywords and their corresponding keywords. When the user enter the keyword to the system, the ontology makes the several steps to make the refine keywords. Finally, the refine keywords are matched with index and relevant contexts are retrieved. The experimentation process is carried out with the help of different set of contexts to achieve the results and the performance analysis of the proposed approach is estimated by the evaluation metrics like precision, recall and F-measure. Keywords— Ontologies; WordNet; contexts; stemming; indexing.
A recommender system-using novel deep network collaborative filteringIAESIJAI
The recommendation model aims to predict the user’s preferred items among million through analyzing the user-item relations; furthermore, Collaborative Filtering has been utilized as one of the successful recommendation approaches in last few years; however, it has the issue of sparsity. This research work develops a deep network collaborative filtering (DeepNCF), which incorporates graph neural network (GNN), and novel network collaborative filtering (NCF) for performance enhancement. At first user-item dual network is constructed, thereafter-custom weighted dual mode modularity is developed for edge clustering. Furthermore, GNN is utilized for capturing the complex relation between user and item. DeepNCF is evaluated considering the two distinctive. The experimental analysis is carried out on two datasets for Amazon and movielens dataset for recall@20 and recall@50 and the normalized discounted cumulative gain (NDCG) metric is evaluated for Amazon Dataset for NDCG@20 and NDCG@50. The proposed method outperforms the most relevant research and is accurate enough to give personalized recommendations and diversity.
Big data cloud-based recommendation system using NLP techniques with machine ...TELKOMNIKA JOURNAL
Recommendation systems (RS) are crucial for social networking sites. Without it, finding precise products is harder. However, existing systems lack adequate efficiency, especially with big data. This paper presents a prototype cloud-based recommendation system for processing big data. The proposed work is implemented by utilizing the matrix factorization method with three approaches. In the first approach, singular value decomposition (SVD) is used, which is an old and traditional recommendation technique. The second recommendation approach is fine-tuned using the alternating least squares (ALS) algorithm with Apache Spark. Finally, the deep neural network (DNN) algorithm is utilized with TensorFlow. This study solves the challenge of handling large-scale datasets in the collaborative filtering (CF) technique after tuning the algorithms by adjusting the parameters in the second approach, which uses machine learning, as well as in the third approach, which uses deep learning. Furthermore, the results of these two approaches outperformed conventional techniques and achieved an acceptable computational time. The dataset size is about 1.5 GB and it is collected from the Goodreads website API. Moreover, the Hadoop distributed file system (HDFS) is used as cloud storage instead of the computer’s local disk for handling larger dataset sizes in the future.
On Using Network Science in Mining Developers Collaboration in Software Engin...IJDKP
Background: Network science is the set of mathematical frameworks, models, and measures that are used to understand a complex system modeled as a network composed of nodes and edges. The nodes of a network represent entities and the edges represent relationships between these entities. Network science has been used in many research works for mining human interaction during different phases of software engineering (SE). Objective: The goal of this study is to identify, review, and analyze the published research works that used network analysis as a tool for understanding the human collaboration on different levels of software development. This study and its findings are expected to be of benefit for software engineering practitioners and researchers who are mining software repositories using tools from network science field. Method: We conducted a systematic literature review, in which we analyzed a number of selected papers from different digital libraries based on inclusion and exclusion criteria. Results: We identified 35 primary studies (PSs) from four digital libraries, then we extracted data from each PS according to a predefined data extraction sheet. The results of our data analysis showed that not all of the constructed networks used in the PSs were valid as the edges of these networks did not reflect a real relationship between the entities of the network. Additionally, the used measures in the PSs were in many cases not suitable for the used networks. Also, the reported analysis results by the PSs were not, in most cases, validated using any statistical model. Finally, many of the PSs did not provide lessons or guidelines for software practitioners that can improve the software engineering practices. Conclusion: Although employing network analysis in mining developers’ collaboration showed some satisfactory results in some of the PSs, the application of network analysis needs to be conducted more carefully. That is said, the constructed network should be representative and meaningful, the used measure needs to be suitable for the context, and the validation of the results should be considered. More and above, we state some research gaps, in which network science can be applied, with some pointers to recent advances that can be used to mine collaboration networks.
On Using Network Science in Mining Developers Collaboration in Software Engin...IJDKP
Background: Network science is the set of mathematical frameworks, models, and measures that are
used to understand a complex system modeled as a network composed of nodes and edges. The nodes of a network
represent entities and the edges represent relationships between these entities. Network science has been used in
many research works for mining human interaction during different phases of software engineering (SE)
An efficient information retrieval ontology system based indexing for contexteSAT Journals
Abstract Many of the research or development projects are constructed and vast type of artifacts are released such as article, patent, report of research, conference papers, journal papers, experimental data and so on. The searching of the particular context through the keywords from the repository is not an easy task because the earliest system the problem of huge recalls with low precision. This paper challenges to construct a search algorithm based on the ontology to retrieve the relevant contexts. Ontology's are great knowledge of retrieving the context. In this paper, we utilize the WordNet ontology to retrieve the relevant contexts from the document repository. It is very difficult to retrieve the relevant context in its original format since we use the pre-processing step, which helps to retrieve context. The pre-processing step includes two major steps first one is stop word removal and the second one is stemming process. The outcome of the pre-processing step is indexing consist of important keywords and their corresponding keywords. When the user enter the keyword to the system, the ontology makes the several steps to make the refine keywords. Finally, the refine keywords are matched with index and relevant contexts are retrieved. The experimentation process is carried out with the help of different set of contexts to achieve the results and the performance analysis of the proposed approach is estimated by the evaluation metrics like precision, recall and F-measure. Keywords— Ontologies; WordNet; contexts; stemming; indexing.
Multi-objective NSGA-II based community detection using dynamical evolution s...IJECEIAES
Community detection is becoming a highly demanded topic in social networking-based applications. It involves finding the maximum intraconnected and minimum inter-connected sub-graphs in given social networks. Many approaches have been developed for community’s detection and less of them have focused on the dynamical aspect of the social network. The decision of the community has to consider the pattern of changes in the social network and to be smooth enough. This is to enable smooth operation for other community detection dependent application. Unlike dynamical community detection Algorithms, this article presents a non-dominated aware searching Algorithm designated as non-dominated sorting based community detection with dynamical awareness (NDS-CD-DA). The Algorithm uses a non-dominated sorting genetic algorithm NSGA-II with two objectives: modularity and normalized mutual information (NMI). Experimental results on synthetic networks and real-world social network datasets have been compared with classical genetic with a single objective and has been shown to provide superiority in terms of the domination as well as the convergence. NDS-CD-DA has accomplished a domination percentage of 100% over dynamic evolutionary community searching DECS for almost all iterations.
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...ijcseit
This paper mainly presents some technical discussions on the identification and analyze of “LAN usersessions”.
The identification of a user-session is non trivial. Classical methods approaches rely on
threshold based mechanisms. Threshold based techniques are very sensitive to the value chosen for the
threshold, which may be difficult to set correctly. Clustering techniques are used to define a novel
methodology to identify LAN user-sessions without requiring an a priori definition of threshold values. We
have defined a clustering based approach in detail, and also we discussed positive and negative of this
approach, and we apply it to real traffic traces. The proposed methodology is applied to artificially
generated traces to evaluate its benefits against traditional threshold based approaches. We also analyzed
the characteristics of user-sessions extracted by the clustering methodology from real traces and study
their statistical properties.
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELSIJDKP
Due to the enormous amount of data and opinions being produced, shared and transferred everyday across the internet and other media, Sentiment analysis has become vital for developing opinion mining systems. This paper introduces a developed classification sentiment analysis using deep learning networks and introduces comparative results of different deep learning networks. Multilayer Perceptron (MLP) was developed as a baseline for other networks results. Long short-term memory (LSTM) recurrent neural network, Convolutional Neural Network (CNN) in addition to a hybrid model of LSTM and CNN were developed and applied on IMDB dataset consists of 50K movies reviews files. Dataset was divided to 50% positive reviews and 50% negative reviews. The data was initially pre-processed using Word2Vec and word embedding was applied accordingly. The results have shown that, the hybrid CNN_LSTM model have outperformed the MLP and singular CNN and LSTM networks. CNN_LSTM have reported the accuracy of 89.2% while CNN has given accuracy of 87.7%, while MLP and LSTM have reported accuracy of 86.74% and 86.64 respectively. Moreover, the results have elaborated that the proposed deep learning models have also outperformed SVM, Naïve Bayes and RNTN that were published in other works using English datasets.
A Novel Approach for Travel Package Recommendation Using Probabilistic Matrix...IJSRD
Recent years have witnessed an increased interest on recommendation system. Classification techniques are supervised that has classified data item into predefined class. An existing system unsupervised constraints are automatically derived from two hidden Tourist area season topic (TAST) for tourist in travel group. It used to an alternating TRAST model are unique characteristic for the travel data and cocktail.
Proactive Intelligent Home System Using Contextual Information and Neural Net...IJERA Editor
Nowadays, cities around the world intend to use information technology to improve the lives of their citizens.
Future smart cities will incorporate digital data and technology to interact differently with their human
inhabitants.
Among the key component of a smart city, we find the smart home component. It is an autonomic environment
that can provide various smart services by considering the user’s context information. Several methods are used
in context-aware system to provide such services. In this paper, we propose an approach to offer the most
relevant services to the user according to any significant change of his context environment. The proposed
approach is based on the use of context history information together with user profiling and machine learning
techniques. Experimentations show that the proposed solution can efficiently provide the most useful services to
the user in an intelligent home environment.
Following the user’s interests in mobile context aware recommender systemsBouneffouf Djallel
The wide development of mobile applications provides a considerable amount of data of all types (images, texts, sounds, videos, etc.). In this sense, Mobile Context-aware Recommender Systems (MCRS) suggest the user suitable information depending on her/his situation and interests. Two key questions have to be considered 1) how to recommend the user information that follows his/her interests evolution? 2) how to model the user’s situation and its related interests? To the best of our knowledge, no existing work proposing a MCRS tries to answer both questions as we do. This paper describes an ongoing work on the implementation of a MCRS based on the hybrid-ε-greedy algorithm we propose, which combines the standard ε-greedy algorithm and both content-based filtering and case-based reasoning techniques.
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
Dr.G.Anandharaj1, Dr.P.Srimanchari2
1Associate Professor and Head, Department of Computer Science
Adhiparasakthi College of Arts and Science (Autonomous), Kalavai, Vellore (Dt) -632506
2 Assistant Professor and Head, Department of Computer Applications
Erode Arts and Science College (Autonomous), Erode (Dt) - 638001
ABSTRACT
In unpredictable increase in mobile apps, more and more threats migrate from outmoded PC client to mobile device. Compared
with traditional windows Intel alliance in PC, Android alliance dominates in Mobile Internet, the apps replace the PC client
software as the foremost target of hateful usage. In this paper, to improve the confidence status of recent mobile apps, we
propose a methodology to estimate mobile apps based on cloud computing platform and data mining. Compared with
traditional method, such as permission pattern based method, combines the dynamic and static analysis methods to
comprehensively evaluate an Android applications The Internet of Things (IoT) indicates a worldwide network of
interconnected items uniquely addressable, via standard communication protocols. Accordingly, preparing us for the
forthcoming invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve
progression efficiency and provide advanced intelligence. In this paper, we propose an efficient multidimensional fusion
algorithm for IoT data based on partitioning. Finally, the attribute reduction and rule extraction methods are used to obtain the
synthesis results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is
illustrated. This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for
big data. These classifiers are very hefty, but are quite easy to generate and use. They can be so large that it makes sense to use
them only for big data. Our experiments compare LIME classifiers with various vile classifiers and standard ordinary ensemble
Meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of
classifications. LIME classifiers made better than the base classifiers and standard ensemble Meta classifiers.
Keywords: LIME classifiers, ensemble Meta classifiers, Internet of Things, Big data
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESSIJNSA Journal
The emerging need for qualitative approaches in context-aware information processing calls for proper modelling of context information and efficient handling of its inherent uncertainty resulted from human interpretation and usage. Many of the current approaches to context-awareness either lack a solid theoretical basis for modelling or ignore important requirements such as modularity, high-order uncertainty management and group-based context-awareness. Therefore, their real-world application and extendibility remains limited. In this paper, we present f-Context as a service-based contextawareness framework, based on language-action perspective (LAP) theory for modelling. Then we identify some of the complex, informational parts of context which contain high-order uncertainties due to differences between members of the group in defining them. An agent-based perceptual computer architecture is proposed for implementing f-Context that uses computing with words (CWW) for handling uncertainty. The feasibility of f-Context is analyzed using a realistic scenario involving a group of mobile users. We believe that the proposed approach can open the door to future research on context-awareness by offering a theoretical foundation based on human communication, and a service-based layered architecture which exploits CWW for context-aware, group-based and platform-independent access to information systems.
Parallel multivariate deep learning models for time-series prediction: A comp...IAESIJAI
This study investigates deep learning models for financial data prediction and examines whether the architecture of a deep learning model and time-series data properties affect prediction accuracy. Comparing the performance of convolutional neural network (CNN), long short-term memory (LSTM), Stacked-LSTM, CNN-LSTM, and convolutional LSTM (ConvLSTM) when used as a prediction approach to a collection of financial time-series data is the main methodology of this study. In this instance, only those deep learning architectures that can predict multivariate time-series data sets in parallel are considered. This research uses the daily movements of 4 (four) Asian stock market indices from 1 January 2020 to 31 December 2020. Using data from the early phase of the spread of the Covid-19 pandemic that has created worldwide economic turmoil is intended to validate the performance of the analyzed deep learning models. Experiment results and analytical findings indicate that there is no superior deep learning model that consistently makes the most accurate predictions for all states' financial data. In addition, a single deep learning model tends to provide more accurate predictions for more stable time-series data, but the hybrid model is preferred for more chaotic time-series data.
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification.
This work highlights transfer learning’s effectiveness in image classification using CNNs and VGG 16 that
provides insights into the selection of pre-trained models and hyper parameters for optimal performance.
We have proposed a comprehensive approach for image segmentation and classification, incorporating preprocessing techniques, the K-means algorithm for segmentation, and employing deep learning models such
as CNN and VGG 16 for classification.
Multi-objective NSGA-II based community detection using dynamical evolution s...IJECEIAES
Community detection is becoming a highly demanded topic in social networking-based applications. It involves finding the maximum intraconnected and minimum inter-connected sub-graphs in given social networks. Many approaches have been developed for community’s detection and less of them have focused on the dynamical aspect of the social network. The decision of the community has to consider the pattern of changes in the social network and to be smooth enough. This is to enable smooth operation for other community detection dependent application. Unlike dynamical community detection Algorithms, this article presents a non-dominated aware searching Algorithm designated as non-dominated sorting based community detection with dynamical awareness (NDS-CD-DA). The Algorithm uses a non-dominated sorting genetic algorithm NSGA-II with two objectives: modularity and normalized mutual information (NMI). Experimental results on synthetic networks and real-world social network datasets have been compared with classical genetic with a single objective and has been shown to provide superiority in terms of the domination as well as the convergence. NDS-CD-DA has accomplished a domination percentage of 100% over dynamic evolutionary community searching DECS for almost all iterations.
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...ijcseit
This paper mainly presents some technical discussions on the identification and analyze of “LAN usersessions”.
The identification of a user-session is non trivial. Classical methods approaches rely on
threshold based mechanisms. Threshold based techniques are very sensitive to the value chosen for the
threshold, which may be difficult to set correctly. Clustering techniques are used to define a novel
methodology to identify LAN user-sessions without requiring an a priori definition of threshold values. We
have defined a clustering based approach in detail, and also we discussed positive and negative of this
approach, and we apply it to real traffic traces. The proposed methodology is applied to artificially
generated traces to evaluate its benefits against traditional threshold based approaches. We also analyzed
the characteristics of user-sessions extracted by the clustering methodology from real traces and study
their statistical properties.
SENTIMENT ANALYSIS FOR MOVIES REVIEWS DATASET USING DEEP LEARNING MODELSIJDKP
Due to the enormous amount of data and opinions being produced, shared and transferred everyday across the internet and other media, Sentiment analysis has become vital for developing opinion mining systems. This paper introduces a developed classification sentiment analysis using deep learning networks and introduces comparative results of different deep learning networks. Multilayer Perceptron (MLP) was developed as a baseline for other networks results. Long short-term memory (LSTM) recurrent neural network, Convolutional Neural Network (CNN) in addition to a hybrid model of LSTM and CNN were developed and applied on IMDB dataset consists of 50K movies reviews files. Dataset was divided to 50% positive reviews and 50% negative reviews. The data was initially pre-processed using Word2Vec and word embedding was applied accordingly. The results have shown that, the hybrid CNN_LSTM model have outperformed the MLP and singular CNN and LSTM networks. CNN_LSTM have reported the accuracy of 89.2% while CNN has given accuracy of 87.7%, while MLP and LSTM have reported accuracy of 86.74% and 86.64 respectively. Moreover, the results have elaborated that the proposed deep learning models have also outperformed SVM, Naïve Bayes and RNTN that were published in other works using English datasets.
A Novel Approach for Travel Package Recommendation Using Probabilistic Matrix...IJSRD
Recent years have witnessed an increased interest on recommendation system. Classification techniques are supervised that has classified data item into predefined class. An existing system unsupervised constraints are automatically derived from two hidden Tourist area season topic (TAST) for tourist in travel group. It used to an alternating TRAST model are unique characteristic for the travel data and cocktail.
Proactive Intelligent Home System Using Contextual Information and Neural Net...IJERA Editor
Nowadays, cities around the world intend to use information technology to improve the lives of their citizens.
Future smart cities will incorporate digital data and technology to interact differently with their human
inhabitants.
Among the key component of a smart city, we find the smart home component. It is an autonomic environment
that can provide various smart services by considering the user’s context information. Several methods are used
in context-aware system to provide such services. In this paper, we propose an approach to offer the most
relevant services to the user according to any significant change of his context environment. The proposed
approach is based on the use of context history information together with user profiling and machine learning
techniques. Experimentations show that the proposed solution can efficiently provide the most useful services to
the user in an intelligent home environment.
Following the user’s interests in mobile context aware recommender systemsBouneffouf Djallel
The wide development of mobile applications provides a considerable amount of data of all types (images, texts, sounds, videos, etc.). In this sense, Mobile Context-aware Recommender Systems (MCRS) suggest the user suitable information depending on her/his situation and interests. Two key questions have to be considered 1) how to recommend the user information that follows his/her interests evolution? 2) how to model the user’s situation and its related interests? To the best of our knowledge, no existing work proposing a MCRS tries to answer both questions as we do. This paper describes an ongoing work on the implementation of a MCRS based on the hybrid-ε-greedy algorithm we propose, which combines the standard ε-greedy algorithm and both content-based filtering and case-based reasoning techniques.
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
Dr.G.Anandharaj1, Dr.P.Srimanchari2
1Associate Professor and Head, Department of Computer Science
Adhiparasakthi College of Arts and Science (Autonomous), Kalavai, Vellore (Dt) -632506
2 Assistant Professor and Head, Department of Computer Applications
Erode Arts and Science College (Autonomous), Erode (Dt) - 638001
ABSTRACT
In unpredictable increase in mobile apps, more and more threats migrate from outmoded PC client to mobile device. Compared
with traditional windows Intel alliance in PC, Android alliance dominates in Mobile Internet, the apps replace the PC client
software as the foremost target of hateful usage. In this paper, to improve the confidence status of recent mobile apps, we
propose a methodology to estimate mobile apps based on cloud computing platform and data mining. Compared with
traditional method, such as permission pattern based method, combines the dynamic and static analysis methods to
comprehensively evaluate an Android applications The Internet of Things (IoT) indicates a worldwide network of
interconnected items uniquely addressable, via standard communication protocols. Accordingly, preparing us for the
forthcoming invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve
progression efficiency and provide advanced intelligence. In this paper, we propose an efficient multidimensional fusion
algorithm for IoT data based on partitioning. Finally, the attribute reduction and rule extraction methods are used to obtain the
synthesis results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is
illustrated. This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for
big data. These classifiers are very hefty, but are quite easy to generate and use. They can be so large that it makes sense to use
them only for big data. Our experiments compare LIME classifiers with various vile classifiers and standard ordinary ensemble
Meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of
classifications. LIME classifiers made better than the base classifiers and standard ensemble Meta classifiers.
Keywords: LIME classifiers, ensemble Meta classifiers, Internet of Things, Big data
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESSIJNSA Journal
The emerging need for qualitative approaches in context-aware information processing calls for proper modelling of context information and efficient handling of its inherent uncertainty resulted from human interpretation and usage. Many of the current approaches to context-awareness either lack a solid theoretical basis for modelling or ignore important requirements such as modularity, high-order uncertainty management and group-based context-awareness. Therefore, their real-world application and extendibility remains limited. In this paper, we present f-Context as a service-based contextawareness framework, based on language-action perspective (LAP) theory for modelling. Then we identify some of the complex, informational parts of context which contain high-order uncertainties due to differences between members of the group in defining them. An agent-based perceptual computer architecture is proposed for implementing f-Context that uses computing with words (CWW) for handling uncertainty. The feasibility of f-Context is analyzed using a realistic scenario involving a group of mobile users. We believe that the proposed approach can open the door to future research on context-awareness by offering a theoretical foundation based on human communication, and a service-based layered architecture which exploits CWW for context-aware, group-based and platform-independent access to information systems.
Parallel multivariate deep learning models for time-series prediction: A comp...IAESIJAI
This study investigates deep learning models for financial data prediction and examines whether the architecture of a deep learning model and time-series data properties affect prediction accuracy. Comparing the performance of convolutional neural network (CNN), long short-term memory (LSTM), Stacked-LSTM, CNN-LSTM, and convolutional LSTM (ConvLSTM) when used as a prediction approach to a collection of financial time-series data is the main methodology of this study. In this instance, only those deep learning architectures that can predict multivariate time-series data sets in parallel are considered. This research uses the daily movements of 4 (four) Asian stock market indices from 1 January 2020 to 31 December 2020. Using data from the early phase of the spread of the Covid-19 pandemic that has created worldwide economic turmoil is intended to validate the performance of the analyzed deep learning models. Experiment results and analytical findings indicate that there is no superior deep learning model that consistently makes the most accurate predictions for all states' financial data. In addition, a single deep learning model tends to provide more accurate predictions for more stable time-series data, but the hybrid model is preferred for more chaotic time-series data.
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification
Image segmentation and classification tasks in computer vision have proven to be highly effective using neural networks, specifically Convolutional Neural Networks (CNNs). These tasks have numerous
practical applications, such as in medical imaging, autonomous driving, and surveillance. CNNs are capable
of learning complex features directly from images and achieving outstanding performance across several
datasets. In this work, we have utilized three different datasets to investigate the efficacy of various preprocessing and classification techniques in accurssedately segmenting and classifying different structures
within the MRI and natural images. We have utilized both sample gradient and Canny Edge Detection
methods for pre-processing, and K-means clustering have been applied to segment the images. Image
augmentation improves the size and diversity of datasets for training the models for image classification.
This work highlights transfer learning’s effectiveness in image classification using CNNs and VGG 16 that
provides insights into the selection of pre-trained models and hyper parameters for optimal performance.
We have proposed a comprehensive approach for image segmentation and classification, incorporating preprocessing techniques, the K-means algorithm for segmentation, and employing deep learning models such
as CNN and VGG 16 for classification.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
2. we offer a theoretical interpretation of the limitations of modeling
context as direct features, particularly using feed-forward neural
networks as the example baseline DNN approach. We then offer
an easy-to-use technique to incorporate these features that results
in improved prediction accuracy, even within a more complicated
RNN model.
Our contributions are:
• First-Order Challenges: We demonstrate the challenges of
first-order neural networks to model low-rank relationships.
• Production Model: We describe how we have constructed
a large-scale RNN recommender system for YouTube.
• Latent Cross: We offer a simple technique, called the "Latent
Cross", to include contextual features more expressively in
our model. Specifically, latent cross performs an element-
wise product between the context embedding and the neural
network hidden states.
• Empirical Results: We offer empirical results verifying that
our approach improves recommendation accuracy.
2 RELATED WORK
We begin with a survey of the various related research. An overview
can be seen in Table 1.
Contextual Recommendation. A significant amount of research
has focused on using contextual data during recommendation. In
particular, certain types of contextual data have been explored in
depth, where as others have been treated abstractly. For exam-
ple, temporal dynamics in recommendation have been explored
widely [6]. During the Netflix Prize [4], Koren [29] discovered the
significant long-ranging temporal dynamics in the Netflix data set
and added temporal features to his Collaborative Filtering (CF)
model to account for these effects. Researchers have also explored
the how preferences evolve in shorter time-scales, e.g., sessions [39].
More general abstractions have been used to model preference evo-
lution for recommendation such as point processes [15] and recur-
rent neural networks [43]. Similarly, modeling user actions along
with geographical data has been widely explored with probabilistic
models [2, 8], matrix factorization [32], and tensor factorization [17].
A variety of methods have built on matrix and tensor factorization
for cross domain learning [45, 46]. Methods like factorization ma-
chines [34] and other contextual recommenders [22, 37, 48] have
provided generalizations of these collaborative filtering approaches.
Neural Recommender Systems. As neural networks have grown
in popularity for computer vision and natural language processing
(NLP) tasks, recommender systems researchers have begun apply-
ing DNNs to recommendation. Early iterations focused on directly
applying the collaborative filtering intuition to neural networks,
such as through an auto-encoder [36] or joint deep and CF models
[20]. More complex networks have been devised to incorporate
a wider variety of input features [11]. Cheng et al. [7] handles
this problem through a linear model to handle interactions among
contextual features, outside the DNN portion of the model.
There has been a recent growth in using recurrent neural net-
works for recommendation [21, 25, 39, 43]. [25, 43] include temporal
information as features and supervision in their models, and [41]
includes general context features. However, in both cases, these
features are concatenated with input, which we show provides lim-
ited benefit. Concurrent and independent research [49] improved
LSTMs by multiplicatively incorporating temporal information, but
did not generalize this to other context data.
Second-order Neural Networks. A major thrust of this paper is
the importance of multiplicative relations in neural recommenders.
These second-order units show up in a few places in neural net-
works. Recurrent units, e.g., LSTMs [23] and GRUs [10], are com-
mon second-order units with gating mechanisms that use an element-
wise multiplication. A more complete tutorial on recurrent net-
works can be found in [18].
Additionally, softmax layers at the top of networks for classifica-
tion are explicitly bi-linear layers between the embedding produced
by the DNN and embeddings of the label classes. This technique
has been extended in multiple papers to include user-item bi-linear
layers on top of DNNs [20, 41, 43, 47].
Similar to the technique described in this paper is a body of work
on multiplicative models [27, 44]. These multiplicative structures
have most commonly been used in natural language processing as
in [14, 27]. The NLP approach was applied to personalized modeling
of reviews [40] (with a slightly different mathematical structure).
Recently, [25] uses the multiplicative technique not over contex-
tual data but directly over users, similar to a tensor factorization.
PNN [33] and NFM [19] push this idea to an extreme, taking mul-
tiplying all pairs of features at the input and either concatenating
or averaging the results before passing through a feed-forward
network. The intuition of these models is similar to ours, but differ
in that we focus on the relationship between contextual data and
user actions, our latent crossing mechanism can be and is applied
throughout the model, and we demonstrate the importance of these
interactions even within an RNN recommender system.
More complex model structures like attention models [3], mem-
ory networks [38], and meta-learning [42] also rely on second-order
relations and are increasingly popular. Attention models, for in-
stance, use attention vectors that modulate the hidden states with
a multiplication. However, these methods are significantly more
complex structurally and are generally found to be much more
difficult to train. In contrast, the latent cross technique proposed in
this paper we found to be easy to train and effective in practice.
3 MODELING PRELIMINARIES
We consider a recommender system in which we we have a database
E of events e which are k-way tuples. We consider eℓ to refer to
the ℓth value in the tuple and eℓ̄ to refer to the other k − 1 values
in the tuple.
For example, the Netflix Prize setting would be described by tu-
ples e ≡ (i, j,R) where user i gave movie j a rating of R. We may also
have context such as time and device such that e ≡ (i, j,t,d) where
user i watched video j at time t on device type d. Note, each value
can be either discrete categorical variables, as in there are N users
where i ∈ I, or continuous, e.g., t is a unix timestamp. Continuous
variables are not uncommonly discretized as a preprocessing step,
such as to convert t to only the day on which the event took place.
With this data, we can frame recommender systems as trying to
predict one value of the event given the others. For example, the
Netflix Prize said for a tuple e = (i, j,R), use (i, j) predict R. From a
3. Latent FM CF TF RRN YT NCF PNN Conext-RNN Time-LSTM
Cross RNN [22, 34] [29, 32, 48] [17, 37, 46] [21, 43] [11] [20] [19, 33] [41] [49]
Neural Network ✓ × × × ✓ ✓ ✓ ✓ ✓ ✓
Models Sequences ✓ × × × ✓ × × × ✓ ✓
Uses Context ✓ ✓ ✓ ✓ time ✓ × ✓ ✓ time
Multiplicative User/Item ✓ ✓ ✓ ✓ ✓ × ✓ ✓ ✓ ✓
Multiplicative Context ✓ ✓ ✓ ✓ × × × ✓ × time
Table 1: Relationship with related recommenders: We bridge the intuition and insights from contextual collaborative filtering
with the power of recurrent recommender systems.
Symbol Description
e Tuple of k values describing an observed event
eℓ Element ℓ in the tuple
E Set of all observed events
ui , vj Trainable embeddings of user i and item j
Xi All events for user i
Xi,t All events for user i before time t
e(τ ) Event at step τ in a particular sequence
⟨·⟩ k-way inner product
∗ Element-wise product
f (·) An arbitrary neural network
Table 2: Notation
machine learning perspective, we can split our tuple e into features
x and label y such that x = (i, j) and label y = R.
We could further re-frame the recommendation problem as pre-
dicting which video a user will watch at a given time by defining
x = (i,t) and y = j. Note, again, depending on if the label is cate-
gorical random value, e.g., video ID, or real value, e.g., rating, the
machine learning problem is either a classification or regression
problem, respectively.
In factorization models, all input values are considered to be
discrete, and are embedded and multiplied. When we “embed” a
discrete value, we learn a dense latent representation, e.g., user
i is described by dense latent vector ui and item j is described
by dense latent vector vj . In matrix factorization models, predic-
tions are generally based on ui · vj . In tensor factorization mod-
els, predictions are based on
Í
r ui,rvj,rwt,r , where wt would be
a dense vector embedding of time or some other contextual fea-
ture. See factorization machines [34] for a clean and clear abstrac-
tion for these types of models. For notational simplicity we will
consider ⟨·⟩ to represent a multi-dimensional inner-product, i.e.,
⟨ui,vj,wt ⟩ =
Í
r ui,rvj,rwt,r .
Neural networks typically also embed discrete inputs. That is,
given an input (i, j) the input to the network will be x = [ui ;vj ]
where ui and vj are concatenated and as before are parameters
that are trainable (in the case of neural networks, through back-
propagation). Thus, we consider neural networks of the form eℓ =
f (eℓ̄), where the network takes all but one value of the tuple as
the input and we train f to predict the last value of the tuple. We
will later expand this definition to allow the model to take rele-
vant previous events also as inputs to the network, as in sequence
models.
4 MOTIVATION: CHALLENGES IN
FIRST-ORDER DNN
In order to understand how neural recommenders make use of
concatenated features, we begin by inspecting the typical building
blocks of these networks. As alluded to above, neural networks,
especially feed-forward DNNs, are often built on first-order opera-
tions. More precisely stated, neural networks often rely on matrix-
vector products of the formWh whereW is a learned weight matrix
and h is an input (either an input to the network, or the output of a
previous layer). In feed-forward networks, fully-connected layers
are generally of the form:
hτ = д(Wτ hτ −1 + bτ ) (1)
where д is an element-wise operation such as a sigmoid or ReLU,
hτ −1 is the output of the last layer, and Wτ and bτ are learned
parameters. We consider this to be a first-order cell in that hτ −1,
which is a k-dimensional vector, only has its different values added
together, with weights based on W , but never multiplied together.
Although neural networks with layers like these have been
shown to be able to approximate any function, their core compu-
tation is structurally significantly different from the past intuition
on collaborative filtering. As described above, matrix factorization
models take the general form ui · vj , resulting in the model learn-
ing low-rank relationships between the different types of inputs,
i.e., between users, items, time, etc. Given that low-rank models
have been successful in recommender systems, we ask the follow-
ing question: How well can neural networks with first-order cells
model low-rank relations?
4.1 Modeling Low-Rank Relations
To test whether first-order neural networks can model low-rank
relations, we generate synthetic low-rank data and study how well
different size neural networks can fit that data. To be more precise,
we consider an m-mode tensor where each dimension is of size
N. For the mN discrete features we generate random vectors ui of
length r using a simple equation:
ui ∼ N
0,
1
r1/2m
I
(2)
The result is that our data is a rank r matrix or tensor with approx-
imately the same scale (with mean of 0, and an empirical variance
close to 1). As an example, withm = 3, we can use these embeddings
to represent events of the form (i, j,t, ⟨ui,uj,ut ⟩).
We try to fit models of different sizes using this data. In particu-
lar, we consider a model with the discrete features embedded and
concatenated as inputs. The model has one hidden layer with ReLU
4. Hidden Layer r = 1, m = 2 r = 1, m = 3 r = 2, m = 3
1 0.42601 0.27952 0.287817
2 0.601657 0.57222 0.472421
5 0.997436 0.854734 0.717233
10 0.999805 0.973214 0.805508
20 0.999938 0.996618 0.980821
30 0.999983 0.99931 0.975782
50 0.999993 0.999738 0.997821
100 0.999997 0.999928 0.99943
Table 3: Pearson correlation for different width models
when fitting low-rank data.
activation, as is common in neural recommender systems, followed
by a final linear layer. The model is programmed in TensorFlow
[1], trained with mean squared error loss (MSE) using the Adagrad
optimizer [16], and trained to convergence. We measure and report
the model’s accuracy by the Pearson correlation (R) between the
training data and the model’s predictions. We use the Pearson cor-
relation so that it is invariant to slight differences in variance of the
data. We report the accuracy against the training data because we
are testing how well these model structures fit low-rank patterns
(i.e., not even whether they can generalize from it).
To model low-rank relations, we want to see how well the model
can approximate individual multiplications, representing interac-
tion between variables. All data is generated with N = 100. With
m = 2, we examine how large the hidden layer must be to multiply
two scalars, and with m = 3 we examine how large the hidden layer
must be to multiply three scalars. We use r ∈ {1, 2} to see how size
of the model grows as more multiplications are needed. We embed
each discrete feature as a 20-dimensional vector, much larger than r
(but we found the model’s accuracy to be independent of this size).
We test with hidden layers ∈ {1, 2, 5, 10, 20, 30, 50}.
Empirical Findings. As can be seen in Table 3 and Figure 2, we
find that the model continually approximates the data better as
hidden layer size grows. Given the intuition that the network is
approximating the multiplication, a wider network should give
a better approximation. Second, we observe that, as we increase
the rank r of the data from 1 to 2, we see the hidden layer size
approximately doubles to get the same accuracy. This too matches
our intuition, as increasing r means there are more interactions to
add—something the network can easily do exactly.
More interestingly, we find that even for r = 1 and m = 2, it
takes a hidden layer of size 5 to get a “high” accuracy estimate.
Considering collaborative filtering models will often discover rank
200 relations [28], this intuitively suggests that real world models
would require very wide layers for a single two-way relation to be
learned.
Additionally, we find that modeling more than 2-way relations
increases the difficulty of approximating the relation. That is, when
we go from m = 2 to m = 3 we find that the model goes from
needing a width 5 hidden layer to a width 20 hidden layer to get an
MSE of approximately 0.005 or a Pearson correlation of 0.99.
In summary, we observe that ReLU layers can approximate mul-
tiplicative interactions (crosses) but are quite inefficient in doing so.
0 20 40 60 80 100
Hidden Layer Width
10 5
10 4
10 3
10 2
10 1
100
1
-
Pearson
Correlation
(1-R)
r=1, m=2
r=1, m=3
r=2, m=3
Figure 2: ReLU layers can learn to approximate low-rank re-
lations, but are inefficient in doing so.
This motivates the need for models that can more easily express,
and deal with, multiplicative relations. We now turn our attention to
using an RNN as a baseline; this is a stronger baseline in that it can
better express multiplicative relations compared to feed-forward
DNNs.
5 YOUTUBE’S RECURRENT RECOMMENDER
With the above analysis as motivation, we now describe the im-
provements to YouTube’s RNN recommender system. RNNs are
notable as a baseline model because they are already second-order
neural networks, significantly more complex than the first-order
models explored above, and are at the cutting edge of dynamic
recommender systems.
We begin with an overview of the RNN recommender we built
for YouTube and then in Section 6 describe how we improve it to
better make use of contextual data.
5.1 Formal Description
In our setting, we observe events of the form user i has watched
video j (uploaded by user ψ(j)) at time t. (We will later introduce
additional contextual features.) In order to model the evolution of
user preferences and behavior, we use a recurrent neural network
(RNN) model, where the input to the model is the set of events for
user Xi = {e = (i, j,ψ(j),t) ∈ E|e0 = i}. We will use Xi,t to denote
all watches before t for user Xi :
Xi,t = {e = (i, j,t) ∈ E|e0 = i ∧ e3 t} ⊂ Xi . (3)
The model is trained to produce sequential prediction Pr(j|i,t, Xi,t ),
i.e., the video j that user i will watch at a given time t based on all
watches before t. For the sake of simplicity, we will use e(τ ) to de-
note the τth event in the sequence, x(τ ) to denote the transformed
input for e(τ ), and y(τ ) to denote the label trying to be predicted
for the τth event. In the example above, if e(τ ) = (i, j,ψ(j),t) and
e(τ +1) = (i, j′,ψ(j′),t′) then the input x(τ ) = [vj ;uψ (j);wt ], which
is used to predict y(τ +1) = j′, where vj is the video embedding,
uψ (j) is the uploader embedding, and wt is the context embedding.
5. When predicting y(τ ), we of course cannot use the label of the
corresponding event e(τ ) as an input, but we can use context from
e(τ ), which we will denote by c(τ ), e.g., c(τ ) = [wt ].
5.2 Structure of the Baseline RNN Model
A diagram of our RNN model can be seen in Figure 1, and described
below. Recurrent neural networks model a sequence of actions.
With each evente(τ ), the model takes a step forward, processingx(τ )
and updating a hidden state vector z(τ −1). To be more precise, each
event is first processed by a neural network h
(τ )
0 = fi (x(τ )). In our
setting, this will either be an identity function or fully-connected
ReLU layers.
The recurrent part of the network is a function h
(τ )
1 ,z(τ ) =
fr (h
(τ )
0 ,z(τ −1)). That is, we use a recurrent cell, such as an LSTM [23]
or GRU [10], that takes as an input the state from the previous step
and the transformed input fi (x(τ )).
To predict y(τ ), we use fo(h
(τ −1)
1 ,c(τ )), which is another train-
able neural network that produces a probability distribution over
possible values of y(τ ). In our setting, this network takes the output
of the RNN as its input and context of the upcoming prediction,
and finally ends with a softmax layer over all videos. This network
can include multiple fully-connected layers.
5.3 Context Features
Core to the success of this model is the incorporation of contextual
data beyond just the sequence of videos watched. We discuss below
how we utilize these features.
TimeDelta. In our system, incorporating time effectively was
highly valuable to the accuracy of our RNN. Historically, time
context has been incorporated into collaborative filtering models
in a variety of ways. Here we use an approach we call timedelta:
∆t(τ )
= log
t(τ +1)
− t(τ )
(4)
That is, when considering event e(τ ) we consider how long until the
next event or prediction to be made. This is essentially equivalent
to time representation described in [25] and [49].
Software Client. YouTube videos can be watched on a variety of
devices: in browser, iOS, Android, Roku, Chromecast, etc. Treating
these contexts as equivalent misses relevant correlations. For ex-
ample, users are probably less likely to watch a full-length feature
film on their phone than through a Roku device. Similarly, short
videos like trailers may be relatively more likely to be watched
on a phone. Modeling the software client, and in particular how it
interacts with the watch decisions, is important.
Page. We also record where a watch initiated from in our system.
For example, we distinguish between watches that starts from the
homepage (i.e. Home Page Watches) and watches that initiate as
recommended follow-up watches once a user is already watching a
video (i.e. Watch Next Watches). This is important as watches from
the homepage may be more open to new content, whereas watches
following a previous watch may be due to users wanting to dig into
a topic more deeply.
Pre- and Post-Fusion. We can use these context features, which
we collectively refer to as c(τ ), as direct inputs in two ways. As
can be seen in Figure 1, we can include context as an input at the
bottom of the network, or concatenated with the output of the
RNN cell. We refer to the inclusion of context features before the
RNN as pre-fusion, and the inclusion of context features after the
RNN cell as post-fusion [12]. Although possibly a subtle point, this
decision can have a significant effect on the RNN. In particular, by
including a feature through pre-fusion, that feature will affect the
prediction through how it modifies the state of the RNN. However,
by including a feature through post-fusion, that feature can more
directly have an effect on the prediction at that step.
To manage this, when predicting y(τ ), we generally use c(τ ) as
a post-fusion feature, and use c(τ −1) as a pre-fusion feature. This
means that c(τ −1) will effect the RNN state but c(τ ) will be used
for predicting y(τ ). Subsequently, at the next step when predicting
y(τ +1), c(τ ) will now be a pre-fusion feature effecting the state of
the RNN from that time forward.
5.4 Implementation Training
Our model is implemented in TensorFlow [1] and trained over many
distributed workers and parameter servers. The training uses one
of the available back-propagated mini-batch stochastic gradient
descent algorithms, either Adagrad [16] or ADAM [26]. During
training we use as supervision the 100 most recent watches during
the period (t0 − 7 days,t0], where t0 is the time of training. This
generally prioritizes recent watches because the behavior is more
similar to the prediction task when the learned model will be applied
to live traffic.
Due to the large number of videos available, we limit our set of
possible videos to predict as well as the number of uploaders of
those videos that we model. In the experiments below, these sets
range in size from 500,000 to 2,000,000. The softmax layer, which
covers this set of candidate videos, is trained using sampled softmax
with 20,000 negative samples per batch. We use the predictions of
this sampled softmax in the cross entropy loss against all labels.
6 CONTEXT MODELING WITH THE LATENT
CROSS
As should be clear in the above description of our baseline model,
the use of contextual features is typically done as concatenated
inputs to simple fully-connected layers. However, as we explained
in Section 4, neural networks are inefficient in modeling the inter-
actions between concatenated input features. Here we propose a
simple alternative.
6.1 Single Feature
We begin with the case where we have a single context feature
that we want to include. For sake of clarity, we will use time as an
example context feature. Rather than incorporating the feature as
another input concatenated with the other relevant features, we
perform an element-wise product in the middle of the network.
That is, we perform:
h
(τ )
0 = (1 + wt ) ∗ h
(τ )
0 (5)
6. where we initialize wt by a 0-mean Gaussian (note, w = 0 is an
identity). This can be interpreted as the context providing a mask
or attention mechanism over the hidden state. However, it also
enables low-rank relations between the input previous watch and
the time. Note, we can also apply this operation after the RNN:
h
(τ )
1 = (1 + wt ) ∗ h
(τ )
1 . (6)
The technique offered in [27] can be viewed as a special case
where multiplicative relations are included at the very top of the
network along with the softmax function to improve NLP tasks. In
that case, this operation can be perceived as a tensor factorization
where the embedding for one modality is produced by a neural
network.
6.2 Using Multiple Features
In many cases we have more than one contextual feature that we
want to include. When including multiple contextual features, say
time t and device d, we perform:
h(τ )
= (1 + wt + wd ) ∗ h(τ )
(7)
We use this form for a few different reasons: (1) By initializing both
wt andwd by 0-mean Gaussians, the multiplicative term has a mean
of 1 and thus can similarly act as a mask/attention mechanism over
the hidden state. (2) By adding these terms together we can capture
2-way relations between the hidden state and each context feature.
This follows the perspective taken in the design of factorization
machines [34]. (3) Using a simple additive function is easy to train.
A more complex function like wt ∗ wd ∗ h(τ ) will increase the
non-convexity significantly with each additional contextual feature.
Similarly we found learning a function f ([wt ;wd ]) to be more
difficult to train and to give worse results. An overview of including
these features in a model can be seen in Figure 1.
Efficiency. We note that one significant benefit of using latent
crosses is their simplicity and computational efficiency. With N
context features and d-dimensional embeddings, the latent cross
can be computed in O(Nd) and does not increase the width of the
subsequent layers.
7 EXPERIMENTS
We perform two sets of experiments. The first is on a restricted
dataset where time is the only contextual feature, and we compare
several model families. In the second set of experiments we use our
production model and explore the relative improvements based on
how we incorporate context features.
7.1 Comparative Analysis
7.1.1 Setup. We begin with an explanation of our experimental
setup.
Dataset and Metrics. We use a dataset with sequences of watches
for hundreds of millions users. The users are split into training,
validation and test sets, with both validation and test sets having
tens of millions of users. Watches are restricted to a set of 500,000
popular videos, and all users have at least 50 watches in their se-
quence. The sequence is given by a list of watched videos and the
timestamp of each watch.
Method Precision@1 MAP@20
RNN with ∆t Latent Cross 0.1621 0.0828
RRN (Concatenated ∆t) 0.1465 0.0753
RNN (Plain, no time) 0.1345 0.0724
Bag Of Words 0.1250 0.0707
Bag of Words with time 0.1550 0.0794
Paragraph Vectors 0.1123 0.0642
Cowatch 0.1204 0.0621
Table 4: Results for Comparative Study: RNN with a latent
cross performs the best.
The task is to predict the last 5 watches in the user’s sequence.
To measure this, we use Mean-Average-Precision-at-k (MAP@k)
for k = 1 and k = 20 on the test set.
Model. For this set of experiments we use an RNN with an LSTM
recurrent unit. We have no ReLU cells before or after the recur-
rent unit, and use a pre-determined hierarchical softmax (HSM)
to predict the videos. Here, we use all but the first watch in the
sequence as supervision during training. The model is trained using
back-propagation with ADAM [26].
Since time is the only contextual feature in this data set, we
use the video embedding vj as the input and perform the latent
cross with the timedelta value w∆t such that the LSTM is given
vj ∗ w∆t . This is an example of a pre-fusion cross. We will call this
RNNLatentCross.
Baselines. We compare the RNNLatentCross model described
above to highly tuned models of alternative forms:
• RRN: Use [vj ;w∆t ] as the input to the RNN; similar to [43]
and [41].
• RNN: RNN directly over vj (without time); similar to [21].
• BOW: Bag of words model over the set of videos in the user’s
history and user demographics.
• BOW+Time: 3-layer feed-forward model taking as input a
concatenation of a bag-of-watches, each of the last three
videos watched, ∆t, and the time of the week of the request.
The model is trained with a softmax over the 50 videos most
co-watched with the last watch.
• Paragraph Vector (PV): Use [13] to learn unsupervised em-
beddings of each user (based on user demographics and
previous watches). Use the learned embeddings as well as
an embedding of the last watch as input to a 1-layer feed
forward classifier trained with sampled softmax.
• Cowatch: Predict the most common co-watched videos based
on the last watch in the sequence.
Unless otherwise specified, all models have a hierarchical softmax.
All models and their hyperparameters are tuned over the course of
a modeling competition. Note, only BagOfWords and Paragraph
Vector make use of user demographic data.
7.1.2 Results. We report our results for this experiment in Table
4. As can be seen there, our model, using the RNN with ∆t having
a latent cross with the watch, gives the best result for both Pre-
cision@1 and MAP@20. Possibly even more interestingly, is the
relative performance of the models. We observe in both the bag of
7. words models and the RNN models the critical importance of mod-
eling time. Further, observe that the improvement from performing
a latent cross instead of just concatenating ∆t is greater than the
improvement from including ∆t as an input feature at all.
7.2 YouTube’s Model
Second, we study multiple variants of our production model against
a larger, more unrestricted dataset.
7.2.1 Setup. Here, we use a production dataset of user watches,
which is less restrictive than the above setting. Our sequences are
composed of the video that was watched and who created the video
(uploader). We use a larger vocabulary on the order of millions of
recently popular uploaded videos and uploaders.
We split the dataset into a training and test set based jointly on
users and time. First, we split users into two sets: 90% of our users
are in our training set and 10% in our test set. Second, to split by
time, we select a time cut-off t0 and during training only consider
watches from before t0. During testing, we consider watches from
after t0 + 4 hours. Similarly, the vocabulary of videos is based on
data from before t0.
Our model consists of embedding and concatenating all of the
features defined above as inputs, followed by a 256-dimensional
ReLU layer, a 256-dimensional GRU cell, and then another 256-
dimensional ReLU layer, before being fed into the softmax layer. As
described previously, we use the 100 most recent watches during
the period (t0 − 7 days,t0] as supervision. Here, we train using the
Adagrad optimizer [16] over many workers and parameter servers.
To test our model, we again measure the mean-average-precision-
at-k. For watches that are not in our vocabulary, we always mark
the prediction as incorrect. The evaluation MAP@k scores reported
here are measured using approximately 45,000 watches.
7.2.2 The Value of Page as Context. We begin analyzing the
accuracy improvements by incorporating Page in different ways. In
particular, we compare not using Page, using Page as an input con-
catenated with the other inputs, and performing a post-fusion latent
cross with Page. (Note, when we include page as a concatenated
feature, it is concatenated during both pre-fusion and post-fusion.)
As can be seen in Figure 3, using Page with a latent cross offers
the best accuracy. Additionally, we see that using both the latent
cross and the concatenated input offers no additional improvement
in accuracy, suggesting that the latent cross is sufficient to capture
the relevant information that would be obtained through using the
feature as a direct input.
7.2.3 Total Improvement. Last, we test how adding latent crosses
on top of the full production model effects the accuracy. In this case,
with each watch the model knows the page, the device type, the
time, how long the video was watched for (watch time), how old the
watch is (watch age), and the uploader. In particular, our baseline
YouTube model uses the page, device, watch time, and timedelta
values as pre-fusion concatenated features, and also uses the page,
device, and watch age as post-fusion concatenated features.
We test including timedelta and page as pre-fusion latent crosses,
as well as device type and page as post-fusion latent crosses. As
can be seen in Figure 4, although all of these features were already
included through concatenation, including them as latent crosses
provides an improvement in accuracy over the baseline model. This
also demonstrates the ability for pre-fusion and post-fusion with
multiple features to work together and provide a strong accuracy
improvement.
8 DISCUSSION
We explore below a number of questions raised by this work and
implications for future work.
8.1 Discrete Relations in DNNs
While much of this paper has focused on enabling multiplicative
interactions between features, we found that neural networks can
also approximate discrete interactions, an area where factorization
models have more difficulty. As an example, in [46] the authors find
that modeling when user i performs action a on item j, ⟨u(i,a),vj ⟩
has better accuracy than ⟨ui,vj,wa⟩. However, discovering that
indexing users and actions together performs better is difficult,
requiring data insights.
Similar to the experiments in Section 4, we generate synthetic
data following the pattern Xi,j,a = ⟨u(i,a),vj ⟩ and test how well
different network architectures predict Xi,j,a given i, j and a are
only concatenated as independent inputs. We initialize u ∈ R10000
and v ∈ R100 as vectors, such that X is a rank-1 matrix. We follow
the same general experimental procedure as in Section 4, measuring
the Pearson correlation (R) for networks with varying number
of hidden layers and varying width to those hidden layers. (We
train these networks with a learning rate of 0.01, ten-times smaller
than the learning rate used above.) As a baseline, we also measure
the Pearson correlation for tensor factorization (⟨ui,vj,wa⟩) for
different ranks.
As can be seen in Figure 5, deep models, in some cases, attain
a reasonably high Pearson correlation, suggesting that they are
in fact able to approximate discrete crosses. Also interestingly,
learning these crosses requires deep networks with wide hidden
layers, particularly large for the size of the data. Additionally, we
find that these networks are difficult to train.
These numbers are interesting relative to the baseline tensor fac-
torization performance. We observe that the factorization models
can approximate the data reasonably well, but requires relatively
high rank. (Note, even if the underlying tensor is full rank, a fac-
torization of rank 100 would suffice to describe it.) However, even
at this high rank, the tensor factorization models require fewer
parameters than the DNNs and are easier to train. Therefore, as
with our results in Section 5, DNNs can approximate these patterns,
but doing so can be difficult, and including low-rank interactions
can help in providing easy-to-train approximations.
8.2 Second-Order DNNs
A natural question to ask when reading this paper is why not
try much wider layers, make the model deeper, or more second-
order units, like GRUs and LSTMs? All of these are reasonable
modeling decisions, but in our experience make training of the
model significantly more difficult. One of the strengths of this
approach is that it is easy to implement and train, while still offering
clear performance improvements, even when applied in conjunction
with other second-order units like LSTMs and GRUs.
8. 0 500000 1000000 1500000 2000000 2500000 3000000
Step
0.00
0.02
0.04
0.06
0.08
MAP@1
Baseline
Page Concatenated
Page Latent Cross
Page Latent Cross Concat
0 500000 1000000 1500000 2000000 2500000 3000000
Step
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
MAP@20
Baseline
Page Concatenated
Page Latent Cross
Page Latent Cross Concat
Figure 3: Accuracy from using page as either input or a latent cross feature (reported over training steps).
0 1000000 2000000 3000000 4000000
Step
0.00
0.02
0.04
0.06
0.08
MAP@1
Baseline with all features concatenated
Include All Latent Crosses
0 1000000 2000000 3000000 4000000
Step
0.00
0.02
0.04
0.06
0.08
0.10
0.12
MAP@20
Baseline with all features concatenated
Include All Latent Crosses
Figure 4: Accuracy from using all latent cross features against a baseline of using all possible concatenated input features.
1 2 3 4 5
Network Depth
0.0
0.2
0.4
0.6
0.8
1.0
Pearson
Correlation
(R)
DNN Width=10
DNN Width=50
Tensor Factorization rank=10
Tensor Factorization rank=50
Figure 5: Sufficiently large DNNs can learn to approximate
discrete interactions.
The growing trend throughout deep learning appears to be using
more second-order interactions. For example, this is common in
attention models and memory networks, as list above. While these
are even more difficult to train, we believe this work shows the
promise in that direction for neural recommender systems.
9 CONCLUSION
In this paper we have explored how to incorporate contextual data
in a production recurrent recommender system at YouTube. In
particular, this paper makes following contributions:
• Challenges of First-Order DNNs: We found feed-forward
neural networks to be inefficient in modeling multiplicative
relations (crosses) between features.
• Production Model: We offer a detailed description of our
RNN-based recommender system used at YouTube.
• Latent Cross: We offer a simple technique for learning mul-
tiplicative relations in DNNs, including RNNs.
• Empirical Results: We demonstrate in multiple settings
and with different context features that latent crosses im-
prove recommendation accuracy, even on top of complex,
state-of-the-art RNN recommenders.
9. REFERENCES
[1] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey
Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, and oth-
ers. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings
of the 12th USENIX Symposium on Operating Systems Design and Implementation
(OSDI). Savannah, Georgia, USA.
[2] Amr Ahmed, Liangjie Hong, and Alexander J Smola. 2013. Hierarchical geo-
graphical modeling of user locations from social media posts. In Proceedings of
the 22nd international conference on World Wide Web (WWW). ACM, 25–36.
[3] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural ma-
chine translation by jointly learning to align and translate. arXiv preprint
arXiv:1409.0473 (2014).
[4] James Bennett, Stan Lanning, and others. 2007. The netflix prize. In Proceedings
of KDD cup and workshop, Vol. 2007. New York, NY, USA, 35.
[5] Alex Beutel, Ed H Chi, Zhiyuan Cheng, Hubert Pham, and John Anderson. 2017.
Beyond Globally Optimal: Focused Learning for Improved Recommendations.
In Proceedings of the 26th International Conference on World Wide Web (WWW).
ACM.
[6] Pedro G Campos, Fernando Díez, and Iván Cantador. 2014. Time-aware recom-
mender systems: a comprehensive survey and analysis of existing evaluation
protocols. User Modeling and User-Adapted Interaction 24, 1-2 (2014), 67–119.
[7] Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra,
Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, and
others. 2016. Wide deep learning for recommender systems. In Proceedings of
the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7–10.
[8] Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You are where you tweet:
a content-based approach to geo-locating twitter users. In Proceedings of the 19th
ACM international conference on Information and knowledge management. ACM,
759–768.
[9] Evangelia Christakopoulou and George Karypis. 2016. Local Item-Item Mod-
els For Top-N Recommendation. In Proceedings of the 10th ACM Conference on
Recommender Systems (RecSys). ACM, 67–74.
[10] Junyoung Chung, Caglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. 2015.
Gated Feedback Recurrent Neural Networks.. In ICML. 2067–2075.
[11] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep Neural Networks
for YouTube Recommendations. In Proceedings of the 10th ACM Conference on
Recommender Systems (RecSys). ACM, 191–198.
[12] Bin Cui, Anthony KH Tung, Ce Zhang, and Zhe Zhao. 2010. Multiple feature
fusion for social media applications. In Proceedings of the 2010 ACM SIGMOD
International Conference on Management of data. ACM, 435–446.
[13] Andrew M Dai, Christopher Olah, and Quoc V Le. 2015. Document embedding
with paragraph vectors. arXiv preprint arXiv:1507.07998 (2015).
[14] Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. 2016. Language
modeling with gated convolutional networks. arXiv preprint arXiv:1612.08083
(2016).
[15] Nan Du, Yichen Wang, Niao He, Jimeng Sun, and Le Song. 2015. Time-sensitive
recommendation from recurrent user activities. In Advances in Neural Information
Processing Systems. 3492–3500.
[16] John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods
for online learning and stochastic optimization. Journal of Machine Learning
Research 12, Jul (2011), 2121–2159.
[17] Hancheng Ge, James Caverlee, and Haokai Lu. 2016. TAPER: A contextual
tensor-based approach for personalized expert recommendation. (2016).
[18] Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv
preprint arXiv:1308.0850 (2013).
[19] Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for Sparse
Predictive Analytics. In Proceedings of the 40th International ACM SIGIR Conference
on Research and Development in Information Retrieval (SIGIR ’17). ACM, New York,
NY, USA, 355–364. DOI:https://doi.org/10.1145/3077136.3080777
[20] Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng
Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th International
Conference on World Wide Web. International World Wide Web Conferences
Steering Committee, 173–182.
[21] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk.
2015. Session-based recommendations with recurrent neural networks. arXiv
preprint arXiv:1511.06939 (2015).
[22] Balázs Hidasi and Domonkos Tikk. 2016. General factorization framework for
context-aware recommendations. Data Mining and Knowledge Discovery 30, 2
(2016), 342–371.
[23] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural
computation 9, 8 (1997), 1735–1780.
[24] Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative filtering for
implicit feedback datasets. In ICDM.
[25] How Jing and Alexander J. Smola. 2017. Neural Survival Recommender. In
Proceedings of the Tenth ACM International Conference on Web Search and Data
Mining (WSDM). 515–524.
[26] Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimiza-
tion. arXiv preprint arXiv:1412.6980 (2014).
[27] Ryan Kiros, Richard Zemel, and Ruslan R Salakhutdinov. 2014. A multiplicative
model for learning distributed text-based attribute representations. In Advances
in neural information processing systems. 2348–2356.
[28] Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted
collaborative filtering model. In KDD. ACM, 426–434.
[29] Yehuda Koren. 2010. Collaborative filtering with temporal dynamics. Commun.
ACM 53, 4 (2010), 89–97.
[30] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Tech-
niques for Recommender Systems. Computer 42, 8 (Aug. 2009), 30–37. DOI:
https://doi.org/10.1109/MC.2009.263
[31] Joonseok Lee, Seungyeon Kim, Guy Lebanon, and Yoram Singer. 2013. Local Low-
Rank Matrix Approximation. In Proceedings of the 30th International Conference
on Machine Learning (ICML). 82–90. http://jmlr.org/proceedings/papers/v28/
lee13.html
[32] Haokai Lu and James Caverlee. 2015. Exploiting geo-spatial preference for
personalized expert recommendation. In Proceedings of the 9th ACM Conference
on Recommender Systems (RecSys). ACM, 67–74.
[33] Yanru Qu, Han Cai, Kan Ren, Weinan Zhang, Yong Yu, Ying Wen, and Jun Wang.
2016. Product-based neural networks for user response prediction. In Data Mining
(ICDM), 2016 IEEE 16th International Conference on. IEEE, 1149–1154.
[34] Steffen Rendle. 2012. Factorization Machines with libFM. ACM TIST 3, 3, Article
57 (May 2012), 22 pages.
[35] Ruslan Salakhutdinov and Andriy Mnih. 2008. Bayesian probabilistic matrix
factorization using Markov chain Monte Carlo. In ICML. ACM, 880–887.
[36] Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015.
Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th
International Conference on World Wide Web (WWW). ACM, 111–112.
[37] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic,
and Nuria Oliver. 2012. TFMAP: optimizing MAP for top-n context-aware rec-
ommendation. In Proceedings of the 35th international ACM SIGIR conference on
Research and development in information retrieval. ACM, 155–164.
[38] Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, and others. 2015. End-to-
end memory networks. In Advances in neural information processing systems.
2440–2448.
[39] Yong Kiam Tan, Xinxing Xu, and Yong Liu. 2016. Improved recurrent neural
networks for session-based recommendations. In Proceedings of the 1st Workshop
on Deep Learning for Recommender Systems. ACM, 17–22.
[40] Duyu Tang, Bing Qin, Ting Liu, and Yuekui Yang. 2015. User Modeling with
Neural Network for Review Rating Prediction.. In IJCAI. 1340–1346.
[41] Bartlomiej Twardowski. 2016. Modelling Contextual Information in Session-
Aware Recommender Systems with Neural Networks.. In RecSys. 273–276.
[42] Manasi Vartak, Hugo Larochelle, and Arvind Thiagarajan. 2017. A Meta-Learning
Perspective on Cold-Start Recommendations for Items. In Advances in Neural
Information Processing Systems. 6888–6898.
[43] Chao-Yuan Wu, Amr Ahmed, Alex Beutel, Alexander J. Smola, and How Jing. 2017.
Recurrent Recommender Networks. In Proceedings of the Tenth ACM International
Conference on Web Search and Data Mining (WSDM). 495–503.
[44] Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, and Ruslan R Salakhut-
dinov. 2016. On multiplicative integration with recurrent neural networks. In
Advances in Neural Information Processing Systems. 2856–2864.
[45] Chunfeng Yang, Huan Yan, Donghan Yu, Yong Li, and Dah Ming Chiu. 2017.
Multi-site User Behavior Modeling and Its Application in Video Recommendation.
In Proceedings of the 40th International ACM SIGIR Conference on Research and
Development in Information Retrieval. ACM, 175–184.
[46] Zhe Zhao, Zhiyuan Cheng, Lichan Hong, and Ed H Chi. 2015. Improving User
Topic Interest Profiles by Behavior Factorization. In Proceedings of the 24th Inter-
national Conference on World Wide Web (WWW). 1406–1416.
[47] Lei Zheng, Vahid Noroozi, and Philip S Yu. 2017. Joint deep modeling of users
and items using reviews for recommendation. In Proceedings of the Tenth ACM
International Conference on Web Search and Data Mining (WSDM). ACM, 425–434.
[48] Yong Zheng, Bamshad Mobasher, and Robin Burke. 2014. CSLIM: Contextual
SLIM recommendation algorithms. In Proceedings of the 8th ACM Conference on
Recommender Systems. ACM, 301–304.
[49] Yu Zhu, Hao Li, Yikang Liao, Beidou Wang, Ziyu Guan, Haifeng Liu, and Deng Cai.
2017. What to Do Next: Modeling User Behaviors by Time-LSTM. In Proceedings of
the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17.
3602–3608. DOI:https://doi.org/10.24963/ijcai.2017/504