We study the data mining problem of modeling adoptions and the stages of the diffusion of an innovation. For our aim we propose a stochastic model which decomposes a diffusion trace (sequence of adoptions) in an ordered sequence of stages, where each stage is intuitively built around two dimensions: users and relative speed at which adoptions happen. Each stage is characterized by a specific rate of adoption and it involves different users to different extent, while the sequentiality in the diffusion is guaranteed by constraining the transition probabilities among stages.
An empirical evaluation on synthetic and real-world adoption logs shows the effectiveness of the proposed framework in summarizing the adoption process, enabling several analysis tasks such as the identification of adopter categories, clustering and characterization of diffusion traces, and prediction of which users will adopt an item in the next future.
Community search is the problem of finding a good community for a given set of query vertices.
In this work we propose a novel method that it is in general more efficient and effective than state-of-the art, it can handle multiple query vertices, it yields optimal communities, and it is parameter free.
Community search is the problem of finding a good community for a given set of query vertices.
In this work we propose a novel method that it is in general more efficient and effective than state-of-the art, it can handle multiple query vertices, it yields optimal communities, and it is parameter free.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
Abstract: Tracking and detecting the crowd is the main problem in the current era hence we are making video scenes method .Detection of many individual objects has been improved over recent years. It has been challenging to detect and track the tasks due to occlusions and variation in people appearance. Facing these challenges, we suggest to leverage information on the global structure of the scenes and to resolve jointly. We explore the constraints of the crowd density and detection of optimization using joint energy function. We show how the optimization of such energy function improves to track and detect in floating crowds. We validate our approach on a challenging video dataset of crowded scenes. The addition of different features which is relevant to tracking peoples such as movement, size, height and the observation models in the particle filters and followed by a clustering methods. It minimizes the communication cost and Data Retrieval is easy.
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016MLconf
Alex Smola is the Manager of the Cloud Machine Learning Platform at Amazon. Prior to his role at Amazon, Smola was a Professor in the Machine Learning Department of Carnegie Mellon University and cofounder and CEO of Marianas Labs. Prior to that he worked at Google Strategic Technologies, Yahoo Research, and National ICT Australia. Prior to joining CMU, he was professor at UC Berkeley and the Australian National University. Alex obtained his PhD at TU Berlin in 1998. He has published over 200 papers and written or coauthored 5 books.
Abstract summary
Personalization and Scalable Deep Learning with MXNET: User return times and movie preferences are inherently time dependent. In this talk I will show how this can be accomplished efficiently using deep learning by employing an LSTM (Long Short Term Model). Moreover, I will show how to train large scale distributed parallel models using MXNet efficiently. This includes a brief overview of key components of defining networks, of optimization, and a walkthrough of the steps required to allocate machines, and to train a model.
Title PDF: MATRIOSKA: A Multi-level Approach to Fast Tracking by Learning | ICIAP 2013
In this paper we propose a novel framework for the detection and tracking in real-time of unknown object in a video stream. We decompose the problem into two separate modules: detection and learning. The detection module can use multiple keypoint-based methods (ORB, FREAK, BRISK, SIFT, SURF and more) inside a fallback model, to correctly localize the object frame by frame exploiting the strengths of each method. The learning module updates the object model, with a growing and pruning approach, to account for changes in its appearance and extracts negative samples to further improve the detector performance. To show the effectiveness of the proposed tracking-by-detection algorithm, we present numerous quantitative results on a number of challenging sequences in which the target object goes through changes of pose, scale and illumination.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
A Survey on: Hyper Spectral Image Segmentation and Classification Using FODPSOrahulmonikasharma
The Spatial analysis of image sensed and captured from a satellite provides less accurate information about a remote location. Hence analyzing spectral becomes essential. Hyper spectral images are one of the remotely sensed images, they are superior to multispectral images in providing spectral information. Detection of target is one of the significant requirements in many are assuc has military, agriculture etc. This paper gives the analysis of hyper spectral image segmentation using fuzzy C-Mean (FCM)clustering technique with FODPSO classifier algorithm. The 2D adaptive log filter is proposed to denoise the sensed and captured hyper spectral image in order to remove the speckle noise.
Using AI Planning to Automate the Performance Analysis of SimulatorsRoland Ewald
Analyzing simulation algorithm performance is cumbersome: execute some runs, observe a performance metric, and analyze the results. Often, the results motivate follow-up experiments, which in turn may lead to additional experiments, and so on. This time-consuming and error-prone process can be automated with planning approaches from artificial intelligence, making simulator performance analysis more convenient and rigorous. This paper introduces ALeSiA, a prototypical system for automatic simulator performance analysis. It is independent of any specific simulation system and realizes a hypothesis-driven approach to evaluate performance.
Sequential Action Patterns in Collaborative Ontology Engineering Projects: A ...Philipp Singer
Simon Walk's talk at CIKM '14 about our paper titled "Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain"
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Postermultimediaeval
This paper details the participation of the UNED-UV group at the 2015 Retrieving Diverse Social Images Task. This year, our proposal is based on a multi-modal approach that firstly applies a textual algorithm based on Formal Concept Analysis (FCA) and Hierarchical Agglomerative Clustering (HAC) to detect the latent topics addressed by the images to diversify them according to these topics. Secondly, a Local Logistic Regression model, which uses the low level features and some relevant and non-relevant samples, is adjusted and estimates the relevance probability for all the images in the database.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
May 2015 talk to SW Data Meetup by Professor Hendrik Blockeel from KU Leuven & Leiden University.
With increasing amounts of ever more complex forms of digital data becoming available, the methods for analyzing these data have also become more diverse and sophisticated. With this comes an increased risk of incorrect use of these methods, and a greater burden on the user to be knowledgeable about their assumptions. In addition, the user needs to know about a wide variety of methods to be able to apply the most suitable one to a particular problem. This combination of broad and deep knowledge is not sustainable.
The idea behind declarative data analysis is that the burden of choosing the right statistical methodology for answering a research question should no longer lie with the user, but with the system. The user should be able to simply describe the problem, formulate a question, and let the system take it from there. To achieve this, we need to find answers to questions such as: what languages are suitable for formulating these questions, and what execution mechanisms can we develop for them? In this talk, I will discuss recent and ongoing research in this direction. The talk will touch upon query languages for data mining and for statistical inference, declarative modeling for data mining, meta-learning, and constraint-based data mining. What connects these research threads is that they all strive to put intelligence about data analysis into the system, instead of assuming it resides in the user.
Hendrik Blockeel is a professor of computer science at KU Leuven, Belgium, and part-time associate professor at Leiden University, The Netherlands. His research interests lie mostly in machine learning and data mining. He has made a variety of research contributions in these fields, including work on decision tree learning, inductive logic programming, predictive clustering, probabilistic-logical models, inductive databases, constraint-based data mining, and declarative data analysis. He is an action editor for Machine Learning and serves on the editorial board of several other journals. He has chaired or organized multiple conferences, workshops, and summer schools, including ILP, ECMLPKDD, IDA and ACAI, and he has been vice-chair, area chair, or senior PC member for ECAI, IJCAI, ICML, KDD, ICDM. He was a member of the board of the European Coordinating Committee for Artificial Intelligence from 2004 to 2010, and currently serves as publications chair for the ECMLPKDD steering committee.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
More Related Content
Similar to Modeling adoptions and the stages of the diffusion of innovations
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
Abstract: Tracking and detecting the crowd is the main problem in the current era hence we are making video scenes method .Detection of many individual objects has been improved over recent years. It has been challenging to detect and track the tasks due to occlusions and variation in people appearance. Facing these challenges, we suggest to leverage information on the global structure of the scenes and to resolve jointly. We explore the constraints of the crowd density and detection of optimization using joint energy function. We show how the optimization of such energy function improves to track and detect in floating crowds. We validate our approach on a challenging video dataset of crowded scenes. The addition of different features which is relevant to tracking peoples such as movement, size, height and the observation models in the particle filters and followed by a clustering methods. It minimizes the communication cost and Data Retrieval is easy.
Alex Smola, Director of Machine Learning, AWS/Amazon, at MLconf SF 2016MLconf
Alex Smola is the Manager of the Cloud Machine Learning Platform at Amazon. Prior to his role at Amazon, Smola was a Professor in the Machine Learning Department of Carnegie Mellon University and cofounder and CEO of Marianas Labs. Prior to that he worked at Google Strategic Technologies, Yahoo Research, and National ICT Australia. Prior to joining CMU, he was professor at UC Berkeley and the Australian National University. Alex obtained his PhD at TU Berlin in 1998. He has published over 200 papers and written or coauthored 5 books.
Abstract summary
Personalization and Scalable Deep Learning with MXNET: User return times and movie preferences are inherently time dependent. In this talk I will show how this can be accomplished efficiently using deep learning by employing an LSTM (Long Short Term Model). Moreover, I will show how to train large scale distributed parallel models using MXNet efficiently. This includes a brief overview of key components of defining networks, of optimization, and a walkthrough of the steps required to allocate machines, and to train a model.
Title PDF: MATRIOSKA: A Multi-level Approach to Fast Tracking by Learning | ICIAP 2013
In this paper we propose a novel framework for the detection and tracking in real-time of unknown object in a video stream. We decompose the problem into two separate modules: detection and learning. The detection module can use multiple keypoint-based methods (ORB, FREAK, BRISK, SIFT, SURF and more) inside a fallback model, to correctly localize the object frame by frame exploiting the strengths of each method. The learning module updates the object model, with a growing and pruning approach, to account for changes in its appearance and extracts negative samples to further improve the detector performance. To show the effectiveness of the proposed tracking-by-detection algorithm, we present numerous quantitative results on a number of challenging sequences in which the target object goes through changes of pose, scale and illumination.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
A Survey on: Hyper Spectral Image Segmentation and Classification Using FODPSOrahulmonikasharma
The Spatial analysis of image sensed and captured from a satellite provides less accurate information about a remote location. Hence analyzing spectral becomes essential. Hyper spectral images are one of the remotely sensed images, they are superior to multispectral images in providing spectral information. Detection of target is one of the significant requirements in many are assuc has military, agriculture etc. This paper gives the analysis of hyper spectral image segmentation using fuzzy C-Mean (FCM)clustering technique with FODPSO classifier algorithm. The 2D adaptive log filter is proposed to denoise the sensed and captured hyper spectral image in order to remove the speckle noise.
Using AI Planning to Automate the Performance Analysis of SimulatorsRoland Ewald
Analyzing simulation algorithm performance is cumbersome: execute some runs, observe a performance metric, and analyze the results. Often, the results motivate follow-up experiments, which in turn may lead to additional experiments, and so on. This time-consuming and error-prone process can be automated with planning approaches from artificial intelligence, making simulator performance analysis more convenient and rigorous. This paper introduces ALeSiA, a prototypical system for automatic simulator performance analysis. It is independent of any specific simulation system and realizes a hypothesis-driven approach to evaluate performance.
Sequential Action Patterns in Collaborative Ontology Engineering Projects: A ...Philipp Singer
Simon Walk's talk at CIKM '14 about our paper titled "Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain"
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Postermultimediaeval
This paper details the participation of the UNED-UV group at the 2015 Retrieving Diverse Social Images Task. This year, our proposal is based on a multi-modal approach that firstly applies a textual algorithm based on Formal Concept Analysis (FCA) and Hierarchical Agglomerative Clustering (HAC) to detect the latent topics addressed by the images to diversify them according to these topics. Secondly, a Local Logistic Regression model, which uses the low level features and some relevant and non-relevant samples, is adjusted and estimates the relevance probability for all the images in the database.
http://ceur-ws.org/Vol-1436/
http://www.multimediaeval.org
May 2015 talk to SW Data Meetup by Professor Hendrik Blockeel from KU Leuven & Leiden University.
With increasing amounts of ever more complex forms of digital data becoming available, the methods for analyzing these data have also become more diverse and sophisticated. With this comes an increased risk of incorrect use of these methods, and a greater burden on the user to be knowledgeable about their assumptions. In addition, the user needs to know about a wide variety of methods to be able to apply the most suitable one to a particular problem. This combination of broad and deep knowledge is not sustainable.
The idea behind declarative data analysis is that the burden of choosing the right statistical methodology for answering a research question should no longer lie with the user, but with the system. The user should be able to simply describe the problem, formulate a question, and let the system take it from there. To achieve this, we need to find answers to questions such as: what languages are suitable for formulating these questions, and what execution mechanisms can we develop for them? In this talk, I will discuss recent and ongoing research in this direction. The talk will touch upon query languages for data mining and for statistical inference, declarative modeling for data mining, meta-learning, and constraint-based data mining. What connects these research threads is that they all strive to put intelligence about data analysis into the system, instead of assuming it resides in the user.
Hendrik Blockeel is a professor of computer science at KU Leuven, Belgium, and part-time associate professor at Leiden University, The Netherlands. His research interests lie mostly in machine learning and data mining. He has made a variety of research contributions in these fields, including work on decision tree learning, inductive logic programming, predictive clustering, probabilistic-logical models, inductive databases, constraint-based data mining, and declarative data analysis. He is an action editor for Machine Learning and serves on the editorial board of several other journals. He has chaired or organized multiple conferences, workshops, and summer schools, including ILP, ECMLPKDD, IDA and ACAI, and he has been vice-chair, area chair, or senior PC member for ECAI, IJCAI, ICML, KDD, ICDM. He was a member of the board of the European Coordinating Committee for Artificial Intelligence from 2004 to 2010, and currently serves as publications chair for the ECMLPKDD steering committee.
Similar to Modeling adoptions and the stages of the diffusion of innovations (20)
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Modeling adoptions and the stages of the diffusion of innovations
1. MODELING ADOPTIONS AND THE
STAGES OF THE DIFFUSION OF
INNOVATIONS
Nicola
Barbieri
Francesco
Bonchi
Yahoo
Labs
Barcelona,
Spain
{barbieri,bonchi}@yahoo-‐inc.com
Yasir
Mehmood
Pompeu
Fabra
University
yasir@yahoo-‐inc.com
2. Background
• The spread of new ideas in a society is a complex process
that starting from a small fraction of the population, propagates
over time through a diverse set of communication channels,
potentially reaching a critical mass.
• Rogers’ seminal work provides a unified tool for modeling
diffusion processes.
• His theory identifies five categories of people by considering
the adoption time of each person with respect to the rest of the
population.
3. Our contribution
• Understanding the dynamics of such complex process is has
potential implications in sociology, economics and marketing.
• We study the data mining problem of modeling adoptions and
the stages of the diffusion of an innovation:
① Real-world items exhibit consistent differences in the way
they diffuse;
② The diffusion of different items may interest different
segments of the market and the diffusion of a item can
achieve different level of success;
③ Different items may exhibit different temporal patterns of
diffusion.
4. MASD: a stochastic framework for modeling
adoptions and the stages of diffusions
• The process of diffusion is decomposed in a finite and
ordered sequence of stages of adoptions;
• Early stages correspond to the introduction in the market of a item,
while latter ones correspond to the maturity phase of its life cycle.
§ In continuity with Rogers’ theory, users have different
likelihood of being involved in each stage.
§ Each stage is characterized by a rate which describes the
relative speed of adoptions.
5. MASD: modeling procedure (1)
• The density function for observing a specific adoption given
the previous ones and the current stage is:
• The density function for the n-th adoption depends on the
given the current stage and it is given by
• the probability of observing the activation of the adopter ui,n
• this adoption to occur at time ti,n
• We assume that each stage sj, the temporal gaps between
consecutive adoptions are explained by an exponential
density function with rate λj :
where
6. MASD: modeling procedure (2)
• Key idea: we consider a 1-to-1 association between the
stages of adoptions, and the states of a Markov model.
• To enforce a clear sequentiality in the evolution of the
stages of adoption, we introduce the following constraints:
• These structural constraints can be accommodated in a
left-to-right hidden Markov model.
A 4-states left-right HMM and its transition matrix
7. MASD: Generative model & learning
To better explain diffusion
mechanisms (e.g. localized trends),
we devise a learning framework that
alternates two phases:
① Cluster the diffusion traces in different
groups;
② For each group fit the parameter of the
MASD model by applying the
Expectation Maximization algorithm.
• The number of states for each cluster is automatically
detected by relying on the Bayesian Information Criterion.
8. MASD: Learning
• We employ a simple instance of iterative prototype-based
clustering.
• Given a cluster Ch (set of traces) and a set of candidate
models with different degree of complexity, we select Θ∗
h as:
9. Evaluation
• Assess the accuracy, convergence and stability of the learning
framework on synthetic data.
• We generate synthetic data with planted clustering structure in two
steps:
① Construct a set MASD models to generate clusters of data;
② Sample lengths of adoption traces, pick a MASD model at random, and
generate adoptions using the selected model.
• Detecting and characterizing different patterns of adoption on
real data (Movielens & Yahoo Meme).
• Predictive tasks. Given a considered time window:
• Which users are more likely to adopt the item?
• How many users will adopt the item?
10. Evaluation on synthetic data.
(top) Accuracy in the “clustering reconstruction” task on synthetic data with planted clusters, measured in terms of Rand index
(bottom) Convergence rate of the clustering/learning process, measured as the percentage of swaps observed at each iteration
11. Evaluation on real-world data (1)
Adoption patterns, in the 5 clusters, MovieLens (top) and Yahoo Meme (bottom)
MASD model on the Yahoo MEME dataset. The thickness of the arc indicates the
strength of transition probability between stages. Numbers inside each state
represent: (i) the index of the adoption stage, (ii) the avg percentage of adoptions
observed in the considered stage; (iii) the percentage of adoption traces that
involve the stage.
12. Evaluation on real-world data (2)
Each user is generally tied to few stages.
Stages exhibit different patterns
considering adoption rates.
Different diffusion rates (log-scale) in the five clusters in MovieLens (top) and Yahoo Meme (bottom)
13. Experiments: real-world datasets (3)
Top-5 traces that minimize the perplexity for the considered model:
Adoption patterns, in the 5 clusters, MovieLens (top) and Yahoo Meme (bottom)
14. Predictive tasks: Prediction protocol
• Learn parameters over the diffusion traces in the training-set.
• For each trace in the test-set we evaluate prediction accuracy
by varying the length of the observed data used for fitting.
• Each partial-observed trace in the test-set is associated to
the model M that maximizes it log likelihood.
Fitting Evaluation
• We find the the current diffusion state by applying Viterbi.
• From the current state, we generate multiple samples
according to the generative process.
• The variable of interest (adoption of a user/overall size) is
estimated on the simulated traces (1k).
15. Predictive tasks: Results
Area under the curve (AUC) for predicting single user activations:
60% partial observation 50% partial observation 40% partial observation
Time Window MASD k-NN (60, 80, 100) MASD k-NN (60, 80, 100) MASD k-NN (60, 80, 100)
30 days 0.70 0.54 0.55 0.55 0.69 0.55 0.55 0.55 0.69 0.54 0.54 0.55
21 days 0.69 0.55 0.55 0.55 0.69 0.54 0.54 0.55 0.68 0.54 0.54 0.54
14 days 0.69 0.54 0.54 0.54 0.69 0.54 0.54 0.55 0.69 0.53 0.54 0.54
(a) Movielens
60% partial observation 50% partial observation 40% partial observation
Time Window MASD k-NN (60, 80, 100) MASD k-NN (60, 80, 100) MASD k-NN (60, 80, 100)
60 min. 0.83 0.73 0.74 0.75 0.82 0.72 0.73 0.74 0.82 0.72 0.74 0.74
30 min. 0.82 0.72 0.73 0.74 0.81 0.71 0.72 0.72 0.81 0.71 0.72 0.73
15 min. 0.81 0.68 0.70 0.70 0.80 0.69 0.70 0.71 0.81 0.66 0.68 0.69
(b) Yahoo Meme
TABLE VI: Area under the curve (AUC) for predicting single user activations in different time windows. The baseline procedure is evaluated
for three selections of k and three different splits of propagations.
Mean absolute error (MAE) for predicting final size of the diffusion trace:
60% partial observation 50% partial observation 40% partial observation
Time Window MASD k-NN (60, 80, 100) MASD k-NN (60, 80, 100) MASD k-NN (60, 80, 100)
30 days 3.42 3.90 3.90 3.92 3.71 3.90 3.91 3.92 4.61 5.14 5.17 5.17
21 days 2.61 3.02 3.02 3.03 2.88 3.02 3.03 3.03 3.61 3.96 3.97 3.98
14 days 1.93 2.22 2.23 2.23 2.16 2.23 2.23 2.23 2.69 2.90 2.91 2.91
(a) Movielens
60% partial observation 50% partial observation 40% partial observation
Time Window MASD k-NN (60, 80, 100) MASD k-NN (60, 80, 100) MASD k-NN (60, 80, 100)
60 min. 3.57 5.56 5.61 5.63 5.32 7.20 7.24 7.25 7.46 9.01 9.08 9.09
30 min. 3.01 4.65 4.69 4.71 4.66 6.13 6.17 6.15 6.69 7.82 7.89 7.93
15 min. 2.49 3.23 3.25 3.26 4.53 5.89 5.92 5.93 5.50 5.63 5.66 5.68
(b) Yahoo Meme
16. Conclusion and future works
• We introduce MASD, a stochastic framework for modeling
users’ adoptions and the different stages of diffusion of of
innovations.
• Our model focuses on the two main dimensions, users and rate
of adoption. Learning is accomplished by fitting a left-to-right
hidden Markov model.
• The experimental evaluation over real-world data confirms the
accuracy in detecting interesting patterns of adoption and in
prediction scenarios.
• Future works: Account for social influence dynamics and stages
of virality.