This document summarizes research on mining user lifecycles from online community platforms and applying the findings to churn prediction. Key points:
- User development was analyzed using properties like in-degree, out-degree, and language over equal time periods to derive lifecycle stages.
- Period entropy and cross-entropy with previous periods/community were computed to quantify variation and convergence of properties over time.
- Linear regression models were fit to lifecycle property trajectories for individual users, showing most had decreasing period entropy over time.
- The goal is to understand user development and forecast churners based on early signals, enabling recommendations and neighborhood-based systems.
CHANGES IN LABVIEW PROGRAMS POSTED TO AN ONLINE FORUM AS USERS GAIN EXPERIENCEijseajournal
Engineers and others can learn to use programming environments such as LabVIEW via online resources,
including the LabVIEW forum. However, an interesting challenge in such a diffuse and distributed learning
environment is assessing to what extent engineers are increasing in programming skill. This paper presents
an analysis exploring the extent to which users’ uploaded programs changed in frequency and complexity over time. This study revealed a high rate of drop-out, a drop in the complexity of programs uploaded to the forum during the first two years after users’ first (respective) uploads of programs to the forum, and a slow long-term upward trend in complexity. The results highlight the need for further research aimed at assessing and promoting online learning of programming.
Iterative knowledge extraction from social networks. The Web Conference 2018Marco Brambilla
Knowledge in the world continuously evolves, and ontologies are largely incomplete, especially regarding data belonging to the so-called long tail. We propose a method for discovering emerging knowledge by extracting it from social content. Once initialized by domain experts, the method is capable of finding relevant entities by means of a mixed syntactic-semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors built by using terms occurring in their social content and ranks the candidates by using their distance from the centroid of seeds, returning the top candidates. Our method can run iteratively, using the results as new seeds.
In this paper we address the following research questions: (1) How does the reconstructed domain knowledge evolve if the candidates of one extraction are recursively used as seeds (2) How does the reconstructed domain knowledge spread geographically (3) Can the method be used to inspect the past, present, and future of knowledge (4) Can the method be used to find emerging knowledge?.
This work was presented at The Web Conference 2018, MSM workshop.
Data Cleaning for social media knowledge extractionMarco Brambilla
Social media platforms let users share their opinions through textual or multimedia content. In many settings, this becomes a valuable source of knowledge that can be exploited for specific business objectives. Brands and companies often ask to monitor social media as sources for understanding the stance, opinion, and sentiment of their customers, audience and potential audience. This is crucial for them because it let them understand the trends and future commercial and marketing opportunities.
However, all this relies on a solid and reliable data collection phase, that grants that all the analyses, extractions and predictions are applied on clean, solid and focused data. Indeed, the typical topic-based collection of social media content performed through keyword-based search typically entails very noisy results.
We recently implemented a simple study aiming at cleaning the data collected from social content, within specific domains or related to given topics of interest. We propose a basic method for data cleaning and removal of off-topic content based on supervised machine learning techniques, i.e. classification, over data collected from social media platforms based on keywords regarding a specific topic. We define a general method for this and then we validate it through an experiment of data extraction from Twitter, with respect to a set of famous cultural institutions in Italy, including theaters, museums, and other venues.
For this case, we collaborated with domain experts to label the dataset, and then we evaluated and compared the performance of classifiers that are trained with different feature extraction strategies.
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...Silicon Studio Corporation
Reducing user attrition, i.e. churn, is a broad challenge faced by several industries. In mobile social games, decreasing churn is decisive to increase player retention and rise revenues. Churn prediction models allow to understand player loyalty and to anticipate when they will stop playing a game. Thanks to these predictions, several initiatives can be taken to retain those players who are more likely to churn. Survival analysis focuses on predicting the time of occurrence of a certain event, churn in our case. Classical methods, like regressions, could be applied only when all players have left the game. The challenge arises for datasets with incomplete churning information for all players, as most of them still connect to the game. This is called a censored data problem and is in the nature of churn. Censoring is commonly dealt with survival analysis techniques, but due to the inflexibility of the survival statistical algorithms, the accuracy achieved is often poor. In contrast, novel ensemble learning techniques, increasingly popular in a variety of scientific fields, provide high-class prediction results. In this work, we develop, for the first time in the social games domain, a survival ensemble model which provides a comprehensive analysis together with an accurate prediction of churn. For each player, we predict the probability of churning as function of time, which permits to distinguish various levels of loyalty profiles. Additionally, we assess the risk factors that explain the predicted player survival times. Our results show that churn prediction by survival ensembles significantly improves the accuracy and robustness of traditional analyses, like Cox regression.
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...Amazon Web Services
In this session, we take a specific business problem—predicting Telco customer churn—and explore the practical aspects of building and evaluating an Amazon Machine Learning model. We explore considerations ranging from assigning a dollar value to applying the model using the relative cost of false positive and false negative errors. We discuss all aspects of putting Amazon ML to practical use, including how to build multiple models to choose from, put models into production, and update them. We also discuss using Amazon Redshift and Amazon S3 with Amazon ML.
Beyond Churn Prediction : An Introduction to uplift modelingPierre Gutierrez
These slides are from a talk I at the papis conference in Boston in 2016. The main subject is uplift modelling. Starting from a churn model approach for an e-gaming company, we introduce when to apply uplift methods, how to mathematically model them, and finally, how to evaluate them.
I tried to bridge the gap between causal inference theory and uplift theory, especially concerning how to properly cross validate the results. The notation used is the one from uplift modelling.
CHANGES IN LABVIEW PROGRAMS POSTED TO AN ONLINE FORUM AS USERS GAIN EXPERIENCEijseajournal
Engineers and others can learn to use programming environments such as LabVIEW via online resources,
including the LabVIEW forum. However, an interesting challenge in such a diffuse and distributed learning
environment is assessing to what extent engineers are increasing in programming skill. This paper presents
an analysis exploring the extent to which users’ uploaded programs changed in frequency and complexity over time. This study revealed a high rate of drop-out, a drop in the complexity of programs uploaded to the forum during the first two years after users’ first (respective) uploads of programs to the forum, and a slow long-term upward trend in complexity. The results highlight the need for further research aimed at assessing and promoting online learning of programming.
Iterative knowledge extraction from social networks. The Web Conference 2018Marco Brambilla
Knowledge in the world continuously evolves, and ontologies are largely incomplete, especially regarding data belonging to the so-called long tail. We propose a method for discovering emerging knowledge by extracting it from social content. Once initialized by domain experts, the method is capable of finding relevant entities by means of a mixed syntactic-semantic method. The method uses seeds, i.e. prototypes of emerging entities provided by experts, for generating candidates; then, it associates candidates to feature vectors built by using terms occurring in their social content and ranks the candidates by using their distance from the centroid of seeds, returning the top candidates. Our method can run iteratively, using the results as new seeds.
In this paper we address the following research questions: (1) How does the reconstructed domain knowledge evolve if the candidates of one extraction are recursively used as seeds (2) How does the reconstructed domain knowledge spread geographically (3) Can the method be used to inspect the past, present, and future of knowledge (4) Can the method be used to find emerging knowledge?.
This work was presented at The Web Conference 2018, MSM workshop.
Data Cleaning for social media knowledge extractionMarco Brambilla
Social media platforms let users share their opinions through textual or multimedia content. In many settings, this becomes a valuable source of knowledge that can be exploited for specific business objectives. Brands and companies often ask to monitor social media as sources for understanding the stance, opinion, and sentiment of their customers, audience and potential audience. This is crucial for them because it let them understand the trends and future commercial and marketing opportunities.
However, all this relies on a solid and reliable data collection phase, that grants that all the analyses, extractions and predictions are applied on clean, solid and focused data. Indeed, the typical topic-based collection of social media content performed through keyword-based search typically entails very noisy results.
We recently implemented a simple study aiming at cleaning the data collected from social content, within specific domains or related to given topics of interest. We propose a basic method for data cleaning and removal of off-topic content based on supervised machine learning techniques, i.e. classification, over data collected from social media platforms based on keywords regarding a specific topic. We define a general method for this and then we validate it through an experiment of data extraction from Twitter, with respect to a set of famous cultural institutions in Italy, including theaters, museums, and other venues.
For this case, we collaborated with domain experts to label the dataset, and then we evaluated and compared the performance of classifiers that are trained with different feature extraction strategies.
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using ...Silicon Studio Corporation
Reducing user attrition, i.e. churn, is a broad challenge faced by several industries. In mobile social games, decreasing churn is decisive to increase player retention and rise revenues. Churn prediction models allow to understand player loyalty and to anticipate when they will stop playing a game. Thanks to these predictions, several initiatives can be taken to retain those players who are more likely to churn. Survival analysis focuses on predicting the time of occurrence of a certain event, churn in our case. Classical methods, like regressions, could be applied only when all players have left the game. The challenge arises for datasets with incomplete churning information for all players, as most of them still connect to the game. This is called a censored data problem and is in the nature of churn. Censoring is commonly dealt with survival analysis techniques, but due to the inflexibility of the survival statistical algorithms, the accuracy achieved is often poor. In contrast, novel ensemble learning techniques, increasingly popular in a variety of scientific fields, provide high-class prediction results. In this work, we develop, for the first time in the social games domain, a survival ensemble model which provides a comprehensive analysis together with an accurate prediction of churn. For each player, we predict the probability of churning as function of time, which permits to distinguish various levels of loyalty profiles. Additionally, we assess the risk factors that explain the predicted player survival times. Our results show that churn prediction by survival ensembles significantly improves the accuracy and robustness of traditional analyses, like Cox regression.
AWS re:Invent 2016: Predicting Customer Churn with Amazon Machine Learning (M...Amazon Web Services
In this session, we take a specific business problem—predicting Telco customer churn—and explore the practical aspects of building and evaluating an Amazon Machine Learning model. We explore considerations ranging from assigning a dollar value to applying the model using the relative cost of false positive and false negative errors. We discuss all aspects of putting Amazon ML to practical use, including how to build multiple models to choose from, put models into production, and update them. We also discuss using Amazon Redshift and Amazon S3 with Amazon ML.
Beyond Churn Prediction : An Introduction to uplift modelingPierre Gutierrez
These slides are from a talk I at the papis conference in Boston in 2016. The main subject is uplift modelling. Starting from a churn model approach for an e-gaming company, we introduce when to apply uplift methods, how to mathematically model them, and finally, how to evaluate them.
I tried to bridge the gap between causal inference theory and uplift theory, especially concerning how to properly cross validate the results. The notation used is the one from uplift modelling.
Testing Vitality Ranking and Prediction in Social Networking Services With Dy...reshma reshu
Social networking services have been prevalent at many online communities such as Twitter.com and Weibo.com, where millions of users keep interacting with each other every day. One interesting and important problem in the social networking services is to rank users based on their vitality in a timely fashion. An accurate ranking list of user vitality could benefit many parties in social network services such as the ads providers and site operators. Although it is very promising to obtain a vitality-based ranking list of users, there are many technical challenges due to the large scale and dynamics of social networking data .
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Studying user footprints in different online social networksIIIT Hyderabad
With the growing popularity and usage of online social media services, people now have accounts (some times several) on multiple and diverse services like Facebook, LinkedIn, Twitter and YouTube. Publicly available information can be used to create a digital footprint of any user using these social media services. Generating such digital footprints can be very useful for personalization, profile management, detecting malicious behavior of users. A very important application of analyzing users’ online digital footprints is to protect users from potential privacy and security risks arising from the huge publicly available user information. We extracted information about user identities on different social networks through Social Graph API, FriendFeed, and Profilactic; we collated our own dataset to create the digital footprints of the users. We used username, display name, description, location, profile image, and number of connections to generate the digital footprints of the user. We applied context specific techniques (e.g. Jaro Winkler
similarity, Wordnet based ontologies) to measure the similarity of the user profiles on different social networks. We specifically focused on Twitter and LinkedIn. In this paper, we present the analysis and results from applying automated classifiers for
disambiguating profiles belonging to the same user from different social networks UserID and Name were found to be the most discriminative features for disambiguating user profiles. Using the most promising set of features and similarity metrics, we
achieved accuracy, precision and recall of 98%, 99%, and 96%, respectively.
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...Markus Luczak-Rösch
Invited talk given at the QUEST (Qualitative Experise at Southampton, http://www.quest.soton.ac.uk/) group event (http://www.quest.soton.ac.uk/training/) on Qualitative Methods and Big Data.
Inferring Peer Centrality in Socially-Informed Peer-to-Peer SystemsNicolas Kourtellis
Social applications implemented on a peer-to-peer (P2P) architecture mine the social graph of their users for improved performance in search, recommendations, resource
sharing and others. In such applications, the social graph that connects their users is distributed on the peer-to-peer system: the traversal of the social graph translates to a socially-informed routing in the peer-to-peer layer.
In this work we introduce the model of a projection graph that is the result of mapping a social graph onto a peer-to-peer network. We analytically formulate the relation between metrics in the social graph and in the projection graph. We focus on three such graph metrics: degree centrality, node betweenness centrality, and edge betweenness centrality. We evaluate experimentally the feasibility of estimating these metrics in the projection graph from the metrics of the social graph. Our experiments on real networks show that when mapping communities of 50-150 users on a peer, there is an optimal organization of the projection graph with respect to degree and node betweenness centrality. In this range, the association between the properties of the social graph and the projection graph is the highest, and thus the properties of the (dynamic) projection graph can be inferred from
the properties of the (slower changing) social graph. We discuss the applicability of our findings to aspects of peer-to-peer systems such as data dissemination, social search, peer vulnerability, and data placement and caching.
Inferring Peer Centrality in Socially-Informed Peer-to-Peer Systems. Nicolas Kourtellis and Adriana Iamnitchi. In Proceedings of 11th IEEE International Conference on Peer-to-Peer Computing (P2P'11), Kyoto, Japan, Aug 2011
What network simulator questions do users ask? a large-scale study of stack o...nooriasukmaningtyas
The use of network simulator as a modern tool in analyzing and predicting the behaviour of computer networks has grown to reduce the complexity of its accuracy measurement. This attracts researchers and practitioners to share problems and discuss them to improve the features. To communicate the related issues, users move to online questionanswering platforms. Although recent studies have shown the popularity and benefits of adopting network simulation tools, the challenges users face in using the network simulator remain unknown. In this research paper, we examine 2,322 network simulator related stack overflow question posts to gain insights into the topics and challenges that users have discussed. We adopt the latent dirichlet allocation model to understand the topics discussed in stack overflow. We then investigate the popularity and difficulty of each topic. The results show that users use stack overflow as an implementation guideline for the network simulation model. We determine 8 discussion topics that are merged into 5 major categories. Simulation model configuration is the most useful topic for users. We also observe that target network protocol modification and network simulator installation are the most popular topics. Network simulator installation and target network protocol modification issues have been challenging for most users. The findings also highlight future research that suggests ways to help the network simulator community in the early stages to overcome the popular and difficult topics faced when using network simulation tools.
Network effects. It’s one of the most important concepts for business in general and especially for tech businesses, as it’s the key dynamic behind many successful software-based companies. Understanding network effects not only helps build better products, but it helps build moats and protect software companies against competitors’ eating away at their margins.
Yet what IS a network effect? How do we untangle the nuances of 'network effects' with 'marketplaces' and 'platforms'? What’s the difference between network effects, virality, supply-side economies of scale? And how do we know a company has network effects?
Most importantly, what questions can entrepreneurs and product managers ask to counter the wishful thinking and sometimes faulty assumption behind the belief that “if we build it, they will come” … and instead go about more deterministically creating network effects in their business? Because it's not a winner-take-all market by accident.
Sampling of User Behavior Using Online Social NetworkEditor IJCATR
The popularity of online networks provides an opportunity to study the characteristics of online social network graphs is important, both to improve current systems and to design new application of online social networks. Although personalized search has been proposed for many years and many personalization strategies have been investigated, it is still unclear whether personalization is consistently effective on different queries for different users, and under different search contexts. In this paper, we study performance of information collection in a dynamic social network. By analyzing the results, we reveal that personalized search has significant improvement over common web search.
The mixing time of thee sampling process strongly depends on the characteristics of the graph.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Is software engineering research addressing software engineering problems?Gail Murphy
Keynote from Automated Software Engineering 2020. (See https://www.cs.ubc.ca/~murphy for video)
Brian Randell described software engineering as “the multi-person development of multi-version programs”. David Parnas has expressed that this “pithy phrase implies everything that differentiates software engineering from other programming”. How does current software engineering research compare against this definition? Is there currently too much focus on research into problems and techniques more associated with programming than software engineering? Are there opportunities to use Randell’s description of software engineering to guide the community to new research directions? In this talk, I will explore these questions and discuss how a consideration of the development streams used by multiple individuals to produce multiple versions of software opens up new avenues for impactful software engineering research.
Testing Vitality Ranking and Prediction in Social Networking Services With Dy...reshma reshu
Social networking services have been prevalent at many online communities such as Twitter.com and Weibo.com, where millions of users keep interacting with each other every day. One interesting and important problem in the social networking services is to rank users based on their vitality in a timely fashion. An accurate ranking list of user vitality could benefit many parties in social network services such as the ads providers and site operators. Although it is very promising to obtain a vitality-based ranking list of users, there are many technical challenges due to the large scale and dynamics of social networking data .
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Studying user footprints in different online social networksIIIT Hyderabad
With the growing popularity and usage of online social media services, people now have accounts (some times several) on multiple and diverse services like Facebook, LinkedIn, Twitter and YouTube. Publicly available information can be used to create a digital footprint of any user using these social media services. Generating such digital footprints can be very useful for personalization, profile management, detecting malicious behavior of users. A very important application of analyzing users’ online digital footprints is to protect users from potential privacy and security risks arising from the huge publicly available user information. We extracted information about user identities on different social networks through Social Graph API, FriendFeed, and Profilactic; we collated our own dataset to create the digital footprints of the users. We used username, display name, description, location, profile image, and number of connections to generate the digital footprints of the user. We applied context specific techniques (e.g. Jaro Winkler
similarity, Wordnet based ontologies) to measure the similarity of the user profiles on different social networks. We specifically focused on Twitter and LinkedIn. In this paper, we present the analysis and results from applying automated classifiers for
disambiguating profiles belonging to the same user from different social networks UserID and Name were found to be the most discriminative features for disambiguating user profiles. Using the most promising set of features and similarity metrics, we
achieved accuracy, precision and recall of 98%, 99%, and 96%, respectively.
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...Markus Luczak-Rösch
Invited talk given at the QUEST (Qualitative Experise at Southampton, http://www.quest.soton.ac.uk/) group event (http://www.quest.soton.ac.uk/training/) on Qualitative Methods and Big Data.
Inferring Peer Centrality in Socially-Informed Peer-to-Peer SystemsNicolas Kourtellis
Social applications implemented on a peer-to-peer (P2P) architecture mine the social graph of their users for improved performance in search, recommendations, resource
sharing and others. In such applications, the social graph that connects their users is distributed on the peer-to-peer system: the traversal of the social graph translates to a socially-informed routing in the peer-to-peer layer.
In this work we introduce the model of a projection graph that is the result of mapping a social graph onto a peer-to-peer network. We analytically formulate the relation between metrics in the social graph and in the projection graph. We focus on three such graph metrics: degree centrality, node betweenness centrality, and edge betweenness centrality. We evaluate experimentally the feasibility of estimating these metrics in the projection graph from the metrics of the social graph. Our experiments on real networks show that when mapping communities of 50-150 users on a peer, there is an optimal organization of the projection graph with respect to degree and node betweenness centrality. In this range, the association between the properties of the social graph and the projection graph is the highest, and thus the properties of the (dynamic) projection graph can be inferred from
the properties of the (slower changing) social graph. We discuss the applicability of our findings to aspects of peer-to-peer systems such as data dissemination, social search, peer vulnerability, and data placement and caching.
Inferring Peer Centrality in Socially-Informed Peer-to-Peer Systems. Nicolas Kourtellis and Adriana Iamnitchi. In Proceedings of 11th IEEE International Conference on Peer-to-Peer Computing (P2P'11), Kyoto, Japan, Aug 2011
What network simulator questions do users ask? a large-scale study of stack o...nooriasukmaningtyas
The use of network simulator as a modern tool in analyzing and predicting the behaviour of computer networks has grown to reduce the complexity of its accuracy measurement. This attracts researchers and practitioners to share problems and discuss them to improve the features. To communicate the related issues, users move to online questionanswering platforms. Although recent studies have shown the popularity and benefits of adopting network simulation tools, the challenges users face in using the network simulator remain unknown. In this research paper, we examine 2,322 network simulator related stack overflow question posts to gain insights into the topics and challenges that users have discussed. We adopt the latent dirichlet allocation model to understand the topics discussed in stack overflow. We then investigate the popularity and difficulty of each topic. The results show that users use stack overflow as an implementation guideline for the network simulation model. We determine 8 discussion topics that are merged into 5 major categories. Simulation model configuration is the most useful topic for users. We also observe that target network protocol modification and network simulator installation are the most popular topics. Network simulator installation and target network protocol modification issues have been challenging for most users. The findings also highlight future research that suggests ways to help the network simulator community in the early stages to overcome the popular and difficult topics faced when using network simulation tools.
Network effects. It’s one of the most important concepts for business in general and especially for tech businesses, as it’s the key dynamic behind many successful software-based companies. Understanding network effects not only helps build better products, but it helps build moats and protect software companies against competitors’ eating away at their margins.
Yet what IS a network effect? How do we untangle the nuances of 'network effects' with 'marketplaces' and 'platforms'? What’s the difference between network effects, virality, supply-side economies of scale? And how do we know a company has network effects?
Most importantly, what questions can entrepreneurs and product managers ask to counter the wishful thinking and sometimes faulty assumption behind the belief that “if we build it, they will come” … and instead go about more deterministically creating network effects in their business? Because it's not a winner-take-all market by accident.
Sampling of User Behavior Using Online Social NetworkEditor IJCATR
The popularity of online networks provides an opportunity to study the characteristics of online social network graphs is important, both to improve current systems and to design new application of online social networks. Although personalized search has been proposed for many years and many personalization strategies have been investigated, it is still unclear whether personalization is consistently effective on different queries for different users, and under different search contexts. In this paper, we study performance of information collection in a dynamic social network. By analyzing the results, we reveal that personalized search has significant improvement over common web search.
The mixing time of thee sampling process strongly depends on the characteristics of the graph.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Is software engineering research addressing software engineering problems?Gail Murphy
Keynote from Automated Software Engineering 2020. (See https://www.cs.ubc.ca/~murphy for video)
Brian Randell described software engineering as “the multi-person development of multi-version programs”. David Parnas has expressed that this “pithy phrase implies everything that differentiates software engineering from other programming”. How does current software engineering research compare against this definition? Is there currently too much focus on research into problems and techniques more associated with programming than software engineering? Are there opportunities to use Randell’s description of software engineering to guide the community to new research directions? In this talk, I will explore these questions and discuss how a consideration of the development streams used by multiple individuals to produce multiple versions of software opens up new avenues for impactful software engineering research.
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...Matthew Rowe
Invited keynote talk at the 1st Workshop of Quality, Motivation and Coordination of Open Collaboration @ the International Conference on Social Informatics 2013
Attention Economics in Social Web SystemsMatthew Rowe
Slides from a Highwire Digital Futures Seminar that I gave at Lancaster University on 25th October 2012 covering Attention Economics in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsMatthew Rowe
Presented at:
-Aston Business School, Birmingham, UK. 2011
-Keynote presentation at Detecting and Exploiting Cultural Diversity on the Social Web Workshop, 20th Annual Conference on Information and Knowledge Management 2011
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
1. MINING USER LIFECYCLES
FROM ONLINE COMMUNITY
PLATFORMS AND THEIR
APPLICATION TO CHURN
PREDICTION
DR. MATTHEW ROWE
SCHOOL OF COMPUTING AND COMMUNICATIONS
@MROWEBOT | M.ROWE@LANCASTER.AC.UK
International Conference on Data Mining 2013
Dallas, USA
3. User Development: ‘Online’
2
¨
Recently studied in isolated dimensions:
¤ Socially
(Telecoms Networks: Miritello et al. 2013)
n Communication
networks tend to a capacity
¤ Lexically
(Online Communities: Danescu-Niculescu-Mizil
et al. 2013)
n Language
¨
adapts to the community, before diverging
Without analysing development:
a)
b)
Relative to earlier signals
Relative to the community of interaction
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
4. Understanding User Development
enables…
3
work (more later)
Jul
Sep
Nov
A
(b) T
0.8
Entropy
Period Entropy
Community Entropy
In−degree
Out−degree
Lexical
All
0.2
0.4
0.6
Figure 3: Average rat
moving average of the
categories.
0.0
of this talk
True Positive Rate
n Focus
churners from development signals
1.0
Churn Prediction
¤ Forecast
Mar
Time
(a) Lens
2.
7.0
8.0
May
6.0
Average Rating
3.8
3.6
3.4
Directorial Debut Films
1990s Comedy Films
5.0
n Current/future
Average Rating
Stage-based user
neighbourhoods (e.g. user-kNN)
¤ Modelling taste evolution (e.g. biases in MF)
3.2
¤ Developmental
4.0
Recommender Systems
3.0
1.
for MovieLens the scores re
Movie Tweetings ‘Independe
rating and ‘Directorial Debu
rating over time. Such info
the biases of the recommen
stability of a given bias in
made: i.e. considering the
and how this relates to pre
0.0
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
5.
0.2
0.4
0.6
0.8
1.0
ANALYSING TA
False Positive Rate
Analysing the evolution a
allows one to understand h
5. Outline
4
Datasets: Online Community Platforms
¨ Defining User Lifecycles and Properties
¨ Mining Lifecycle Trajectories
¨ Predicting Churners
¨ Findings and Conclusions
¨ Future Work
¨
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
6. examination of user lifecycles we used data collected from Facebook, the SAP
Community Network (SAP) and Server Fault. Table 1 provides summary statistics of the datasets where we only considered users who had posted more than 40
times within their lifetime on the platform.1 The Facebook dataset was collected
Datasets: Online Community Platforms
from groups discussing Open University courses, where users talked about their
5
issues with the courses and guidance on studying. The SAP Community Network
is a community question ‘Open University’related to SAP technologies where
1. Facebook answering system Groups
users post questions and provide answers related to technical issues. Similarly,
¤ Containing discussions about courses and degrees
Server Fault is a platform that is part of the Stack Overflow question answering
2.
site collection2SAP Community Network related to server-related issues. We
where users post questions
divided each platform’s users up into 80%/20% splits for training (and analysis)
¤ Question-answering system for SAP technologies
and testing, using the former in this section to examine user development and
3.
the latter splitServer Fault detection experiments.
for our later
¤ Stack
Overflow subsidiary site for server-related issues
Table 1. Statistics of the online community platform datasets.
Platform
Time Span
Post Count User Count
Facebook
[18-08-2007,24-01-2013] 118,432
4,745
SAP
[15-12-2003,20-07-2011] 427,221
32,926
Server Fault [01-08-2008,31-03-2011] 234,790
33,285
3.1
Defining Lifecycle Periods
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
In order to examine how users develop over time we needed some means to
7. User Lifecycles: Derivation
6
Offline Lifecycle Periods
Primary School
High
School
University
Postgrad
Postdoc
Lecturing
Time
First Post
Last Post
Lifecycle Periods of a potential Question-Answering System user (conjecture!)
Novice Users
Asking Questions
Asking & Answering
Questions
Answering
Questions
In reality: do not know the labels, however we can split by equal time intervals:
1
2
3
…
n
Yet, users non-uniformly distribute their activity across lifecycles
1
2
3
…
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
n
8. User Lifecycles: Properties
7
We set n=20
1
2
1
#posts
¨
3
2
=
…
n
Divide lifetime into equal activity periods
#posts
Capture period-specific user properties (in period s):
¤
In-degree distribution
n
¤
Out-degree distribution
n
¤
Relative frequency distribution of senders to user u in period s
Relative frequency distribution of recipients from user u in s
Term distribution
n
Relative frequency distribution of terms used by u in s
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
s
9. they develop in the community (for SAP and Facebook),
however Server Fault users remain relatively stable. This
could be due to the relatively minor interaction effects that
take place on ServerFault: users largely lurk on the platform
Analysing Development: Period not contribute
to seek answers to questions, and thus do Entropy
unless it is necessary (i.e. they feel that their expertise is
(3)
8
sufficient to answer a question or that a new question is
¨ required), asin users’itproperties across periods
Variation a result is likely that users have an implicit
understanding of how one should formulate a post and thus
¨ Computed period entropy for each property
ghout their
the language that should be used.
using three
tribution in
change in
Facebook
with earlier
SAP
Server Fault
ng relative
ion in one
es over the
0 0.2
0.5
0.8 1
0 0.2
0.5
0.8 1
0 0.2
0.5
0.8 1
G
G
G
G
GG
G
GGGG
GG
G
GG
GG
G
Lifecycle Stages
G
GG
GGG
GG
G
G
GGGGG GG
Lifecycle Stages
G
G
G
Distribution Entropy
2.5 3.0 3.5 4.0 4.5
G
Distribution Entropy
0.6
0.8
G
0.4
Distribution Entropy
0.1
0.3
0.5
0.7
: C[t,t ] →
sage by the
conditional
t, t ] as:
G
GGGGGGGGG GGGGG
GGGG
G
Lifecycle Stages
(a) In-degree
(b) Out-degree
(c) Lexical
tropy): To
hin a given
Generally stable trends: of lifetime-stage distributions formed from users’ terms
Figure 1. Entropies consistent variance in communication and
probability
in-degrees, out-degrees and lexical their Application to Churn Prediction
Mining
y describes User Lifecycles from Online Community Platforms andterms.
riable, and
10. that consistently across the platforms, users are contacted
by people who have contacted them before and that fewer
novel users appear. The same is also true for the out-degree
distributions: users contact fewer new people than they did
before. This is symptomatic of community platforms where
despite new users arriving within the platform, users form
sub-communities in which they interact and communicate
Changes in properties relative to earlier
with the same individuals. Figure 2(c) also demonstrates that
Computed the minimised over time and thus produce a
users tend to reuse language cross-entropy for each
gradually
propertydecaying cross-entropy curve.
users form
tently perfor
We find a
where diver
the latter st
demonstrate
SAP we fi
initially bef
while for Se
cross-entrop
suggesting t
Convergence on prior properties diverge f
to
This effect
[2] where u
begin with,
Cross Entropy
0.10
0.20
G
Facebook
SAP
Server Fault
1.2
G
G
G
G
G
G
G
0
G
GGGGGGGGGGGGGGG
0.2
0.5
0.8
Lifecycle Stages
1
0.00
0.00
GG
0
G
G
GG
GG
GGG
GGG
GG
G
GG
0.2
0.5
0.8
Lifecycle Stages
1
GGG
GGGGGG
GGGGGG
0.0
0.30
¨
Cross Entropy
0.4
0.8
¨
Cross Entropy
0.05
0.10
9
0.15
Analysing Development: Period CrossEntropy
0
0.2
0.5
0.8
Lifecycle Stages
1
V.
Inspecting
concentrated
Convergence: lack of communication with new people, or use of new terms
platform, ex
Figure 2. Cross-entropies derived from comparing users’ in-degree, outnamics of co
Mining User Lifecycles from Online Community Platforms and theirwith previous lifecycle periods. We
degree and lexical term distributions Application to Churn Prediction
now turn to
see a consistent reduction in the cross-entropies over time.
(a) In-degree
(b) Out-degree
(c) Lexical
11. Analysing Development: Community CrossEntropy
10
Difference in properties relative to the community
¨ Computed cross-entropy for each property between
user @ [t,t’] and community @ [t,t’]
¨
G
G
GGGG
GGG
GGGGGGG
0
2.0
G G GGGGGGG
GGGGGG G G
(a) In-degree
G
Cross Entropy
7.0
8.0
G
G
G
G
GG
6.0
G
Cross Entropy
3.0
4.0
5.0
Cross Entropy
1
2
3
4
lexical en
Facebook
entropy re
SAP
Server Fault
increase. W
here due t
users R2 >
0 0.2
0.5
0.8 1
0 0.2
0.5
0.8 1
0 0.2
0.5
0.8 1
Lifecycle
Lifecycle Stages
Convergence onLifecycle Stages properties Stages
community
Divergence from the community
B. Modell
(b) Out-degree
G
GG
G
GG
G
G
G
G
G
GG
G
GG
(c) Lexical
G
Inspecti
Convergence-divergence: first, adapt to community; second, separate
earlier, by
Figure 3. Cross-entropies derived from comparing users’ in-degree, outMining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
paring use
degree and lexical term distributions the community platform over the same
time periods. We see a increased divergence towards the end of lifecycles.
decreasing
12. How can we model the evolution of individual users?
Solution: Mine Lifecycle Trajectories
i.e. fit a curve for each user’s development measure (property and indicator)
Properties: in-degree, out-degree, terms
Indicators: period entropy, period cross-entropy, community cross-entropy
Measures: property and indicator (e.g. in-degree period entropy)
11
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
13. opment of user properties, setting the explanatory variable
to be the lifecycle period of the user and the response
variable to be the user property’s entropy. In modelling
entropy development we can characterise each user using the
slope (β) of the model, thus indicating the rate of change of
entropy throughout the lifecycle periods. We induced user¨ specific entropy models for each platform’s users and then
Fitted per-user linear regression models
examined the cumulative frequency distribution [0,1] β¤ Ind’ var: entropy. Dep’ var: lifecycle period of the
values for the different user properties and platforms, these
¤ >80% of users R2 > 0.4
are shown in Figure 4.
−4
0
2 4
β
6
(a) In-degree
8
0.0
F(x)
0.4
0.0
F(x)
0.4
0.0
F(x)
0.4
0.8
Facebook
SAP
Server Fault
0.8
12
0.8
Lifecycle Trajectories: Period Entropy
propertie
the avera
decay ov
users had
than 0, th
model. T
to be pro
x (e.g. i
λ = 1/¯.
x
model u
[t0 , t0.05 ]
model as
the perio
out-degre
−2 −1
0
1
2
β
(b) Out-degree
3
−3
−1 0
β
1
2
3
(c) Lexical
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
Figure 4. Cumulative frequency distributions of linear regression models’
As we
model fo
users alo
In Figure
14. user throughout their lifecycles, letting
By
se the in-degree, out- derivingbythe distribution cross-entropy when proportional f (ui , [t, t ])
earlier,
deriving the minimum of average commparing users’ represent changes in user
model to
munity platform over the same values user properties with past properties, platforms and user
change paring ( ) then
across the users converge indicated clear
different on their past
devel-the end ofdevelopment, beforetrends. Thatfunction that returns the period cross-entropy of an
towards
community lifecycles. decreasing
is, property (e.g. in-degree) for a given user
propertiesbehaviour over time. the proportion of users for
we examined user
riable users. We begin this section This suggests that an exponential decay whom
individual
interval:
model process. suitable for than 0, Cross-Entropy
would be
Lifecyclechange was greater describing such reductions
Trajectories: Period and thus indicating
yatforms differtheterms mining
trajectories in average
and the
ponse
throughout user’s lifecycles. Applying such a model requires f (ui , [t, t ]) − f (ui , [
1
e development of users overall. We found that = cross-entropy values over
decay
for all tested measures, all
elling 13
period
y Trajectoriestrajecto- that users reduce in their δui
|T | − 1
f (u , [t, t ])
ng the lifecycle
time. average proportional changethe case]∈T, greater i
To examine whether this was indeed ],[t ,t we
[t,t value
users
an
ngof the entropy haduser properties
the
s (in-degree, out-degree,of defined the converged on past behaviour of
t<t <t
¨ Earlier: users measure δu that returns the average propor(entropy,
nge of period-cross- thus suggesting the period cross-entropy for a given growth
exhibited athan 0, tional change value in suitability of a decaying
generally stable entropy
Mining is performed by
By deriving [t, t one
model. Thewe chose their terms, letting requires denote a
exponential
ycle periods. in ¤ I.e. previously seendecay model f (ui , the ])distribution of average pro
user-changes Thereforeuser throughoutthe lifecycles,relationships, etc. parameter
resent
user
period cross-entropy of
the
del as a suitableFirst, examined the the change decay exponential decay platforms
modelfunction that returns potential for rateacross given value
for the that defines the values (δ) an arbitrary different
develof a time
d then beforebe provided
evelopment, to then
¨
user property (e.g. in-degree) for a given user and
properties we examined the proportion
erties, begin this the explanatory variable
rs. We setting section in-degree period cross-entropy) over time, where of users f
x (e.g. interval:
he the model:
nd
the average change was greater than 0, and thus i
periodmining process. and the response
of the user
1
f (u [t, t ]) − (ui [t , t ])
= 1/¯. We defined the lifecyclei ,period ffor, the exponential tested mea
x δu =
these
decay overall. ,We ])
found that for all
ser property’s entropy. In modelling
|T | − 1
f (ui [t, t
using an integer ,tusers s an average . , 20}, hence
[t,t ],[t
]∈T,
we can user model each user using the t<t value had = {1, 2, . .proportional change value o
opy of characterise
properties
Average proportional
<t
Feature value for interval [t,t’]
[t value in
enerally change0 , t thefeature change of
stable entropy rate
than 0, Feature: property and development indicator
thus suggesting the(6)
suitability of
l, thus indicating 0.05 ] ⌘of s1 , and then defined the exponential decay a decayin
By deriving the
Therefore we chose the We induced lettingdistribution) beexponential decay returns
he lifecycle model as follows, a proportionalof average proportional model requires one p
periods. users had user- fmodel. The a function that
(s, ui change value <0,
¨ All
change values (δ) across the different platforms and user
le model for the develbe arbitrary feature (in-degree,
els explanatory platform’s properties wethen ofthe proportion ofλ thatfor whom the decay rate of a giv
for eachthe periodusers and examinedto an provided users defines
cross-entropy
the
variable
ative and the response the average change wasx (e.g.than 0, and thus indicating
frequency hence fitted exponential decay model:
distribution of the βgreater in-degree period cross-entropy) over tim
user
out-degree, terms) for a given 1/¯. We defined the lifecycle period for the ex
user and lifecycle period:
Average of user’s features
λ= x
i
i
As we induce a per-user parameter, and thus derive a
0.8
0.8
ntentropy.properties and platforms, these found that for all tested measures, all
user In modelling decay overall. We
Exponential Decay Model
erise
model using an s
integer value
4. each user using the users had an average proportional change value of greater s = {1, 2, . . . , 20
g(ui , s) the ,suitability≡ a , and growth
(u
than 0, thus suggesting= f t i ,]s1 )es decaying then defined(7) exponenti
ng the rate of change of
[t0 0.05 of 1
the
model. Community Platforms decay Application to Churn Prediction
riods. WeMining User Lifecycles from Online The exponential and theirmodel requires one parameter
induced user2 3
model decay rate of letting f (s,
to be provided λ that defines the as follows, a given value ui ) be a function tha
latform’s users and then
15. Lifecycle Trajectories: Community CrossEntropy
14
n Divergence
linear regression
●
●
●
●
●
●●●●
0
●
●
●●●
●●●●●●●
●
● ● ●●●●●●●
●●●●●● ● ●
0.2
0.5
0.8
Lifecycle Stages
●●
●●
1
●
●
●
●
●●●●
●
6.0
0.2
0.5
0.8
Lifecycle Stages
●●●
●●●●●●●
●
●
Cross Entropy
7.0
8.0
Cross Entropy
3.0
4.0
5.0
0
6.0
0
1
2.0
●
● ●
● ● ●●●●●●●
● ●●●● ● ●
●●
2.0
0
Cross Entropy
1
2
3
4
●
0.2
0.5
0.8
Lifecycle Stages
●●
0
●
●
●
●●
●
●●
●
●
●
(b
lex
1
n Facebook, SAP: quadratic regression
Facebook
en
SAP
Figure 3. Cross-entropies deri
n Server Fault: linearIn-degree
(a)
(b) Out-degree
(c) Lexical inc
Server Fault regression
degree and lexical term distribut
he
time periods. We see a increase
>73% of users have R2 > 0.4
Figure 3. Cross-entropies derived from comparing users’ in-degree, use
out
0
(a) In-degree
2.0
Facebook
SAP
Server Fault
● ●
● ● ●●●●●●●
● ●●●● ● ●
¤ Lexical:
¨
●
Cross Entropy
3.0
4.0
5.0
¤ Out-degree:
Cross Entropy
1
2
3
4
n Convergence-divergence
Facebook
SAP
Server Fault
Cross Entropy
7.0
8.0
quadratic regression
●
0
¤ In-degree:
Cross Entropy
1
2
3
4
Identified differences between platforms and
properties’ trajectory models
Cross Entropy
3.0
4.0
5.0
¨
1
0
●●
●
●●
0.2
0.5
0.8
Lifecycle Stages
●
●
●
●
●
●●
●
●●
●
●
0.2 degree and lexical term0.2
0.5
0.8 1
0 distributions the community platform over the sam
0.5
0.8 1
0 0.2
0.5
0.8 1
Lifecycle periods. We see a Lifecycle Stages Prediction
Lifecycle Stages
Mining User Lifecycles from Online CommunityStages
time Platforms and their Application to Churn
increased divergence towards the end of lifecycles
0
(a) In-degree
(b) Out-degree
B.
informs how online com
(c) Lexical
16. Mining lifecycle trajectories enables users to be
categorised by their behaviour…
Facilitating Churn Prediction
15
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
17. entr
F EATURES USED FOR THElabel of PREDICTION EXPERIMENTS .B. Experimental4 Setup
CHURN the user from one of two T HE
values: y 2 {0, 1},
the closed inter
In this section we INDICATORS OF LIFECYCLExiTRAJECTORIESbinary R-valued feature vector for mod
define churn prediction as an 11-element TO
while
denotes a ARE USED
formance eac
CHARACTERISE USER EVOLUTION ALONG or DIFFERENT USER 10-element feature of and
either examined indicaclassification task and use the previously a FacebookTHE SAP user, and a For our experiments
PROPERTIES .
sures: (i) fina
vector for a Server Fault user - given that we use a linear precis
by
tors of lifecycle trajectories to predict whether a for each user’s lexical combining the test
user is a
mod
regression model
community cross- charac
operator
Property
Model Feature(s)
Platform
setting feat
ea
churner or not. As we confine Indicator
user lifecycle periodsWe model thetogether and ranked the u
entropy development. from
feature vector of each
we
In-degree
Period Entropy
Linear Regression
All
sele
user using thetrajectory indicators Alland a standard deviatio
the trajectories
from the previous section,
Period Cross-Ent
the start of their lifecycle to the end we useExponential Decay 2
16
the induced perf
mo
in short Quad’ Regress’ a1 ,our set ofAllagain where we place
Table II defines a
features into the respectiv
Comm’ Cross-Ent
mined from this period to characterise howLinearset depending on the dynamics it captures.ranks, set
users develop. All
top-k
A
Out-degree
Period Entropy within a Regression
each
¨
be
Period Cross-Ent
Exponential Decay
the mean asthe
We define churners as any user who posts for the last time Allthe same instancesof dict
Comm’ Cross-Ent
Linear Regression Table II All
dom
observingto .which the u
different user
F
Period window of our datasets,PREDICTION EXPERIMENTS T HE
Linear FOR THE CHURN
All
before the ¨
final 10%Lexicalthe time EntropyEATURES USEDRegression
of
the
Period Cross-Ent INDICATORS OF LIFECYCLE TRAJECTORIES AREchurn prediction
Exponential Decay
Allics on USED TO
correct.
use
cutoff points are: 2012-07-09 Comm’Facebook,Quad’ Regress’ EVOLUTION Fb, SAPTHE DIFFERENT USER We form
for Cross-Ent
2010-05-11 2
CHARACTERISE USER a1 , afor ALONG
¨
a randomly sele
sure
Comm’ Cross-Ent
Linear RegressionPROPERTIES .
SAP, and 2010-12-23 for ServerFault. Our dataset is of the SFerty in isolation, for in
oper
to the probabil
Property
Indicator
Modeland the entropy, period
Feature(s)
Platform
following form: D = {(xi , yi )}, where yi denotes the class Linear Regression
In-degree
Period Entropy
All
(setting p =we
|ch
Period Cross-Ent 4 Exponential Decay
All
entropy trajectory indi
the
label of the user from one of two values: y Comm’ {0, 1}, Quad’ Regress’ a1 , a2 the receiver op
2 Cross-Ent
All
A. Prediction Model Definition
Out-degree
Period Entropy
Linear Regression
All
model in confidence of a
isolation, topfor
while xi denotes an 11-element R-valued feature vector for Exponential Decay
Period Cross-Ent
All
the
Facebook, SAP: 11 features
Comm’
All
and examining in-degre
observed
userfeature ) contains
ui
to w
settings of confi
either a Facebook The SAP user,featurea vector of Period Cross-Ent (xiLinear Regression
or features and 10-element Entropy Linear Regression
Lexical
All
Server Fault: 10
Period along
corr
the indicator trajectories of we use a linear Exponential Decay
finally thereby setting
combined a
vector for a Server Fault user - given that the user Cross-Ent the different, a2 we All
Comm’ Cross-Ent
Quad’ Regress’ a1
Fb, SAP
properties. We use the logistic regression modelLinear predict In SF
to Regression
Comm’ Cross-Ent
model. follows: soa ran
doing to w
regression model for each user’s lexical community crosst
the conditional probability of user ui churning as follows:
features maximum
p
(sett
¨
Induce
entropy development. We model the feature vector of each coefficients via on prediction =
f (x)
the
1 Definition
likelihood estimation
selection for specific
A. Prediction Model
user using the trajectory indicators from |the)previous section,
P r(Y = 1 xi =
(9)
|x
confi
i
Probability of user churning
1+e
model dif
The where we place
user ui (xi ) For each setti
in short Table II defines our set of featuresobserved feature vector of performingcontains for F
the indicator trajectoriesweight user As the used
along
different
Mining User Lifecycles TheOnline Community Platforms and their)Application to Churn Predictionattached we positivethe therP
from model’s coefficients (
define the of the
log
(T
each within a set depending on the dynamics We captures. regression model to predictrate follo
it use the logistic
properties.
to each identity trajectory feature within the linear model
Predicting Churners
Binary classification task: is user u a churner?
Dataset churners: who last posted before final 10%
Dataset attributes from trajectory model features
Induced Logistic regression model:
and from are
diction model we these
18. Evaluation: Setup
17
User-wise dataset split: 80% training, 20% testing
¨ Experiments:
¨
¤ Isolated
user properties, isolated development indicator
features, all features together
¨
Evaluation measures:
1.
2.
¨
Precision@k (P): Avg over k={1,5,10,20,50,100}
Area Under the Receiver Operator Curve (AUC)
Baseline: Success probability in single Bernoulli trial
¤ I.e.
randomly selecting a churner
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
19. Table III
¯ ) AND A REA U NDER THE RECEIVER OPERATOR
P RECISION @ K (P
CHARACTERISTIC C URVE (AU C) VALUES FOR FACEBOOK , SAP AND
S ERVER FAULT WHEN TESTING DIFFERENT: ( I ) USER PROPERTIES , ( II )
DEVELOPMENT INDICATORS , ( III ) ALL FEATURES TOGETHER .
Evaluation: Results
AU C) is preferable (thus achieving a value
baseline for this measure is 0.5.
18
nts the performance of the different models
¨ Variance in features
atforms, showing variation in the optimum
ation measures. Interestingly, we find that
depending on:
ures combined together does not yield the
¤ Accuracy preference
y of the tested platforms. For Facebook the
hat the prediction model using community
n I.e. precision ¯ recall
>
icators performed best in terms of both P
sted the difference between this model and
¤ Platform
ming model (Full) using a Mann-Whitney
n Different detection
he difference to be significant (at the 5%
signals for different
found differences in the communities
best performing
Platform
Facebook
SAP
Server Fault
Feature
Entropy
Period Cross Entropy
Community Cross Entropy
In-degree
Out-degree
Lexical
Full
Baseline
Entropy
Period Cross Entropy
Community Cross Entropy
In-degree
Out-degree
Lexical
Full
Baseline
Entropy
Period Cross Entropy
Community Cross Entropy
In-degree
Out-degree
Lexical
Full
Baseline
to the evaluation measure used: in-degree
¯
exical features ¨ FullThese differences
for P . model is never
entrating on top ranks and thus informing
the best
ners with high-levels of confidence can be
assessing the term distributions of users
dynamics, while for preferring recall the
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
distributions is preferable.
o Server Fault, the results also indicate
¯
P
0.761
0.624
0.791
0.648
0.781
0.681
0.730
0.629
0.434
0.321
0.334
0.351
0.250
0.438
0.363
0.342
0.392
0.300
0.352
0.232
0.293
0.459
0.421
0.319
AU C
0.500
0.485
0.617
0.511
0.570
0.557
0.573
0.500
0.549
0.568
0.549
0.592
0.503
0.539
0.539
0.500
0.526
0.555
0.538
0.475
0.512
0.546
0.554
0.500
20. s are salient
gh precision
their in-degree distributions, and the extent to which they
are contacted during one time period relative to their past
communications, reduces at a much faster rate than on
ServerFault.
Evaluation: Churner Patterns
Table IV
19
B EST PERFORMING PREDICTION MODEL COEFFICIENTS FOR FACEBOOK
icting churn( COMMUNITY CROSS - ENTROPY ), SAP (I N - DEGREE ) AND S ERVER
nspecting the
Reduced quadratic coefficients: churnersLL FEATURES ARE SIGNIFICANT
FAULT ( PERIOD CROSS - ENTROPY ). A exhibit steep
. One of the
WITHIN THEIR RESPECTIVE MODELS (↵ < 0.05)
cross-community curves towards the end of their lifecycles
as our churn
Feature
Facebook
SAP
Server Fault
s that can be
In-degree Entropy
0.0532
dual features
In-degree Period Cross-Ent
0.0139
-0.1826
1
In-degree Comm’ Cross-Ent a
-0.1057
-0.1878
y inspecting
2
In-degree Comm’ Cross-Ent a
-0.0510
-1.5104
odel we can
Out-degree Comm’ Cross-Ent
0.3173
Out-degree Period Cross-Ent
0.0210
ase/decrease)
Lexical Period Cross-Ent
Lexical Comm’ Cross-Ent a1
Lexical Comm’ Cross-Ent a2
0.3253
-0.0541
-
0.0557
-
nts from the
g the AU C,
Variance in decay coefficient: degree of communication decays
and SAP we
VII. D ISCUSSION a lot faster forW ORK
AND F UTURE SAP than Server Fault
n model for
distributions
Prior work on social network evolution by Panzarasa et al.
Mining
[6] from Miritello et al. [1] their Application to Churn social
has a vertexUser Lifecyclesand Online Community Platforms andfound that users’Prediction networks
sed and that
tend to a limit in terms of their communication capacity.
21. Conclusions
20
1.
Users communicate with a fixed-set of users
¤ Similar
2.
to findings from (Miritello et al. 2013)
Convergence-divergence effect: users converge on
community ‘norms’ before diverging
¤ (Erikson.
1959) theorised that younger people are
susceptible to social norms
¤ (Danescu-Niculescu-Mizil et al. 2013) found users to
converge on lexical norms, before diverging
3.
Variance in churner signals
¤ No
common best model was found across platforms
Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction
22. Current & Future Work
21
1.
Regularised Linear Models
¤ Achieved ~30% AUC boost with growth and magnitude
that users tend to converge in their reviewing behaviour and
features
u,s,c
that previous profiles allow one to gauge how the user will
Dtrain )
(4)
rate items in the future given their category information.
u,s,c0
ng(Dtrain )
2.
Conversely, for MovieLens and Movie Tweetings we see an
opposite e↵ect: users’ taste profiles become less predictable
¤ Used lifecycle model (n=5) to form category-ratings profiles
as they develop; users rate items in a way that renders unassess the relative
certainty variance from previous information.
n user and lifecycle ¤ Identified in profiling in taste evolution across platforms
mapping function
categories they are
Dissimilarity
categories ( g ) we
in taste profile
o di↵erent categorfrom previous
the former profile
profile
gories, would lead
ficity that the cat1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Lifecycle Stages
Lifecycle Stages
Lifecycle Stages
type, formed from
ries would lead to
(a) Lens
(b) Tweetings
(c) Amazon
uenced byMiningprior
the User Lifecycles from Online Community Platforms and their Application to Churn Prediction
thors consider only
0.220
0.290
0.275
●
●
0.215
●
●
0.210
●
●
Conditional Entropy
0.285
●
0.205
●
0.280
●
Conditional Entropy
0.235
0.245
●
●
0.225
Conditional Entropy
Evolving-Taste Recommender System
●