Overview and practical applications
with Markov Chains
Data Science and Engineering Club
Dublin, May 2018
● Clickstream introduction
● Markov Chains overview
● 3 Practical applications
My journey so far
● Traditional companies undergoing digital transformation
● Increasing number of IRL startups now purely digital
● Clickstream becoming an ideal way to listen to the voices of customers
Warm-up: Wikipedia Clickstream and Network analysis
● Perform advanced types of analysis
● Go beyond standard segmentation analysis
● Get closer to the individual voices of customers
What’s the clickstream exactly ?
Accessing the Clickstream via Google Analytics
1. Implement Customer ID dimension
2. Implement timestamp dimension
Then for every pageview we can see the customer ID and the time stamp
How to guide: https://www.simoahava.com/analytics/improve-data-collection-with-
A tidy clickstream example
Multiple models for clickstream analysis
● Network Analysis to visualise flow of
● Clustering of customers
● Clustering of sessions
● Markov Chains for future click
● Frequent path analysis
● Hidden Markov Models to identify
user’s stage in the buying cycle.
● Association Rules to identify
bottlenecks to conversion
● Bot analysis for SEO optimisation
3 useful applications
● Frequent Path analysis
● Future Click predicition w/ Markov Chains
● Transition Probablities w/ Markov Chains
● It’s a 100+ year old theory.
● Studies the evolution of dynamic systems
● Used widely in science from physics to finance, information science
● Hidden Markov Models, Markov Chain Monte Carlo, higer order
Markov Chains vocabulary
Media Exposure through the Funnel: A Model of Multi-Stage Attribution
The clickstream R package.
Package Author: Michael Scholz
- Cluster your clickstream
- Model the clickstream clusters as a markov chain
- Visualise and calculate transition probabilities
- Predict next click given a submited click sequence.
- Convert the clickstream to an object that is ready for association rules
Markov Chains intro – when to use them, how they work
Clickstream package article on the Journal of Statistical Software
Supercharging websites with a real-time R API
Notebook on Github
There is a lot of talk about digital transformation..lots of companies especially new are completely digital OR more traditional ones are moving to that direction fast. Clickstream is becoming a key data structure/resource that its critical to underand it and work with it in order not to give potential value on the table and use it for competitive advantage to better understand customer journeys.
Will talk about cls from the perspective of a startup company, that’s in line with my experience and in line with how the vast majoriy of businesses can benefit.
If you work for a company with data engineers and data science teams, this is something that you might take for granted.
Of course we record everything, we structure the web log files we put data in data bases and then analysts can access them and we build real time streaming applications on top of that data...but this is probably 1 % of companies. But even if you work there, if you are in Marketing or customer department, there is a lot you can do, without necessarily asking for dedicated engineering resources.
Out of context warm up from a recent blogpost. What you see here is the result of some clickstream combined network analysis. Use network analysis to visualise association between wikipedia pages in a particular thematic area in this case Data science and the traffic that goes back and forth between them. Just one of the application of clickstream combined with network analysis…we ll see a few more. We ll go there step by step.