1. Clickstream Analytics
Overview and practical applications
with Markov Chains
Data Science and Engineering Club
Dublin, May 2018
Alexandros Papageorgiou
4. Digital transformation
● Traditional companies undergoing digital transformation
● Increasing number of IRL startups now purely digital
● Clickstream becoming an ideal way to listen to the voices of customers
10. Accessing the Clickstream via Google Analytics
1. Implement Customer ID dimension
2. Implement timestamp dimension
Then for every pageview we can see the customer ID and the time stamp
How to guide: https://www.simoahava.com/analytics/improve-data-collection-with-
four-custom-dimensions/
12. Multiple models for clickstream analysis
● Network Analysis to visualise flow of
web traffic
● Clustering of customers
● Clustering of sessions
● Markov Chains for future click
prediction
● Frequent path analysis
● Hidden Markov Models to identify
user’s stage in the buying cycle.
● Association Rules to identify
bottlenecks to conversion
● Bot analysis for SEO optimisation
14. Markov Chains
● It’s a 100+ year old theory.
● Studies the evolution of dynamic systems
● Used widely in science from physics to finance, information science
● Hidden Markov Models, Markov Chain Monte Carlo, higer order
Markov Chains
15. Markov Chains vocabulary
Media Exposure through the Funnel: A Model of Multi-Stage Attribution
repository.cmu.edu/cgi/viewcontent.cgi?article=1399&context=heinzworks
16. The clickstream R package.
Package Author: Michael Scholz
- Cluster your clickstream
- Model the clickstream clusters as a markov chain
- Visualise and calculate transition probabilities
- Predict next click given a submited click sequence.
- Convert the clickstream to an object that is ready for association rules
17. Useful References
Markov Chains intro – when to use them, how they work
https://towardsdatascience.com/introduction-to-markov-chains-50da3645a50d
Clickstream package article on the Journal of Statistical Software
www.jstatsoft.org/article/view/v074i04
Supercharging websites with a real-time R API
http://code.markedmondson.me/predictClickOpenCPU/supercharge
Notebook on Github
https://github.com/papageorgiou/clickstream-talk/blob/master/data-sci-eng-meetup.md
There is a lot of talk about digital transformation..lots of companies especially new are completely digital OR more traditional ones are moving to that direction fast. Clickstream is becoming a key data structure/resource that its critical to underand it and work with it in order not to give potential value on the table and use it for competitive advantage to better understand customer journeys.
Will talk about cls from the perspective of a startup company, that’s in line with my experience and in line with how the vast majoriy of businesses can benefit.
If you work for a company with data engineers and data science teams, this is something that you might take for granted.
Of course we record everything, we structure the web log files we put data in data bases and then analysts can access them and we build real time streaming applications on top of that data...but this is probably 1 % of companies. But even if you work there, if you are in Marketing or customer department, there is a lot you can do, without necessarily asking for dedicated engineering resources.
Out of context warm up from a recent blogpost. What you see here is the result of some clickstream combined network analysis. Use network analysis to visualise association between wikipedia pages in a particular thematic area in this case Data science and the traffic that goes back and forth between them. Just one of the application of clickstream combined with network analysis…we ll see a few more. We ll go there step by step.