Markov Chains for the Web - SEO, Usability, Search Engine Scoring, and More

1,878 views

Published on

Markov chains can take predictive theory to a new level, with large-scale applications for digital marketing. From social media network modeling to user pathing, site scoring and recommended pages, Markov chains can quantify, rank, and return likely outcomes on the web. In other words, they can demystify demographics. Here's how.

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,878
On SlideShare
0
From Embeds
0
Number of Embeds
32
Actions
Shares
0
Downloads
31
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Stochastic definition: Stochastic processes are random processes that describe the evolution of a random value over time. As opposed to deterministic processes, which are just ordinary differential equations
  • In deterministic modeling (for the web), the user keyphrase is the focal point, and all subsequent stages are based on the focal pointIn stochastic modeling, the Markov theorem has all stages but the focal point and preceding stage irrelevant to the current stage. You can also define the preceding stage as the focal point This means that (not provided) is irrelevant when the focal point changes from the keyword to a SERP, landing page, or behavior (see relational Markov models/user behavior)Other users’ paths: See multichannel attribution
  • The Markov chain formula is generative, so modeling is easily automated.Monitoring and prediction is defined by the Bayesian theorem. E.g., The probability of the hypothesis given evidence from the initial source is dependent on the probability of the hypothesis given evidence from a different source
  • For example: the probability of a user picking a landing page and then picking an object on that landing page as opposed to the probability of picking both a different object, a different landing page, and a different path entirely can be calculated.Modeled spatially, not temporallyCan be combined with probabilities as well
  • Possible only via a spatial model because the nature of the co-domain means that you’d be modeling backwards
  • The keyphrase cluster is post-Hummingbird
  • Markov Chains for the Web - SEO, Usability, Search Engine Scoring, and More

    1. 1. Using Markov Chains to Predict User Behavior Rivka Fogel
    2. 2. Rivka Fogel Markov Chains: Probability without History Andrey Markov COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 2
    3. 3. Rivka Fogel What Are Probability Spaces? Function/Possibility 1 Focal Object / Function Co-Domain Function/Possibility 2 • Also known as stochastic processes COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 3
    4. 4. Rivka Fogel Type 1: Time Series Function/Possibility 1 First Event Function/Possibility 2 Also called “states” Time COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 4
    5. 5. Rivka Fogel Application: Personalization Identifying user-specific authorities B C User A E D • To return more accurate SERPs (E) for that user COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 5
    6. 6. Rivka Fogel Type 2: Spatial Field Shared Event • Variable interactions are often statistically correlated COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 6
    7. 7. Rivka Fogel Addition of The Markov Property The Next State Depends Only on the Current State: A B C E because of B or D, not because of A D • The probability of B causing E, as opposed to D causing E, is calculated by the Bayesian Theorem COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 7
    8. 8. Rivka Fogel Application: (not provided) Model Landing Page Keyphrase? Homepage Inventory Gallery Page Video View Bounce Homepage Video View • The Markov Property enables the marketer to model paths without • knowing every state. While some keyphrase data is known, it can also identify the keyphrase based on other users’ paths where the keyphrase is known. COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 8
    9. 9. Rivka Fogel Application: Multichannel Attribution Monitoring and prediction can be based on probability of a user’s path given other users’ paths Known Path 1 A 1 B C Probability of B Known Path 2 2 B 4 Probability of C C D 5 • Identify A (or predict D) via multiple probability states within a Markovian chain. COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 9
    10. 10. Rivka Fogel Application: Audience Segmentation B 1 Probability of B A 2 Known Path 1 B Known Path 2 Referral Paths COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. C Landing Page 4 Probability of C C D 5 On-Site Paths JANUARY 23, 2014 | PAGE 10
    11. 11. Rivka Fogel Relational Markov Properties Relational Markov Models allow states to be of different types. State A Type 1 State C State B Type 2 E because of B or D’s type, not because of A or C’s type State D • Relational Markov Models group multiple types of objects – relations – and calculate the probability of the relation’s appearance in a state. • They work off of Dynamic Bayesian Networks COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 11
    12. 12. Rivka Fogel Application: Audience Segmentation 2 Paid Known 1 C B Organic COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. 2 JANUARY 23, 2014 | PAGE 12
    13. 13. Rivka Fogel Application: User Experience Model Landing Page Homepage Bounce Inventory Gallery Page Video View Homepage Video View Types: Page Visit Video View COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. Bounce JANUARY 23, 2014 | PAGE 13
    14. 14. Rivka Fogel Application: Social Network Modeling Rich Media Brand Social Profile News Feed Play Site Landing Page Rich Media Host Page User Share Influencer • This function will answer: if the user ended up converting/visiting the landing page, which [type(s)] of social interaction[s] came into play? COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 14
    15. 15. Rivka Fogel Application: HTTP Service Request Prediction A Keyphrase 1 Keyphrase Cluster Keyphrase 2 Probability of 3 1 3 Known Paths 2 • Prefetch Page A given the probability that the user will want to see it. • The keyphrase cluster is predicted by the function with co-domain B and is then used to predict the incidence of B where the first state isn’t known. COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 15
    16. 16. Rivka Fogel Application: Agent Suggestion Keyphrase Cluster or Authority URL A URL B URL C URL D URL E Search A First words of Query Search B Search C • Auto-suggests searches (Search C) and links (URL E) that the user is likely to want to access, based on user history and other users’ history COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 16
    17. 17. Rivka Fogel Application: Search Engine Scoring Identifying Authority 2: Keyphrase Cluster Authority 1 Page C Page A Page B Authority 2 Link 1 Link 2 • The function identifies hubs of authority that are probable next steps in many systems (each with individual focus objects). COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 17
    18. 18. Appendix: Formal Definitions COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 18
    19. 19. Rivka Fogel Where, Probability Spaces: • The measurable space (S, Σ) and an object on the measurable space X • The probability space is defined by the function P, the assignment of probabilities to events, and where Ω is the set of possible outcomes, and F is set of events in which each event has 0 or more outcomes P(x) = Σ(t1-tk)P(t1) for all X on Ω • The finite dimensional distribution X: Xt1 Ω -> Xk • That arrow, or the push forward measures, or the random distribution of events, or the matrix of transition probabilities P P (.)=PT1(.)/x = Sk – Where the Bayesian theorem allows for: P (H|E old) = P(H)*P(H|E new)/P(E entire set) T1 COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 19
    20. 20. Rivka Fogel Then, Markov Property: • P(Xl+1=S | Xl=St | Xl-1 = St-1 … X0 = S0) = P(Xl+1=S | Xl = Sl) | Xl=I – The random distribution of events is defined because the system is finite. • So, in the matrix of transition probabilities [defined as Pl, l+1 over ij = P(Xl+1 = j | Xl=i)], Pl is independent of l. • That is, s^(t) = s^(t-1)A – s is the state space, A is the matrix of transition probabilities, and ^ is the initial probability distribution of the states in s. s(t) is the probability vector for states at time “t.” COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 20
    21. 21. Rivka Fogel Markov Restatement 1: When a User’s History is Available • A(s, s’)=C(s,s’)/Σs’’ C(s,s’’) and ^(s)=C(s)/Σs’ C(s’) – C(s,s’) counts the instances where s’ follows s – This can be applied to HTTP prediction and agent suggestion COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 21
    22. 22. Rivka Fogel Markov Restatement 2: When the Evidence Comes from a User Pool • The Markov function becomes a generative chain link system that can store counts and probabilities • s^(t) = a0i^(t-1)A+a1i^(t-2)A2+a2i^(t-3)A3… and = Max(a0i^(t-1)A+a1i^(t-2)A2+a2i^(t-3)A3…) – s(t) is normalized to select a list of probable states. – Where probabilities are used: This can be applied to authority hubs as well, where collected user path traversal patterns are represented in a traversal connectivity matrix. COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 22
    23. 23. Rivka Fogel Markov Restatement 3: When Groupings of States Are Estimated • These are Relational Markov Models • These groupings are also seen as abstractions. A(Q) forms a – {D, R, Q, A, π} where D ∈ D is the tree and a hierarchy of values. R is a set of relations. Each relation is defined by nodes on leaves of D. Q is the set of states. A is the transition probability matrix. Π is the initial probability, that is the initial state in the chain. States are defined as abstractions on Q. – The rank of an abstraction a=R(d1, …., dk) in the lattice is defined as 1+ Σk1 depth(dk). Depth is a node’s depth on the tree, and increases with the abstraction’s rank. The rank of Q (the most general) is 0. lattice of abstractions. • States that have nodes on common leaves will more frequently appear in abstractions together. COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 23
    24. 24. Rivka Fogel Further Reading • Anderson, Corin R., Domingos, Pedro, and Weld, Daniel S. • • • “Relational Markov Models and their Application to Adaptive Web Navigation.” Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. (2002): 143152. Electronic. http://homes.cs.washington.edu/~pedrod/papers/kdd02a.pdf Downey, Allen. “Bayesian statistics made (as) simple (as possible).” Pycon US. 7 March 2012. http://pyvideo.org/video/608/bayesianstatistics-made-as-simple-as-possible Ildiko, Flesch and Lucas, Peter. “Markov Equivalence in Bayesian Networks.” Electronic. http://www.cs.ru.nl/P.Lucas/markoveq.pdf Sarukkai, Ramesh R. “Link prediction and path analysis using Markov chains.” Computer Networks 3 (June 2000): 377-386. Electronic. http://www.sciencedirect.com/science/article/pii/S138912860000044X COPYRIGHT 2013 CATALYST. ALL RIGHTS RESERVED. JANUARY 23, 2014 | PAGE 24
    25. 25. Questions?

    ×