Stig Workshop                             Architect of Scalable Infrastructure - Jason Lucas                              ...
A graph of user session analytics
Session Analytics Graph A sample of session flows stored in Stig <Source, ‘generated’, Clickthrough> <Clickthrough, ‘click...
Inferred EdgesWe can define rules for edges that are not part of the graph, but that areinferred knowledge from the edges ...
Queries of Interest•  How often does a person visit the same page?•  How often is such a visit an inconvenient clickthroug...
User Experience Optimization•  Find common usage patterns in a specific user’s session flow•  Look for inconveniences within...
Simple Stig FunctionsHow many times did Page A lead to Page B?   numClicksFromTo source dest =         count (solve : [<so...
Let’s take a look at some sample user            session flows…
Alice loves to play Pets!                               “Alice comes to our site to play Pets.                            ...
Bob interacts a lot with his newsfeed                         “Bob likes to go straight to newsfeed and                   ...
Carol receives an e-mail about a specific newsfeed post                                        “Carol received an e-mail   ...
Dave checks his Cafe thanks to an e-mail notification                       “Dave received an e-mail notification telling  ...
“What about Alice?”•  Alice is identified as a candidate for our user experience optimization   study.•  Let’s look at Alic...
Modifying the User Flow In Real TimeWe update the database to set Alice’s status as participating in the userflow experime...
Analyzing the New Flow•  Alice gets a friendly pop-up asking how she likes the new flow, and   whether to keep it.    –  Th...
We Snuck Something in There:"Improving Update Concurrency•  x +=1 is better than x = x + 1•  We can take in many thumb upd...
Calling Stig•  Our client API is available from:    –    Python    –    PHP    –    Perl    –    Java    –    C / C++    –...
Sessions•  Sessions are durable.               •  Sessions are replicated.  •    You can close a session and        •    I...
Writing in Stig•  Stig is a compiled language    –  What we can do through analysis most databases have to do through data...
Logical Pattern Matching•  Writing a pattern:    –  walk:   person, friend, hobby, site [           <   person, ‘is friend...
Composing Sequences for Distributed Evaluation•    chain() // concatenate sequences                •    sort() // convert ...
Pure Graph vs. Fat GraphPure Graph                     Fat Graph•  Stores exactly one kind of   •  Stores multiple kinds o...
Evolving into Stig•  Treat Stig like a filesystem             •  Over time…   •    Store unstructured information         •...
Moving Forward•  Open Source                     •  Find us at stigdb.org   –  We plan to offer Stig under      –  Signs u...
Upcoming SlideShare
Loading in …5
×

Wed 1315 lucas_jason_color

430 views

Published on

  • Be the first to comment

Wed 1315 lucas_jason_color

  1. 1. Stig Workshop Architect of Scalable Infrastructure - Jason Lucas
  2. 2. A graph of user session analytics
  3. 3. Session Analytics Graph A sample of session flows stored in Stig <Source, ‘generated’, Clickthrough> <Clickthrough, ‘clicked by’, User> <Clickthrough, ‘served’, Page>
  4. 4. Inferred EdgesWe can define rules for edges that are not part of the graph, but that areinferred knowledge from the edges that are in the graph.For example, we can infer that a person requested a page:<person, ‘requested’, page> = walk clickthrough: [ <clickthrough, ‘clicked_by’, person>; <clickthrough, ‘served’, page> ];Or that a page was visited from another page:<page1, ‘clicked_to’, page2> = walk clickthrough: [ <page1, ‘generated’, clickthrough>; <clickthrough, ‘served’, page2> ];
  5. 5. Queries of Interest•  How often does a person visit the same page?•  How often is such a visit an inconvenient clickthrough or redirect?•  Do people visit a given page more often at a certain time of day?•  Does their ip address (home or work PC) influence their usage patterns?•  Do friends’ usage patterns on the site influence this specific user? (A friend started playing a new game)•  General usage statistics.•  Have links on our site been moved/removed in such a way that users can’t find the features they want to use?•  The biggest question is… How can we improve our users’ experience?
  6. 6. User Experience Optimization•  Find common usage patterns in a specific user’s session flow•  Look for inconveniences within this flow and experiment with removing them; acquire user feedback•  For example, a user that comes to our site to play a specific game and has to click 3-4 times to get to this game’s main page•  How can we identify users whom we want to put in an experimental flow?•  A simple identifier: 90% of their flows lead to a particular feature.
  7. 7. Simple Stig FunctionsHow many times did Page A lead to Page B? numClicksFromTo source dest = count (solve : [<source, ‘clicked_to’, dest>]);How many times was Page A served? numServes dest = count (solve (clickthrough) : [<clickthrough, ‘served’, dest>];How many times did a particular user access Page A? numUserRequestsPage person page = count (solve : [<person, ‘requested’, page>]);How many times did this particular user go from Page A to Page B? numUserRequestsPageFrom person source page = count (solve : [ <source, ‘clicked_to’, page>; <person, ‘requested’, page> ]);
  8. 8. Let’s take a look at some sample user session flows…
  9. 9. Alice loves to play Pets! “Alice comes to our site to play Pets. She always hits the home page and has to click through to her favorite game. Plenty of room for improvement here”
  10. 10. Bob interacts a lot with his newsfeed “Bob likes to go straight to newsfeed and expand people’s comments or like them. Can we better his experience by expanding his favorite comments or displaying newsfeed on his home page?”
  11. 11. Carol receives an e-mail about a specific newsfeed post “Carol received an e-mail notification that her friend mentioned her in a newsfeed comment. She had to press expand on the comments to be able to see it. We should have known better!”
  12. 12. Dave checks his Cafe thanks to an e-mail notification “Dave received an e-mail notification telling him his cookers were ready in Cafe. When he clicked on the link in the e-mail it took him to the home page! Now he has to click on his alert, get redirected to the Cafe game, and then start playing.”
  13. 13. “What about Alice?”•  Alice is identified as a candidate for our user experience optimization study.•  Let’s look at Alice’s patterns…•  Alice goes from the home page straight to Games 90% of the time, so we want to just give her the Games page when she visits our site.•  So we’ll be adding her to our experimental user base.•  Also, we want to let her give us a “Thumbs Up!” or “Thumbs Down!” on the new flow.•  How can we make the database do this?
  14. 14. Modifying the User Flow In Real TimeWe update the database to set Alice’s status as participating in the userflow experiment:when (( numUserRequestsPageFrom ‘Alice’ ‘Home Page’ ‘Games Page’ / numUserRequestsPage ‘Alice’ ‘Home Page’) > 0.9) do { FlowExperiment@’Alice’ := { running = True; startDate = Now; HasVoted = False; }; <‘Alice’, ‘participates_in_experiment’, ‘Games Page’> := True; };We also add an edge to the graph showing that Alice is part of theGames Page experiment – we can find all the users that are part of theexperiment by getting all the edges from that node.
  15. 15. Analyzing the New Flow•  Alice gets a friendly pop-up asking how she likes the new flow, and whether to keep it. –  Thumbs Up! -> Keep the new structure! I love it! –  Thumbs Down! -> Give me back my old flow!•  We can update the database with a simple Stig function:thumbsUpdate user = do { FlowExperiment@’user’ := { HasVoted = True; }; ThumbsUp@’Flow Experiment’ += 1; };thumbsDowndate user = do { FlowExperiment@’user’ := { HasVoted = True; running = False; }; ThumbsDown@’Flow Experiment’ += 1; };
  16. 16. We Snuck Something in There:"Improving Update Concurrency•  x +=1 is better than x = x + 1•  We can take in many thumb updates, and only need to evaluate the ThumbsUp or ThumbsDown total when we need it•  Common terminology: –  Database theorists would call this a Field Call –  Escrowing –  Write without read –  Commutative operations Don’t ask for things you don’t need! “If you don’t care about the result, don’t make Stig compute it”
  17. 17. Calling Stig•  Our client API is available from: –  Python –  PHP –  Perl –  Java –  C / C++ –  and we can serve HTTP directly•  Our focus is the web: –  Almost all calls are asynchronous and return futures –  Sessions are durable and progress while you’re not connected –  Most interface objects have a Time To Live
  18. 18. Sessions•  Sessions are durable. •  Sessions are replicated. •  You can close a session and •  If a session server goes down, re-open it later. one of its backups will take •  This is to facilitate HTTP. over. •  Access is controlled by a •  This might require your client to security token. restart its cursors. •  Sessions eventually die of old •  Progress happens when age if left alone. you’re not looking.•  Sessions have synthetic •  Queries and updates continue nodes in the graph. to make progress, even when •  Use the sessions node to store a session is closed. session-specific data. •  Notifications accumulate in your in-box and are waiting for you when you re-open.
  19. 19. Writing in Stig•  Stig is a compiled language –  What we can do through analysis most databases have to do through data- definition languages and DBA tweaking –  Feedback from the compiler is not just about program correctness but expected performance –  Application programmers become aware of scaling problems before they happen but are not required to be scalability engineers in order to fix them•  Stig is a programming language not a query language –  Stig harnesses the computation power of the cluster, not just its storage capacity –  The more of your program is written in Stig, the more you can take advantage of distributed evaluation –  Stig programs are stored in the database and can call each other, enabling a strategy of library development•  Stig marries logical pattern matching and functional evaluation. –  That is: search + computation = Stig.
  20. 20. Logical Pattern Matching•  Writing a pattern: –  walk: person, friend, hobby, site [ < person, ‘is friend of’, friend >, < friend, ‘has interest in’, hobby >, < hobby, ‘is advertised on’, site > ];•  Yields a sequence: –  [ { person=/users/alice, friend=/users/bob, hobby=/subjects/gardening, site=“http://www.plantastic.com” }, { person=/users/bob, friend=/users/carol, hobby=/subjects/yoga, site=“http://lowfatyoga.com” } ]
  21. 21. Composing Sequences for Distributed Evaluation•  chain() // concatenate sequences •  sort() // convert a sequence into a sorted•  collect() // collect a sequence into a list list•  reverse() // collect a sequence into a list in •  filter() // convert a sequence into a reverse order sequence with some elements filtered out•  map() // apply an arity-1 function to each •  count() // count the number of elements in element in a sequence, yielding a a sequence, yielding an integer sequence •  product() // yield as a sequence the•  reduce() // apply an arity-2 function to Cartesian product of a tuple of sequences each element in a sequence, yielding a •  range() // yield as a sequence a range of scalar integers•  zip() // convert a tuple of sequences into a •  slice() // slice a subsequence from a sequence of tuples sequence•  group() // collect a sequence into a •  cycle() // repeat a sequence over and over sequence of lists •  select() // filter elements from a sequence•  enumerate() // convert a sequence into a using a second sequence of true/false sequence of lists with ordinals •  group_by() // collect a sequence into a set of subsequences by key
  22. 22. Pure Graph vs. Fat GraphPure Graph Fat Graph•  Stores exactly one kind of •  Stores multiple kinds of data in each node. data in each node.•  Doesn’t usually support •  Allows structures and can aggregate types see into them. (structures). •  Coalesces aggregated•  Tend to have lots of ‘has data into a single attribute’ edges making component, suitable for their edges fuzzy and re-constitution as a inefficient. program object.
  23. 23. Evolving into Stig•  Treat Stig like a filesystem •  Over time… •  Store unstructured information •  Begin with edges as ad-hoc at locations indices.•  Treat Stig like a kv store •  Add more complex edge- driven behavior as you grow •  Nodes are basically like key- more comfortable. value stores, except with type. •  Evolve your schema freely by •  Ignore edges and just access adding facets. nodes by their ids.•  Treat Stig like SQL •  Even if you ignore •  Simple walks are like table everything, you still get: scans. •  ACID-like guarantees at scale. •  Keep your data in separate, •  Single path to data. scalar fields and the results will •  Stand-alone development. look tabular.
  24. 24. Moving Forward•  Open Source •  Find us at stigdb.org –  We plan to offer Stig under –  Signs up there for updates a extremely liberal license –  Open source to drop Q4 (Apache) 2011 –  We are seeking •  Workshops engagement with the open source community and with –  At our office in SF other companies in this –  See website for schedule space •  Ideas? –  Introducing Yaakov, Stig’s –  If you have the killer Stig voice in open source app, we want to hear from•  At our Booth you. –  Copies of our decks

×