Wed 1315 lucas_jason_color

Stig Workshop

Architect of Scalable Infrastructure - Jason Lucas

A graph of user session analytics

Session Analytics Graph

A sample of session flows stored in Stig

<Source, ‘generated’, Clickthrough>
<Clickthrough, ‘clicked by’, User>
<Clickthrough, ‘served’, Page>

Inferred Edges
We can define rules for edges that are not part of the graph, but that are
inferred knowledge from the edges that are in the graph.

For example, we can infer that a person requested a page:

<person, ‘requested’, page> = walk clickthrough:
[ <clickthrough, ‘clicked_by’, person>;
<clickthrough, ‘served’, page> ];

Or that a page was visited from another page:

<page1, ‘clicked_to’, page2> = walk clickthrough:
[ <page1, ‘generated’, clickthrough>;
<clickthrough, ‘served’, page2> ];

Queries of Interest
•  How often does a person visit the same page?
•  How often is such a visit an inconvenient clickthrough or redirect?
•  Do people visit a given page more often at a certain time of day?
•  Does their ip address (home or work PC) influence their usage
patterns?
•  Do friends’ usage patterns on the site influence this specific user? (A
friend started playing a new game)
•  General usage statistics.
•  Have links on our site been moved/removed in such a way that users
can’t find the features they want to use?

•  The biggest question is…

How can we improve our users’ experience?

User Experience Optimization

•  Find common usage patterns in a specific user’s session flow
•  Look for inconveniences within this flow and experiment with removing
them; acquire user feedback
•  For example, a user that comes to our site to play a specific game and
has to click 3-4 times to get to this game’s main page
•  How can we identify users whom we want to put in an experimental
flow?
•  A simple identifier: 90% of their flows lead to a particular feature.

Simple Stig Functions

How many times did Page A lead to Page B?
numClicksFromTo source dest =
count (solve : [<source, ‘clicked_to’, dest>]);

How many times was Page A served?
numServes dest =
count (solve (clickthrough) : [<clickthrough, ‘served’, dest>];

How many times did a particular user access Page A?
numUserRequestsPage person page =
count (solve : [<person, ‘requested’, page>]);

How many times did this particular user go from Page A to Page B?
numUserRequestsPageFrom person source page =
count (solve : [ <source, ‘clicked_to’, page>;
<person, ‘requested’, page> ]);

Let’s take a look at some sample user
session ﬂows…

Alice loves to play Pets!

“Alice comes to our site to play Pets.
She always hits the home page and has
to click through to her favorite game.
Plenty of room for improvement here”

Bob interacts a lot with his newsfeed

“Bob likes to go straight to newsfeed and
expand people’s comments or like them.
Can we better his experience by
expanding his favorite comments or
displaying newsfeed on his home page?”

Carol receives an e-mail about a speciﬁc newsfeed post

“Carol received an e-mail
notification that her friend
mentioned her in a newsfeed
comment. She had to press
expand on the comments to
be able to see it. We should
have known better!”

Dave checks his Cafe thanks to an e-mail notiﬁcation

“Dave received an e-mail notification telling
him his cookers were ready in Cafe. When
he clicked on the link in the e-mail it took
him to the home page! Now he has to click
on his alert, get redirected to the Cafe game,
and then start playing.”

“What about Alice?”

•  Alice is identiﬁed as a candidate for our user experience optimization
study.
•  Let’s look at Alice’s patterns…
•  Alice goes from the home page straight to Games 90% of the time, so
we want to just give her the Games page when she visits our site.
•  So we’ll be adding her to our experimental user base.
•  Also, we want to let her give us a “Thumbs Up!” or “Thumbs Down!”
on the new ﬂow.

•  How can we make the database do this?

Modifying the User Flow In Real Time

We update the database to set Alice’s status as participating in the user
flow experiment:

when (( numUserRequestsPageFrom ‘Alice’ ‘Home Page’ ‘Games Page’ /
numUserRequestsPage ‘Alice’ ‘Home Page’) > 0.9)
do {
FlowExperiment@’Alice’ := { running = True;
startDate = Now;
HasVoted = False;
};
<‘Alice’, ‘participates_in_experiment’, ‘Games Page’> := True;
};

We also add an edge to the graph showing that Alice is part of the
Games Page experiment – we can find all the users that are part of the
experiment by getting all the edges from that node.

Analyzing the New Flow

•  Alice gets a friendly pop-up asking how she likes the new ﬂow, and
whether to keep it.
–  Thumbs Up! -> Keep the new structure! I love it!
–  Thumbs Down! -> Give me back my old ﬂow!

•  We can update the database with a simple Stig function:
thumbsUpdate user =
do {
FlowExperiment@’user’ := { HasVoted = True; };
ThumbsUp@’Flow Experiment’ += 1;
};

thumbsDowndate user =
do {
FlowExperiment@’user’ := { HasVoted = True;
running = False; };
ThumbsDown@’Flow Experiment’ += 1;
};

We Snuck Something in There:"
Improving Update Concurrency

•  x +=1 is better than x = x + 1
•  We can take in many thumb updates, and only need to evaluate the
ThumbsUp or ThumbsDown total when we need it
•  Common terminology:
–  Database theorists would call this a Field Call
–  Escrowing
–  Write without read
–  Commutative operations

Don’t ask for things you don’t need!
“If you don’t care about the result, don’t make Stig compute it”

Calling Stig

•  Our client API is available from:
–  Python
–  PHP
–  Perl
–  Java
–  C / C++
–  and we can serve HTTP directly
•  Our focus is the web:
–  Almost all calls are asynchronous and return futures
–  Sessions are durable and progress while you’re not connected
–  Most interface objects have a Time To Live

Sessions

•  Sessions are durable.
•  Sessions are replicated.
•  You can close a session and •  If a session server goes down,
re-open it later.
one of its backups will take
•  This is to facilitate HTTP.
over.
•  Access is controlled by a •  This might require your client to
security token.
restart its cursors.
•  Sessions eventually die of old •  Progress happens when
age if left alone.
you’re not looking.
•  Sessions have synthetic •  Queries and updates continue
nodes in the graph.
to make progress, even when
•  Use the sessions node to store a session is closed.
session-speciﬁc data.
•  Notiﬁcations accumulate in
your in-box and are waiting for
you when you re-open.

Writing in Stig

•  Stig is a compiled language
–  What we can do through analysis most databases have to do through data-
deﬁnition languages and DBA tweaking
–  Feedback from the compiler is not just about program correctness but expected
performance
–  Application programmers become aware of scaling problems before they happen
but are not required to be scalability engineers in order to ﬁx them
•  Stig is a programming language not a query language
–  Stig harnesses the computation power of the cluster, not just its storage capacity
–  The more of your program is written in Stig, the more you can take advantage of
distributed evaluation
–  Stig programs are stored in the database and can call each other, enabling a
strategy of library development
•  Stig marries logical pattern matching and functional evaluation.
–  That is: search + computation = Stig.

Logical Pattern Matching

•  Writing a pattern:
–  walk: person, friend, hobby, site [
< person, ‘is friend of’, friend >,
< friend, ‘has interest in’, hobby >,
< hobby, ‘is advertised on’, site > ];

•  Yields a sequence:
–  [ { person=/users/alice, friend=/users/bob,
hobby=/subjects/gardening,
site=“http://www.plantastic.com” },
{ person=/users/bob, friend=/users/carol,
hobby=/subjects/yoga,
site=“http://lowfatyoga.com” } ]

Composing Sequences for Distributed Evaluation
•  chain() // concatenate sequences
•  sort() // convert a sequence into a sorted
•  collect() // collect a sequence into a list
list
•  reverse() // collect a sequence into a list in •  filter() // convert a sequence into a
reverse order
sequence with some elements filtered out
•  map() // apply an arity-1 function to each •  count() // count the number of elements in
element in a sequence, yielding a a sequence, yielding an integer
sequence
•  product() // yield as a sequence the
•  reduce() // apply an arity-2 function to Cartesian product of a tuple of sequences
each element in a sequence, yielding a •  range() // yield as a sequence a range of
scalar
integers
•  zip() // convert a tuple of sequences into a •  slice() // slice a subsequence from a
sequence of tuples
sequence
•  group() // collect a sequence into a •  cycle() // repeat a sequence over and over
sequence of lists
•  select() // filter elements from a sequence
•  enumerate() // convert a sequence into a using a second sequence of true/false
sequence of lists with ordinals
•  group_by() // collect a sequence into a set
of subsequences by key

Pure Graph vs. Fat Graph

Pure Graph
Fat Graph
•  Stores exactly one kind of •  Stores multiple kinds of
data in each node.
data in each node.
•  Doesn’t usually support •  Allows structures and can
aggregate types see into them.
(structures).
•  Coalesces aggregated
•  Tend to have lots of ‘has data into a single
attribute’ edges making component, suitable for
their edges fuzzy and re-constitution as a
inefﬁcient.
program object.

Evolving into Stig

•  Treat Stig like a ﬁlesystem
•  Over time…
•  Store unstructured information •  Begin with edges as ad-hoc
at locations
indices.
•  Treat Stig like a kv store
•  Add more complex edge-
driven behavior as you grow
•  Nodes are basically like key-
more comfortable.
value stores, except with type.
•  Evolve your schema freely by
•  Ignore edges and just access
adding facets.
nodes by their ids.
•  Treat Stig like SQL
•  Even if you ignore
•  Simple walks are like table
everything, you still get:
scans.
•  ACID-like guarantees at scale.
•  Keep your data in separate, •  Single path to data.
scalar ﬁelds and the results will •  Stand-alone development.
look tabular.

Moving Forward

•  Open Source
•  Find us at stigdb.org
–  We plan to offer Stig under –  Signs up there for updates
a extremely liberal license –  Open source to drop Q4
(Apache)
2011
–  We are seeking •  Workshops
engagement with the open
source community and with –  At our ofﬁce in SF
other companies in this –  See website for schedule
space
•  Ideas?
–  Introducing Yaakov, Stig’s –  If you have the killer Stig
voice in open source
app, we want to hear from
•  At our Booth
you.
–  Copies of our decks

Wed 1315 lucas_jason_color

Recommended

Recommended

More Related Content

Similar to Wed 1315 lucas_jason_color

Similar to Wed 1315 lucas_jason_color (20)

More from DATAVERSITY

More from DATAVERSITY (20)

Recently uploaded

Recently uploaded (20)

Wed 1315 lucas_jason_color