SlideShare a Scribd company logo
Stig Workshop 

                            

Architect of Scalable Infrastructure - Jason Lucas
A graph of user session analytics
Session Analytics Graph

 A sample of session flows stored in Stig

 <Source, ‘generated’, Clickthrough>
 <Clickthrough, ‘clicked by’, User>
 <Clickthrough, ‘served’, Page>
Inferred Edges
We can define rules for edges that are not part of the graph, but that are
inferred knowledge from the edges that are in the graph.

For example, we can infer that a person requested a page:

<person, ‘requested’, page> = walk clickthrough:
         [ <clickthrough, ‘clicked_by’, person>;
            <clickthrough, ‘served’, page>     ];


Or that a page was visited from another page:

<page1, ‘clicked_to’, page2> = walk clickthrough:
         [ <page1, ‘generated’, clickthrough>;
            <clickthrough, ‘served’, page2> ];
Queries of Interest
•  How often does a person visit the same page?
•  How often is such a visit an inconvenient clickthrough or redirect?
•  Do people visit a given page more often at a certain time of day?
•  Does their ip address (home or work PC) influence their usage
   patterns?
•  Do friends’ usage patterns on the site influence this specific user? (A
   friend started playing a new game)
•  General usage statistics.
•  Have links on our site been moved/removed in such a way that users
   can’t find the features they want to use?

•  The biggest question is…


     How can we improve our users’ experience?
User Experience Optimization



•  Find common usage patterns in a specific user’s session flow
•  Look for inconveniences within this flow and experiment with removing
   them; acquire user feedback
•  For example, a user that comes to our site to play a specific game and
   has to click 3-4 times to get to this game’s main page
•  How can we identify users whom we want to put in an experimental
   flow?
•  A simple identifier: 90% of their flows lead to a particular feature.
Simple Stig Functions

How many times did Page A lead to Page B?
   numClicksFromTo source dest =
         count (solve : [<source, ‘clicked_to’, dest>]);



How many times was Page A served?
   numServes dest =
         count (solve (clickthrough) : [<clickthrough, ‘served’, dest>];



How many times did a particular user access Page A?
   numUserRequestsPage person page =
         count (solve : [<person, ‘requested’, page>]);



How many times did this particular user go from Page A to Page B?
   numUserRequestsPageFrom person source page =
         count (solve : [   <source, ‘clicked_to’, page>;
                            <person, ‘requested’, page>   ]);
Let’s take a look at some sample user
            session flows…
Alice loves to play Pets!

                               “Alice comes to our site to play Pets.
                             She always hits the home page and has
                               to click through to her favorite game.
                              Plenty of room for improvement here”
Bob interacts a lot with his newsfeed

                         “Bob likes to go straight to newsfeed and
                          expand people’s comments or like them.
                                  Can we better his experience by
                              expanding his favorite comments or
                         displaying newsfeed on his home page?”
Carol receives an e-mail about a specific newsfeed post

                                        “Carol received an e-mail
                                       notification that her friend
                                    mentioned her in a newsfeed
                                     comment. She had to press
                                     expand on the comments to
                                     be able to see it. We should
                                              have known better!”
Dave checks his Cafe thanks to an e-mail notification

                       “Dave received an e-mail notification telling
                        him his cookers were ready in Cafe. When
                         he clicked on the link in the e-mail it took
                        him to the home page! Now he has to click
                      on his alert, get redirected to the Cafe game,
                                             and then start playing.”
“What about Alice?”



•  Alice is identified as a candidate for our user experience optimization
   study.
•  Let’s look at Alice’s patterns…
•  Alice goes from the home page straight to Games 90% of the time, so
   we want to just give her the Games page when she visits our site.
•  So we’ll be adding her to our experimental user base.
•  Also, we want to let her give us a “Thumbs Up!” or “Thumbs Down!”
   on the new flow.

•  How can we make the database do this?
Modifying the User Flow In Real Time

We update the database to set Alice’s status as participating in the user
flow experiment:

when ((   numUserRequestsPageFrom ‘Alice’ ‘Home Page’ ‘Games Page’ /
          numUserRequestsPage ‘Alice’ ‘Home Page’) > 0.9)
          do {
               FlowExperiment@’Alice’ := { running = True;
                                           startDate = Now;
                                           HasVoted = False;
                                         };
               <‘Alice’, ‘participates_in_experiment’, ‘Games Page’> := True;
          };


We also add an edge to the graph showing that Alice is part of the
Games Page experiment – we can find all the users that are part of the
experiment by getting all the edges from that node.
Analyzing the New Flow

•  Alice gets a friendly pop-up asking how she likes the new flow, and
   whether to keep it.
    –  Thumbs Up! -> Keep the new structure! I love it!
    –  Thumbs Down! -> Give me back my old flow!

•  We can update the database with a simple Stig function:
thumbsUpdate user =
   do {
         FlowExperiment@’user’ := { HasVoted = True; };
         ThumbsUp@’Flow Experiment’ += 1;
   };

thumbsDowndate user =
   do {
         FlowExperiment@’user’ := { HasVoted = True;
                                    running = False; };
         ThumbsDown@’Flow Experiment’ += 1;
   };
We Snuck Something in There:"
Improving Update Concurrency

•  x +=1 is better than x = x + 1
•  We can take in many thumb updates, and only need to evaluate the
   ThumbsUp or ThumbsDown total when we need it
•  Common terminology:
   –    Database theorists would call this a Field Call
   –    Escrowing
   –    Write without read
   –    Commutative operations




                Don’t ask for things you don’t need!
             “If you don’t care about the result, don’t make Stig compute it”
Calling Stig

•  Our client API is available from:
    –    Python
    –    PHP
    –    Perl
    –    Java
    –    C / C++
    –    and we can serve HTTP directly
•  Our focus is the web:
    –  Almost all calls are asynchronous and return futures
    –  Sessions are durable and progress while you’re not connected
    –  Most interface objects have a Time To Live
Sessions



•  Sessions are durable.
               •  Sessions are replicated.
  •    You can close a session and        •    If a session server goes down,
       re-open it later.
                      one of its backups will take
  •    This is to facilitate HTTP.
            over.
  •    Access is controlled by a          •    This might require your client to
       security token.
                        restart its cursors.
  •    Sessions eventually die of old   •  Progress happens when
       age if left alone.
                 you’re not looking.
•  Sessions have synthetic                •    Queries and updates continue
   nodes in the graph.
                        to make progress, even when
  •    Use the sessions node to store          a session is closed.
       session-specific data.
             •    Notifications accumulate in
                                               your in-box and are waiting for
                                               you when you re-open.
Writing in Stig



•  Stig is a compiled language
    –  What we can do through analysis most databases have to do through data-
       definition languages and DBA tweaking
    –  Feedback from the compiler is not just about program correctness but expected
       performance
    –  Application programmers become aware of scaling problems before they happen
       but are not required to be scalability engineers in order to fix them
•  Stig is a programming language not a query language
    –  Stig harnesses the computation power of the cluster, not just its storage capacity
    –  The more of your program is written in Stig, the more you can take advantage of
       distributed evaluation
    –  Stig programs are stored in the database and can call each other, enabling a
       strategy of library development
•  Stig marries logical pattern matching and functional evaluation.
    –  That is: search + computation = Stig.
Logical Pattern Matching



•  Writing a pattern:
    –  walk:   person, friend, hobby, site [
           <   person, ‘is friend of’, friend >,
           <   friend, ‘has interest in’, hobby >,
           <   hobby, ‘is advertised on’, site > ];


•  Yields a sequence:
    –  [ { person=/users/alice, friend=/users/bob,
           hobby=/subjects/gardening,
           site=“http://www.plantastic.com” },
         { person=/users/bob, friend=/users/carol,
           hobby=/subjects/yoga,
           site=“http://lowfatyoga.com” } ]
Composing Sequences for Distributed Evaluation
•    chain() // concatenate sequences
                •    sort() // convert a sequence into a sorted
•    collect() // collect a sequence into a list
          list
•    reverse() // collect a sequence into a list in   •    filter() // convert a sequence into a
     reverse order
                                        sequence with some elements filtered out
•    map() // apply an arity-1 function to each       •    count() // count the number of elements in
     element in a sequence, yielding a                     a sequence, yielding an integer
     sequence
                                        •    product() // yield as a sequence the
•    reduce() // apply an arity-2 function to              Cartesian product of a tuple of sequences
     each element in a sequence, yielding a           •    range() // yield as a sequence a range of
     scalar
                                               integers
•    zip() // convert a tuple of sequences into a     •    slice() // slice a subsequence from a
     sequence of tuples
                                   sequence
•    group() // collect a sequence into a             •    cycle() // repeat a sequence over and over
     sequence of lists
                               •    select() // filter elements from a sequence
•    enumerate() // convert a sequence into a              using a second sequence of true/false
     sequence of lists with ordinals
                 •    group_by() // collect a sequence into a set
                                                           of subsequences by key
Pure Graph vs. Fat Graph


Pure Graph
                     Fat Graph
•  Stores exactly one kind of   •  Stores multiple kinds of
   data in each node.
             data in each node.
•  Doesn’t usually support      •  Allows structures and can
   aggregate types                 see into them.
   (structures).
               •  Coalesces aggregated
•  Tend to have lots of ‘has       data into a single
   attribute’ edges making         component, suitable for
   their edges fuzzy and           re-constitution as a
   inefficient.
                    program object.
Evolving into Stig



•  Treat Stig like a filesystem
             •  Over time…
   •    Store unstructured information         •    Begin with edges as ad-hoc
        at locations
                               indices.
•  Treat Stig like a kv store
                 •    Add more complex edge-
                                                    driven behavior as you grow
   •    Nodes are basically like key-
                                                    more comfortable.
        value stores, except with type.
                                               •    Evolve your schema freely by
   •    Ignore edges and just access
                                                    adding facets.
        nodes by their ids.
•  Treat Stig like SQL
                     •  Even if you ignore
   •    Simple walks are like table
                                               everything, you still get:
        scans.
                                •    ACID-like guarantees at scale.
   •    Keep your data in separate,            •    Single path to data.
        scalar fields and the results will      •    Stand-alone development.
        look tabular.
Moving Forward


•  Open Source
                     •  Find us at stigdb.org
   –  We plan to offer Stig under      –  Signs up there for updates
      a extremely liberal license      –  Open source to drop Q4
      (Apache)
                           2011
   –  We are seeking                •  Workshops
      engagement with the open
      source community and with        –  At our office in SF
      other companies in this          –  See website for schedule
      space
                        •  Ideas?
   –  Introducing Yaakov, Stig’s       –  If you have the killer Stig
      voice in open source
               app, we want to hear from
•  At our Booth
                          you.
   –  Copies of our decks

More Related Content

Similar to Wed 1315 lucas_jason_color

Community Organizing Tools from the Experts Webinar
Community Organizing Tools from the Experts WebinarCommunity Organizing Tools from the Experts Webinar
Community Organizing Tools from the Experts Webinar
NTEN
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseLucidworks (Archived)
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Lucidworks (Archived)
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
lucenerevolution
 
Prashant Sridharan
Prashant SridharanPrashant Sridharan
Prashant Sridharan
Francisco Saez Cerda
 
User Story Mapping
User Story MappingUser Story Mapping
User Story Mapping
Steve Rogalsky
 
F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4
malorie_pinterest
 
Hacking iOS with Proxies - dc612
Hacking iOS with Proxies - dc612Hacking iOS with Proxies - dc612
Hacking iOS with Proxies - dc612
Karl Fosaaen
 
Neo4j Training Cypher
Neo4j Training CypherNeo4j Training Cypher
Neo4j Training Cypher
Max De Marzi
 
Chef at WebMD
Chef at WebMDChef at WebMD
Chef at WebMD
adamleff
 
Stig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at ScaleStig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at Scale
DATAVERSITY
 
World's Top 10 Technology Startups
World's Top 10 Technology StartupsWorld's Top 10 Technology Startups
World's Top 10 Technology Startups
101presentations
 
Towards a Reactive Game Engine
Towards a Reactive Game EngineTowards a Reactive Game Engine
Towards a Reactive Game Engine
NUS-ISS
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on RailsAvi Kedar
 
Extreme Analytics
Extreme AnalyticsExtreme Analytics
Extreme AnalyticsNTEN
 
Surviving a Hackathon and Beyond
Surviving a Hackathon and BeyondSurviving a Hackathon and Beyond
Surviving a Hackathon and Beyond
imoneytech
 
Power to the People: Manipulating SharePoint with Client-Side JavaScript
Power to the People:  Manipulating SharePoint with Client-Side JavaScriptPower to the People:  Manipulating SharePoint with Client-Side JavaScript
Power to the People: Manipulating SharePoint with Client-Side JavaScript
PeterBrunone
 
Five Cliches of Online Game Development
Five Cliches of Online Game DevelopmentFive Cliches of Online Game Development
Five Cliches of Online Game Development
iandundore
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
Tomer Gabel
 
Make Everyone a Tester: Natural Language Acceptance Testing
Make Everyone a Tester: Natural Language Acceptance TestingMake Everyone a Tester: Natural Language Acceptance Testing
Make Everyone a Tester: Natural Language Acceptance Testing
Patrick Reagan
 

Similar to Wed 1315 lucas_jason_color (20)

Community Organizing Tools from the Experts Webinar
Community Organizing Tools from the Experts WebinarCommunity Organizing Tools from the Experts Webinar
Community Organizing Tools from the Experts Webinar
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
 
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks EnterpriseImplementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
Implementing Click-through Relevance Ranking in Solr and LucidWorks Enterprise
 
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 Click-through relevance ranking in solr &  lucid works enterprise - By Andrz... Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
Click-through relevance ranking in solr &  lucid works enterprise - By Andrz...
 
Prashant Sridharan
Prashant SridharanPrashant Sridharan
Prashant Sridharan
 
User Story Mapping
User Story MappingUser Story Mapping
User Story Mapping
 
F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4F8 tech talk_pinterest_v4
F8 tech talk_pinterest_v4
 
Hacking iOS with Proxies - dc612
Hacking iOS with Proxies - dc612Hacking iOS with Proxies - dc612
Hacking iOS with Proxies - dc612
 
Neo4j Training Cypher
Neo4j Training CypherNeo4j Training Cypher
Neo4j Training Cypher
 
Chef at WebMD
Chef at WebMDChef at WebMD
Chef at WebMD
 
Stig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at ScaleStig: Social Graphs & Discovery at Scale
Stig: Social Graphs & Discovery at Scale
 
World's Top 10 Technology Startups
World's Top 10 Technology StartupsWorld's Top 10 Technology Startups
World's Top 10 Technology Startups
 
Towards a Reactive Game Engine
Towards a Reactive Game EngineTowards a Reactive Game Engine
Towards a Reactive Game Engine
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
 
Extreme Analytics
Extreme AnalyticsExtreme Analytics
Extreme Analytics
 
Surviving a Hackathon and Beyond
Surviving a Hackathon and BeyondSurviving a Hackathon and Beyond
Surviving a Hackathon and Beyond
 
Power to the People: Manipulating SharePoint with Client-Side JavaScript
Power to the People:  Manipulating SharePoint with Client-Side JavaScriptPower to the People:  Manipulating SharePoint with Client-Side JavaScript
Power to the People: Manipulating SharePoint with Client-Side JavaScript
 
Five Cliches of Online Game Development
Five Cliches of Online Game DevelopmentFive Cliches of Online Game Development
Five Cliches of Online Game Development
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
 
Make Everyone a Tester: Natural Language Acceptance Testing
Make Everyone a Tester: Natural Language Acceptance TestingMake Everyone a Tester: Natural Language Acceptance Testing
Make Everyone a Tester: Natural Language Acceptance Testing
 

More from DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
DATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
DATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Recently uploaded

The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
Jen Stirrup
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 

Recently uploaded (20)

The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...The Metaverse and AI: how can decision-makers harness the Metaverse for their...
The Metaverse and AI: how can decision-makers harness the Metaverse for their...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 

Wed 1315 lucas_jason_color

  • 1. Stig Workshop Architect of Scalable Infrastructure - Jason Lucas
  • 2. A graph of user session analytics
  • 3. Session Analytics Graph A sample of session flows stored in Stig <Source, ‘generated’, Clickthrough> <Clickthrough, ‘clicked by’, User> <Clickthrough, ‘served’, Page>
  • 4. Inferred Edges We can define rules for edges that are not part of the graph, but that are inferred knowledge from the edges that are in the graph. For example, we can infer that a person requested a page: <person, ‘requested’, page> = walk clickthrough: [ <clickthrough, ‘clicked_by’, person>; <clickthrough, ‘served’, page> ]; Or that a page was visited from another page: <page1, ‘clicked_to’, page2> = walk clickthrough: [ <page1, ‘generated’, clickthrough>; <clickthrough, ‘served’, page2> ];
  • 5. Queries of Interest •  How often does a person visit the same page? •  How often is such a visit an inconvenient clickthrough or redirect? •  Do people visit a given page more often at a certain time of day? •  Does their ip address (home or work PC) influence their usage patterns? •  Do friends’ usage patterns on the site influence this specific user? (A friend started playing a new game) •  General usage statistics. •  Have links on our site been moved/removed in such a way that users can’t find the features they want to use? •  The biggest question is… How can we improve our users’ experience?
  • 6. User Experience Optimization •  Find common usage patterns in a specific user’s session flow •  Look for inconveniences within this flow and experiment with removing them; acquire user feedback •  For example, a user that comes to our site to play a specific game and has to click 3-4 times to get to this game’s main page •  How can we identify users whom we want to put in an experimental flow? •  A simple identifier: 90% of their flows lead to a particular feature.
  • 7. Simple Stig Functions How many times did Page A lead to Page B? numClicksFromTo source dest = count (solve : [<source, ‘clicked_to’, dest>]); How many times was Page A served? numServes dest = count (solve (clickthrough) : [<clickthrough, ‘served’, dest>]; How many times did a particular user access Page A? numUserRequestsPage person page = count (solve : [<person, ‘requested’, page>]); How many times did this particular user go from Page A to Page B? numUserRequestsPageFrom person source page = count (solve : [ <source, ‘clicked_to’, page>; <person, ‘requested’, page> ]);
  • 8. Let’s take a look at some sample user session flows…
  • 9. Alice loves to play Pets! “Alice comes to our site to play Pets. She always hits the home page and has to click through to her favorite game. Plenty of room for improvement here”
  • 10. Bob interacts a lot with his newsfeed “Bob likes to go straight to newsfeed and expand people’s comments or like them. Can we better his experience by expanding his favorite comments or displaying newsfeed on his home page?”
  • 11. Carol receives an e-mail about a specific newsfeed post “Carol received an e-mail notification that her friend mentioned her in a newsfeed comment. She had to press expand on the comments to be able to see it. We should have known better!”
  • 12. Dave checks his Cafe thanks to an e-mail notification “Dave received an e-mail notification telling him his cookers were ready in Cafe. When he clicked on the link in the e-mail it took him to the home page! Now he has to click on his alert, get redirected to the Cafe game, and then start playing.”
  • 13. “What about Alice?” •  Alice is identified as a candidate for our user experience optimization study. •  Let’s look at Alice’s patterns… •  Alice goes from the home page straight to Games 90% of the time, so we want to just give her the Games page when she visits our site. •  So we’ll be adding her to our experimental user base. •  Also, we want to let her give us a “Thumbs Up!” or “Thumbs Down!” on the new flow. •  How can we make the database do this?
  • 14. Modifying the User Flow In Real Time We update the database to set Alice’s status as participating in the user flow experiment: when (( numUserRequestsPageFrom ‘Alice’ ‘Home Page’ ‘Games Page’ / numUserRequestsPage ‘Alice’ ‘Home Page’) > 0.9) do { FlowExperiment@’Alice’ := { running = True; startDate = Now; HasVoted = False; }; <‘Alice’, ‘participates_in_experiment’, ‘Games Page’> := True; }; We also add an edge to the graph showing that Alice is part of the Games Page experiment – we can find all the users that are part of the experiment by getting all the edges from that node.
  • 15. Analyzing the New Flow •  Alice gets a friendly pop-up asking how she likes the new flow, and whether to keep it. –  Thumbs Up! -> Keep the new structure! I love it! –  Thumbs Down! -> Give me back my old flow! •  We can update the database with a simple Stig function: thumbsUpdate user = do { FlowExperiment@’user’ := { HasVoted = True; }; ThumbsUp@’Flow Experiment’ += 1; }; thumbsDowndate user = do { FlowExperiment@’user’ := { HasVoted = True; running = False; }; ThumbsDown@’Flow Experiment’ += 1; };
  • 16. We Snuck Something in There:" Improving Update Concurrency •  x +=1 is better than x = x + 1 •  We can take in many thumb updates, and only need to evaluate the ThumbsUp or ThumbsDown total when we need it •  Common terminology: –  Database theorists would call this a Field Call –  Escrowing –  Write without read –  Commutative operations Don’t ask for things you don’t need! “If you don’t care about the result, don’t make Stig compute it”
  • 17. Calling Stig •  Our client API is available from: –  Python –  PHP –  Perl –  Java –  C / C++ –  and we can serve HTTP directly •  Our focus is the web: –  Almost all calls are asynchronous and return futures –  Sessions are durable and progress while you’re not connected –  Most interface objects have a Time To Live
  • 18. Sessions •  Sessions are durable. •  Sessions are replicated. •  You can close a session and •  If a session server goes down, re-open it later. one of its backups will take •  This is to facilitate HTTP. over. •  Access is controlled by a •  This might require your client to security token. restart its cursors. •  Sessions eventually die of old •  Progress happens when age if left alone. you’re not looking. •  Sessions have synthetic •  Queries and updates continue nodes in the graph. to make progress, even when •  Use the sessions node to store a session is closed. session-specific data. •  Notifications accumulate in your in-box and are waiting for you when you re-open.
  • 19. Writing in Stig •  Stig is a compiled language –  What we can do through analysis most databases have to do through data- definition languages and DBA tweaking –  Feedback from the compiler is not just about program correctness but expected performance –  Application programmers become aware of scaling problems before they happen but are not required to be scalability engineers in order to fix them •  Stig is a programming language not a query language –  Stig harnesses the computation power of the cluster, not just its storage capacity –  The more of your program is written in Stig, the more you can take advantage of distributed evaluation –  Stig programs are stored in the database and can call each other, enabling a strategy of library development •  Stig marries logical pattern matching and functional evaluation. –  That is: search + computation = Stig.
  • 20. Logical Pattern Matching •  Writing a pattern: –  walk: person, friend, hobby, site [ < person, ‘is friend of’, friend >, < friend, ‘has interest in’, hobby >, < hobby, ‘is advertised on’, site > ]; •  Yields a sequence: –  [ { person=/users/alice, friend=/users/bob, hobby=/subjects/gardening, site=“http://www.plantastic.com” }, { person=/users/bob, friend=/users/carol, hobby=/subjects/yoga, site=“http://lowfatyoga.com” } ]
  • 21. Composing Sequences for Distributed Evaluation •  chain() // concatenate sequences •  sort() // convert a sequence into a sorted •  collect() // collect a sequence into a list list •  reverse() // collect a sequence into a list in •  filter() // convert a sequence into a reverse order sequence with some elements filtered out •  map() // apply an arity-1 function to each •  count() // count the number of elements in element in a sequence, yielding a a sequence, yielding an integer sequence •  product() // yield as a sequence the •  reduce() // apply an arity-2 function to Cartesian product of a tuple of sequences each element in a sequence, yielding a •  range() // yield as a sequence a range of scalar integers •  zip() // convert a tuple of sequences into a •  slice() // slice a subsequence from a sequence of tuples sequence •  group() // collect a sequence into a •  cycle() // repeat a sequence over and over sequence of lists •  select() // filter elements from a sequence •  enumerate() // convert a sequence into a using a second sequence of true/false sequence of lists with ordinals •  group_by() // collect a sequence into a set of subsequences by key
  • 22. Pure Graph vs. Fat Graph Pure Graph Fat Graph •  Stores exactly one kind of •  Stores multiple kinds of data in each node. data in each node. •  Doesn’t usually support •  Allows structures and can aggregate types see into them. (structures). •  Coalesces aggregated •  Tend to have lots of ‘has data into a single attribute’ edges making component, suitable for their edges fuzzy and re-constitution as a inefficient. program object.
  • 23. Evolving into Stig •  Treat Stig like a filesystem •  Over time… •  Store unstructured information •  Begin with edges as ad-hoc at locations indices. •  Treat Stig like a kv store •  Add more complex edge- driven behavior as you grow •  Nodes are basically like key- more comfortable. value stores, except with type. •  Evolve your schema freely by •  Ignore edges and just access adding facets. nodes by their ids. •  Treat Stig like SQL •  Even if you ignore •  Simple walks are like table everything, you still get: scans. •  ACID-like guarantees at scale. •  Keep your data in separate, •  Single path to data. scalar fields and the results will •  Stand-alone development. look tabular.
  • 24. Moving Forward •  Open Source •  Find us at stigdb.org –  We plan to offer Stig under –  Signs up there for updates a extremely liberal license –  Open source to drop Q4 (Apache) 2011 –  We are seeking •  Workshops engagement with the open source community and with –  At our office in SF other companies in this –  See website for schedule space •  Ideas? –  Introducing Yaakov, Stig’s –  If you have the killer Stig voice in open source app, we want to hear from •  At our Booth you. –  Copies of our decks