SlideShare a Scribd company logo
1 of 24
Social Event Streams
        Background
Social event feeds




 Major feature - 70%of page views on Tumblr
Generating event streams
                        Social networking system

                                 data store clients
                       front-end (application logic)




                                                           …
user
                                  social               data stores
                                  graph

 Two types of user actions
   Share anevent
   Generate a new event stream
Optimizations
  Materialized views, one per user
    Contain the user’s own events
    It can also contain events of the other users it follows

  We abstract away application-specific “relevance” filters
    All views contain all events stored in them
    All queries to a view return all events in the view
                                      VIEW


EVENT    Plato      12.00     “Having the shadow of an ideal sandwich”
         Hume       12.01     “I just feel a good taste in my mouth”
         Kant       12.02     “Dudes, just eat it and stop blabbering”
GOAL: Optimizing throughput
 Throughput of event stream
   Proportional to the amount of data being transferred
   Partitioning social graphs is impossible (or at least, very very
    hard)

 Existing approaches to optimize throughput
   Push-all
   Pull-all
   Hybrid
Pull-all
 Writes to your view only                              B

 Read from all your friends’ view             A

 Simpler, good with frequent writes                    C


WRITE from Alice   Data stores     READ from Charlie   Data stores

                      Alice                                 Alice
      Client                             Client

       Alice           Bob               Charlie            Bob



                     Charlie                             Charlie
Push-all
 Write to all your friends’ views                     B

 Read from your view only                    A

 Good with frequent reads                             C


WRITEs from Alice   Data stores   READ from Charlie   Data stores
and Bob
      Clients          Alice                               Alice
                                        Client
       Alice
                        Bob             Charlie            Bob


       Bob
                      Charlie                           Charlie
Hybrid
                     [Silberstein et. al., SIGMOD 2010]

 Per-edge choice between pull or push

 Uses Production Rate (PR) and Consumption Rate (CR)

 Minimum per-edge throughput cost

  If PR(A) < CR(B)                      If PR(A) ≥ CR(B)


          A                B                    A                B

                PUSH                                   PULL
        A writes onto B’s view                 B reads from A’s view
             Cost: PR(A)                            Cost: CR(B)
Request schedule
                          Social networking system

                                   data store clients
                         front-end (application logic)




                                                             …
user
                              social                     data stores
                              graph

 Social graph contains the Request Schedule
   Per-edge Push or Pull
   Easy to integrate in existing system
Social Piggybacking
       Contribution
Idea: Social Piggybacking
 Two friends are likely to share many common friends

 Their views can be used as HUBS to prune edges

                     SOCIAL PIGGYBACKING

                    PUSH                 HUB
             A writes new events
                onto B’s view       B
                                               PULL
                                          C reads events
                        A                    by B and A
                                           from B’s view
                   FREE EDGE!        C
               Neither pull nor push
Social Dissemination Problem
 Inputs
   Social Graph
   Per-node Production and consumption rates

 Output: request schedulethat minimizes costs
   Each edge needs to be covered
   Can be through a hub, push or pull

 Requirements
   Bounded staleness
   Non-triviality
Analysis
 All admissible request schedule are s.t., for each edge
   The edge is served directly, using a push or a pull, or
   The edge is served through a hub.
   Any other schedule is not admissible

 The Social Dissemination problem is NP-hard
Nosy: A Simple Heuristic
 Nosy looks for hubgraphs

 Cost with Piggybacking : PR(X) + CR(Y), cross edges free
Nosy Phase 1
 Add elements to X sets
                                            X
 For each edge (w, y)
     Build the largest hubgraph(X, w, y)
     Piggybacking cost: PR(X) + CR(y)      X   w   y
     Cross edges X ->y are free
     Piggyback if cheaper than hybrid
Nosy Phase 2
 Add elements to Y sets
                                           X       Y
 For each (w, y)
   Let Xybe producers of y that push to
    w already                                      X
                                           X   w   y
   Piggybacking cost: CR(y)
   Cross edges Xy ->y are free
   Piggyback if cheaper than hybrid
Experiments
 Flickr and Twitter graphs
Experiments
 Twitter (Aug 2009) and Flickr (Apr 2008) social graphs
 Samples using random walks, which preserve graph
  properties
 Average sizes
   Flickr: 4 k nodes, 112 k edges
   Twitter: 25 k nodes, 158 k edges

 Production and consumption rates are generated
   write:read ratio is 1:5
   PR (resp. CR) increases logarithmically with out- degree
    (resp. in-degree)
Metrics and Results
 Metric
   Improvement overhybrid optimization (baseline)
   Gain(A) = Cost(BASE) / Cost(A) – 1

 Results
  1. Nosy exploits the community structure
  2. It works well under a variety of parameters
Clustering Coefficient
 After sampling, we keep only a fraction s of edges

 B+ is a trivial extension of Baseline
   Lock push edges
   Pull edges that can be served using hubs are free

 More clustering, more gain for Nosy but not for B+
Varied Workload
 Significant gains

 Asymptotically, i.e. with all reads, the per-edge push-
  based solution is optimal so the gain tends to zero
Effect of Colocation
 As the system size grows, the gains reach their maximum

 For very small systems there is little communication so
  little room for improvements
Conclusions
 Social Piggybacking is a very promising approach
   Baseline has up to 2.4 times higher throughput cost
   Easy to integrate in existing systems

 Next steps
   Run on full social graphs
   Evaluate throughput gain on actual social networking system

More Related Content

Similar to Social Piggybacking: Leveraging Common Friends to Generate Event Streams

项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 topgeek
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Fundamentals of Deep Recommender Systems
 Fundamentals of Deep Recommender Systems Fundamentals of Deep Recommender Systems
Fundamentals of Deep Recommender SystemsWQ Fan
 
Connecting the Dots—How a Graph Database Enables Discovery
Connecting the Dots—How a Graph Database Enables DiscoveryConnecting the Dots—How a Graph Database Enables Discovery
Connecting the Dots—How a Graph Database Enables DiscoveryInside Analysis
 
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...SL Corporation
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Universitat Politècnica de Catalunya
 
[Phpcamp]Shindig An OpenSocial container
[Phpcamp]Shindig An OpenSocial container[Phpcamp]Shindig An OpenSocial container
[Phpcamp]Shindig An OpenSocial containerBipin Upadhyay
 
w-jax 2022: Keeping CALM – Konsistenz in verteilten Systemen leichtgemacht
w-jax 2022: Keeping CALM – Konsistenz in verteilten Systemen leichtgemachtw-jax 2022: Keeping CALM – Konsistenz in verteilten Systemen leichtgemacht
w-jax 2022: Keeping CALM – Konsistenz in verteilten Systemen leichtgemachtSusanne Braun
 
Konsistenz-in-verteilten-Systemen-leichtgemacht-wjax.pdf
Konsistenz-in-verteilten-Systemen-leichtgemacht-wjax.pdfKonsistenz-in-verteilten-Systemen-leichtgemacht-wjax.pdf
Konsistenz-in-verteilten-Systemen-leichtgemacht-wjax.pdfSusanne Braun
 
Analysis of Adaptive Streaming for Hybrid CDN/P2P Live Video Systems
Analysis of Adaptive Streaming for Hybrid CDN/P2P Live Video SystemsAnalysis of Adaptive Streaming for Hybrid CDN/P2P Live Video Systems
Analysis of Adaptive Streaming for Hybrid CDN/P2P Live Video SystemsKevin Tong
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...Data Science Milan
 
ARM'08 - Keynote Talk
ARM'08 - Keynote TalkARM'08 - Keynote Talk
ARM'08 - Keynote TalkLicia Capra
 
Ieee eit-talk-large-scale-neural-modeling-in-map reduce-giraph
Ieee eit-talk-large-scale-neural-modeling-in-map reduce-giraphIeee eit-talk-large-scale-neural-modeling-in-map reduce-giraph
Ieee eit-talk-large-scale-neural-modeling-in-map reduce-giraphimsure
 
Opportunistic Routing Based on Daily Routines
Opportunistic Routing Based on Daily RoutinesOpportunistic Routing Based on Daily Routines
Opportunistic Routing Based on Daily RoutinesWaldir Moreira
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Konsistenz-in-verteilten-Systemen-leichtgemacht.pdf
Konsistenz-in-verteilten-Systemen-leichtgemacht.pdfKonsistenz-in-verteilten-Systemen-leichtgemacht.pdf
Konsistenz-in-verteilten-Systemen-leichtgemacht.pdfSusanne Braun
 
Microservices Antipatterns
Microservices AntipatternsMicroservices Antipatterns
Microservices AntipatternsC4Media
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Data Con LA
 
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...Amazon Web Services
 
HILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill HoweHILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill Howedomoritz
 

Similar to Social Piggybacking: Leveraging Common Friends to Generate Event Streams (20)

项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通 项亮 推荐系统实践 从入门到精通
项亮 推荐系统实践 从入门到精通
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Fundamentals of Deep Recommender Systems
 Fundamentals of Deep Recommender Systems Fundamentals of Deep Recommender Systems
Fundamentals of Deep Recommender Systems
 
Connecting the Dots—How a Graph Database Enables Discovery
Connecting the Dots—How a Graph Database Enables DiscoveryConnecting the Dots—How a Graph Database Enables Discovery
Connecting the Dots—How a Graph Database Enables Discovery
 
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
Overcoming the Top Four Challenges to Real‐Time Performance in Large‐Scale, D...
 
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...
 
[Phpcamp]Shindig An OpenSocial container
[Phpcamp]Shindig An OpenSocial container[Phpcamp]Shindig An OpenSocial container
[Phpcamp]Shindig An OpenSocial container
 
w-jax 2022: Keeping CALM – Konsistenz in verteilten Systemen leichtgemacht
w-jax 2022: Keeping CALM – Konsistenz in verteilten Systemen leichtgemachtw-jax 2022: Keeping CALM – Konsistenz in verteilten Systemen leichtgemacht
w-jax 2022: Keeping CALM – Konsistenz in verteilten Systemen leichtgemacht
 
Konsistenz-in-verteilten-Systemen-leichtgemacht-wjax.pdf
Konsistenz-in-verteilten-Systemen-leichtgemacht-wjax.pdfKonsistenz-in-verteilten-Systemen-leichtgemacht-wjax.pdf
Konsistenz-in-verteilten-Systemen-leichtgemacht-wjax.pdf
 
Analysis of Adaptive Streaming for Hybrid CDN/P2P Live Video Systems
Analysis of Adaptive Streaming for Hybrid CDN/P2P Live Video SystemsAnalysis of Adaptive Streaming for Hybrid CDN/P2P Live Video Systems
Analysis of Adaptive Streaming for Hybrid CDN/P2P Live Video Systems
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...
 
ARM'08 - Keynote Talk
ARM'08 - Keynote TalkARM'08 - Keynote Talk
ARM'08 - Keynote Talk
 
Ieee eit-talk-large-scale-neural-modeling-in-map reduce-giraph
Ieee eit-talk-large-scale-neural-modeling-in-map reduce-giraphIeee eit-talk-large-scale-neural-modeling-in-map reduce-giraph
Ieee eit-talk-large-scale-neural-modeling-in-map reduce-giraph
 
Opportunistic Routing Based on Daily Routines
Opportunistic Routing Based on Daily RoutinesOpportunistic Routing Based on Daily Routines
Opportunistic Routing Based on Daily Routines
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Konsistenz-in-verteilten-Systemen-leichtgemacht.pdf
Konsistenz-in-verteilten-Systemen-leichtgemacht.pdfKonsistenz-in-verteilten-Systemen-leichtgemacht.pdf
Konsistenz-in-verteilten-Systemen-leichtgemacht.pdf
 
Microservices Antipatterns
Microservices AntipatternsMicroservices Antipatterns
Microservices Antipatterns
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Real-time Aggregations, Ap...
 
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
ABD322_Implementing a Flight Simulator Interface Using AI, Virtual Reality, a...
 
HILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill HoweHILDA 2023 Keynote Bill Howe
HILDA 2023 Keynote Bill Howe
 

Social Piggybacking: Leveraging Common Friends to Generate Event Streams

  • 1.
  • 2. Social Event Streams Background
  • 3. Social event feeds  Major feature - 70%of page views on Tumblr
  • 4. Generating event streams Social networking system data store clients front-end (application logic) … user social data stores graph  Two types of user actions  Share anevent  Generate a new event stream
  • 5. Optimizations  Materialized views, one per user  Contain the user’s own events  It can also contain events of the other users it follows  We abstract away application-specific “relevance” filters  All views contain all events stored in them  All queries to a view return all events in the view VIEW EVENT Plato 12.00 “Having the shadow of an ideal sandwich” Hume 12.01 “I just feel a good taste in my mouth” Kant 12.02 “Dudes, just eat it and stop blabbering”
  • 6. GOAL: Optimizing throughput  Throughput of event stream  Proportional to the amount of data being transferred  Partitioning social graphs is impossible (or at least, very very hard)  Existing approaches to optimize throughput  Push-all  Pull-all  Hybrid
  • 7. Pull-all  Writes to your view only B  Read from all your friends’ view A  Simpler, good with frequent writes C WRITE from Alice Data stores READ from Charlie Data stores Alice Alice Client Client Alice Bob Charlie Bob Charlie Charlie
  • 8. Push-all  Write to all your friends’ views B  Read from your view only A  Good with frequent reads C WRITEs from Alice Data stores READ from Charlie Data stores and Bob Clients Alice Alice Client Alice Bob Charlie Bob Bob Charlie Charlie
  • 9. Hybrid [Silberstein et. al., SIGMOD 2010]  Per-edge choice between pull or push  Uses Production Rate (PR) and Consumption Rate (CR)  Minimum per-edge throughput cost If PR(A) < CR(B) If PR(A) ≥ CR(B) A B A B PUSH PULL A writes onto B’s view B reads from A’s view Cost: PR(A) Cost: CR(B)
  • 10. Request schedule Social networking system data store clients front-end (application logic) … user social data stores graph  Social graph contains the Request Schedule  Per-edge Push or Pull  Easy to integrate in existing system
  • 11. Social Piggybacking Contribution
  • 12. Idea: Social Piggybacking  Two friends are likely to share many common friends  Their views can be used as HUBS to prune edges SOCIAL PIGGYBACKING PUSH HUB A writes new events onto B’s view B PULL C reads events A by B and A from B’s view FREE EDGE! C Neither pull nor push
  • 13. Social Dissemination Problem  Inputs  Social Graph  Per-node Production and consumption rates  Output: request schedulethat minimizes costs  Each edge needs to be covered  Can be through a hub, push or pull  Requirements  Bounded staleness  Non-triviality
  • 14. Analysis  All admissible request schedule are s.t., for each edge  The edge is served directly, using a push or a pull, or  The edge is served through a hub.  Any other schedule is not admissible  The Social Dissemination problem is NP-hard
  • 15. Nosy: A Simple Heuristic  Nosy looks for hubgraphs  Cost with Piggybacking : PR(X) + CR(Y), cross edges free
  • 16. Nosy Phase 1  Add elements to X sets X  For each edge (w, y)  Build the largest hubgraph(X, w, y)  Piggybacking cost: PR(X) + CR(y) X w y  Cross edges X ->y are free  Piggyback if cheaper than hybrid
  • 17. Nosy Phase 2  Add elements to Y sets X Y  For each (w, y)  Let Xybe producers of y that push to w already X X w y  Piggybacking cost: CR(y)  Cross edges Xy ->y are free  Piggyback if cheaper than hybrid
  • 18. Experiments Flickr and Twitter graphs
  • 19. Experiments  Twitter (Aug 2009) and Flickr (Apr 2008) social graphs  Samples using random walks, which preserve graph properties  Average sizes  Flickr: 4 k nodes, 112 k edges  Twitter: 25 k nodes, 158 k edges  Production and consumption rates are generated  write:read ratio is 1:5  PR (resp. CR) increases logarithmically with out- degree (resp. in-degree)
  • 20. Metrics and Results  Metric  Improvement overhybrid optimization (baseline)  Gain(A) = Cost(BASE) / Cost(A) – 1  Results 1. Nosy exploits the community structure 2. It works well under a variety of parameters
  • 21. Clustering Coefficient  After sampling, we keep only a fraction s of edges  B+ is a trivial extension of Baseline  Lock push edges  Pull edges that can be served using hubs are free  More clustering, more gain for Nosy but not for B+
  • 22. Varied Workload  Significant gains  Asymptotically, i.e. with all reads, the per-edge push- based solution is optimal so the gain tends to zero
  • 23. Effect of Colocation  As the system size grows, the gains reach their maximum  For very small systems there is little communication so little room for improvements
  • 24. Conclusions  Social Piggybacking is a very promising approach  Baseline has up to 2.4 times higher throughput cost  Easy to integrate in existing systems  Next steps  Run on full social graphs  Evaluate throughput gain on actual social networking system