Activity Feeds ArchitectureJanuary, 2011Friday, January 14, 2011
Activity Feeds ArchitectureTo Be Covered:•Data model•Where feeds come from•How feeds are displayed•OptimizationsFriday, Ja...
Activity Feeds ArchitectureFundamental EntitiesActivitiesConnectionsFriday, January 14, 2011There are two fundamental buil...
Activity Feeds ArchitectureConnectionsConnectionsFavoritesCirclesOrdersEtc.Friday, January 14, 2011Connections are a super...
Activity Feeds ArchitectureConnectionsA BCDEFGHIJFriday, January 14, 2011Connections are implemented as a directed graph.C...
Activity Feeds ArchitectureConnectionsA BCDEFGHIJconnection_edges_reverseconnection_edges_forwardFriday, January 14, 2011T...
Activity Feeds ArchitectureConnectionsA BCDEFGHIJaCBaBAaBCaCDaDBaEBaEAaFAaEFaFEaEIaDIaIJaIGaJGaGJaHJaHGaHEFriday, January ...
Activity Feeds ArchitectureConnectionsfrom to affinityH E 0.3H G 0.7from to affinityJ H 0.75On H’s shardconnection_edges_for...
Activity Feeds ArchitectureActivitiesFriday, January 14, 2011Activities are the other database entity important to activit...
Activity Feeds Architectureactivity := (subject, verb, object)Friday, January 14, 2011As you can see in Rob’s magnetic poe...
Activity Feeds Architectureactivity := (subject, verb, object)(Kyle, favorited, brief jerky)(Steve, connected, Kyle)Friday...
Activity Feeds Architectureactivity := (subject, verb, object)activity := [owner,(subject, verb, object)]Friday, January 1...
Activity Feeds Architectureactivity := [owner,(subject, verb, object)]Steves shardKyles shard(Steve, connected, Kyle)[Stev...
Activity Feeds Architectureactivity := [owner,(subject, verb, object)]Kyles shardMixedspecies shard(Kyle, favorited, brief...
Activity Feeds ArchitectureBuilding a Feed‫ּט‬_‫ּט‬(Kyle, favorited, brief jerky)(Steve, connected, Kyle)Magic,cheatingMag...
Activity Feeds ArchitectureBuilding a Feed‫ּט‬_‫ּט‬(Kyle, favorited, brief jerky)(Steve, connected, Kyle)Magic,cheatingMag...
Activity Feeds ArchitectureAggregationShard 1NewsfeedShard 2Shard 3Shard 4(Steve, connected, Kyle)(Steve, favorited, foo)(...
Activity Feeds ArchitectureAggregation, Step 1: Choosing ConnectionsPotentially way too manyFriday, January 14, 2011We alr...
Activity Feeds ArchitectureAggregation, Step 1: Choosing Connections00.250.50.751“In Theory”AffinityConnection$choose_conn...
Activity Feeds ArchitectureAggregation, Step 1: Choosing Connections00.250.50.751“In Theory”AffinityConnection$choose_conn...
Activity Feeds ArchitectureAggregation, Step 1: Choosing Connections00.250.50.751“In Practice”AffinityConnectionFriday, Ja...
Activity Feeds ArchitectureAggregation, Step 1: Choosing Connections00.250.50.751“Even More In Practice”AffinityConnection...
Activity Feeds ArchitectureAggregation, Step 2: Making Activity Setsactivity0.00x0activity0.00x0activity0.00x0activity0.00...
Activity Feeds ArchitectureAggregation, Step 3: Classificationactivity0.00x11activity0.00x3activity0.00x20c1activity0.00x1...
Activity Feeds ArchitectureAggregation, Step 3: Classificationactivity0.00x11activity0.00x3activity0.00x20c1activity0.00x1...
Activity Feeds ArchitectureAggregation, Step 4: Scoringactivity0.80x11activity0.770x3activity0.90x20c1activity0.10x10004ac...
Activity Feeds ArchitectureAggregation, Step 5: Pruningactivity0.80x11activity0.770x3activity0.90x20c1activity0.10x10004ac...
Activity Feeds ArchitectureAggregation, Step 5: Pruningactivity0.80x11activity0.770x3activity0.90x20c1activity0.10x10004ac...
Activity Feeds ArchitectureAggregation, Step 6: Sort & Mergeactivity0.80x11activity0.770x3activity0.90x20c1activity0.10x10...
Activity Feeds ArchitectureAggregation, Step 6: Sort & Mergeactivity0.10x10004activity0.90x20c1activity0.770x3activity0.80...
Activity Feeds ArchitectureAggregation: Cleaning Upactivity0.10x10004activity0.90x20c1activity0.770x3activity0.80x11activi...
Activity Feeds ArchitectureAggregation: Cleaning Upactivity0.90x20c1activity0.770x3activity0.80x11activity0.470x4activity0...
Activity Feeds ArchitectureAggregationFriday, January 14, 2011Currently we peak at doing this about 25 times per second.
Activity Feeds ArchitectureAggregationFriday, January 14, 2011We do it for a lot of different reasons:- The feed owner has...
Activity Feeds ArchitectureDisplaymemcachedFriday, January 14, 2011Next I’m going to talk about how we get from the memcac...
Activity Feeds ArchitectureDisplay: done naively, sucksFriday, January 14, 2011If we just showed the activities in the ord...
Activity Feeds ArchitectureDisplay: Enter RollupsFriday, January 14, 2011To solve this problem we combine similar stories ...
Activity Feeds ArchitectureDisplay: Computing RollupsFriday, January 14, 2011Here’s an attempt at depicting how rollups ar...
Activity Feeds ArchitectureDisplay: Filling in Storiesactivity0.90x10004Story(Global details)StoryHydrator StoryTellerStor...
Activity Feeds ArchitectureMaking it Fast03006009001200Homepage Shop Listing Search ActivityResponse Time (ms)BoomFriday, ...
Activity Feeds ArchitectureHack #1: Cache Warming!_!(Kyle, favorited, brief jerky)(Steve, connected, Kyle)Magic,cheatingMa...
Activity Feeds ArchitectureHack #2: TTL CachingMay be his avatarfrom 5 minutesago.Big f’ing deal.Friday, January 14, 2011T...
Activity Feeds ArchitectureHack #3: Judicious AssociationsgetFinder(“UserProfile”)->find(...)notgetFinder(“User”)->find(...)-...
Activity Feeds ArchitectureHack #4: Lazy Below the FoldWe don’t load much at the outset.You get more as you scroll down(fin...
The EndFriday, January 14, 2011
Upcoming SlideShare
Loading in …5
×

jkllkjgggggggggggggggggggggggggg

433 views
408 views

Published on

dsadsad

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
433
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

jkllkjgggggggggggggggggggggggggg

  1. 1. Activity Feeds ArchitectureJanuary, 2011Friday, January 14, 2011
  2. 2. Activity Feeds ArchitectureTo Be Covered:•Data model•Where feeds come from•How feeds are displayed•OptimizationsFriday, January 14, 2011
  3. 3. Activity Feeds ArchitectureFundamental EntitiesActivitiesConnectionsFriday, January 14, 2011There are two fundamental building blocks for feeds: connections and activities.Activities form a log of what some entity on the site has done, or had done to it.Connections express relationships between entities.I will explain the data model for connections first.
  4. 4. Activity Feeds ArchitectureConnectionsConnectionsFavoritesCirclesOrdersEtc.Friday, January 14, 2011Connections are a superset of Circles, Favorites, Orders, and other relationships betweenentities on the site.
  5. 5. Activity Feeds ArchitectureConnectionsA BCDEFGHIJFriday, January 14, 2011Connections are implemented as a directed graph.Currently, the nodes can be people or shops. (In principle they can be other objects.)
  6. 6. Activity Feeds ArchitectureConnectionsA BCDEFGHIJconnection_edges_reverseconnection_edges_forwardFriday, January 14, 2011The edges of the graph are stored in two tables.For any node, connection_edges_forward lists outgoing edges and connection_edges_reverselists the incoming edges.In other words, we store each edge twice.
  7. 7. Activity Feeds ArchitectureConnectionsA BCDEFGHIJaCBaBAaBCaCDaDBaEBaEAaFAaEFaFEaEIaDIaIJaIGaJGaGJaHJaHGaHEFriday, January 14, 2011We also assign each edge a weight, known as affinity.
  8. 8. Activity Feeds ArchitectureConnectionsfrom to affinityH E 0.3H G 0.7from to affinityJ H 0.75On H’s shardconnection_edges_forwardconnection_edges_reverseFriday, January 14, 2011Here we see the data for Anda’s connections on her shard.She has two entries in the forward connections table for the people in her circle.She has one entry in the reverse connections so that she can see everyone following her.
  9. 9. Activity Feeds ArchitectureActivitiesFriday, January 14, 2011Activities are the other database entity important to activity feeds.
  10. 10. Activity Feeds Architectureactivity := (subject, verb, object)Friday, January 14, 2011As you can see in Rob’s magnetic poetry diagram, activities are a description of an event onEtsy boiled down to a subject (“who did it”), a verb (“what they did”), and an object (“whatthey did it to”).
  11. 11. Activity Feeds Architectureactivity := (subject, verb, object)(Kyle, favorited, brief jerky)(Steve, connected, Kyle)Friday, January 14, 2011Here are some examples of activities.The first one describes Steve adding Kyle to his circle.The second one describes Kyle favoriting an item.In each of these cases note that there are probably several parties interested in these events[examples]. The problem (the main one we’re trying to solve with activity feeds) is how tonotify all of them about it. In order to achieve that goal, as usual we copy the data all over theplace.
  12. 12. Activity Feeds Architectureactivity := (subject, verb, object)activity := [owner,(subject, verb, object)]Friday, January 14, 2011So what we do is duplicate the S,V,O combinations with different owners.Steve will have his record that he connected to Kyle, and Kyle will be given his own recordthat Steve connected to him.
  13. 13. Activity Feeds Architectureactivity := [owner,(subject, verb, object)]Steves shardKyles shard(Steve, connected, Kyle)[Steve, (Steve, connected, Kyle)][Kyle, (Steve, connected, Kyle)]Friday, January 14, 2011This is what that looks like.
  14. 14. Activity Feeds Architectureactivity := [owner,(subject, verb, object)]Kyles shardMixedspecies shard(Kyle, favorited, brief jerky)[Kyle, (Kyle, favorited, brief jerky)][MixedSpecies, (Kyle, favorited, brief jerky)][brief jerky, (Kyle, favorited, brief jerky)]Friday, January 14, 2011In more complicated examples there could be more than two owners.You could envision people being interested in Kyle, people being interested in MixedSpecies,or people being interested in brief jerky.In cases where there are this many writes, we will generally perform them with Gearman.Again, in order for interested parties to find the activities, we copy the activities all over theplace.
  15. 15. Activity Feeds ArchitectureBuilding a Feed‫ּט‬_‫ּט‬(Kyle, favorited, brief jerky)(Steve, connected, Kyle)Magic,cheatingMagic,cheatingNewsfeedFriday, January 14, 2011Now that we know about connections and activities, we can talk about how activities areturned into Newsfeeds and how those wind up being displayed to end users.
  16. 16. Activity Feeds ArchitectureBuilding a Feed‫ּט‬_‫ּט‬(Kyle, favorited, brief jerky)(Steve, connected, Kyle)Magic,cheatingMagic,cheatingNewsfeedAggregationDisplayFriday, January 14, 2011Getting to the end result (the activity feed page) has two distinct phases: aggregation anddisplay.
  17. 17. Activity Feeds ArchitectureAggregationShard 1NewsfeedShard 2Shard 3Shard 4(Steve, connected, Kyle)(Steve, favorited, foo)(Steve, connected, Kyle)(Kyle, favorited, brief jerky)(Wil, bought, mittens)(theblackapple, listed, widget)(Wil, bought, mittens)Friday, January 14, 2011I am going to talk about aggregation first.Aggregation turns activities (in the database) into a Newsfeed (in memcache).Aggregation typically occurs offline, with Gearman.
  18. 18. Activity Feeds ArchitectureAggregation, Step 1: Choosing ConnectionsPotentially way too manyFriday, January 14, 2011We already allow people to have more connections than would make sense on a single feed,or could be practically aggregated all at once.The first step in aggregation is to turn the list of people you are connected to into the list ofpeople we’re actually going to go seek out activities for.
  19. 19. Activity Feeds ArchitectureAggregation, Step 1: Choosing Connections00.250.50.751“In Theory”AffinityConnection$choose_connection = mt_rand() < $affinity;Friday, January 14, 2011In theory, the way we would do this is rank the connections by affinity and then treat theaffinity as the probability that we’ll pick it.
  20. 20. Activity Feeds ArchitectureAggregation, Step 1: Choosing Connections00.250.50.751“In Theory”AffinityConnection$choose_connection = mt_rand() < $affinity;Friday, January 14, 2011So then we’d be more likely to pick the close connections, but leaving the possibility that wewill pick the distant ones.
  21. 21. Activity Feeds ArchitectureAggregation, Step 1: Choosing Connections00.250.50.751“In Practice”AffinityConnectionFriday, January 14, 2011In practice we don’t really handle affinity yet.
  22. 22. Activity Feeds ArchitectureAggregation, Step 1: Choosing Connections00.250.50.751“Even More In Practice”AffinityConnectionFriday, January 14, 2011And most people don’t currently have enough connections for this to matter at all. (Meanconnections is around a dozen.)
  23. 23. Activity Feeds ArchitectureAggregation, Step 2: Making Activity Setsactivity0.00x0activity0.00x0activity0.00x0activity0.00x0activity0.00x0activity0.00x0activity0.00x0activity0.00x0activity0.00x0activity0.00x0activity0.00x0activity0.00x0activity0.00x0scoreflagsFriday, January 14, 2011Once the connections are chosen, we then select historical activity for them and convert theminto in-memory structures called activity sets.These are just the activities grouped by connection, with a score and flags field for each.The next few phases of aggregation operate on these.
  24. 24. Activity Feeds ArchitectureAggregation, Step 3: Classificationactivity0.00x11activity0.00x3activity0.00x20c1activity0.00x10004activity0.00x20c1activity0.00x4activity0.00x1001activity0.00x2003activity0.00x11activity0.00x11activity0.00x401activity0.00x5activity0.00x10004Friday, January 14, 2011The next thing that happens is that we iterate through all of the activities in all of the sets andclassify them.
  25. 25. Activity Feeds ArchitectureAggregation, Step 3: Classificationactivity0.00x11activity0.00x3activity0.00x20c1activity0.00x10004activity0.00x20c1activity0.00x4activity0.00x1001activity0.00x2003activity0.00x11activity0.00x11activity0.00x401activity0.00x5activity0.00x10004about_owner_shop | user_created_treasuryRob created a treasury featuringthe feed owners shop.Friday, January 14, 2011The flags are a bit field.They are all from the point of view of the feed owner.So the same activities on another person’s feed would be assigned different flags.
  26. 26. Activity Feeds ArchitectureAggregation, Step 4: Scoringactivity0.80x11activity0.770x3activity0.90x20c1activity0.10x10004activity0.60x20c1activity0.470x4activity0.80x1001activity0.550x2003activity0.80x11activity0.30x11activity0.250x401activity0.740x5activity0.90x10004Friday, January 14, 2011Next we fill in the score fields.At this point the score is just a simple time decay function (older activities always score lowerthan new ones).
  27. 27. Activity Feeds ArchitectureAggregation, Step 5: Pruningactivity0.80x11activity0.770x3activity0.90x20c1activity0.10x10004activity0.60x20c1activity0.470x4activity0.80x1001activity0.550x2003activity0.80x11activity0.30x11activity0.250x401activity0.740x5activity0.90x10004[Rob, (Rob, connected, Jared)][Jared, (Rob, connected, Jared)]Friday, January 14, 2011As we noted before it’s possible to wind up seeing the same event as two or more activities.The next stage of aggregation detects these situations.
  28. 28. Activity Feeds ArchitectureAggregation, Step 5: Pruningactivity0.80x11activity0.770x3activity0.90x20c1activity0.10x10004activity0.60x20c1activity0.470x4activity0.80x1001activity0.550x2003activity0.80x11activity0.30x11activity0.250x401activity0.740x5activity0.90x10004[Rob, (Rob, connected, Jared)][Jared, (Rob, connected, Jared)]Friday, January 14, 2011We iterate through the activity sets and remove the duplicates.Right now we can just cross off the second instance of the SVO pair; once we have commentsthis will be more complicated.
  29. 29. Activity Feeds ArchitectureAggregation, Step 6: Sort & Mergeactivity0.80x11activity0.770x3activity0.90x20c1activity0.10x10004activity0.60x20c1activity0.470x4activity0.80x1001activity0.550x2003activity0.30x11activity0.250x401activity0.740x5activity0.90x10004activity0.10x10004activity0.90x20c1activity0.770x3activity0.80x11activity0.470x4activity0.60x20c1activity0.550x2003activity0.80x1001activity0.90x10004activity0.740x5activity0.250x401activity0.30x11Friday, January 14, 2011Once everything is scored, classified, and de-duped we can flatten the whole thing and sort itby score.
  30. 30. Activity Feeds ArchitectureAggregation, Step 6: Sort & Mergeactivity0.10x10004activity0.90x20c1activity0.770x3activity0.80x11activity0.470x4activity0.60x20c1activity0.550x2003activity0.80x1001activity0.90x10004activity0.740x5activity0.250x401activity0.30x11activity0.10x10004activity0.90x20c1activity0.770x3activity0.80x11activity0.470x4activity0.60x20c1activity0.550x2003activity0.80x1001activity0.90x10004activity0.740x5activity0.250x401activity0.30x11activity0.30x4001activity0.2910x4001activity0.60xc001activity0.710x2c01activity0.7160x4c01activity0.0970x4activity0.020x1001NewsfeedFriday, January 14, 2011Then we take the final set of activities and merge it on to the owner’s existing newsfeed.(Or we create a new newsfeed if they don’t have one.)
  31. 31. Activity Feeds ArchitectureAggregation: Cleaning Upactivity0.10x10004activity0.90x20c1activity0.770x3activity0.80x11activity0.470x4activity0.60x20c1activity0.550x2003activity0.80x1001activity0.90x10004activity0.740x5activity0.250x401activity0.30x11activity0.30x4001activity0.2910x4001activity0.60xc001activity0.710x2c01activity0.7160x4c01activity0.0970x4activity0.020x1001NewsfeedToo manyJust fineFriday, January 14, 2011We trim off the end of the newsfeed, so that they don’t become arbitrarily large.
  32. 32. Activity Feeds ArchitectureAggregation: Cleaning Upactivity0.90x20c1activity0.770x3activity0.80x11activity0.470x4activity0.60x20c1activity0.550x2003activity0.80x1001activity0.90x10004activity0.740x5activity0.30x11activity0.30x4001activity0.60xc001activity0.710x2c01activity0.7160x4c01NewsfeedmemcachedFriday, January 14, 2011And then finally we stuff the feed into memcached.
  33. 33. Activity Feeds ArchitectureAggregationFriday, January 14, 2011Currently we peak at doing this about 25 times per second.
  34. 34. Activity Feeds ArchitectureAggregationFriday, January 14, 2011We do it for a lot of different reasons:- The feed owner has done something, or logged in.- On a schedule, with cron.- We also aggregate for your connections when you do something (purple). This is a hack andwon’t scale.
  35. 35. Activity Feeds ArchitectureDisplaymemcachedFriday, January 14, 2011Next I’m going to talk about how we get from the memcached newsfeed to the final product.
  36. 36. Activity Feeds ArchitectureDisplay: done naively, sucksFriday, January 14, 2011If we just showed the activities in the order that they’re in on the newsfeed, it would look likethis.
  37. 37. Activity Feeds ArchitectureDisplay: Enter RollupsFriday, January 14, 2011To solve this problem we combine similar stories into rollups.
  38. 38. Activity Feeds ArchitectureDisplay: Computing RollupsFriday, January 14, 2011Here’s an attempt at depicting how rollups are created.The feed is divided up into sections, so that you don’t wind up seeing all of the reds, greens,etc. on the entire feed in just a few very large rollups.Then the similar stories are grouped together within the sections.
  39. 39. Activity Feeds ArchitectureDisplay: Filling in Storiesactivity0.90x10004Story(Global details)StoryHydrator StoryTellerStory(Feed-owner-specific details)memcachedhtmlSmartyFriday, January 14, 2011Once that’s done, we can go through the rest of the display pipeline for the root story in eachrollup.There are multiple layers of caching here. Things that are global (like the shop associatedwith a favorited listing) are cached separately from things that are unique to the personlooking at the feed (like the exact way the story is phrased).
  40. 40. Activity Feeds ArchitectureMaking it Fast03006009001200Homepage Shop Listing Search ActivityResponse Time (ms)BoomFriday, January 14, 2011Finally I’m going to go through a few ways that we’ve sped up activity, to the point where it’sone of the faster pages on the site (despite being pretty complicated).
  41. 41. Activity Feeds ArchitectureHack #1: Cache Warming!_!(Kyle, favorited, brief jerky)(Steve, connected, Kyle)Magic,cheatingMagic,cheatingNewsfeedFriday, January 14, 2011The first thing we do to speed things up is run almost the entire pipeline proactively usinggearman.So after aggregation we trigger a display run, even though nobody is there to look at thehtml.The end result is that almost every pageview is against a hot cache.
  42. 42. Activity Feeds ArchitectureHack #2: TTL CachingMay be his avatarfrom 5 minutesago.Big f’ing deal.Friday, January 14, 2011The second thing we do is add bits of TTL caching where few people will notice.Straightforward but not done in many places on the site.Note that his avatar here is tied to the story. If he generates new activity he’ll see his newavatar.
  43. 43. Activity Feeds ArchitectureHack #3: Judicious AssociationsgetFinder(“UserProfile”)->find(...)notgetFinder(“User”)->find(...)->ProfileFriday, January 14, 2011We also profiled the pages and meticulously simplified ORM usage.Again this sounds obvious but it’s really easy to lose track of what you’re doing as you handthe user off to the template. Lots of ORM calls were originally actually being made by thetemplate.
  44. 44. Activity Feeds ArchitectureHack #4: Lazy Below the FoldWe don’t load much at the outset.You get more as you scroll down(finite scrolling).Friday, January 14, 2011
  45. 45. The EndFriday, January 14, 2011

×