SlideShare a Scribd company logo
1 of 45
Download to read offline
Activity Feeds Architecture
                                   January, 2011




Friday, January 14, 2011
Activity Feeds Architecture




                                       To Be Covered:
                 ā€¢Data model
                 ā€¢Where feeds come from
                 ā€¢How feeds are displayed
                 ā€¢Optimizations




Friday, January 14, 2011
Activity Feeds Architecture


                                  Fundamental Entities
                            Connections                        Activities




Friday, January 14, 2011

There are two fundamental building blocks for feeds: connections and activities.
Activities form a log of what some entity on the site has done, or had done to it.
Connections express relationships between entities.
I will explain the data model for connections ļ¬rst.
Activity Feeds Architecture


                                           Connections


                                               Favorites
                                                                         Circles




                                                           Connections


                                        Etc.



                                                              Orders




Friday, January 14, 2011

Connections are a superset of Circles, Favorites, Orders, and other relationships between
entities on the site.
Activity Feeds Architecture

                                             Connections
                                     A                 B




                                                               C




                                         F




                                                   E           D




                                               G                   I




                                         H

                                                           J




Friday, January 14, 2011

Connections are implemented as a directed graph.
Currently, the nodes can be people or shops. (In principle they can be other objects.)
Activity Feeds Architecture


                                                      Connections
                                              A                 B




                                                                        C




                                                  F
                   connection_edges_forward

                   connection_edges_reverse                 E           D




                                                        G                   I




                                                  H

                                                                    J




Friday, January 14, 2011

The edges of the graph are stored in two tables.
For any node, connection_edges_forward lists outgoing edges and connection_edges_reverse
lists the incoming edges.
In other words, we store each edge twice.
Activity Feeds Architecture


                                               Connections
                                                             aBA
                                                                                                          aCB


                                     A                                     B

                                                                                                    aBC

                                         aFA               aEA                                                   C
                                                                           aEB
                                                                                              aDB
                                                                                                                           aCD

                                                       aEF
                                         F


                                                 aFE
                                                                       E                                         D


                                                                                        aEI
                                                                                                                     aDI




                                                                                  aIG
                                               aHE



                                                                 G                       aJG                         I

                                                     aHG                    aGJ
                                                                                                           aIJ




                                         H                           aHJ

                                                                                                J




Friday, January 14, 2011

We also assign each edge a weight, known as affinity.
Activity Feeds Architecture


                                           Connections
         On Hā€™s shard

               connection_edges_forward


                               from                  to         afļ¬nity
                                  H                  E            0.3
                                  H                  G            0.7
                connection_edges_reverse


                               from                  to         afļ¬nity
                                   J                 H            0.75




Friday, January 14, 2011

Here we see the data for Andaā€™s connections on her shard.
She has two entries in the forward connections table for the people in her circle.
She has one entry in the reverse connections so that she can see everyone following her.
Activity Feeds Architecture




                                            Activities




Friday, January 14, 2011

Activities are the other database entity important to activity feeds.
Activity Feeds Architecture




                           activity := (subject, verb, object)




Friday, January 14, 2011

As you can see in Robā€™s magnetic poetry diagram, activities are a description of an event on
Etsy boiled down to a subject (ā€œwho did itā€), a verb (ā€œwhat they didā€), and an object (ā€œwhat
they did it toā€).
Activity Feeds Architecture




                           activity := (subject, verb, object)


                               (Steve, connected, Kyle)

                             (Kyle, favorited, brief jerky)




Friday, January 14, 2011

Here are some examples of activities.
The ļ¬rst one describes Steve adding Kyle to his circle.
The second one describes Kyle favoriting an item.
In each of these cases note that there are probably several parties interested in these events
[examples]. The problem (the main one weā€™re trying to solve with activity feeds) is how to
notify all of them about it. In order to achieve that goal, as usual we copy the data all over the
place.
Activity Feeds Architecture




                           activity := (subject, verb, object)



        activity := [owner,(subject, verb, object)]




Friday, January 14, 2011

So what we do is duplicate the S,V,O combinations with different owners.
Steve will have his record that he connected to Kyle, and Kyle will be given his own record
that Steve connected to him.
Activity Feeds Architecture


        activity := [owner,(subject, verb, object)]



                                                      [Steve, (Steve, connected, Kyle)]   Steve's shard




                           (Steve, connected, Kyle)




                                                      [Kyle, (Steve, connected, Kyle)]    Kyle's shard




Friday, January 14, 2011

This is what that looks like.
Activity Feeds Architecture


        activity := [owner,(subject, verb, object)]


                                                                [Kyle, (Kyle, favorited, brief jerky)]       Kyle's shard




                            (Kyle, favorited, brief jerky)




                                                         [MixedSpecies, (Kyle, favorited, brief jerky)]
                                                                                                          Mixedspecies' shard
                                                         [brief jerky, (Kyle, favorited, brief jerky)]




Friday, January 14, 2011

In more complicated examples there could be more than two owners.
You could envision people being interested in Kyle, people being interested in MixedSpecies,
or people being interested in brief jerky.
In cases where there are this many writes, we will generally perform them with Gearman.
Again, in order for interested parties to ļ¬nd the activities, we copy the activities all over the
place.
Activity Feeds Architecture




                                                   Building a Feed

                                (Kyle, favorited, brief jerky)
                                                                                       ā€«Ö¼×˜_ּטā€¬
                                 (Steve, connected, Kyle)



                                                                                         Magic,
                                                                                        cheating




                                                                  Magic,
                                                                            Newsfeed
                                                                 cheating




Friday, January 14, 2011

Now that we know about connections and activities, we can talk about how activities are
turned into Newsfeeds and how those wind up being displayed to end users.
Activity Feeds Architecture




                                                    Building a Feed

                                (Kyle, favorited, brief jerky)
                                                                                       ā€«Ö¼×˜_ּטā€¬
                                 (Steve, connected, Kyle)
                                                                                               Display

                                                                                         Magic,
                                                                                        cheating




                                                                  Magic,
                                                                            Newsfeed
                                                                 cheating



                                                     Aggregation




Friday, January 14, 2011

Getting to the end result (the activity feed page) has two distinct phases: aggregation and
display.
Activity Feeds Architecture



                                                          Aggregation

                                          (Steve, connected, Kyle)
                               Shard 1      (Steve, favorited, foo)




                                         (theblackapple, listed, widget)
                                             (Wil, bought, mittens)
                               Shard 2

                                                                           Newsfeed



                                         (Wil, bought, mittens)
                               Shard 3




                                          (Steve, connected, Kyle)
                               Shard 4   (Kyle, favorited, brief jerky)




Friday, January 14, 2011

I am going to talk about aggregation ļ¬rst.
Aggregation turns activities (in the database) into a Newsfeed (in memcache).
Aggregation typically occurs offline, with Gearman.
Activity Feeds Architecture


                           Aggregation, Step 1: Choosing Connections




                                                         Potentially way too many




Friday, January 14, 2011

We already allow people to have more connections than would make sense on a single feed,
or could be practically aggregated all at once.
The ļ¬rst step in aggregation is to turn the list of people you are connected to into the list of
people weā€™re actually going to go seek out activities for.
Activity Feeds Architecture


                           Aggregation, Step 1: Choosing Connections

                                                    ā€œIn Theoryā€


                      1


                 0.75
   Affinity
                   0.5


                 0.25


                      0
                                                    Connection



                                      $choose_connection = mt_rand() < $affinity;


Friday, January 14, 2011

In theory, the way we would do this is rank the connections by affinity and then treat the
affinity as the probability that weā€™ll pick it.
Activity Feeds Architecture


                           Aggregation, Step 1: Choosing Connections

                                                    ā€œIn Theoryā€


                      1


                 0.75
   Affinity
                   0.5


                 0.25


                      0
                                                    Connection



                                      $choose_connection = mt_rand() < $affinity;


Friday, January 14, 2011

So then weā€™d be more likely to pick the close connections, but leaving the possibility that we
will pick the distant ones.
Activity Feeds Architecture


                           Aggregation, Step 1: Choosing Connections

                                               ā€œIn Practiceā€


                      1


                 0.75
   Affinity
                   0.5


                 0.25


                      0
                                                Connection




Friday, January 14, 2011

In practice we donā€™t really handle affinity yet.
Activity Feeds Architecture


                           Aggregation, Step 1: Choosing Connections

                                            ā€œEven More In Practiceā€


                      1


                 0.75
   Affinity
                   0.5


                 0.25


                      0

                                                  Connection




Friday, January 14, 2011

And most people donā€™t currently have enough connections for this to matter at all. (Mean
connections is around a dozen.)
Activity Feeds Architecture


                           Aggregation, Step 2: Making Activity Sets

                                          score      activity   activity   activity   activity

                                                       0.0        0.0        0.0        0.0
                                          ļ¬‚ags
                                                      0x0        0x0        0x0        0x0




                                                     activity   activity

                                                       0.0        0.0
                                                      0x0        0x0




                                                     activity   activity   activity

                                                       0.0        0.0        0.0
                                                      0x0        0x0        0x0




                                                     activity   activity   activity   activity

                                                       0.0        0.0        0.0        0.0
                                                       0x0        0x0        0x0        0x0




Friday, January 14, 2011

Once the connections are chosen, we then select historical activity for them and convert them
into in-memory structures called activity sets.
These are just the activities grouped by connection, with a score and ļ¬‚ags ļ¬eld for each.
The next few phases of aggregation operate on these.
Activity Feeds Architecture


                              Aggregation, Step 3: Classification

                                            activity   activity   activity   activity

                                              0.0        0.0        0.0        0.0
                                             0x11       0x3       0x20c1     0x10004




                                            activity   activity

                                              0.0        0.0
                                            0x20c1      0x4




                                            activity   activity   activity

                                              0.0        0.0        0.0
                                            0x1001     0x2003      0x11




                                            activity   activity   activity   activity

                                              0.0        0.0        0.0        0.0
                                             0x11      0x401        0x5      0x10004




Friday, January 14, 2011

The next thing that happens is that we iterate through all of the activities in all of the sets and
classify them.
Activity Feeds Architecture


                              Aggregation, Step 3: Classification

                               activity   activity   activity   activity

                                 0.0        0.0        0.0        0.0
                                0x11       0x3       0x20c1     0x10004    about_owner_shop | user_created_treasury

                                                                           Rob created a treasury featuring
                                                                           the feed owner's shop.

                               activity   activity

                                 0.0        0.0
                               0x20c1      0x4




                               activity   activity   activity

                                 0.0        0.0        0.0
                               0x1001     0x2003      0x11




                               activity   activity   activity   activity

                                 0.0        0.0        0.0        0.0
                                0x11      0x401        0x5      0x10004




Friday, January 14, 2011

The ļ¬‚ags are a bit ļ¬eld.
They are all from the point of view of the feed owner.
So the same activities on another personā€™s feed would be assigned different ļ¬‚ags.
Activity Feeds Architecture


                                  Aggregation, Step 4: Scoring

                                            activity   activity   activity   activity

                                              0.8       0.77        0.9        0.1
                                             0x11       0x3       0x20c1     0x10004




                                            activity   activity

                                              0.6       0.47
                                            0x20c1      0x4




                                            activity   activity   activity

                                              0.8       0.55        0.8
                                            0x1001     0x2003      0x11




                                            activity   activity   activity   activity

                                              0.3       0.25       0.74        0.9
                                             0x11      0x401        0x5      0x10004




Friday, January 14, 2011

Next we ļ¬ll in the score ļ¬elds.
At this point the score is just a simple time decay function (older activities always score lower
than new ones).
Activity Feeds Architecture


                                  Aggregation, Step 5: Pruning
                                                                               [Rob, (Rob, connected, Jared)]



                                   activity   activity   activity   activity

                                     0.8       0.77        0.9        0.1
                                    0x11       0x3       0x20c1     0x10004




                                   activity   activity

                                     0.6       0.47
                                   0x20c1      0x4




                                   activity   activity   activity              [Jared, (Rob, connected, Jared)]

                                     0.8       0.55        0.8
                                   0x1001     0x2003      0x11




                                   activity   activity   activity   activity

                                     0.3       0.25       0.74        0.9
                                    0x11      0x401        0x5      0x10004




Friday, January 14, 2011

As we noted before itā€™s possible to wind up seeing the same event as two or more activities.
The next stage of aggregation detects these situations.
Activity Feeds Architecture


                                  Aggregation, Step 5: Pruning
                                                                               [Rob, (Rob, connected, Jared)]



                                   activity   activity   activity   activity

                                     0.8       0.77        0.9        0.1
                                    0x11       0x3       0x20c1     0x10004




                                   activity   activity

                                     0.6       0.47
                                   0x20c1      0x4




                                   activity   activity   activity              [Jared, (Rob, connected, Jared)]

                                     0.8       0.55        0.8
                                   0x1001     0x2003      0x11




                                   activity   activity   activity   activity

                                     0.3       0.25       0.74        0.9
                                    0x11      0x401        0x5      0x10004




Friday, January 14, 2011

We iterate through the activity sets and remove the duplicates.
Right now we can just cross off the second instance of the SVO pair; once we have comments
this will be more complicated.
Activity Feeds Architecture


                                                  Aggregation, Step 6: Sort & Merge

                           activity   activity   activity   activity

                             0.8       0.77        0.9        0.1
                            0x11       0x3       0x20c1     0x10004




                           activity   activity

                             0.6       0.47
                           0x20c1      0x4
                                                                       activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity

                                                                         0.9        0.9        0.8        0.8       0.77       0.74        0.6       0.55       0.47        0.3       0.25        0.1
                                                                       0x20c1     0x10004    0x1001      0x11       0x3        0x5       0x20c1     0x2003      0x4        0x11      0x401      0x10004
                           activity   activity

                             0.8       0.55
                           0x1001     0x2003




                           activity   activity   activity   activity

                             0.3       0.25       0.74        0.9
                            0x11      0x401        0x5      0x10004




Friday, January 14, 2011

Once everything is scored, classiļ¬ed, and de-duped we can ļ¬‚atten the whole thing and sort it
by score.
Activity Feeds Architecture


                                                       Aggregation, Step 6: Sort & Merge

                                           activity    activity    activity   activity   activity   activity   activity   activity   activity   activity   activity   activity

                                             0.9         0.9         0.8        0.8       0.77       0.74        0.6       0.55       0.47        0.3       0.25         0.1
                                           0x20c1      0x10004     0x1001      0x11       0x3        0x5       0x20c1     0x2003      0x4        0x11      0x401      0x10004




         activity   activity    activity    activity    activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity

           0.9        0.9         0.8         0.8        0.77       0.74      0.716       0.71        0.6        0.6       0.55       0.47        0.3        0.3      0.291       0.25        0.1      0.097       0.02
         0x20c1     0x10004     0x1001       0x11        0x3        0x5       0x4c01     0x2c01     0x20c1     0xc001     0x2003      0x4        0x11      0x4001     0x4001     0x401      0x10004     0x4       0x1001



        Newsfeed




Friday, January 14, 2011

Then we take the ļ¬nal set of activities and merge it on to the ownerā€™s existing newsfeed.
(Or we create a new newsfeed if they donā€™t have one.)
Activity Feeds Architecture


                                                                  Aggregation: Cleaning Up



                                                                                                                                                   Just ļ¬ne            Too many


          activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity

            0.9        0.9        0.8        0.8       0.77       0.74      0.716       0.71        0.6        0.6       0.55       0.47        0.3        0.3      0.291       0.25        0.1      0.097       0.02
          0x20c1     0x10004    0x1001      0x11       0x3        0x5       0x4c01     0x2c01     0x20c1     0xc001     0x2003      0x4        0x11      0x4001     0x4001     0x401      0x10004     0x4       0x1001



         Newsfeed




Friday, January 14, 2011

We trim off the end of the newsfeed, so that they donā€™t become arbitrarily large.
Activity Feeds Architecture


                                                            Aggregation: Cleaning Up



                           activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity   activity

                             0.9        0.9        0.8        0.8       0.77       0.74      0.716       0.71        0.6        0.6       0.55       0.47        0.3        0.3
                           0x20c1     0x10004    0x1001      0x11       0x3        0x5       0x4c01     0x2c01     0x20c1     0xc001     0x2003      0x4        0x11      0x4001



                           Newsfeed




                                                                                             memcached




Friday, January 14, 2011

And then ļ¬nally we stuff the feed into memcached.
Activity Feeds Architecture


                                            Aggregation




Friday, January 14, 2011

Currently we peak at doing this about 25 times per second.
Activity Feeds Architecture


                                            Aggregation




Friday, January 14, 2011

We do it for a lot of different reasons:
- The feed owner has done something, or logged in.
- On a schedule, with cron.
- We also aggregate for your connections when you do something (purple). This is a hack and
wonā€™t scale.
Activity Feeds Architecture



                                              Display




                 memcached




Friday, January 14, 2011

Next Iā€™m going to talk about how we get from the memcached newsfeed to the ļ¬nal product.
Activity Feeds Architecture



                            Display: done naively, sucks




Friday, January 14, 2011

If we just showed the activities in the order that theyā€™re in on the newsfeed, it would look like
this.
Activity Feeds Architecture



                                 Display: Enter Rollups




Friday, January 14, 2011

To solve this problem we combine similar stories into rollups.
Activity Feeds Architecture



                             Display: Computing Rollups




Friday, January 14, 2011

Hereā€™s an attempt at depicting how rollups are created.
The feed is divided up into sections, so that you donā€™t wind up seeing all of the reds, greens,
etc. on the entire feed in just a few very large rollups.
Then the similar stories are grouped together within the sections.
Activity Feeds Architecture



                                           Display: Filling in Stories


                           activity                                                            Story
                                                           Story
                             0.9
                                                                                                                      html
                                                                                           (Feed-owner-
                                      StoryHydrator   (Global details)      StoryTeller                      Smarty
                           0x10004                                                        speciļ¬c details)




                                                                         memcached




Friday, January 14, 2011

Once thatā€™s done, we can go through the rest of the display pipeline for the root story in each
rollup.
There are multiple layers of caching here. Things that are global (like the shop associated
with a favorited listing) are cached separately from things that are unique to the person
looking at the feed (like the exact way the story is phrased).
Activity Feeds Architecture



                                         Making it Fast
                                                Response Time (ms)

             1200


                900

                                                                              Boom
                600


                300


                    0
                           Homepage      Shop          Listing       Search     Activity



Friday, January 14, 2011

Finally Iā€™m going to go through a few ways that weā€™ve sped up activity, to the point where itā€™s
one of the faster pages on the site (despite being pretty complicated).
Activity Feeds Architecture



                               Hack #1: Cache Warming
                                                                                       !_!

                                (Kyle, favorited, brief jerky)
                                 (Steve, connected, Kyle)


                                                                                        Magic,
                                                                                       cheating




                                                                  Magic,
                                                                            Newsfeed
                                                                 cheating




Friday, January 14, 2011

The ļ¬rst thing we do to speed things up is run almost the entire pipeline proactively using
gearman.
So after aggregation we trigger a display run, even though nobody is there to look at the
html.
The end result is that almost every pageview is against a hot cache.
Activity Feeds Architecture



                                 Hack #2: TTL Caching

     May be his avatar
      from 5 minutes
           ago.

         Big fā€™ing deal.




Friday, January 14, 2011

The second thing we do is add bits of TTL caching where few people will notice.
Straightforward but not done in many places on the site.
Note that his avatar here is tied to the story. If he generates new activity heā€™ll see his new
avatar.
Activity Feeds Architecture



                           Hack #3: Judicious Associations

      getFinder(ā€œUserProļ¬leā€)->ļ¬nd
                   (...)

                       not
       getFinder(ā€œUserā€)->ļ¬nd(...)-
                 >Proļ¬le




Friday, January 14, 2011

We also proļ¬led the pages and meticulously simpliļ¬ed ORM usage.
Again this sounds obvious but itā€™s really easy to lose track of what youā€™re doing as you hand
the user off to the template. Lots of ORM calls were originally actually being made by the
template.
Activity Feeds Architecture



                           Hack #4: Lazy Below the Fold




                            We donā€™t load much at the outset.
                            You get more as you scroll down
                            (ļ¬nite scrolling).


Friday, January 14, 2011
The End




Friday, January 14, 2011

More Related Content

What's hot

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBMike Dirolf
Ā 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas BonƩr
Ā 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark ArchitectureAlexey Grishchenko
Ā 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservicespflueras
Ā 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBasealexbaranau
Ā 
InnoDB MVCC Architecture (by ź¶Œź±“ģš°)
InnoDB MVCC Architecture (by ź¶Œź±“ģš°)InnoDB MVCC Architecture (by ź¶Œź±“ģš°)
InnoDB MVCC Architecture (by ź¶Œź±“ģš°)I Goo Lee.
Ā 
DNS Security Presentation ISSA
DNS Security Presentation ISSADNS Security Presentation ISSA
DNS Security Presentation ISSASrikrupa Srivatsan
Ā 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling TwitterBlaine
Ā 
Redis cluster
Redis clusterRedis cluster
Redis clusteriammutex
Ā 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryTsz-Wo (Nicholas) Sze
Ā 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
Ā 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PGConf APAC
Ā 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Jay Patel
Ā 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
Ā 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowPyData
Ā 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep InternalEXEM
Ā 
Clean Architecture Applications in Python
Clean Architecture Applications in PythonClean Architecture Applications in Python
Clean Architecture Applications in PythonSubhash Bhushan
Ā 
Nodejs presentation
Nodejs presentationNodejs presentation
Nodejs presentationArvind Devaraj
Ā 
Nginx Internals
Nginx InternalsNginx Internals
Nginx InternalsJoshua Zhu
Ā 

What's hot (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Ā 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Ā 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Ā 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
Ā 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
Ā 
InnoDB MVCC Architecture (by ź¶Œź±“ģš°)
InnoDB MVCC Architecture (by ź¶Œź±“ģš°)InnoDB MVCC Architecture (by ź¶Œź±“ģš°)
InnoDB MVCC Architecture (by ź¶Œź±“ģš°)
Ā 
DNS Security Presentation ISSA
DNS Security Presentation ISSADNS Security Presentation ISSA
DNS Security Presentation ISSA
Ā 
Scaling Twitter
Scaling TwitterScaling Twitter
Scaling Twitter
Ā 
Redis cluster
Redis clusterRedis cluster
Redis cluster
Ā 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
Ā 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
Ā 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
Ā 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
Ā 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
Ā 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
Ā 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
Ā 
Clean Architecture Applications in Python
Clean Architecture Applications in PythonClean Architecture Applications in Python
Clean Architecture Applications in Python
Ā 
Apache ZooKeeper
Apache ZooKeeperApache ZooKeeper
Apache ZooKeeper
Ā 
Nodejs presentation
Nodejs presentationNodejs presentation
Nodejs presentation
Ā 
Nginx Internals
Nginx InternalsNginx Internals
Nginx Internals
Ā 

Viewers also liked

Marketing 2.0 - L'utente come canale di vendita grazie all'approccio conversa...
Marketing 2.0 - L'utente come canale di vendita grazie all'approccio conversa...Marketing 2.0 - L'utente come canale di vendita grazie all'approccio conversa...
Marketing 2.0 - L'utente come canale di vendita grazie all'approccio conversa...Claudio Vaccaro
Ā 
Copy Strategy Brand Amiacque
Copy Strategy Brand AmiacqueCopy Strategy Brand Amiacque
Copy Strategy Brand AmiacqueNopeidea Ā®
Ā 
Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...
Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...
Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...Zalo_app
Ā 
SĘ” lĘ°į»£c kiįŗæn trĆŗc hį»‡ thį»‘ng Zing Me
SĘ” lĘ°į»£c kiįŗæn trĆŗc hį»‡ thį»‘ng Zing MeSĘ” lĘ°į»£c kiįŗæn trĆŗc hį»‡ thį»‘ng Zing Me
SĘ” lĘ°į»£c kiįŗæn trĆŗc hį»‡ thį»‘ng Zing Mezingopen
Ā 
Building ZingMe News Feed System
Building ZingMe News Feed SystemBuilding ZingMe News Feed System
Building ZingMe News Feed SystemChau Thanh
Ā 
Design a scalable social network: Problems and solutions
Design a scalable social network: Problems and solutionsDesign a scalable social network: Problems and solutions
Design a scalable social network: Problems and solutionsChau Thanh
Ā 
Choose Boring Technology
Choose Boring TechnologyChoose Boring Technology
Choose Boring TechnologyDan McKinley
Ā 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M usersJongyoon Choi
Ā 
Design for Continuous Experimentation
Design for Continuous ExperimentationDesign for Continuous Experimentation
Design for Continuous ExperimentationDan McKinley
Ā 
The Brand Strategy Canvas: a One-Page Strategy for Startups
The Brand Strategy Canvas: a One-Page Strategy for StartupsThe Brand Strategy Canvas: a One-Page Strategy for Startups
The Brand Strategy Canvas: a One-Page Strategy for Startupspatrickjwoods
Ā 

Viewers also liked (11)

14 La Copy Strategy
14 La Copy Strategy14 La Copy Strategy
14 La Copy Strategy
Ā 
Marketing 2.0 - L'utente come canale di vendita grazie all'approccio conversa...
Marketing 2.0 - L'utente come canale di vendita grazie all'approccio conversa...Marketing 2.0 - L'utente come canale di vendita grazie all'approccio conversa...
Marketing 2.0 - L'utente come canale di vendita grazie all'approccio conversa...
Ā 
Copy Strategy Brand Amiacque
Copy Strategy Brand AmiacqueCopy Strategy Brand Amiacque
Copy Strategy Brand Amiacque
Ā 
Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...
Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...
Inside Zalo: Developing a mobile messenger for the audience of millions - VN ...
Ā 
SĘ” lĘ°į»£c kiįŗæn trĆŗc hį»‡ thį»‘ng Zing Me
SĘ” lĘ°į»£c kiįŗæn trĆŗc hį»‡ thį»‘ng Zing MeSĘ” lĘ°į»£c kiįŗæn trĆŗc hį»‡ thį»‘ng Zing Me
SĘ” lĘ°į»£c kiįŗæn trĆŗc hį»‡ thį»‘ng Zing Me
Ā 
Building ZingMe News Feed System
Building ZingMe News Feed SystemBuilding ZingMe News Feed System
Building ZingMe News Feed System
Ā 
Design a scalable social network: Problems and solutions
Design a scalable social network: Problems and solutionsDesign a scalable social network: Problems and solutions
Design a scalable social network: Problems and solutions
Ā 
Choose Boring Technology
Choose Boring TechnologyChoose Boring Technology
Choose Boring Technology
Ā 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
Ā 
Design for Continuous Experimentation
Design for Continuous ExperimentationDesign for Continuous Experimentation
Design for Continuous Experimentation
Ā 
The Brand Strategy Canvas: a One-Page Strategy for Startups
The Brand Strategy Canvas: a One-Page Strategy for StartupsThe Brand Strategy Canvas: a One-Page Strategy for Startups
The Brand Strategy Canvas: a One-Page Strategy for Startups
Ā 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
Ā 
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelDeepika Singh
Ā 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
Ā 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
Ā 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
Ā 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
Ā 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
Ā 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
Ā 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
Ā 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
Ā 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
Ā 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
Ā 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
Ā 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vƔzquez
Ā 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
Ā 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Ā 
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Ā 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Ā 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Ā 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
Ā 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Ā 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Ā 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Ā 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Ā 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Ā 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Ā 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Ā 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Ā 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
Ā 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Ā 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Ā 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Ā 

Etsy Activity Feeds Architecture

  • 1. Activity Feeds Architecture January, 2011 Friday, January 14, 2011
  • 2. Activity Feeds Architecture To Be Covered: ā€¢Data model ā€¢Where feeds come from ā€¢How feeds are displayed ā€¢Optimizations Friday, January 14, 2011
  • 3. Activity Feeds Architecture Fundamental Entities Connections Activities Friday, January 14, 2011 There are two fundamental building blocks for feeds: connections and activities. Activities form a log of what some entity on the site has done, or had done to it. Connections express relationships between entities. I will explain the data model for connections ļ¬rst.
  • 4. Activity Feeds Architecture Connections Favorites Circles Connections Etc. Orders Friday, January 14, 2011 Connections are a superset of Circles, Favorites, Orders, and other relationships between entities on the site.
  • 5. Activity Feeds Architecture Connections A B C F E D G I H J Friday, January 14, 2011 Connections are implemented as a directed graph. Currently, the nodes can be people or shops. (In principle they can be other objects.)
  • 6. Activity Feeds Architecture Connections A B C F connection_edges_forward connection_edges_reverse E D G I H J Friday, January 14, 2011 The edges of the graph are stored in two tables. For any node, connection_edges_forward lists outgoing edges and connection_edges_reverse lists the incoming edges. In other words, we store each edge twice.
  • 7. Activity Feeds Architecture Connections aBA aCB A B aBC aFA aEA C aEB aDB aCD aEF F aFE E D aEI aDI aIG aHE G aJG I aHG aGJ aIJ H aHJ J Friday, January 14, 2011 We also assign each edge a weight, known as affinity.
  • 8. Activity Feeds Architecture Connections On Hā€™s shard connection_edges_forward from to afļ¬nity H E 0.3 H G 0.7 connection_edges_reverse from to afļ¬nity J H 0.75 Friday, January 14, 2011 Here we see the data for Andaā€™s connections on her shard. She has two entries in the forward connections table for the people in her circle. She has one entry in the reverse connections so that she can see everyone following her.
  • 9. Activity Feeds Architecture Activities Friday, January 14, 2011 Activities are the other database entity important to activity feeds.
  • 10. Activity Feeds Architecture activity := (subject, verb, object) Friday, January 14, 2011 As you can see in Robā€™s magnetic poetry diagram, activities are a description of an event on Etsy boiled down to a subject (ā€œwho did itā€), a verb (ā€œwhat they didā€), and an object (ā€œwhat they did it toā€).
  • 11. Activity Feeds Architecture activity := (subject, verb, object) (Steve, connected, Kyle) (Kyle, favorited, brief jerky) Friday, January 14, 2011 Here are some examples of activities. The ļ¬rst one describes Steve adding Kyle to his circle. The second one describes Kyle favoriting an item. In each of these cases note that there are probably several parties interested in these events [examples]. The problem (the main one weā€™re trying to solve with activity feeds) is how to notify all of them about it. In order to achieve that goal, as usual we copy the data all over the place.
  • 12. Activity Feeds Architecture activity := (subject, verb, object) activity := [owner,(subject, verb, object)] Friday, January 14, 2011 So what we do is duplicate the S,V,O combinations with different owners. Steve will have his record that he connected to Kyle, and Kyle will be given his own record that Steve connected to him.
  • 13. Activity Feeds Architecture activity := [owner,(subject, verb, object)] [Steve, (Steve, connected, Kyle)] Steve's shard (Steve, connected, Kyle) [Kyle, (Steve, connected, Kyle)] Kyle's shard Friday, January 14, 2011 This is what that looks like.
  • 14. Activity Feeds Architecture activity := [owner,(subject, verb, object)] [Kyle, (Kyle, favorited, brief jerky)] Kyle's shard (Kyle, favorited, brief jerky) [MixedSpecies, (Kyle, favorited, brief jerky)] Mixedspecies' shard [brief jerky, (Kyle, favorited, brief jerky)] Friday, January 14, 2011 In more complicated examples there could be more than two owners. You could envision people being interested in Kyle, people being interested in MixedSpecies, or people being interested in brief jerky. In cases where there are this many writes, we will generally perform them with Gearman. Again, in order for interested parties to ļ¬nd the activities, we copy the activities all over the place.
  • 15. Activity Feeds Architecture Building a Feed (Kyle, favorited, brief jerky) ā€«Ö¼×˜_ּטā€¬ (Steve, connected, Kyle) Magic, cheating Magic, Newsfeed cheating Friday, January 14, 2011 Now that we know about connections and activities, we can talk about how activities are turned into Newsfeeds and how those wind up being displayed to end users.
  • 16. Activity Feeds Architecture Building a Feed (Kyle, favorited, brief jerky) ā€«Ö¼×˜_ּטā€¬ (Steve, connected, Kyle) Display Magic, cheating Magic, Newsfeed cheating Aggregation Friday, January 14, 2011 Getting to the end result (the activity feed page) has two distinct phases: aggregation and display.
  • 17. Activity Feeds Architecture Aggregation (Steve, connected, Kyle) Shard 1 (Steve, favorited, foo) (theblackapple, listed, widget) (Wil, bought, mittens) Shard 2 Newsfeed (Wil, bought, mittens) Shard 3 (Steve, connected, Kyle) Shard 4 (Kyle, favorited, brief jerky) Friday, January 14, 2011 I am going to talk about aggregation ļ¬rst. Aggregation turns activities (in the database) into a Newsfeed (in memcache). Aggregation typically occurs offline, with Gearman.
  • 18. Activity Feeds Architecture Aggregation, Step 1: Choosing Connections Potentially way too many Friday, January 14, 2011 We already allow people to have more connections than would make sense on a single feed, or could be practically aggregated all at once. The ļ¬rst step in aggregation is to turn the list of people you are connected to into the list of people weā€™re actually going to go seek out activities for.
  • 19. Activity Feeds Architecture Aggregation, Step 1: Choosing Connections ā€œIn Theoryā€ 1 0.75 Affinity 0.5 0.25 0 Connection $choose_connection = mt_rand() < $affinity; Friday, January 14, 2011 In theory, the way we would do this is rank the connections by affinity and then treat the affinity as the probability that weā€™ll pick it.
  • 20. Activity Feeds Architecture Aggregation, Step 1: Choosing Connections ā€œIn Theoryā€ 1 0.75 Affinity 0.5 0.25 0 Connection $choose_connection = mt_rand() < $affinity; Friday, January 14, 2011 So then weā€™d be more likely to pick the close connections, but leaving the possibility that we will pick the distant ones.
  • 21. Activity Feeds Architecture Aggregation, Step 1: Choosing Connections ā€œIn Practiceā€ 1 0.75 Affinity 0.5 0.25 0 Connection Friday, January 14, 2011 In practice we donā€™t really handle affinity yet.
  • 22. Activity Feeds Architecture Aggregation, Step 1: Choosing Connections ā€œEven More In Practiceā€ 1 0.75 Affinity 0.5 0.25 0 Connection Friday, January 14, 2011 And most people donā€™t currently have enough connections for this to matter at all. (Mean connections is around a dozen.)
  • 23. Activity Feeds Architecture Aggregation, Step 2: Making Activity Sets score activity activity activity activity 0.0 0.0 0.0 0.0 ļ¬‚ags 0x0 0x0 0x0 0x0 activity activity 0.0 0.0 0x0 0x0 activity activity activity 0.0 0.0 0.0 0x0 0x0 0x0 activity activity activity activity 0.0 0.0 0.0 0.0 0x0 0x0 0x0 0x0 Friday, January 14, 2011 Once the connections are chosen, we then select historical activity for them and convert them into in-memory structures called activity sets. These are just the activities grouped by connection, with a score and ļ¬‚ags ļ¬eld for each. The next few phases of aggregation operate on these.
  • 24. Activity Feeds Architecture Aggregation, Step 3: Classification activity activity activity activity 0.0 0.0 0.0 0.0 0x11 0x3 0x20c1 0x10004 activity activity 0.0 0.0 0x20c1 0x4 activity activity activity 0.0 0.0 0.0 0x1001 0x2003 0x11 activity activity activity activity 0.0 0.0 0.0 0.0 0x11 0x401 0x5 0x10004 Friday, January 14, 2011 The next thing that happens is that we iterate through all of the activities in all of the sets and classify them.
  • 25. Activity Feeds Architecture Aggregation, Step 3: Classification activity activity activity activity 0.0 0.0 0.0 0.0 0x11 0x3 0x20c1 0x10004 about_owner_shop | user_created_treasury Rob created a treasury featuring the feed owner's shop. activity activity 0.0 0.0 0x20c1 0x4 activity activity activity 0.0 0.0 0.0 0x1001 0x2003 0x11 activity activity activity activity 0.0 0.0 0.0 0.0 0x11 0x401 0x5 0x10004 Friday, January 14, 2011 The ļ¬‚ags are a bit ļ¬eld. They are all from the point of view of the feed owner. So the same activities on another personā€™s feed would be assigned different ļ¬‚ags.
  • 26. Activity Feeds Architecture Aggregation, Step 4: Scoring activity activity activity activity 0.8 0.77 0.9 0.1 0x11 0x3 0x20c1 0x10004 activity activity 0.6 0.47 0x20c1 0x4 activity activity activity 0.8 0.55 0.8 0x1001 0x2003 0x11 activity activity activity activity 0.3 0.25 0.74 0.9 0x11 0x401 0x5 0x10004 Friday, January 14, 2011 Next we ļ¬ll in the score ļ¬elds. At this point the score is just a simple time decay function (older activities always score lower than new ones).
  • 27. Activity Feeds Architecture Aggregation, Step 5: Pruning [Rob, (Rob, connected, Jared)] activity activity activity activity 0.8 0.77 0.9 0.1 0x11 0x3 0x20c1 0x10004 activity activity 0.6 0.47 0x20c1 0x4 activity activity activity [Jared, (Rob, connected, Jared)] 0.8 0.55 0.8 0x1001 0x2003 0x11 activity activity activity activity 0.3 0.25 0.74 0.9 0x11 0x401 0x5 0x10004 Friday, January 14, 2011 As we noted before itā€™s possible to wind up seeing the same event as two or more activities. The next stage of aggregation detects these situations.
  • 28. Activity Feeds Architecture Aggregation, Step 5: Pruning [Rob, (Rob, connected, Jared)] activity activity activity activity 0.8 0.77 0.9 0.1 0x11 0x3 0x20c1 0x10004 activity activity 0.6 0.47 0x20c1 0x4 activity activity activity [Jared, (Rob, connected, Jared)] 0.8 0.55 0.8 0x1001 0x2003 0x11 activity activity activity activity 0.3 0.25 0.74 0.9 0x11 0x401 0x5 0x10004 Friday, January 14, 2011 We iterate through the activity sets and remove the duplicates. Right now we can just cross off the second instance of the SVO pair; once we have comments this will be more complicated.
  • 29. Activity Feeds Architecture Aggregation, Step 6: Sort & Merge activity activity activity activity 0.8 0.77 0.9 0.1 0x11 0x3 0x20c1 0x10004 activity activity 0.6 0.47 0x20c1 0x4 activity activity activity activity activity activity activity activity activity activity activity activity 0.9 0.9 0.8 0.8 0.77 0.74 0.6 0.55 0.47 0.3 0.25 0.1 0x20c1 0x10004 0x1001 0x11 0x3 0x5 0x20c1 0x2003 0x4 0x11 0x401 0x10004 activity activity 0.8 0.55 0x1001 0x2003 activity activity activity activity 0.3 0.25 0.74 0.9 0x11 0x401 0x5 0x10004 Friday, January 14, 2011 Once everything is scored, classiļ¬ed, and de-duped we can ļ¬‚atten the whole thing and sort it by score.
  • 30. Activity Feeds Architecture Aggregation, Step 6: Sort & Merge activity activity activity activity activity activity activity activity activity activity activity activity 0.9 0.9 0.8 0.8 0.77 0.74 0.6 0.55 0.47 0.3 0.25 0.1 0x20c1 0x10004 0x1001 0x11 0x3 0x5 0x20c1 0x2003 0x4 0x11 0x401 0x10004 activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity 0.9 0.9 0.8 0.8 0.77 0.74 0.716 0.71 0.6 0.6 0.55 0.47 0.3 0.3 0.291 0.25 0.1 0.097 0.02 0x20c1 0x10004 0x1001 0x11 0x3 0x5 0x4c01 0x2c01 0x20c1 0xc001 0x2003 0x4 0x11 0x4001 0x4001 0x401 0x10004 0x4 0x1001 Newsfeed Friday, January 14, 2011 Then we take the ļ¬nal set of activities and merge it on to the ownerā€™s existing newsfeed. (Or we create a new newsfeed if they donā€™t have one.)
  • 31. Activity Feeds Architecture Aggregation: Cleaning Up Just ļ¬ne Too many activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity 0.9 0.9 0.8 0.8 0.77 0.74 0.716 0.71 0.6 0.6 0.55 0.47 0.3 0.3 0.291 0.25 0.1 0.097 0.02 0x20c1 0x10004 0x1001 0x11 0x3 0x5 0x4c01 0x2c01 0x20c1 0xc001 0x2003 0x4 0x11 0x4001 0x4001 0x401 0x10004 0x4 0x1001 Newsfeed Friday, January 14, 2011 We trim off the end of the newsfeed, so that they donā€™t become arbitrarily large.
  • 32. Activity Feeds Architecture Aggregation: Cleaning Up activity activity activity activity activity activity activity activity activity activity activity activity activity activity 0.9 0.9 0.8 0.8 0.77 0.74 0.716 0.71 0.6 0.6 0.55 0.47 0.3 0.3 0x20c1 0x10004 0x1001 0x11 0x3 0x5 0x4c01 0x2c01 0x20c1 0xc001 0x2003 0x4 0x11 0x4001 Newsfeed memcached Friday, January 14, 2011 And then ļ¬nally we stuff the feed into memcached.
  • 33. Activity Feeds Architecture Aggregation Friday, January 14, 2011 Currently we peak at doing this about 25 times per second.
  • 34. Activity Feeds Architecture Aggregation Friday, January 14, 2011 We do it for a lot of different reasons: - The feed owner has done something, or logged in. - On a schedule, with cron. - We also aggregate for your connections when you do something (purple). This is a hack and wonā€™t scale.
  • 35. Activity Feeds Architecture Display memcached Friday, January 14, 2011 Next Iā€™m going to talk about how we get from the memcached newsfeed to the ļ¬nal product.
  • 36. Activity Feeds Architecture Display: done naively, sucks Friday, January 14, 2011 If we just showed the activities in the order that theyā€™re in on the newsfeed, it would look like this.
  • 37. Activity Feeds Architecture Display: Enter Rollups Friday, January 14, 2011 To solve this problem we combine similar stories into rollups.
  • 38. Activity Feeds Architecture Display: Computing Rollups Friday, January 14, 2011 Hereā€™s an attempt at depicting how rollups are created. The feed is divided up into sections, so that you donā€™t wind up seeing all of the reds, greens, etc. on the entire feed in just a few very large rollups. Then the similar stories are grouped together within the sections.
  • 39. Activity Feeds Architecture Display: Filling in Stories activity Story Story 0.9 html (Feed-owner- StoryHydrator (Global details) StoryTeller Smarty 0x10004 speciļ¬c details) memcached Friday, January 14, 2011 Once thatā€™s done, we can go through the rest of the display pipeline for the root story in each rollup. There are multiple layers of caching here. Things that are global (like the shop associated with a favorited listing) are cached separately from things that are unique to the person looking at the feed (like the exact way the story is phrased).
  • 40. Activity Feeds Architecture Making it Fast Response Time (ms) 1200 900 Boom 600 300 0 Homepage Shop Listing Search Activity Friday, January 14, 2011 Finally Iā€™m going to go through a few ways that weā€™ve sped up activity, to the point where itā€™s one of the faster pages on the site (despite being pretty complicated).
  • 41. Activity Feeds Architecture Hack #1: Cache Warming !_! (Kyle, favorited, brief jerky) (Steve, connected, Kyle) Magic, cheating Magic, Newsfeed cheating Friday, January 14, 2011 The ļ¬rst thing we do to speed things up is run almost the entire pipeline proactively using gearman. So after aggregation we trigger a display run, even though nobody is there to look at the html. The end result is that almost every pageview is against a hot cache.
  • 42. Activity Feeds Architecture Hack #2: TTL Caching May be his avatar from 5 minutes ago. Big fā€™ing deal. Friday, January 14, 2011 The second thing we do is add bits of TTL caching where few people will notice. Straightforward but not done in many places on the site. Note that his avatar here is tied to the story. If he generates new activity heā€™ll see his new avatar.
  • 43. Activity Feeds Architecture Hack #3: Judicious Associations getFinder(ā€œUserProļ¬leā€)->ļ¬nd (...) not getFinder(ā€œUserā€)->ļ¬nd(...)- >Proļ¬le Friday, January 14, 2011 We also proļ¬led the pages and meticulously simpliļ¬ed ORM usage. Again this sounds obvious but itā€™s really easy to lose track of what youā€™re doing as you hand the user off to the template. Lots of ORM calls were originally actually being made by the template.
  • 44. Activity Feeds Architecture Hack #4: Lazy Below the Fold We donā€™t load much at the outset. You get more as you scroll down (ļ¬nite scrolling). Friday, January 14, 2011