The document provides an overview of the activity feeds architecture. It discusses the fundamental entities of connections and activities. Connections express relationships between entities and are implemented as a directed graph. Activities form a log of actions by entities. To populate feeds, activities are copied and distributed to relevant entities and then aggregated. The aggregation process involves selecting connections, classifying activities, scoring them, pruning duplicates, and sorting the results into a merged newsfeed.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
This presentation:
* covers basics of caching and popular cache types
* explains evolution from simple cache to distributed, and from distributed to IMDG
* not describes usage of NoSQL solutions for caching
* is not intended for products comparison or for promotion of Hazelcast as the best solution
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
This presentation shortly describes key features of Apache Cassandra. It was held at the Apache Cassandra Meetup in Vienna in January 2014. You can access the meetup here: http://www.meetup.com/Vienna-Cassandra-Users/
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
Apache Kafka is a new breed of messaging system built for the "big data" world. Coming out of LinkedIn (and donated to Apache), it is a distributed pub/sub system built in Scala. It has been an Apache TLP now for several months with the first Apache release imminent. Built for speed, scalability, and robustness, Kafka should definitely be one of the data tools you consider when designing distributed data-oriented applications.
The talk will cover a general overview of the project and technology, with some use cases, and a demo.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
This presentation:
* covers basics of caching and popular cache types
* explains evolution from simple cache to distributed, and from distributed to IMDG
* not describes usage of NoSQL solutions for caching
* is not intended for products comparison or for promotion of Hazelcast as the best solution
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
Introducing Rainbird, Twitter's high volume distributed counting service for realtime analytics, built on Cassandra. This presentation looks at the motivation, design, and uses of Rainbird across Twitter.
DNS is critical network infrastructure and securing it against attacks like DDoS, NXDOMAIN, hijacking and Malware/APT is very important to protecting any business.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Anoop Sam John and Ramkrishna Vasudevan (Intel)
HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon Web Services
Come to this session to learn how Amazon DynamoDB was built as the hyper-scale database for internet-scale applications. In January 2012, Amazon launched DynamoDB, a cloud-based NoSQL database service designed from the ground up to support extreme scale, with the security, availability, performance, and manageability needed to run mission-critical workloads. This session discloses for the first time the underpinnings of DynamoDB, and how we run a fully managed nonrelational database used by more than 100,000 customers. We cover the underlying technical aspects of how an application works with DynamoDB for authentication, metadata, storage nodes, streams, backup, and global replication.
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
Most developers are familiar with the topic of “database design”. In the relational world, normalization is the name of the game. How do things change when you’re working with a scalable, distributed, non-SQL database like HBase? This talk will cover the basics of HBase schema design at a high level and give several common patterns and examples of real-world schemas to solve interesting problems. The storage and data access architecture of HBase (row keys, column families, etc.) will be explained, along with the pros and cons of different schema decisions.
When does InnoDB lock a row? Multiple rows? Why would it lock a gap? How do transactions affect these scenarios? Locking is one of the more opaque features of MySQL, but it’s very important for both developers and DBA’s to understand if they want their applications to work with high performance and concurrency. This is a creative presentation to illustrate the scenarios for locking in InnoDB and make these scenarios easier to visualize. I'll cover: key locks, table locks, gap locks, shared locks, exclusive locks, intention locks, insert locks, auto-inc locks, and also conditions for deadlocks.
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
This is the presentation I made on JavaDay Kiev 2015 regarding the architecture of Apache Spark. It covers the memory model, the shuffle implementations, data frames and some other high-level staff and can be used as an introduction to Apache Spark
Rainbird: Realtime Analytics at Twitter (Strata 2011)Kevin Weil
Introducing Rainbird, Twitter's high volume distributed counting service for realtime analytics, built on Cassandra. This presentation looks at the motivation, design, and uses of Rainbird across Twitter.
DNS is critical network infrastructure and securing it against attacks like DDoS, NXDOMAIN, hijacking and Malware/APT is very important to protecting any business.
Communication between Microservices is inherently unreliable. These integration points may produce cascading failures, slow responses, service outages. We will walk through stability patterns like timeouts, circuit breaker, bulkheads and discuss how they improve stability of Microservices.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Anoop Sam John and Ramkrishna Vasudevan (Intel)
HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.
Amazon DynamoDB Under the Hood: How We Built a Hyper-Scale Database (DAT321) ...Amazon Web Services
Come to this session to learn how Amazon DynamoDB was built as the hyper-scale database for internet-scale applications. In January 2012, Amazon launched DynamoDB, a cloud-based NoSQL database service designed from the ground up to support extreme scale, with the security, availability, performance, and manageability needed to run mission-critical workloads. This session discloses for the first time the underpinnings of DynamoDB, and how we run a fully managed nonrelational database used by more than 100,000 customers. We cover the underlying technical aspects of how an application works with DynamoDB for authentication, metadata, storage nodes, streams, backup, and global replication.
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
Most developers are familiar with the topic of “database design”. In the relational world, normalization is the name of the game. How do things change when you’re working with a scalable, distributed, non-SQL database like HBase? This talk will cover the basics of HBase schema design at a high level and give several common patterns and examples of real-world schemas to solve interesting problems. The storage and data access architecture of HBase (row keys, column families, etc.) will be explained, along with the pros and cons of different schema decisions.
When does InnoDB lock a row? Multiple rows? Why would it lock a gap? How do transactions affect these scenarios? Locking is one of the more opaque features of MySQL, but it’s very important for both developers and DBA’s to understand if they want their applications to work with high performance and concurrency. This is a creative presentation to illustrate the scenarios for locking in InnoDB and make these scenarios easier to visualize. I'll cover: key locks, table locks, gap locks, shared locks, exclusive locks, intention locks, insert locks, auto-inc locks, and also conditions for deadlocks.
Slides for presentation on ZooKeeper I gave at Near Infinity (www.nearinfinity.com) 2012 spring conference.
The associated sample code is on GitHub at https://github.com/sleberknight/zookeeper-samples
Copy Strategy studiata per la campagna Adv per il
"New Brand" Amiacque.
La campagna advertising studiata per il “nuovo brand Amiacque” si basa sui diktat fissati in fase di sviluppo del “marketing plan”. Gli “obbiettivi di marketing” in questo senso sono molto precisi:
- Sviluppare nei confronti del Brand Amiacque una maggiore “Brand Loyalty”.
- Penetrazione del Brand nei vari “Target Obiettivi”.
Dagli “obbiettivi di marketing” derivano gli obbiettivi derivano gli obbiettivi di comunicazione:
- Avvicinare i residenti della Provincia di Milano a questo Ente Pubblico.
- Generare e infondere fiducia nel pubblico/consumatore.
- Amiacque qualcosa di più di un’impre-sa pubblica, i cittadini devono sentirsela propria.
Andare oltre quello che
l’immaginario collettivo proietta quando si parla di impresa pubblica. I cittadini devono sentirsi partecipi dell’impresa, sentirsela
propria.
Tale atteggiamento sviluppa fiducia e partecipazione da parte del pubblico/consumatore alla vita dell’impresa e alle sue iniziative (eventi, adv, comunicazione istituzionale, customer service).
When you're starting or running a company, how do you choose technology? The prevailing advice du jour is something along the lines of "use the best tool for the job." This is obviously right, but it is also devoid of meaning in an unfortunate way that lets people define "best" and "job" as myopically as they like.
The Brand Strategy Canvas: a One-Page Strategy for Startupspatrickjwoods
(First, grab a copy of the Brand Strategy Canvas at brandstrategycanvas.com)
Branding is hard for startups. News flash: It’s hard for everyone. So we built a tool to make it easier.
A clear-cut path for startups toward a rock-solid foundation for their brand. It’s the Brand Strategy Canvas—a single-page, step-by-step formula that serves as a starting point for a stronger brand.
Who are we? Advertising-slash-startup vets at one of the largest independent agencies around. We’ve spun years of experience into the Canvas, turning brand strategy into a process, not a philosophy.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
2. Activity Feeds Architecture
To Be Covered:
•Data model
•Where feeds come from
•How feeds are displayed
•Optimizations
Friday, January 14, 2011
3. Activity Feeds Architecture
Fundamental Entities
Connections Activities
Friday, January 14, 2011
There are two fundamental building blocks for feeds: connections and activities.
Activities form a log of what some entity on the site has done, or had done to it.
Connections express relationships between entities.
I will explain the data model for connections first.
4. Activity Feeds Architecture
Connections
Favorites
Circles
Connections
Etc.
Orders
Friday, January 14, 2011
Connections are a superset of Circles, Favorites, Orders, and other relationships between
entities on the site.
5. Activity Feeds Architecture
Connections
A B
C
F
E D
G I
H
J
Friday, January 14, 2011
Connections are implemented as a directed graph.
Currently, the nodes can be people or shops. (In principle they can be other objects.)
6. Activity Feeds Architecture
Connections
A B
C
F
connection_edges_forward
connection_edges_reverse E D
G I
H
J
Friday, January 14, 2011
The edges of the graph are stored in two tables.
For any node, connection_edges_forward lists outgoing edges and connection_edges_reverse
lists the incoming edges.
In other words, we store each edge twice.
7. Activity Feeds Architecture
Connections
aBA
aCB
A B
aBC
aFA aEA C
aEB
aDB
aCD
aEF
F
aFE
E D
aEI
aDI
aIG
aHE
G aJG I
aHG aGJ
aIJ
H aHJ
J
Friday, January 14, 2011
We also assign each edge a weight, known as affinity.
8. Activity Feeds Architecture
Connections
On H’s shard
connection_edges_forward
from to affinity
H E 0.3
H G 0.7
connection_edges_reverse
from to affinity
J H 0.75
Friday, January 14, 2011
Here we see the data for Anda’s connections on her shard.
She has two entries in the forward connections table for the people in her circle.
She has one entry in the reverse connections so that she can see everyone following her.
9. Activity Feeds Architecture
Activities
Friday, January 14, 2011
Activities are the other database entity important to activity feeds.
10. Activity Feeds Architecture
activity := (subject, verb, object)
Friday, January 14, 2011
As you can see in Rob’s magnetic poetry diagram, activities are a description of an event on
Etsy boiled down to a subject (“who did it”), a verb (“what they did”), and an object (“what
they did it to”).
11. Activity Feeds Architecture
activity := (subject, verb, object)
(Steve, connected, Kyle)
(Kyle, favorited, brief jerky)
Friday, January 14, 2011
Here are some examples of activities.
The first one describes Steve adding Kyle to his circle.
The second one describes Kyle favoriting an item.
In each of these cases note that there are probably several parties interested in these events
[examples]. The problem (the main one we’re trying to solve with activity feeds) is how to
notify all of them about it. In order to achieve that goal, as usual we copy the data all over the
place.
12. Activity Feeds Architecture
activity := (subject, verb, object)
activity := [owner,(subject, verb, object)]
Friday, January 14, 2011
So what we do is duplicate the S,V,O combinations with different owners.
Steve will have his record that he connected to Kyle, and Kyle will be given his own record
that Steve connected to him.
13. Activity Feeds Architecture
activity := [owner,(subject, verb, object)]
[Steve, (Steve, connected, Kyle)] Steve's shard
(Steve, connected, Kyle)
[Kyle, (Steve, connected, Kyle)] Kyle's shard
Friday, January 14, 2011
This is what that looks like.
14. Activity Feeds Architecture
activity := [owner,(subject, verb, object)]
[Kyle, (Kyle, favorited, brief jerky)] Kyle's shard
(Kyle, favorited, brief jerky)
[MixedSpecies, (Kyle, favorited, brief jerky)]
Mixedspecies' shard
[brief jerky, (Kyle, favorited, brief jerky)]
Friday, January 14, 2011
In more complicated examples there could be more than two owners.
You could envision people being interested in Kyle, people being interested in MixedSpecies,
or people being interested in brief jerky.
In cases where there are this many writes, we will generally perform them with Gearman.
Again, in order for interested parties to find the activities, we copy the activities all over the
place.
15. Activity Feeds Architecture
Building a Feed
(Kyle, favorited, brief jerky)
ּט_ּט
(Steve, connected, Kyle)
Magic,
cheating
Magic,
Newsfeed
cheating
Friday, January 14, 2011
Now that we know about connections and activities, we can talk about how activities are
turned into Newsfeeds and how those wind up being displayed to end users.
16. Activity Feeds Architecture
Building a Feed
(Kyle, favorited, brief jerky)
ּט_ּט
(Steve, connected, Kyle)
Display
Magic,
cheating
Magic,
Newsfeed
cheating
Aggregation
Friday, January 14, 2011
Getting to the end result (the activity feed page) has two distinct phases: aggregation and
display.
17. Activity Feeds Architecture
Aggregation
(Steve, connected, Kyle)
Shard 1 (Steve, favorited, foo)
(theblackapple, listed, widget)
(Wil, bought, mittens)
Shard 2
Newsfeed
(Wil, bought, mittens)
Shard 3
(Steve, connected, Kyle)
Shard 4 (Kyle, favorited, brief jerky)
Friday, January 14, 2011
I am going to talk about aggregation first.
Aggregation turns activities (in the database) into a Newsfeed (in memcache).
Aggregation typically occurs offline, with Gearman.
18. Activity Feeds Architecture
Aggregation, Step 1: Choosing Connections
Potentially way too many
Friday, January 14, 2011
We already allow people to have more connections than would make sense on a single feed,
or could be practically aggregated all at once.
The first step in aggregation is to turn the list of people you are connected to into the list of
people we’re actually going to go seek out activities for.
19. Activity Feeds Architecture
Aggregation, Step 1: Choosing Connections
“In Theory”
1
0.75
Affinity
0.5
0.25
0
Connection
$choose_connection = mt_rand() < $affinity;
Friday, January 14, 2011
In theory, the way we would do this is rank the connections by affinity and then treat the
affinity as the probability that we’ll pick it.
20. Activity Feeds Architecture
Aggregation, Step 1: Choosing Connections
“In Theory”
1
0.75
Affinity
0.5
0.25
0
Connection
$choose_connection = mt_rand() < $affinity;
Friday, January 14, 2011
So then we’d be more likely to pick the close connections, but leaving the possibility that we
will pick the distant ones.
21. Activity Feeds Architecture
Aggregation, Step 1: Choosing Connections
“In Practice”
1
0.75
Affinity
0.5
0.25
0
Connection
Friday, January 14, 2011
In practice we don’t really handle affinity yet.
22. Activity Feeds Architecture
Aggregation, Step 1: Choosing Connections
“Even More In Practice”
1
0.75
Affinity
0.5
0.25
0
Connection
Friday, January 14, 2011
And most people don’t currently have enough connections for this to matter at all. (Mean
connections is around a dozen.)
23. Activity Feeds Architecture
Aggregation, Step 2: Making Activity Sets
score activity activity activity activity
0.0 0.0 0.0 0.0
flags
0x0 0x0 0x0 0x0
activity activity
0.0 0.0
0x0 0x0
activity activity activity
0.0 0.0 0.0
0x0 0x0 0x0
activity activity activity activity
0.0 0.0 0.0 0.0
0x0 0x0 0x0 0x0
Friday, January 14, 2011
Once the connections are chosen, we then select historical activity for them and convert them
into in-memory structures called activity sets.
These are just the activities grouped by connection, with a score and flags field for each.
The next few phases of aggregation operate on these.
24. Activity Feeds Architecture
Aggregation, Step 3: Classification
activity activity activity activity
0.0 0.0 0.0 0.0
0x11 0x3 0x20c1 0x10004
activity activity
0.0 0.0
0x20c1 0x4
activity activity activity
0.0 0.0 0.0
0x1001 0x2003 0x11
activity activity activity activity
0.0 0.0 0.0 0.0
0x11 0x401 0x5 0x10004
Friday, January 14, 2011
The next thing that happens is that we iterate through all of the activities in all of the sets and
classify them.
25. Activity Feeds Architecture
Aggregation, Step 3: Classification
activity activity activity activity
0.0 0.0 0.0 0.0
0x11 0x3 0x20c1 0x10004 about_owner_shop | user_created_treasury
Rob created a treasury featuring
the feed owner's shop.
activity activity
0.0 0.0
0x20c1 0x4
activity activity activity
0.0 0.0 0.0
0x1001 0x2003 0x11
activity activity activity activity
0.0 0.0 0.0 0.0
0x11 0x401 0x5 0x10004
Friday, January 14, 2011
The flags are a bit field.
They are all from the point of view of the feed owner.
So the same activities on another person’s feed would be assigned different flags.
26. Activity Feeds Architecture
Aggregation, Step 4: Scoring
activity activity activity activity
0.8 0.77 0.9 0.1
0x11 0x3 0x20c1 0x10004
activity activity
0.6 0.47
0x20c1 0x4
activity activity activity
0.8 0.55 0.8
0x1001 0x2003 0x11
activity activity activity activity
0.3 0.25 0.74 0.9
0x11 0x401 0x5 0x10004
Friday, January 14, 2011
Next we fill in the score fields.
At this point the score is just a simple time decay function (older activities always score lower
than new ones).
27. Activity Feeds Architecture
Aggregation, Step 5: Pruning
[Rob, (Rob, connected, Jared)]
activity activity activity activity
0.8 0.77 0.9 0.1
0x11 0x3 0x20c1 0x10004
activity activity
0.6 0.47
0x20c1 0x4
activity activity activity [Jared, (Rob, connected, Jared)]
0.8 0.55 0.8
0x1001 0x2003 0x11
activity activity activity activity
0.3 0.25 0.74 0.9
0x11 0x401 0x5 0x10004
Friday, January 14, 2011
As we noted before it’s possible to wind up seeing the same event as two or more activities.
The next stage of aggregation detects these situations.
28. Activity Feeds Architecture
Aggregation, Step 5: Pruning
[Rob, (Rob, connected, Jared)]
activity activity activity activity
0.8 0.77 0.9 0.1
0x11 0x3 0x20c1 0x10004
activity activity
0.6 0.47
0x20c1 0x4
activity activity activity [Jared, (Rob, connected, Jared)]
0.8 0.55 0.8
0x1001 0x2003 0x11
activity activity activity activity
0.3 0.25 0.74 0.9
0x11 0x401 0x5 0x10004
Friday, January 14, 2011
We iterate through the activity sets and remove the duplicates.
Right now we can just cross off the second instance of the SVO pair; once we have comments
this will be more complicated.
30. Activity Feeds Architecture
Aggregation, Step 6: Sort & Merge
activity activity activity activity activity activity activity activity activity activity activity activity
0.9 0.9 0.8 0.8 0.77 0.74 0.6 0.55 0.47 0.3 0.25 0.1
0x20c1 0x10004 0x1001 0x11 0x3 0x5 0x20c1 0x2003 0x4 0x11 0x401 0x10004
activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity
0.9 0.9 0.8 0.8 0.77 0.74 0.716 0.71 0.6 0.6 0.55 0.47 0.3 0.3 0.291 0.25 0.1 0.097 0.02
0x20c1 0x10004 0x1001 0x11 0x3 0x5 0x4c01 0x2c01 0x20c1 0xc001 0x2003 0x4 0x11 0x4001 0x4001 0x401 0x10004 0x4 0x1001
Newsfeed
Friday, January 14, 2011
Then we take the final set of activities and merge it on to the owner’s existing newsfeed.
(Or we create a new newsfeed if they don’t have one.)
31. Activity Feeds Architecture
Aggregation: Cleaning Up
Just fine Too many
activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity activity
0.9 0.9 0.8 0.8 0.77 0.74 0.716 0.71 0.6 0.6 0.55 0.47 0.3 0.3 0.291 0.25 0.1 0.097 0.02
0x20c1 0x10004 0x1001 0x11 0x3 0x5 0x4c01 0x2c01 0x20c1 0xc001 0x2003 0x4 0x11 0x4001 0x4001 0x401 0x10004 0x4 0x1001
Newsfeed
Friday, January 14, 2011
We trim off the end of the newsfeed, so that they don’t become arbitrarily large.
32. Activity Feeds Architecture
Aggregation: Cleaning Up
activity activity activity activity activity activity activity activity activity activity activity activity activity activity
0.9 0.9 0.8 0.8 0.77 0.74 0.716 0.71 0.6 0.6 0.55 0.47 0.3 0.3
0x20c1 0x10004 0x1001 0x11 0x3 0x5 0x4c01 0x2c01 0x20c1 0xc001 0x2003 0x4 0x11 0x4001
Newsfeed
memcached
Friday, January 14, 2011
And then finally we stuff the feed into memcached.
33. Activity Feeds Architecture
Aggregation
Friday, January 14, 2011
Currently we peak at doing this about 25 times per second.
34. Activity Feeds Architecture
Aggregation
Friday, January 14, 2011
We do it for a lot of different reasons:
- The feed owner has done something, or logged in.
- On a schedule, with cron.
- We also aggregate for your connections when you do something (purple). This is a hack and
won’t scale.
35. Activity Feeds Architecture
Display
memcached
Friday, January 14, 2011
Next I’m going to talk about how we get from the memcached newsfeed to the final product.
36. Activity Feeds Architecture
Display: done naively, sucks
Friday, January 14, 2011
If we just showed the activities in the order that they’re in on the newsfeed, it would look like
this.
37. Activity Feeds Architecture
Display: Enter Rollups
Friday, January 14, 2011
To solve this problem we combine similar stories into rollups.
38. Activity Feeds Architecture
Display: Computing Rollups
Friday, January 14, 2011
Here’s an attempt at depicting how rollups are created.
The feed is divided up into sections, so that you don’t wind up seeing all of the reds, greens,
etc. on the entire feed in just a few very large rollups.
Then the similar stories are grouped together within the sections.
39. Activity Feeds Architecture
Display: Filling in Stories
activity Story
Story
0.9
html
(Feed-owner-
StoryHydrator (Global details) StoryTeller Smarty
0x10004 specific details)
memcached
Friday, January 14, 2011
Once that’s done, we can go through the rest of the display pipeline for the root story in each
rollup.
There are multiple layers of caching here. Things that are global (like the shop associated
with a favorited listing) are cached separately from things that are unique to the person
looking at the feed (like the exact way the story is phrased).
40. Activity Feeds Architecture
Making it Fast
Response Time (ms)
1200
900
Boom
600
300
0
Homepage Shop Listing Search Activity
Friday, January 14, 2011
Finally I’m going to go through a few ways that we’ve sped up activity, to the point where it’s
one of the faster pages on the site (despite being pretty complicated).
41. Activity Feeds Architecture
Hack #1: Cache Warming
!_!
(Kyle, favorited, brief jerky)
(Steve, connected, Kyle)
Magic,
cheating
Magic,
Newsfeed
cheating
Friday, January 14, 2011
The first thing we do to speed things up is run almost the entire pipeline proactively using
gearman.
So after aggregation we trigger a display run, even though nobody is there to look at the
html.
The end result is that almost every pageview is against a hot cache.
42. Activity Feeds Architecture
Hack #2: TTL Caching
May be his avatar
from 5 minutes
ago.
Big f’ing deal.
Friday, January 14, 2011
The second thing we do is add bits of TTL caching where few people will notice.
Straightforward but not done in many places on the site.
Note that his avatar here is tied to the story. If he generates new activity he’ll see his new
avatar.
43. Activity Feeds Architecture
Hack #3: Judicious Associations
getFinder(“UserProfile”)->find
(...)
not
getFinder(“User”)->find(...)-
>Profile
Friday, January 14, 2011
We also profiled the pages and meticulously simplified ORM usage.
Again this sounds obvious but it’s really easy to lose track of what you’re doing as you hand
the user off to the template. Lots of ORM calls were originally actually being made by the
template.
44. Activity Feeds Architecture
Hack #4: Lazy Below the Fold
We don’t load much at the outset.
You get more as you scroll down
(finite scrolling).
Friday, January 14, 2011