Many user-facing applications present some kind of news feed/inbox system. You can think of Facebook, Twitter, or Gmail as different types of inboxes where the user can see data of interest, sorted by time, popularity, or other parameter. A scalable inbox is a difficult problem to solve: for millions of users, varied data from many sources must be sorted and presented within milliseconds. Different strategies can be used: scatter-gather, fan-out writes, and so on. This session presents an actual application developed by 10gen in Java, using MongoDB. This application is open source and is intended to show the reference implementation of several strategies to tackle this common challenge. The presentation also introduces many MongoDB concepts.
Martin Nicklous, Software Architect, IBM Deutschland Research & Development GmbH
Neil Griffin, Software Architect, Liferay Inc
Portlet Specification 3.0 (JSR 362) is now feature-complete. This session discusses the new features and how easy it is for Java EE developers to write portlet applications. Demos include configuring portlets via annotations, integrating with microservices, integrating with client-side frameworks such as Angular/React/jQuery, client-side IPC via the portlet hub, CDI features, server-side async for long-running requests, and support for JSF 2.2 via the portlet bridge.
This slide deck shares my thoughts on the product owner role. It discusses what it means to own a product, and how the product owner role can be scaled.
Martin Nicklous, Software Architect, IBM Deutschland Research & Development GmbH
Neil Griffin, Software Architect, Liferay Inc
Portlet Specification 3.0 (JSR 362) is now feature-complete. This session discusses the new features and how easy it is for Java EE developers to write portlet applications. Demos include configuring portlets via annotations, integrating with microservices, integrating with client-side frameworks such as Angular/React/jQuery, client-side IPC via the portlet hub, CDI features, server-side async for long-running requests, and support for JSF 2.2 via the portlet bridge.
This slide deck shares my thoughts on the product owner role. It discusses what it means to own a product, and how the product owner role can be scaled.
The Science of Story: How Brands Can Use Storytelling To Get More CustomersDigital Surgeons
Storytelling is not only an entertaining source for information, but a way to engage and humanize our messages that helps them stick. Our brains are wired for stories. Like a drug, we seek them out. Good stories create lasting emotional connections that persuade, educate, entertain, and convert consumers into brand loyalists.
Here’s another good reason to believe in the power of stories: You don't have a goddamn choice. We spend a third of our waking hours crafting stories, and the rest of the time consuming them. Our brains are always searching for stories. You need stories. You live your life around stories. Your life itself is a story. So, now find out how you can use them to better understand how brands and businesses can use storytelling to increase engagement and sales.
This presentation offers best practices and lessons learned regarding finding and developing Agile Product Owners. The presentation goals are:
- Understand the value of the Product Owner;
- Provide real-world applications of CSPO training;
- Offer ideas for positively influencing team members; and
- Offer suggestions for continuous improvement.
This information provides a deep understanding of a product vision, explains what a product vision is, and must-have for product visions, it also includes a sample product vision board and a sample roadmap. It describes what a roadmap is and the benefits of using a roadmap.
Sample examples of product visions are included in the slides.
The Product Owner is the keeper of the requirements. He or she provides the single source of truth for the Team regarding requirements and their planned order of implementation. The Product Owner role in an Agile product development organization requires the knowledge and skills of a product manager, business analyst, and project manager. This presentation focuses on providing easy to implement, bite-size, practices that product owners can utilize for efficiency in daily tasks.
Creating a backlog of user stories is pretty straight forward but it doesn't help you when it comes to decisions like what to build first, how to prioritize and groom the backlog, how to scope and plan the project, and how to visualize progress. The traditional backlog is simply too flat and often too long to help you see the bigger picture and make good decisions. User Story Mapping helps simplify all of these common project issues. By adding a third dimension to your backlog, your team will make better decisions about priorities, scope, and planning while improving your ability to visualize progress.
In this practical session I’ll cover the basics of user story mapping before walking you through case studies of how our teams are using this approach and the results we are achieving. I'll show you the before, during, and after pictures from several projects so that you can understand how our maps progress during the projects and how we use them to influence iterative development, promote good decision making, and visualize priorities, plans, scope and progress.
Agile Product Development Playbook - Popular Tools and TechniquesAndy Birds
This Playbook provides an overview of some popular agile product development tools and techniques that Andy has found useful when building products. The Playbook focuses on Product Roadmaps as a keystone tool and provides a very high-level overview of other tools including; Product Vision Canvas, Product Canvas, Business Model Canvas, and Lean Canvas.
The Playbook is ideal for Product Managers, Product Owners, Business Analysts, User Experience Designers and anyone who works on an agile team or squad.
Ask the AEM Community Expert : May Session. This session will cover in depth sling concepts such as Sling Selectors, Default Sling Post Servlet, Sling Models, and the Sling API.
Today we all live and work in the Internet Century, where technology is roiling the business landscape, and the pace of change is only accelerating.
In their new book How Google Works, Google Executive Chairman and ex-CEO Eric Schmidt and former SVP of Products Jonathan Rosenberg share the lessons they learned over the course of a decade running Google.
Covering topics including corporate culture, strategy, talent, decision-making, communication, innovation, and dealing with disruption, the authors illustrate management maxims with numerous insider anecdotes from Google’s history.
In an era when everything is speeding up, the best way for businesses to succeed is to attract smart-creative people and give them an environment where they can thrive at scale. How Google Works is a new book that explains how to do just that.
This is a visual preview of How Google Works. You can pick up a copy of the book at www.howgoogleworks.net
Detailed Description Of Scrum Team Roles And StructureSlideTeam
"You can download this product from SlideTeam.net"
Presenting this set of slides with name Detailed Description Of Scrum Team Roles And Structure. This is a eleven stage process. The stages in this process are Business, Influence, Development. This is a completely editable PowerPoint presentation and is available for immediate download. Download now and impress your audience. https://bit.ly/3dwSN6M
Personally designed, Professional Scrum Master (PSM-I) courseware.
Trademarks are properties of the holders, who are not affiliated with courseware author.
The Science of Story: How Brands Can Use Storytelling To Get More CustomersDigital Surgeons
Storytelling is not only an entertaining source for information, but a way to engage and humanize our messages that helps them stick. Our brains are wired for stories. Like a drug, we seek them out. Good stories create lasting emotional connections that persuade, educate, entertain, and convert consumers into brand loyalists.
Here’s another good reason to believe in the power of stories: You don't have a goddamn choice. We spend a third of our waking hours crafting stories, and the rest of the time consuming them. Our brains are always searching for stories. You need stories. You live your life around stories. Your life itself is a story. So, now find out how you can use them to better understand how brands and businesses can use storytelling to increase engagement and sales.
This presentation offers best practices and lessons learned regarding finding and developing Agile Product Owners. The presentation goals are:
- Understand the value of the Product Owner;
- Provide real-world applications of CSPO training;
- Offer ideas for positively influencing team members; and
- Offer suggestions for continuous improvement.
This information provides a deep understanding of a product vision, explains what a product vision is, and must-have for product visions, it also includes a sample product vision board and a sample roadmap. It describes what a roadmap is and the benefits of using a roadmap.
Sample examples of product visions are included in the slides.
The Product Owner is the keeper of the requirements. He or she provides the single source of truth for the Team regarding requirements and their planned order of implementation. The Product Owner role in an Agile product development organization requires the knowledge and skills of a product manager, business analyst, and project manager. This presentation focuses on providing easy to implement, bite-size, practices that product owners can utilize for efficiency in daily tasks.
Creating a backlog of user stories is pretty straight forward but it doesn't help you when it comes to decisions like what to build first, how to prioritize and groom the backlog, how to scope and plan the project, and how to visualize progress. The traditional backlog is simply too flat and often too long to help you see the bigger picture and make good decisions. User Story Mapping helps simplify all of these common project issues. By adding a third dimension to your backlog, your team will make better decisions about priorities, scope, and planning while improving your ability to visualize progress.
In this practical session I’ll cover the basics of user story mapping before walking you through case studies of how our teams are using this approach and the results we are achieving. I'll show you the before, during, and after pictures from several projects so that you can understand how our maps progress during the projects and how we use them to influence iterative development, promote good decision making, and visualize priorities, plans, scope and progress.
Agile Product Development Playbook - Popular Tools and TechniquesAndy Birds
This Playbook provides an overview of some popular agile product development tools and techniques that Andy has found useful when building products. The Playbook focuses on Product Roadmaps as a keystone tool and provides a very high-level overview of other tools including; Product Vision Canvas, Product Canvas, Business Model Canvas, and Lean Canvas.
The Playbook is ideal for Product Managers, Product Owners, Business Analysts, User Experience Designers and anyone who works on an agile team or squad.
Ask the AEM Community Expert : May Session. This session will cover in depth sling concepts such as Sling Selectors, Default Sling Post Servlet, Sling Models, and the Sling API.
Today we all live and work in the Internet Century, where technology is roiling the business landscape, and the pace of change is only accelerating.
In their new book How Google Works, Google Executive Chairman and ex-CEO Eric Schmidt and former SVP of Products Jonathan Rosenberg share the lessons they learned over the course of a decade running Google.
Covering topics including corporate culture, strategy, talent, decision-making, communication, innovation, and dealing with disruption, the authors illustrate management maxims with numerous insider anecdotes from Google’s history.
In an era when everything is speeding up, the best way for businesses to succeed is to attract smart-creative people and give them an environment where they can thrive at scale. How Google Works is a new book that explains how to do just that.
This is a visual preview of How Google Works. You can pick up a copy of the book at www.howgoogleworks.net
Detailed Description Of Scrum Team Roles And StructureSlideTeam
"You can download this product from SlideTeam.net"
Presenting this set of slides with name Detailed Description Of Scrum Team Roles And Structure. This is a eleven stage process. The stages in this process are Business, Influence, Development. This is a completely editable PowerPoint presentation and is available for immediate download. Download now and impress your audience. https://bit.ly/3dwSN6M
Personally designed, Professional Scrum Master (PSM-I) courseware.
Trademarks are properties of the holders, who are not affiliated with courseware author.
Building a complete social networking platform presents many challenges at scale. Socialite is a reference architecture and open source Java implementation of a scalable social feed service built on DropWizard and MongoDB. We'll provide an architectural overview of the platform, explaining how you can store an infinite timeline of data while optimizing indexing and sharding configuration for access to the most recent window of data. We'll also dive into the details of storing a social user graph in MongoDB.
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
Slides of my MongoDB Training given at Coding Serbia Conference on 18.10.2013
Agenda:
1. Introduction to NoSQL & MongoDB
2. Data manipulation: Learn how to CRUD with MongoDB
3. Indexing: Speed up your queries with MongoDB
4. MapReduce: Data aggregation with MongoDB
5. Aggregation Framework: Data aggregation done the MongoDB way
6. Replication: High Availability with MongoDB
7. Sharding: Scaling with MongoDB
MongoDB.local DC 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a member of the solutions architecture team, I will share common mistakes observed as well as tips and tricks to avoiding them.
Socialite, the Open Source Status Feed Part 3: Scaling the Data FeedMongoDB
Scaling the delivery of posts and content to the follower networks of millions of users has many challenges. In this section we look at the various approaches to fanning out posts and look at a performance comparison between them. We will highlight some tricks for caching the recent timeline of active users to drive down read latency. We will also look at overall performance metrics from Socialite as we scale from a single replica set to a large sharded environment using MMS Automation.
Back to Basics: My First MongoDB ApplicationMongoDB
This Back to Basics webinar series will introduce you to NoSQL and the MongoDB database. You will find out what MongoDB is, why you would use it, and what you would use it for.
Back to Basics, webinar 2: La tua prima applicazione MongoDBMongoDB
Questo è il secondo webinar della serie Back to Basics che ti offrirà un'introduzione al database MongoDB. In questo webinar ti dimostreremo come creare un'applicazione base per il blogging in MongoDB.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Building a Scalable Inbox System with MongoDB and Java
1. Technical Account Manager Lead, MongoDB Inc
@antoinegirbal
Antoine Girbal
JavaOne 2013
Building a scalable inbox
system with MongoDB and
Java
2. Single Table En
Agenda
• Problem Overview
• Schema and queries
• Java Development
• Design Options
– Fan out on Read
– Fan out on Write
– Bucketed Fan out on Write
– Cached Inbox
• Discussion
10. The User Collection
The collection statistics:
> db.users.stats()
{
"ns": "edges.users",
"count": 1000000, // number of documents
"size": 637864480, // size of all documents
"avgObjSize": 637.86448,
"storageSize": 845197312,
"numExtents": 16,
"nindexes": 2,
"lastExtentSize": 227786752,
"paddingFactor": 1.0000000000260925, // padding after documents
"systemFlags": 1,
"userFlags": 0,
"totalIndexSize": 66070256,
"indexSizes": { "_id_": 29212848, "uid_1": 36857408 },
"ok": 1
}
11. Queries on Users
Finding a user by email address…
> db.users.find({ "email": "AnthonyJDacosta@pookmail.com" }).pretty()
{ "_id": ObjectId("519c12d53004030e5a6316d2"),
…
By default will use a slow table scan…
> db.users.find({ "email": "AnthonyJDacosta@pookmail.com" } ).explain()
{ "cursor": "BasicCursor",
"nscannedObjects": 1000000, // 1m objects scanned
"nscanned": 1000000,
…
Use an index for fast performance…
> db.users.ensureIndex({ "email": 1 } ) // does not do anything if index is there
> db.users.find({ "email": "AnthonyJDacosta@pookmail.com" }).explain()
{ "cursor": "BtreeCursor email_1", // Btree, sweet!
"nscannedObjects": 1, // document is found almost right away
"nscanned": 1,
…
12. Users Relationships
• Here the follower / followee relationships are of
"many-to-many" type. It can be either stored as:
1. a list of followers in user
2. a list of followees in user
3. a relationship collection: "followees"
4. two relationship collections: "followees" and "followers".
• Ideal solutions:
– a few million users and a 1000 followee limit: Solution #2
– no boundaries and relative scaling: Solution #3
– no boundaries and max scaling: Solution #4
13. Relationship Data
Let's look at a sample document:
> use edges
switched to db edges
> db.followees.findOne()
{ "_id": ObjectId(),
"user": "17052001”,
"followee": "31554261”
}
And the statistics:
> db.followees.stats()
{
"ns": "edges.followees",
"count": 1000000,
"size": 64000048,
"avgObjSize": 64.000048,
"storageSize": 86310912,
"numExtents": 10,
"nindexes": 2,
"lastExtentSize": 27869184,
"paddingFactor": 1,
"systemFlags": 1,
"userFlags": 0,
"totalIndexSize": 85561840,
"indexSizes": {
"_id_": 32458720,
"user_1_followee_1": 53103120 },
"ok": 1
}
14. Relationship Queries
To find all the users that a user follows:
> db.followees.ensureIndex({ user: 1, followee: 1 }) // why not just index on user? We shall see
> db.followees.find({user: "11622712"})
{ "_id" : ObjectId("51641c02e4b0ef6827a34569"), "user" : "11622712", "followee" : "30432718" }
…
> db.followees.find({user: "11622712"}).explain()
{
"cursor" : "BtreeCursor user_1_followee_1",
"n" : 66,
"indexOnly" : false,
"millis" : 0, // this is fast
Even faster if using a “covered” index:
> db.followees.find({user: "11622712"}, {followee: 1, _id: 0}).explain()
{
"cursor" : "BtreeCursor user_1_followee_1",
"n" : 66,
"nscannedObjects" : 0,
"nscanned" : 66,
"indexOnly" : true, // this means covered
To find all the followers of a user, we just need the opposite index::
> db.followees.ensureIndex({followee: 1, user: 1})
> db.followees.find({followee: "30313973"}, {user: 1, _id: 0})
18. Java support
• Java driver is open source, available on github
and Maven.
• mongo.jar is the driver, bson.jar is a subset with
BSON library only.
• Java driver is probably the most used MongoDB
driver
• It receives active development by MongoDB Inc
and the community
19. Driver Features
• CRUD
• Support for replica sets
• Connection pooling
• Distributed reads to slave servers
• BSON serializer/deserializer (lazy option)
• JSON serializer/deserializer
• GridFS
20. Message Store
public class MessageStoreDAO implements MessageStore {
private Morphia morphia;
private Datastore ds;
public MessageStoreDAO( MongoClient mongo ) {
this.morphia = new Morphia();
this.morphia.map(DBMessage.class);
this.ds = morphia.createDatastore(mongo, "messages");
this.ds.getCollection(DBMessage.class).
ensureIndex(new BasicDBObject("sender",1).append("sentAt",1) );
}
// get a message
public Message get(String user_id, String msg_id) {
return (Message) this.ds.find(DBMessage.class)
.filter("sender", user_id)
.filter("_id", new ObjectId(msg_id))
.get();
}
21. Message Store
// save a message
public Message save(String user_id, String message, Date date) {
Message msg = new DBMessage( user_id, message, date );
ds.save( msg );
return msg;
}
// find message by author sorted by descending time
public List<Message> sentBy(String user_id) {
return (List) this.ds.find(DBMessage.class)
.filter("sender",user_id).order("-sentAt").limit(50).asList();
}
// find message by several authors sorted by descending time
public List<Message> sentBy(List<String> user_ids) {
return (List) this.ds.find(DBMessage.class)
.field("sender").in(user_ids).order("-sentAt").limit(50).asList();
}
22. Graph Store
Below uses Solution #4: both a follower and followee list
public class GraphStoreDAO implements GraphStore {
private DBCollection friends;
private DBCollection followers;
public GraphStoreDAO(MongoClient mongo) {
this.followers = mongo.getDB("edges").getCollection("followers");
this.friends = mongo.getDB("edges").getCollection("friends");
followers.ensureIndex( new BasicDBObject("u",1).append("o",1), new BasicDBObject("unique", true));
friends.ensureIndex( new BasicDBObject("u",1).append("o",1), new BasicDBObject("unique",true));
}
// find users that are followed
public List<String> friendsOf(String user_id) {
List<String> theFriends = new ArrayList<String>();
DBCursor cursor = friends.find( new BasicDBObject("u",user_id), new
BasicDBObject("_id",0).append("o",1));
while(cursor.hasNext())
theFriends.add( (String) cursor.next().get("o"));
return theFriends;
}
23. Graph Store
// find followers of a user
public List<String> followersOf(String user_id) {
List<String> theFollowers = new ArrayList<String>();
DBCursor cursor = followers.find( new BasicDBObject("u",user_id),
new BasicDBObject("_id",0).append("o",1));
while(cursor.hasNext())
theFollowers.add( (String) cursor.next().get("o"));
return theFollowers;
}
public void follow(String user_id, String toFollow) {
friends.save( new BasicDBObject("u",user_id).append("o",toFollow));
followers.save( new BasicDBObject("u",toFollow).append("o",user_id));
}
public void unfollow(String user_id, String toUnFollow) {
friends.remove(new BasicDBObject("u", user_id).append("o", toUnFollow));
followers.remove(new BasicDBObject("u", toUnFollow).append("o", user_id));
}
25. 4 Approaches (there are
more)
• Fan out on Read
• Fan out on Write
• Bucketed Fan out on Write
• Inbox Caches
26. Fan out on read
• Generally, not the right approach
• 1 document per message sent
• Reading an inbox is finding all messages sent by
the list of people users follow
• Requires scatter-gather on sharded cluster
• Then a lot of random IO on a shard to find
everything
27. Fan out on Read
Put the followees ids in a list:
> var fees = []
> db.followees.find({user: "11622712"})
.forEach( function(doc) { fees.push( doc.followee ) } )
Use $in and sort() and limit() to gather the inbox:
> db.messages.find({ uid: { $in: fees } }).sort({ created: -1 }).limit(100)
{ "_id": ObjectId("519d627ce4b07916312f0a09"), "uid": "34660390", "username": "Dingdowas"
{ "_id": ObjectId("519d627ce4b07916312f0a10"), "uid": "34661390", "username": "John" } …
{ "_id": ObjectId("519d627ce4b07916312f0a11"), "uid": "34662390", "username": "Brenda" } …
…
28. Fan out on read – Send
Message
Shard 1 Shard 2 Shard 3
Send
Message
29. Fan out on read – Inbox Read
Shard 1 Shard 2 Shard 3
Read
Inbox
30. Fan out on read
> db.messages.find({ uid: { $in: fees } } ).sort({ created: -1 } ).limit(100).explain()
{
"cursor": "BtreeCursor uid_1_created_1 multi",
"isMultiKey": false,
"n": 100,
"nscannedObjects": 1319,
"nscanned": 1384,
"nscannedObjectsAllPlans": 1425,
"nscannedAllPlans": 1490,
"scanAndOrder": true, // it is sorting in RAM??
"indexOnly": false,
"nYields": 0,
"nChunkSkips": 0,
"millis": 31 // takes about 30ms
}
32. Fan out on write
• Tends to scale better than fan out on read
• 1 document per recipient
• Reading my inbox is just finding all of the
messages with me as the recipient
• Can shard on recipient, so inbox reads hit one
shard
• But still lots of random IO on the shard
33. Fan out on Write
// Shard on “recipient” and “sent”
db.shardCollection(”myapp.inbox”, { ”recipient”: 1, ”sent”: 1 } )
msg = { from: "Joe”, sent: new Date(), message: ”Hi!” }
// Send a message, write one message per follower
for( follower in followersOf( msg.from) ) {
msg.recipient = recipient
db.inbox.save(msg);
}
// Read my inbox, super easy
db.inbox.find({ recipient: ”Joe” }).sort({ sent: -1 })
34. Fan out on write – Send
Message
Shard 1 Shard 2 Shard 3
Send
Message
35. Fan out on write– Read Inbox
Shard 1 Shard 2 Shard 3
Read
Inbox
36. Bucketed Fan out on write
• Each “inbox” document is an array of messages
• Append a message onto “inbox” of recipient
• Bucket inbox documents so there‟s not too many
per document
• Can shard on recipient, so inbox reads hit one
shard
• 1 or 2 documents to read the whole inbox
37. Bucketed Fan out on Write
// Shard on “owner / sequence”
db.shardCollection(”myapp.buckets”, { ”owner”: 1, ”sequence”: 1 } )
db.shardCollection(”myapp.users”, { ”user_name”: 1 } )
msg = { from: "Joe”, sent: new Date(), message: ”Hi!” }
// Send a message, have to find the right sequence document
for( follower in followersOf( msg.from) ) {
sequence = db.users.findAndModify({
query: { user_name: recipient},
update: { '$inc': { ‟msg_count': 1 }},
upsert: true,
new: true }).msg_count / 50;
db.buckets.update({ owner: recipient, sequence: sequence},
{ $push: { „messages‟: msg } },
{ upsert: true });
}
// Read my inbox
db.buckets.find({ owner: ”Joe” }).sort({ sequence: -1 }).limit(2)
38. Bucketed fan out on write -
Send
Shard 1 Shard 2 Shard 3
Send
Message
39. Bucketed fan out on write -
Read
Shard 1 Shard 2 Shard 3
Read
Inbox
40. Cached inbox
• Recent messages are fast, but older messages
are slower
• Store a cache of last N messages per user
• Used capped array to age out older messages
• Create cache lazily when user accesses inbox
• Only write the message if cache exists.
• Use TTL collection to time out caches for inactive
users
41. Cached Inbox
// Shard on “owner"
db.shardCollection(”myapp.caches”, { ”owner”: 1 } )
// Send a message, add it to the existing caches of followers
for( follower in followersOf( msg.from) ) {
db.caches.update({ owner: recipient }, { $push: { messages: {
$each: [ msg ],
$sort: { „sent‟: 1 },
$slice: -50 } } } );
// Read my inbox
If( msgs = db.caches.find({ owner: ”Joe” }) ) {
// cache document exists
return msgs;
} else {
// fall back to "fan out on read" and cache it
db.caches.save({owner:‟joe‟, messages:[]});
msgs = db.outbox.find({sender: { $in: [ followersOf( msg.from ) ] }}).sort({sent:-1}).limit(50);
db.caches.update({user:‟joe‟}, {$push: msgs });
}
45. Tradeoffs
Fan out on
Read
Fan out on
Write
Bucketed Fan
out on Write
Inbox Cache
Send
Message
Performance
Best
Single shard
Single write
Good
Shard per
recipient
Multiple writes
Worst
Shard per recipient
Appends (grows)
Mixed
Depends on how
many users are in
cache
Read Inbox
Performance
Worst
Broadcast all
shards
Random reads
Good
Single shard
Random reads
Best
Single shard
Single read
Mixed
Recent
messages fast
Older messages
are slow
Data Size Best
Message stored
once
Worst
Copy per
recipient
Worst
Copy per recipient
Good
Same as FoR +
size of cache
46. Things to consider
• Lots of recipients
• Fan out on write might become prohibitive
• Consider introducing a “Group”
• Make fan out asynchronous
• Very large message size
• Multiple copies of messages can be a burden
• Consider single copy of message with a “pointer” per inbox
• More writes than reads
• Fan out on read might be okay
47. Summary
• Multiple ways to model status updates
• Think about characteristics of your network
– Number of users
– Number of edges
– Publish frequency
– Access patterns
• Try to minimize random IO