The Java EE 7 specification has evolved quite a lot since the early days of the specification. One one hand, Java EE 7 continues the ease of development push that characterized prior releases by bringing further simplification to enterprise development. On the other hand, Java EE 7 tackle new emerging requirements such as HTML 5 support.
Last but not least, Java EE 7 also adds new, APIs such as the REST client API in JAX-RS 2.0, WebSockets, JSON-P, JMS 2, Batch Processing, etc.
This session will give an technical overview of the Java EE 7 platform. GlassFish 4.0, the world first Java EE 7 Application Server, will be used to demonstrate some of the Java EE 7 features.
I inherited a MongoDB database server with 60 collections and 100 or so indexes.
The business users are complaining about slow report completion times. What can I do to improve performance?
MongoDB Aggregations Indexing and ProfilingManish Kapoor
This deck consists of following operations in MongoDB: aggregation through aggregation pipeline, map reduce, operations, indexes and profiling of slow queries.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Topics covered include:
- Scaling Vertically
- Hardware Considerations
- Index Optimization
- Schema Design
- Sharding
The Java EE 7 specification has evolved quite a lot since the early days of the specification. One one hand, Java EE 7 continues the ease of development push that characterized prior releases by bringing further simplification to enterprise development. On the other hand, Java EE 7 tackle new emerging requirements such as HTML 5 support.
Last but not least, Java EE 7 also adds new, APIs such as the REST client API in JAX-RS 2.0, WebSockets, JSON-P, JMS 2, Batch Processing, etc.
This session will give an technical overview of the Java EE 7 platform. GlassFish 4.0, the world first Java EE 7 Application Server, will be used to demonstrate some of the Java EE 7 features.
I inherited a MongoDB database server with 60 collections and 100 or so indexes.
The business users are complaining about slow report completion times. What can I do to improve performance?
MongoDB Aggregations Indexing and ProfilingManish Kapoor
This deck consists of following operations in MongoDB: aggregation through aggregation pipeline, map reduce, operations, indexes and profiling of slow queries.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Topics covered include:
- Scaling Vertically
- Hardware Considerations
- Index Optimization
- Schema Design
- Sharding
Webinar: MongoDB Schema Design and Performance ImplicationsMongoDB
In this session, you will learn how to translate one-to-one, one-to-many and many-to-many relationships, and learn how MongoDB's JSON structures, atomic updates and rich indexes can influence your design. We will also explore implications of storage engines, indexing and query patterns, available tools and related new features in MongoDB 3.2.
MEAN Stack is a full-stack JavaScript solution that helps you build fast, robust and maintainable production web applications using MongoDB, Express, AngularJS, and Node.js.
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
This webinar explores the use-cases and architecture for Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Watch the webinar to learn:
- What MongoDB is and where it's used
- What data streaming is and where it fits into modern data architectures
- How Kafka works, what it delivers, and where it's used
- How to operationalize the Data Lake with MongoDB & Kafka
- How MongoDB integrates with Kafka – both as a producer and a consumer of event data
The webinar is co-presented with Confluent, the company founded by the creators of Apache Kafka.
Big Data Testing: Ensuring MongoDB Data QualityRTTS
You've made the move to MongoDB for its flexible schema and querying capabilities in order to enhance agility and reduce costs for your business. Shouldn't your data quality process be just as organized and efficient?
Using QuerySurge for testing your MongoDB data as part of your quality effort will increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your Big Data store. QuerySurge will help you keep your team organized and on track too!
To learn more about QuerySurge, visit www.QuerySurge.com
Publishing strategies for API documentationTom Johnson
Most of the common tools for publishing help material fall short when it comes to API documentation. Much API documentation (such as for Java, C++, or .NET APIs) is generated from comments in the source code. Their outputs don’t usually integrate with other help material, such as programming tutorials or scenario-based code samples.
REST APIs are a breed of their own, with almost no standard tools for generating documentation from the source. The variety of outputs for REST APIs are as diverse as the APIs themselves, as you can see by browsing the 11,000+ web APIs on programmableweb.com.
As a technical writer, what publishing strategies do you use for API documentation? Do you leave the reference material separate from the tutorials and code samples? Do you convert everything to DITA and merge it into a single output? Do you build your own help system from scratch that imports your REST API information?
There’s not a one-size-fits-all approach. In this presentation, you’ll learn a variety of publishing strategies for different kinds of APIs, with examples of what works well for developer audiences. No matter what kind of API you’re working with, you’ll benefit from this survey of the API doc publishing scene.
- See more at: http://idratherbewriting.com
MongoDB at Sailthru: Scaling and Schema DesignDATAVERSITY
Sailthru provides all your website email delivery needs, ensuring Inbox delivery for transactional and mass mail. Sailthru started out as a MySQL-powered transactional-mail service. Starting in 2009, we migrated to the document-oriented "nosql" database MongoDB. Moving entirely to MongoDB has allowed us to build complex user profiles to power behavioral-targeted mass emails and onsite recommendations. How and why we made the move, and how we use MongoDB today.
A great idea can be built with almost any technology. The success or failure of your project has more to do with vision, leadership, execution, and market than technological choices.
Besides the vision, a lot of startups focus on culture. what isn’t often mentioned is that the technical decisions will have a direct effect on the company culture. Great things have been built with each of the technologies. But they do come with a culture.
The purpose of this presentation is to help developers, managers, founders, etc. to make an insightful decision about the framework they want to use to create their product.
Integrating Splunk into your Spring ApplicationsDamien Dallimore
How much visibility do you really have into your Spring applications? How effectively are you capturing,harnessing and correlating the logs, metrics, & messages from your Spring applications that can be used to deliver this visibility ? What tools and techniques are you providing your Spring developers with to better create and utilize this mass of machine data ? In this session I'll answer these questions and show how Splunk can be used to not only provide historical and realtime visibility into your Spring applications , but also as a platform that developers can use to become more "devops effective" & easily create custom big data integrations and standalone solutions.I'll discuss and demonstrate many of Splunk's Java apps,frameworks and SDK and also cover the Spring Integration Adaptors for Splunk.
A Java compiler is a compiler for the development terminology Java. The most frequent way of outcome from a Java compiler is Java category data files containing platform-neutral Java bytecode,
Javascript Frameworks Comparison - Angular, Knockout, Ember and BackboneDeepu S Nath
Introduction and Comparison of polpular JS Frameworks Knockout, Ember, Angular and Backbone. The presentation descrobes How and when to select each framework.
Webinar: MongoDB Schema Design and Performance ImplicationsMongoDB
In this session, you will learn how to translate one-to-one, one-to-many and many-to-many relationships, and learn how MongoDB's JSON structures, atomic updates and rich indexes can influence your design. We will also explore implications of storage engines, indexing and query patterns, available tools and related new features in MongoDB 3.2.
MEAN Stack is a full-stack JavaScript solution that helps you build fast, robust and maintainable production web applications using MongoDB, Express, AngularJS, and Node.js.
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
A new generation of technologies is needed to consume and exploit today's real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.
This webinar explores the use-cases and architecture for Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.
Watch the webinar to learn:
- What MongoDB is and where it's used
- What data streaming is and where it fits into modern data architectures
- How Kafka works, what it delivers, and where it's used
- How to operationalize the Data Lake with MongoDB & Kafka
- How MongoDB integrates with Kafka – both as a producer and a consumer of event data
The webinar is co-presented with Confluent, the company founded by the creators of Apache Kafka.
Big Data Testing: Ensuring MongoDB Data QualityRTTS
You've made the move to MongoDB for its flexible schema and querying capabilities in order to enhance agility and reduce costs for your business. Shouldn't your data quality process be just as organized and efficient?
Using QuerySurge for testing your MongoDB data as part of your quality effort will increase your testing speed, boost your testing coverage (up to 100%), and improve the level of quality within your Big Data store. QuerySurge will help you keep your team organized and on track too!
To learn more about QuerySurge, visit www.QuerySurge.com
Publishing strategies for API documentationTom Johnson
Most of the common tools for publishing help material fall short when it comes to API documentation. Much API documentation (such as for Java, C++, or .NET APIs) is generated from comments in the source code. Their outputs don’t usually integrate with other help material, such as programming tutorials or scenario-based code samples.
REST APIs are a breed of their own, with almost no standard tools for generating documentation from the source. The variety of outputs for REST APIs are as diverse as the APIs themselves, as you can see by browsing the 11,000+ web APIs on programmableweb.com.
As a technical writer, what publishing strategies do you use for API documentation? Do you leave the reference material separate from the tutorials and code samples? Do you convert everything to DITA and merge it into a single output? Do you build your own help system from scratch that imports your REST API information?
There’s not a one-size-fits-all approach. In this presentation, you’ll learn a variety of publishing strategies for different kinds of APIs, with examples of what works well for developer audiences. No matter what kind of API you’re working with, you’ll benefit from this survey of the API doc publishing scene.
- See more at: http://idratherbewriting.com
MongoDB at Sailthru: Scaling and Schema DesignDATAVERSITY
Sailthru provides all your website email delivery needs, ensuring Inbox delivery for transactional and mass mail. Sailthru started out as a MySQL-powered transactional-mail service. Starting in 2009, we migrated to the document-oriented "nosql" database MongoDB. Moving entirely to MongoDB has allowed us to build complex user profiles to power behavioral-targeted mass emails and onsite recommendations. How and why we made the move, and how we use MongoDB today.
A great idea can be built with almost any technology. The success or failure of your project has more to do with vision, leadership, execution, and market than technological choices.
Besides the vision, a lot of startups focus on culture. what isn’t often mentioned is that the technical decisions will have a direct effect on the company culture. Great things have been built with each of the technologies. But they do come with a culture.
The purpose of this presentation is to help developers, managers, founders, etc. to make an insightful decision about the framework they want to use to create their product.
Integrating Splunk into your Spring ApplicationsDamien Dallimore
How much visibility do you really have into your Spring applications? How effectively are you capturing,harnessing and correlating the logs, metrics, & messages from your Spring applications that can be used to deliver this visibility ? What tools and techniques are you providing your Spring developers with to better create and utilize this mass of machine data ? In this session I'll answer these questions and show how Splunk can be used to not only provide historical and realtime visibility into your Spring applications , but also as a platform that developers can use to become more "devops effective" & easily create custom big data integrations and standalone solutions.I'll discuss and demonstrate many of Splunk's Java apps,frameworks and SDK and also cover the Spring Integration Adaptors for Splunk.
A Java compiler is a compiler for the development terminology Java. The most frequent way of outcome from a Java compiler is Java category data files containing platform-neutral Java bytecode,
Javascript Frameworks Comparison - Angular, Knockout, Ember and BackboneDeepu S Nath
Introduction and Comparison of polpular JS Frameworks Knockout, Ember, Angular and Backbone. The presentation descrobes How and when to select each framework.
Node.js and MongoDB from scratch, fully explained and tested John Culviner
The slides for my presentation:
I'll fully explain what Node.js is, how it works and most importantly discuss the pros and cons of it vs something like C# or Java from real world experience using all of them. Same will be done for MongoDB vs. traditional SQL. We will then build out (from scratch) a Node/MongoDB API application paying careful attention to common pitfalls (like dealing with async code) to learn tips and tricks along the way. We’ll then cover integration testing to make sure everything works. Expect to leave the talk feeling confident when and why to use this tech stack and how to get started quickly!
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
2. Thursday, July 7, 2011 2
APIdock.com is one of the services we’ve created for the Ruby community: a social
documentation site.
3. Thursday, July 7, 2011 3
- We did some “research” about real-time web back in 2008.
- At the same time, did software consulting for large companies.
- Flowdock is a product spinoff from our consulting company. It’s Google Wave done right,
with focus on technical teams.
4. Thursday, July 7, 2011 4
Flowdock combines a group chat (on the right) to a shared team inbox (on the left).
Our promise: Teams stay up-to-date, react in seconds instead of hours, and never forget
anything.
5. Thursday, July 7, 2011 5
Flowdock gets messages from various external sources (like JIRA, Twitter, Github, Pivotal
Tracker, emails, RSS feeds) and from the Flowdock users themselves.
6. Thursday, July 7, 2011 6
All of the highlighted areas are objects in the “messages” collection. MongoDB’s document
model is perfect for our use case, where various data formats (tweets, emails, ...) are stored
inside the same collection.
7. Thursday, July 7, 2011 6
All of the highlighted areas are objects in the “messages” collection. MongoDB’s document
model is perfect for our use case, where various data formats (tweets, emails, ...) are stored
inside the same collection.
8. Thursday, July 7, 2011 6
All of the highlighted areas are objects in the “messages” collection. MongoDB’s document
model is perfect for our use case, where various data formats (tweets, emails, ...) are stored
inside the same collection.
9. Thursday, July 7, 2011 6
All of the highlighted areas are objects in the “messages” collection. MongoDB’s document
model is perfect for our use case, where various data formats (tweets, emails, ...) are stored
inside the same collection.
11. {
"_id":ObjectId("4de92cd0097580e29ca5b6c2"),
"id":NumberLong(45967),
"app":"chat",
"flow":"demo:demoflow",
"event":"comment",
"sent":NumberLong("1307126992832"),
"attachments":[
],
"_keywords":[
"good",
"point", ...
],
"uuid":"hC4-09hFcULvCyiU",
"user":"1",
"content":{
"text":"Good point, I'll mark it as deprecated.",
"title":"Updated JIRA integration API"
},
"tags":[
"influx:45958"
]
}
Thursday, July 7, 2011 7
This is how a typical message looks like.
12. Browser
jQuery (+UI)
Comet impl.
MVC impl.
Rails app Scala backend
Website Messages
Admin Who’s online
Payments API
Account mgmt RSS feeds
SMTP server
Twitter feed
PostgreSQL MongoDB
Thursday, July 7, 2011 8
An overview of the Flowdock architecture: most of the code is JavaScript and runs inside the
browser.
The Scala (+Akka) backend does all the heavy lifting (mostly related to messages and online
presence), and the Ruby on Rails application handles all the easy stuff (public website,
account management, administration, payments etc).
We used PostgreSQL in the beginning, and migrated messages to MongoDB. Otherwise there
is no particular reason why we couldn’t use MongoDB for everything.
13. Thursday, July 7, 2011 9
One of the key features in Flowdock is tagging. For example, if I’m doing some changes to
our production environment, I mention it in the chat and tag it as #production. Production
deployments are automatically tagged with the same tag, so I can easily get a log of
everything that’s happened.
It’s very easy to implement with MongoDB, since we just index the “tags” array and search
using it.
14. db.messages.ensureIndex({flow: 1, tags: 1, id: -1});
Thursday, July 7, 2011 9
One of the key features in Flowdock is tagging. For example, if I’m doing some changes to
our production environment, I mention it in the chat and tag it as #production. Production
deployments are automatically tagged with the same tag, so I can easily get a log of
everything that’s happened.
It’s very easy to implement with MongoDB, since we just index the “tags” array and search
using it.
15. db.messages.ensureIndex({flow: 1, tags: 1, id: -1});
db.messages.find({flow: 123,
tags: {$all: [“production”]})
.sort({id: -1});
Thursday, July 7, 2011 9
One of the key features in Flowdock is tagging. For example, if I’m doing some changes to
our production environment, I mention it in the chat and tag it as #production. Production
deployments are automatically tagged with the same tag, so I can easily get a log of
everything that’s happened.
It’s very easy to implement with MongoDB, since we just index the “tags” array and search
using it.
17. Library support
• Stemming
• Ranked probabilistic search
• Synonyms
• Spelling corrections
• Boolean, phrase, word proximity queries
Thursday, July 7, 2011 11
These are some of the features you might see in an advanced full-text search
implementation. There are libraries to do some parts of this (like libraries specific to
stemming), and more advanced search libraries like Lucene and Xapian.
Lucene is a Java library (also ported to C++ etc.), and Xapian is a C++ library.
Many of these are hackable with MongoDB by expanding the query.
18. Standalone server Standalone server Standalone server
Lucene based Lucene queries MySQL integration
Rich document REST/JSON API Real-time indexing
support Real-time indexing Distributed
Result highlighting Distributed searching
Distributed
Thursday, July 7, 2011 12
You can use the libraries directly, but they don’t do anything to guarantee replication &
scaling.
Standalone implementations usually handle clustering, query processing and some more
advanced features.
19. Things to consider
• Data access patterns
• Technology stack
• Data duplication
• Use cases: need to search Word
documents? Need to support boolean
queries? ...
Thursday, July 7, 2011 13
When choosing your solution, you’ll want to keep it simple, consider how write-heavy your
app is, what special features do you need, can you afford to store the data 3 times in a
MongoDB replica set + 2 times in a search server etc.
20. Real-time sear
ch
Performance
Thursday, July 7, 2011 14
There are tons of use cases where search doesn’t need to be real-time. It’s a requirement
that will heavily impact your application.
21. KISS
Thursday, July 7, 2011 15
As a lean startup, we can’t afford to spend a lot of time with technology adventures. Need to
measure what customers want.
Many of the features are possible to achieve with MongoDB.
Facebook messages search also searches exact word matches (=it sucks), and people don’t
complain.
So we built a minimal implementation with MongoDB. No stemming or anything, just a
keyword search, but it needs to be real-time.
22. KISS
Even Facebook does.
Thursday, July 7, 2011 15
As a lean startup, we can’t afford to spend a lot of time with technology adventures. Need to
measure what customers want.
Many of the features are possible to achieve with MongoDB.
Facebook messages search also searches exact word matches (=it sucks), and people don’t
complain.
So we built a minimal implementation with MongoDB. No stemming or anything, just a
keyword search, but it needs to be real-time.
23. “Good point. I’ll mark it as deprecated.”
_keywords: [“good”, “point”, “mark”, “deprecated”]
Thursday, July 7, 2011 16
You need client-side code for this transformation.
What’s possible: stemming, search by beginning of the word
What’s not possible: intelligent ranking on the DB side (which is ok for us, since we want to
sort results by time anyway)
24. db.messages.ensureIndex({
flow: 1,
_keywords: 1,
id: -1});
Thursday, July 7, 2011 17
Simply build the _keywords index the same way we already had our tags indexed.
25. db.messages.find({
flow: 123,
_keywords: {
$all: [“hello”, “world”]}
}).sort({id: -1});
Thursday, July 7, 2011 18
Search is also trivial to implement. As said, our users want the messages in chronological
order, which makes this a lot easier.
26. That’s it! Let’s take it to production.
Thursday, July 7, 2011 19
A minimal search implementation is the easy part. We faced quite a few operational issues
when deploying it to production.
27. Index size:
2500 MB per 1M messages
Thursday, July 7, 2011 20
As it turns out, the _keywords index is pretty big.
28. 10M messages: Size in gigabytes
20.00
15.00
10.00
5.00
0
Messages Index: Keywords Index: Tags Index: Others
Thursday, July 7, 2011 21
Would be great to fit indices to the memory. Now it obviously doesn’t. Stemming will reduce
the index size.
Has implications for example to insert/update performance.
29. 10M messages: Size in gigabytes
20.00
15.00
10.00
5.00
0
Messages Index: Keywords Index: Tags Index: Others
Thursday, July 7, 2011 21
Would be great to fit indices to the memory. Now it obviously doesn’t. Stemming will reduce
the index size.
Has implications for example to insert/update performance.
30. Option #1:
Just generate _keywords and build
the index in background.
Thursday, July 7, 2011 22
The naive solution: try to do it with no downtime. Didn’t work, site slowed down too much.
31. Option #2:
Try to do it during a 6 hour
service break.
Thursday, July 7, 2011 23
It worked much faster when our users weren’t constantly accessing the data. But 6 hours
during a weekend wasn’t enough, and we had to cancel the migration.
32. Option #3:
Delete _keywords, build the index
and re-generate keywords in the background.
Thursday, July 7, 2011 24
Generating an index is much faster when there is no data to index. Building the index was
fine, but generating keywords was very slow and took the site down.
33. Option #4:
As previously, but add sleep(5).
Thursday, July 7, 2011 25
You know you’re a great programmer when you’re adding sleep()s to your production code.
34. Option #5:
As previously, but add Write Concerns.
Thursday, July 7, 2011 26
Let the queries block, so that if MongoDB slows down, the migration script doesn’t flood the
server.
Yeah, it would’ve taken a month, or it would’ve slowed down the service.
35. Option #6:
Shard.
Thursday, July 7, 2011 27
Would have been a solution, but we didn’t want to host all that data in-memory, since it’s not
accessed that often.
36. Option #7:
SSD!
Thursday, July 7, 2011 28
We had the possibility to try it on a SSD disk.
This is not a viable alternative to AWS users, but AWS users could temporarily shard their data
to a large number of high-memory instances.
37. Option #7:
SSD!
Thursday, July 7, 2011 28
We had the possibility to try it on a SSD disk.
This is not a viable alternative to AWS users, but AWS users could temporarily shard their data
to a large number of high-memory instances.
38. Option #7:
SSD!
Thursday, July 7, 2011 28
We had the possibility to try it on a SSD disk.
This is not a viable alternative to AWS users, but AWS users could temporarily shard their data
to a large number of high-memory instances.
39. Thursday, July 7, 2011 29
My reactions to using SSD. Decided to benchmark it.
40. 10M messages
in 100 “flows”,
Messages 100k each
Total size 19.67 GB
_id: 1
flow: 1, app: 1, id: -1
flow: 1, event: 1, id: -1
flow: 1, id: -1
Indices flow: 1, tags: 1, id: -1
flow: 1, _keywords: 1, id: -1
Total size 22.03 GB
Thursday, July 7, 2011 30
This is the starting point for my next benchmark. Wanted to test it with a real-size database,
instead of starting from scratch.
41. mongorestore time in minutes
300.00
225.00
150.00
75.00
0
SSD SATA
Thursday, July 7, 2011 31
First used mongorestore to populate the test database.
133 vs. 235 minutes, and index generation is mostly CPU-bound.
Doesn’t really benefit from the faster seek times.
42. Insert performance test
A total of 100 workspaces
And 3 workers each accessing 30 workspaces
Performing 1000 inserts to each
= 90 000 inserts, as quickly as possible
Thursday, July 7, 2011 32
43. insert benchmark: time in minutes
200.00
150.00
100.00
50.00
0
SSD SATA
Thursday, July 7, 2011 33
4.25 vs 155. That’s 4 minutes vs. 2.5 hours.
44. 9.67 inserts/sec
vs.
352.94 inserts/sec
Thursday, July 7, 2011 34
The same numbers as inserts/sec.
45. 36x
Thursday, July 7, 2011 35
36x performance improvement with SSD. So we ended up using it in production.
46. Thursday, July 7, 2011 36
Works well, searches from all kinds of content (here Git commit messages and deployment
emails), queries typically take only tens of milliseconds max.
47. Questions / Comments?
@flowdock / otto@flowdock.com
Thursday, July 7, 2011 37
This was a very specific full-text search implementation. The fact that we didn’t need to rank
search results made it trivial.
I’m happy to discuss other use cases. Please share your thoughts and experiences.