When dealing with infrastructure we often go through the process of determining the different resources needed to attend our application requirements. This talks looks into the way that resources are used by MongoDB and which aspects should be considered to determined the sizing, capacity and deployment of a MongoDB cluster given the different scenarios, different sets of operations and storage engines available.
Powerpoint file(incl. animations!): http://db.tt/oQiXb9lq
This is the slides of the presentation "Wordpress optimization" who presented at WordCamp 2013.
How to improve your wordpress performance and speed up your website more than 700% faster!
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL-Consulting
This talk is prepared as a bunch of slides, where each slide describes a really bad way people can screw up their PostgreSQL database and provides a weight - how frequently I saw that kind of problem. Right before the talk I will reshuffle the deck to draw ten random slides and explain you why such practices are bad and how to avoid running into them.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
When dealing with infrastructure we often go through the process of determining the different resources needed to attend our application requirements. This talks looks into the way that resources are used by MongoDB and which aspects should be considered to determined the sizing, capacity and deployment of a MongoDB cluster given the different scenarios, different sets of operations and storage engines available.
Powerpoint file(incl. animations!): http://db.tt/oQiXb9lq
This is the slides of the presentation "Wordpress optimization" who presented at WordCamp 2013.
How to improve your wordpress performance and speed up your website more than 700% faster!
PostgreSQL worst practices, version FOSDEM PGDay 2017 by Ilya KosmodemianskyPostgreSQL-Consulting
This talk is prepared as a bunch of slides, where each slide describes a really bad way people can screw up their PostgreSQL database and provides a weight - how frequently I saw that kind of problem. Right before the talk I will reshuffle the deck to draw ten random slides and explain you why such practices are bad and how to avoid running into them.
Optimizing MongoDB: Lessons Learned at Localyticsandrew311
Tips, tricks, and gotchas learned at Localytics for optimizing MongoDB installs. Includes information about document design, indexes, fragmentation, migration, AWS EC2/EBS, and more.
Git Fusion manages two inherently different branching models. Learn the ramifications of changing branch mappings, using fully populated or lightweight branches in Git Fusion and the purpose of “ghost” changes.
The need to scale is in high demand in an age where everything is moving to the cloud. Though the standard Apache configuration could handle a website with moderate traffic, the minute it gets slash dotted or twitted multiple times could spell an embarrassing crash landing! If you are the administrator of such a website then good luck finding another job! On the other hand you value high availability in the midst of popularity then read on. On this one day workshop, we will show you how to scale your website and webapps to scale to handle thousands of simultaneous sessions the right way. The topics covered will include:
- Setting up Apache and NGiNXM
- Setting up a sample LAMP web app
- Benchmarking Apache performance
- Fine tuning Apache to improve performance
- Fine tuning NGiNX to improve performance
- Discussion about code level improvements when developing custom webapps using PHP
These are the slides I presented at the Nosql Night in Boston on Nov 4, 2014. The slides were adapted from a presentation given by Steve Francia in 2011. Original slide deck can be found here:
http://spf13.com/presentation/mongodb-sort-conference-2011
Understanding how memory is managed with MongoDB is instrumental in maximizing database performance and hardware utilisation. This talk covers the workings of low level operating system components like the page cache and memory mapped files. We will examine the differences between RAM, SSD and hard disk drives to help you choose the right hardware configuration. Finally, we will learn how to monitor and analyze memory and disk usage using the MongoDB Management Service, linux administration commands and MongoDB commands.
Perforce BTrees: The Arcane and the ProfanePerforce
"Get a tour of Perforce BTree history, its behaviors and configuration. Learn about performance alternatives, space management tools and future projects, too."
This talk was presented at OSCON 2009 and YAPC::NA 2009. The choices discussed here are still relevant, although Plack has opened up some new options. For a more up-to-date take on this material I would recommend Tatsuhiko Miyagawa's most recent Plack/PSGI talks.
Some of the most common questions we hear from users relate to capacity planning and hardware choices. How many replicas do I need? Should I consider sharding right away? How much RAM will I need for my working set? SSD or HDD? No one likes spending a lot of cash on hardware and cloud bills can just be as painful. MongoDB is different from traditional RDBMSs in its resource management, so you need to be mindful when deciding on the cluster layout and hardware. In this talk we will review the factors that drive the capacity requirements: volume of queries, access patterns, indexing, working set size, among others. Attendees will gain additional insight as we go through a few real-world scenarios, as experienced with MongoDB Inc customers, and come up with their ideal cluster layout and hardware.
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...Ontico
Когда стоит дилемма, какое DBMS решение выбрать, то приходится принимать во внимание много факторов — latency, bandwidth, ACID-complience, наличие/отсутствие server-side-scripting, возможности репликации, удобство развертывания и администрирования, наличие известных багов или maintenance window и т.д.
Я хочу рассказать лишь об одном из факторов, который имеет особенное значение на проектах с многомиллионными аудиториями — это Total Cost of Ownership или, по-простому, цена. Чем больше аудитория у проекта, тем больше эта аудитория создает нагрузку на базы данных, тем больше должно быть серверов с базами данных, тем больше финансовых затрат это требует.
Можно экстенсивно наращивать количество серверов, но до определенного предела, когда становится понятным, что далее дешевле будет внедрить новое, более производительное решение, которое позволит радикально снизить цену и количество железа.
Мой рассказ будет посвящен тому, как мы в Почте@Mail.Ru перешли на Tarantool, и как его использование сэкономило нам миллион долларов.
Search in WordPress - how it works and howto customize itOtto Kekäläinen
WordPress search customization is a topic we at Seravo get asked about on a frequent basis. There are many different ways to customize the search, and customers understandably want to learn the best practices. The search can be customized quite easily with small changes on PHP code level, and by utilizing MariaDB database’s built-in search functionality. You can also choose a more robust way to do this, and build a new ElasticSearch server just for your case.
These slides are from the webinar on January 14th, 2021: https://seravo.com/blog/webinar-search-function-and-how-to-customize-it/
Scalable Text File Service with MongoDB (Intuit)MongoDB
Docstoc.com (founded in 2007, acquired by Intuit in 2013) is one of the largest online repositories of documents. A critical component of our product is our text file service, which delivers text documents to both humans and crawlers. In early 2013 this service, which was file system based, became a prohibitive bottleneck. To meet our scaling needs, we replaced it with one backed by a sharded MongoDB cluster. This talk will cover:
Our traffic load (5:1 bots:humans ratio) How we implemented the system in our SOA environment How MongoDB fit our use case out of the box How we load tested peak time traffic before hardware purchase How we loaded the system and how we rolled it out live Performance metrics and gains in stability and reliability
Development to Production with Sharded MongoDB ClustersSeveralnines
Severalnines presentation at MongoDB Stockholm Conference.
Presentation covers:
- mongoDB sharding/clustering concepts
- recommended dev/test/prod setups
- how to verify your deployment
- how to avoid downtime
- what MongoDB metrics to watch
- when to scale
Git Fusion manages two inherently different branching models. Learn the ramifications of changing branch mappings, using fully populated or lightweight branches in Git Fusion and the purpose of “ghost” changes.
The need to scale is in high demand in an age where everything is moving to the cloud. Though the standard Apache configuration could handle a website with moderate traffic, the minute it gets slash dotted or twitted multiple times could spell an embarrassing crash landing! If you are the administrator of such a website then good luck finding another job! On the other hand you value high availability in the midst of popularity then read on. On this one day workshop, we will show you how to scale your website and webapps to scale to handle thousands of simultaneous sessions the right way. The topics covered will include:
- Setting up Apache and NGiNXM
- Setting up a sample LAMP web app
- Benchmarking Apache performance
- Fine tuning Apache to improve performance
- Fine tuning NGiNX to improve performance
- Discussion about code level improvements when developing custom webapps using PHP
These are the slides I presented at the Nosql Night in Boston on Nov 4, 2014. The slides were adapted from a presentation given by Steve Francia in 2011. Original slide deck can be found here:
http://spf13.com/presentation/mongodb-sort-conference-2011
Understanding how memory is managed with MongoDB is instrumental in maximizing database performance and hardware utilisation. This talk covers the workings of low level operating system components like the page cache and memory mapped files. We will examine the differences between RAM, SSD and hard disk drives to help you choose the right hardware configuration. Finally, we will learn how to monitor and analyze memory and disk usage using the MongoDB Management Service, linux administration commands and MongoDB commands.
Perforce BTrees: The Arcane and the ProfanePerforce
"Get a tour of Perforce BTree history, its behaviors and configuration. Learn about performance alternatives, space management tools and future projects, too."
This talk was presented at OSCON 2009 and YAPC::NA 2009. The choices discussed here are still relevant, although Plack has opened up some new options. For a more up-to-date take on this material I would recommend Tatsuhiko Miyagawa's most recent Plack/PSGI talks.
Some of the most common questions we hear from users relate to capacity planning and hardware choices. How many replicas do I need? Should I consider sharding right away? How much RAM will I need for my working set? SSD or HDD? No one likes spending a lot of cash on hardware and cloud bills can just be as painful. MongoDB is different from traditional RDBMSs in its resource management, so you need to be mindful when deciding on the cluster layout and hardware. In this talk we will review the factors that drive the capacity requirements: volume of queries, access patterns, indexing, working set size, among others. Attendees will gain additional insight as we go through a few real-world scenarios, as experienced with MongoDB Inc customers, and come up with their ideal cluster layout and hardware.
Tarantool: как сэкономить миллион долларов на базе данных на высоконагруженно...Ontico
Когда стоит дилемма, какое DBMS решение выбрать, то приходится принимать во внимание много факторов — latency, bandwidth, ACID-complience, наличие/отсутствие server-side-scripting, возможности репликации, удобство развертывания и администрирования, наличие известных багов или maintenance window и т.д.
Я хочу рассказать лишь об одном из факторов, который имеет особенное значение на проектах с многомиллионными аудиториями — это Total Cost of Ownership или, по-простому, цена. Чем больше аудитория у проекта, тем больше эта аудитория создает нагрузку на базы данных, тем больше должно быть серверов с базами данных, тем больше финансовых затрат это требует.
Можно экстенсивно наращивать количество серверов, но до определенного предела, когда становится понятным, что далее дешевле будет внедрить новое, более производительное решение, которое позволит радикально снизить цену и количество железа.
Мой рассказ будет посвящен тому, как мы в Почте@Mail.Ru перешли на Tarantool, и как его использование сэкономило нам миллион долларов.
Search in WordPress - how it works and howto customize itOtto Kekäläinen
WordPress search customization is a topic we at Seravo get asked about on a frequent basis. There are many different ways to customize the search, and customers understandably want to learn the best practices. The search can be customized quite easily with small changes on PHP code level, and by utilizing MariaDB database’s built-in search functionality. You can also choose a more robust way to do this, and build a new ElasticSearch server just for your case.
These slides are from the webinar on January 14th, 2021: https://seravo.com/blog/webinar-search-function-and-how-to-customize-it/
Scalable Text File Service with MongoDB (Intuit)MongoDB
Docstoc.com (founded in 2007, acquired by Intuit in 2013) is one of the largest online repositories of documents. A critical component of our product is our text file service, which delivers text documents to both humans and crawlers. In early 2013 this service, which was file system based, became a prohibitive bottleneck. To meet our scaling needs, we replaced it with one backed by a sharded MongoDB cluster. This talk will cover:
Our traffic load (5:1 bots:humans ratio) How we implemented the system in our SOA environment How MongoDB fit our use case out of the box How we load tested peak time traffic before hardware purchase How we loaded the system and how we rolled it out live Performance metrics and gains in stability and reliability
Development to Production with Sharded MongoDB ClustersSeveralnines
Severalnines presentation at MongoDB Stockholm Conference.
Presentation covers:
- mongoDB sharding/clustering concepts
- recommended dev/test/prod setups
- how to verify your deployment
- how to avoid downtime
- what MongoDB metrics to watch
- when to scale
An initial work on coming out with specialized postgraduate programme focusing on software testing and quality, both for full-time and part-time studies. Presented at Malaysian Software Engineering Conference 2014 (MySEC14) in Langkawi, Malaysia.
Paper presented during International Conference on Computer and Information Science 2012 (ICCIS2012) as part of World Engineering, Science and Technology Congress 2012 (ESTCON2012)
Paper presented at The Second International Conference on E-Technologies and Business on the Web (EBW2014) at Asia Pacific University of Technology and Innovation (APU), Bukit Jalil, Malaysia
Silicon Valley Code Camp 2014 - Advanced MongoDBDaniel Coupal
MongoDB presentation from Silicon Valley Code Camp 2014.
Walkthrough developing, deploying and operating a MongoDB application, avoiding the most common pitfalls.
Partner Webinar: The Scaling Checklist for MongoDB - 100GB and beyondMongoDB
MongoHQ knows there is something special about 100 GB of data. Our customers that hit 100 GB are running core pieces of their business on a scalable MongoDB platform. In this presentation, we will walk through a cloud focused scaling checklist that will help you quickly and securely blow past the 100 GB milestone. Using customer examples and best practice MongoDB use cases, we'll help prepare you to get to the data size your business needs.
SharePoint Saturday San Antonio: SharePoint 2010 PerformanceBrian Culver
Is your farm struggling to server your organization? How long is it taking between page requests? Where is your bottleneck in your farm? Is your SQL Server tuned properly? Worried about upgrading due to poor performance? We will look at various tools for analyzing and measuring performance of your farm. We will look at simple SharePoint and IIS configuration options to instantly improve performance. I will discuss advanced approaches for analyzing, measuring and implementing optimizations in your farm.
MongoDB 3.0 comes with a set of innovations regarding storage engine, operational facilities and improvements has well of security enhancements. This presentations describes these improvements and new features ready to be tested.
https://www.mongodb.com/lp/white-paper/mongodb-3.0
Benchmarking, Load Testing, and Preventing Terrible DisastersMongoDB
"Have you ever crossed your fingers before performing an upgrade or switching storage engines, because you weren't quite sure what would happen? Have you ever been bitten by a slight change in behavior that turned out to be unexpectedly significant for your workload? At Parse we have developed a workflow that lets us repeatedly capture and replay real production workloads offline. This has allowed us to confidently perform upgrades across a large fleet with a minimum amount of canarying, and has helped us load test a variety of storage engines with real workloads so we can compare and understand the performance tradeoffs.
In this talk we will cover best practices for upgrades and migrations, and we will walk through how to use our open-sourced tooling to demonstrate how you can do the same. We will also share some fun war stories about various disasters found and averted *before* putting them into production thanks to offline benchmarking."
2019 StartIT - Boosting your performance with BlackfireMarko Mitranić
A workshop held in StartIT as part of Catena Media learning sessions.
We aim to dispel the notion that large PHP applications tend to be sluggish, resource-intensive and slow compared to what the likes of Python, Erlang or even Node can do. The issue is not with optimising PHP internals - it's the lack of proper introspection tools and getting them into our every day workflow that counts! In this workshop we will talk about our struggles with whipping PHP Applications into shape, as well as work together on some of the more interesting examples of CPU or IO drain.
Salvatore Incandela "Loyalty cashback - Scaling with MongoDB"Paybay
Salvatore Incandela Chief Architect, presented "Loyalty cashback - Scaling with MongoDB" on 25/09/2014 in Florence, at the "BIGDATATECH" an event in which they were presented some use cases of Big Data in real context.
Similar to Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg) (20)
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
5. Technology Stack
Design principles: Simple and stateless
API
• Scala / MongoDB
API Clients
• Web application (Python/Django)
• iPhone (Obj-C)
5
6. Hot Potato API - Why MongoDB?
• Good documentation and excellent support
• Fully Featured (like an RDBMS) but with a migration path to scale-
out
• Replication
• Easy administration / scripting
• Fast
• Auto-sharding coming soon
6
8. Hot Potato API - Why MongoDB?
RDBMS Optimization process
7
9. Hot Potato API - Why MongoDB?
RDBMS Optimization process
1. too many reads -> add cache
7
10. Hot Potato API - Why MongoDB?
RDBMS Optimization process
1. too many reads -> add cache
2. too many joins -> de-normalize your data
7
11. Hot Potato API - Why MongoDB?
RDBMS Optimization process
1. too many reads -> add cache
2. too many joins -> de-normalize your data
3. too many writes -> scale up hardware, CPU, RAM, IO
7
12. Hot Potato API - Why MongoDB?
RDBMS Optimization process
1. too many reads -> add cache
2. too many joins -> de-normalize your data
3. too many writes -> scale up hardware, CPU, RAM, IO
4. too much db load -> eliminate triggers, stored procedures
7
13. Hot Potato API - Why MongoDB?
RDBMS Optimization process
1. too many reads -> add cache
2. too many joins -> de-normalize your data
3. too many writes -> scale up hardware, CPU, RAM, IO
4. too much db load -> eliminate triggers, stored procedures
5. too much db load -> pre-materialize complex queries
7
14. Hot Potato API - Why MongoDB?
RDBMS Optimization process
1. too many reads -> add cache
2. too many joins -> de-normalize your data
3. too many writes -> scale up hardware, CPU, RAM, IO
4. too much db load -> eliminate triggers, stored procedures
5. too much db load -> pre-materialize complex queries
6. writes bottleneck -> drop secondary indexes
7
15. Hot Potato API - Why MongoDB?
RDBMS Optimization process
1. too many reads -> add cache
2. too many joins -> de-normalize your data
3. too many writes -> scale up hardware, CPU, RAM, IO
4. too much db load -> eliminate triggers, stored procedures
5. too much db load -> pre-materialize complex queries
6. writes bottleneck -> drop secondary indexes
At this point you are using the RDBMS as a KV store. The full-
featured nature of the RDBMS is merely getting in the way of scaling.
7
18. Hot Potato API - Why MongoDB?
Mongo comparison
1. reads start out fast, can be externally cached, can also be
horizontally scaled
8
19. Hot Potato API - Why MongoDB?
Mongo comparison
1. reads start out fast, can be externally cached, can also be
horizontally scaled
2. mongo doesn’t support joins - data starts de-normalized
8
20. Hot Potato API - Why MongoDB?
Mongo comparison
1. reads start out fast, can be externally cached, can also be
horizontally scaled
2. mongo doesn’t support joins - data starts de-normalized
3. writes start out fast, can be horizontally scaled
8
21. Hot Potato API - Why MongoDB?
Mongo comparison
1. reads start out fast, can be externally cached, can also be
horizontally scaled
2. mongo doesn’t support joins - data starts de-normalized
3. writes start out fast, can be horizontally scaled
4. much less opportunity for logic in the db
8
22. Hot Potato API - Why MongoDB?
Mongo comparison
1. reads start out fast, can be externally cached, can also be
horizontally scaled
2. mongo doesn’t support joins - data starts de-normalized
3. writes start out fast, can be horizontally scaled
4. much less opportunity for logic in the db
5. map/reduce for pre-materialization/aggregation, which can be
horizontally scaled
8
23. Hot Potato API - Why MongoDB?
Mongo comparison
1. reads start out fast, can be externally cached, can also be
horizontally scaled
2. mongo doesn’t support joins - data starts de-normalized
3. writes start out fast, can be horizontally scaled
4. much less opportunity for logic in the db
5. map/reduce for pre-materialization/aggregation, which can be
horizontally scaled
6. indexes can be removed over time as usage becomes more KV-like
8
24. Hot Potato API - Why MongoDB?
Mongo comparison
1. reads start out fast, can be externally cached, can also be
horizontally scaled
2. mongo doesn’t support joins - data starts de-normalized
3. writes start out fast, can be horizontally scaled
4. much less opportunity for logic in the db
5. map/reduce for pre-materialization/aggregation, which can be
horizontally scaled
6. indexes can be removed over time as usage becomes more KV-like
The gradual scaling process is more natural with Mongo. Horizontal
scaling is not an after-thought or bolt on addition. Mongo works
perfectly well as a KV store if you ever get to that point.
8
25. Hot Potato API - Why Scala?
• Runs on JVM (Stable, Fast)
• Access to Java’s many libraries
• Language benefits
• Terse
• Supports Immutability
• Functional
• Concurrent
• Easy to write DSLs
9
26. Three key Scala features
• Pattern Matching / Case classes
• Implicit conversions
• Actors
10
27. Building a DSL for Mongo
Documents are a flexible building block:
• Insertion
• Updates
• Queries
• Sorting
• parts of Map / Reduce
• Indexes
11
28. Building a DSL for Mongo
Goals
• Stay close to the MongoDB Java API
• Keep it flexible
• Focus on document creation
• New documents, queries, updates
• Protect against mis-named fields
Key classes and objects
• Collection - wraps MongoDB DBCollection
• MongoAST - defines the types for building Mongo documents
• MongoDSL - defines DSL syntax
12
29. Usage Patterns
Asynchronous atomic updates
• Simple observer pattern implemented with Lift Actors
• HpActor, HpActorPool, Notifier
• Main action object with a Notifier
• Example: checkins
• Listeners that use MongoDB atomic updates
• Example: EventAdjuster
13