The document discusses migrating the website Sluggy.com from a MySQL database to MongoDB. It provides an overview of Sluggy.com and its history using various technologies from 1997 to the present. Key lessons learned include over-reliance on memcached masking other issues and MongoDB providing better performance than MySQL by being more forgiving of inconsistencies and reducing the need for caching layers.
Using RAG to create your own Podcast conversations.pdfRichard Rodger
The presentation on "Retrieval Augmented Generation for Interactive Podcasts" outlines a method to transform podcast audio into interactive chat interfaces. It covers the design, coding, and practical aspects of using Retrieval Augmented Generation (RAG) to emulate podcast guest responses. The project involves processing a significant volume of podcast data, including episodes, transcripts, and metadata.
The technical discussion focuses on ingesting audio and metadata into an AI system and querying it for conversational responses. Key concepts introduced include vector embedding, which converts text to conceptual vectors using models, and the application of Large Language Models (LLMs) with Transformer architecture for context understanding.
The coding segment details microservice messages for transcript ingestion and chat functionalities, employing transcription services, embedding techniques, and vector storage solutions. Challenges in RAG project deployment are also discussed, highlighting performance, quality, regressions, and managing expectations.
The presentation concludes by contrasting technical complexities with a philosophical vision of AI's potential, inspired by speculative fiction, suggesting a future where AI capabilities vastly exceed human cognitive functions. Further resources and open-source implementations are provided for those interested in the technical development of interactive podcast systems
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareAltinity Ltd
Presented on December ClickHouse Meetup. Dec 3, 2019
Concrete findings and "best practices" from building a cluster sized for 150 analytic queries per second on 100TB of http logs. Topics covered: hardware, clients (http vs native), partitioning, indexing, SELECT vs INSERT performance, replication, sharding, quotas, and benchmarking.
The Anatomy Of The Google Architecture Fina Lv1.1Hassy Veldstra
A comprehensive overview of Google's architecture - starting from the search page and all the way to its internal networks.
By Ed Austin, talk given at Edinburgh Techmeetup in December 2009
http://techmeetup.co.uk
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
From the StampedeCon 2015 Big Data Conference: There is an adage, “If you fail to plan, you plan to fail” . When developing systems the adage can be taken a step further, “If you fail to plan FOR FAILURE, you plan to fail”. At Huffington post data moves between a number of systems to provide statistics for our technical, business, and editorial teams. Due to the mission-critical nature of our data, considerable effort is spent building resiliency into processes.
This talk will focus on designing for failure. Some material will focus understanding the traits of specific distributed systems such as message queues or NoSQL databases and what are the consequences for different types of failures. While other parts of the presentation will focus on how systems and software can be designed to make re-processing batch data simple, or how to determine what failure mode semantics are important for a real time event processing system.
In May of 2012, Socialcam exploded, gaining tens of millions of new users in just a few weeks. At the time, the service ran on 15 servers in a co-location facility in San Francisco. To meet new user traffic demands and continue to deliver maximum user satisfaction, Socialcam made the move to cloud services. With only two engineers and a constant barrage of users, there was limited time for technical transition, but Socialcam endured with no significant downtime. In this technical session, Socialcam co-founders Guillaume Luccisano and Ammon Bartram talk about their experience scaling Socialcam. They present the challenges they encountered, how they addressed them, and the technologies they used in the process. They focus particularly on how they used Amazon services in conjunction with their own hardware to keep Socialcam active with no significant downtime and no costly system redesign.
The Node.js movement has transformed the landscape of UI development. In this session we'll look at how Node.js can be leveraged on multiple layers of the web application development lifecycle. Attendees will learn how incorporating Node.js into your front-end build process can optimize code, allow you to use use new and upcoming JavaScript features in your code today, and to improve your asset delivery pipeline. This session will also cover how Node is changing the template rendering landscape, allowing developers to write "isomorphic" code that runs on the client and server. Lastly we'll look into using Node to achieve developer zen by keeping the codebase clean and limiting the risk of changes to the code causing unknown errors.
In this tutorial we will take Mediawiki, one of the most popular open-source wiki applications, and we will use it to discuss schema and scalability design problems we suffered on Wikipedia. We will discuss how they were solved (or how we plan to solve them).
Forget about basic normalization theory or performance promises made by vendors in an ideal world. All topics discussed will be based on *actual problems* found when trying to handle the infrastructure of one of the top 10 most popular websites.
* Target audience: Developers using MySQL from any programming language; or system administrators, devops and architects in charge of the data model of its application
* Requirements: Basic SQL. Ability to read PHP. Basic familiarity with how wikis/Wikipedia works.
* Session dynamic: You are expected to contribute actively- it will not be a lecture format
* Topics (practical cases):
- Case #0: Pages and revisions
- Case #2: Supporting 290 languages
- Case #3: An abnormal denormalization
- Case #4: Key-value system
- Case #5: Revisions and deletions
- Case #6: What's links here
- Case #7: A large table
- Case #8: Anecdotes: The ghost tables and Timestamps
- Case #9: Slots
https://www.percona.com/live/plam16/sessions/mysql-schema-design-practice
Using RAG to create your own Podcast conversations.pdfRichard Rodger
The presentation on "Retrieval Augmented Generation for Interactive Podcasts" outlines a method to transform podcast audio into interactive chat interfaces. It covers the design, coding, and practical aspects of using Retrieval Augmented Generation (RAG) to emulate podcast guest responses. The project involves processing a significant volume of podcast data, including episodes, transcripts, and metadata.
The technical discussion focuses on ingesting audio and metadata into an AI system and querying it for conversational responses. Key concepts introduced include vector embedding, which converts text to conceptual vectors using models, and the application of Large Language Models (LLMs) with Transformer architecture for context understanding.
The coding segment details microservice messages for transcript ingestion and chat functionalities, employing transcription services, embedding techniques, and vector storage solutions. Challenges in RAG project deployment are also discussed, highlighting performance, quality, regressions, and managing expectations.
The presentation concludes by contrasting technical complexities with a philosophical vision of AI's potential, inspired by speculative fiction, suggesting a future where AI capabilities vastly exceed human cognitive functions. Further resources and open-source implementations are provided for those interested in the technical development of interactive podcast systems
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareAltinity Ltd
Presented on December ClickHouse Meetup. Dec 3, 2019
Concrete findings and "best practices" from building a cluster sized for 150 analytic queries per second on 100TB of http logs. Topics covered: hardware, clients (http vs native), partitioning, indexing, SELECT vs INSERT performance, replication, sharding, quotas, and benchmarking.
The Anatomy Of The Google Architecture Fina Lv1.1Hassy Veldstra
A comprehensive overview of Google's architecture - starting from the search page and all the way to its internal networks.
By Ed Austin, talk given at Edinburgh Techmeetup in December 2009
http://techmeetup.co.uk
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
From the StampedeCon 2015 Big Data Conference: There is an adage, “If you fail to plan, you plan to fail” . When developing systems the adage can be taken a step further, “If you fail to plan FOR FAILURE, you plan to fail”. At Huffington post data moves between a number of systems to provide statistics for our technical, business, and editorial teams. Due to the mission-critical nature of our data, considerable effort is spent building resiliency into processes.
This talk will focus on designing for failure. Some material will focus understanding the traits of specific distributed systems such as message queues or NoSQL databases and what are the consequences for different types of failures. While other parts of the presentation will focus on how systems and software can be designed to make re-processing batch data simple, or how to determine what failure mode semantics are important for a real time event processing system.
In May of 2012, Socialcam exploded, gaining tens of millions of new users in just a few weeks. At the time, the service ran on 15 servers in a co-location facility in San Francisco. To meet new user traffic demands and continue to deliver maximum user satisfaction, Socialcam made the move to cloud services. With only two engineers and a constant barrage of users, there was limited time for technical transition, but Socialcam endured with no significant downtime. In this technical session, Socialcam co-founders Guillaume Luccisano and Ammon Bartram talk about their experience scaling Socialcam. They present the challenges they encountered, how they addressed them, and the technologies they used in the process. They focus particularly on how they used Amazon services in conjunction with their own hardware to keep Socialcam active with no significant downtime and no costly system redesign.
The Node.js movement has transformed the landscape of UI development. In this session we'll look at how Node.js can be leveraged on multiple layers of the web application development lifecycle. Attendees will learn how incorporating Node.js into your front-end build process can optimize code, allow you to use use new and upcoming JavaScript features in your code today, and to improve your asset delivery pipeline. This session will also cover how Node is changing the template rendering landscape, allowing developers to write "isomorphic" code that runs on the client and server. Lastly we'll look into using Node to achieve developer zen by keeping the codebase clean and limiting the risk of changes to the code causing unknown errors.
In this tutorial we will take Mediawiki, one of the most popular open-source wiki applications, and we will use it to discuss schema and scalability design problems we suffered on Wikipedia. We will discuss how they were solved (or how we plan to solve them).
Forget about basic normalization theory or performance promises made by vendors in an ideal world. All topics discussed will be based on *actual problems* found when trying to handle the infrastructure of one of the top 10 most popular websites.
* Target audience: Developers using MySQL from any programming language; or system administrators, devops and architects in charge of the data model of its application
* Requirements: Basic SQL. Ability to read PHP. Basic familiarity with how wikis/Wikipedia works.
* Session dynamic: You are expected to contribute actively- it will not be a lecture format
* Topics (practical cases):
- Case #0: Pages and revisions
- Case #2: Supporting 290 languages
- Case #3: An abnormal denormalization
- Case #4: Key-value system
- Case #5: Revisions and deletions
- Case #6: What's links here
- Case #7: A large table
- Case #8: Anecdotes: The ghost tables and Timestamps
- Case #9: Slots
https://www.percona.com/live/plam16/sessions/mysql-schema-design-practice
Similar to MySQL to MongoDB @ Sluggy.com: MongoDB Boston, September 20, 2010 (20)
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
A tale of scale & speed: How the US Navy is enabling software delivery from l...
MySQL to MongoDB @ Sluggy.com: MongoDB Boston, September 20, 2010
1. Migrating from MySQL to MongoDB at Sluggy.com
Brendan W. McAdams
Evil Monkey Labs, LLC
Mongo Boston Conference - Sep. 20, 2010
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 1 / 42
2. Outline
1 Introduction
Basic Rundown
Technology History
2 What We Learned
Lessons Learned Over Time
Open Source Code Yielded
Why MongoDB?
3 Show Me The Code!
The Old: MySQL Snippets
4 Final Items
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 2 / 42
3. Outline
1 Introduction
Basic Rundown
Technology History
2 What We Learned
Lessons Learned Over Time
Open Source Code Yielded
Why MongoDB?
3 Show Me The Code!
The Old: MySQL Snippets
4 Final Items
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 3 / 42
4. Sluggy.com Rundown
Live since August 25, 1997
Updated Every Day (even if it’s just filler)
More Users == More Load
More Load == More Hardware
On Advertising Revenues (e.g. Paying the Bills), Pete says: I tend
to think of advertising as a finky spastic mentally retarded cat who sometimes wants to
jump in my lap and other times wants to hiss at me and run for the litterbox and often
walks in circles trying to figure out which of the two it wants, followed by dropping
dead with a final thought..."ohhh! food!"
Not Google... Not Trying to be Google. (Can’t afford to think or
scale like Google.
No dedicated staff or operations budget - advertising revenues
cover server costs. Any downtime (be it bugs or system failure)
means I get up at 3am to fix it.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 4 / 42
5. Sluggy.com Rundown
Live since August 25, 1997
Updated Every Day (even if it’s just filler)
More Users == More Load
More Load == More Hardware
On Advertising Revenues (e.g. Paying the Bills), Pete says: I tend
to think of advertising as a finky spastic mentally retarded cat who sometimes wants to
jump in my lap and other times wants to hiss at me and run for the litterbox and often
walks in circles trying to figure out which of the two it wants, followed by dropping
dead with a final thought..."ohhh! food!"
Not Google... Not Trying to be Google. (Can’t afford to think or
scale like Google.
No dedicated staff or operations budget - advertising revenues
cover server costs. Any downtime (be it bugs or system failure)
means I get up at 3am to fix it.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 4 / 42
6. Sluggy.com Rundown
Live since August 25, 1997
Updated Every Day (even if it’s just filler)
More Users == More Load
More Load == More Hardware
On Advertising Revenues (e.g. Paying the Bills), Pete says: I tend
to think of advertising as a finky spastic mentally retarded cat who sometimes wants to
jump in my lap and other times wants to hiss at me and run for the litterbox and often
walks in circles trying to figure out which of the two it wants, followed by dropping
dead with a final thought..."ohhh! food!"
Not Google... Not Trying to be Google. (Can’t afford to think or
scale like Google.
No dedicated staff or operations budget - advertising revenues
cover server costs. Any downtime (be it bugs or system failure)
means I get up at 3am to fix it.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 4 / 42
7. Sluggy.com Rundown
Some Stats
50GB/day (1.5TB of traffic/month on a single virtual box)
13 years of daily comics = 6500 image files (just for the comics)
Artist is frequently late in updating. System has to handle random
unexpected cache flushes & data updates.
Erratic access behavior: Today’s comic is always popular and
related load can be easily mitigated. However, archives may also
be hit heavily by new readers or links from a newer strip to
previous storylines. Hard to expect where in 6500+ strips people
may be digging from day to day.
“LUMP” Stack (Lighttpd, Ubuntu, MongoDB & Python)
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 5 / 42
8. Outline
1 Introduction
Basic Rundown
Technology History
2 What We Learned
Lessons Learned Over Time
Open Source Code Yielded
Why MongoDB?
3 Show Me The Code!
The Old: MySQL Snippets
4 Final Items
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 6 / 42
9. Sluggy.com Technology History
1997 - 2000
August 25, 1997: Site Started, with basic code by Pete‘s
friends/coworkers.
Static HTML generated via midnight cron executing Perl.
No dynamic content - Hand edited HTML for news, navigation, etc.
File format requires globs:
000217a.gif
000217b.gif
000217c.gif
All make up the panels for February 17, 2000. Artist likes &
understands this format. Code looks for yyMMdd*.(gif|jpg)
via glob and organizes them in order.
2000 - Original Developers split off and formed KeenSpot.com
using same code & navigation concepts.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 7 / 42
10. Sluggy.com Technology History
2002
Rewrite using MySQL & PHP (original goal: “No More HTML
Editing”)
Scope Creep == New Features. Dynamic headlines, news,
predefined templates, dynamic navigation and a “Members Only
Club”.
First Folly - Reading MySQL and dynamically generating on each
page request; Running dynamic code for essentially static content
== FAIL. Disk I/O DoSing ... “call datacenter and cross fingers”
Moved to smarty template caching, generating on-disk cache file
upon first request (expires at midnight).
Next 4 years became hellish with frequent Midnight
crashes/failures as readers pound server for new comic.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 8 / 42
11. Sluggy.com Technology History I
2006
Server move (For cost reasons) introduced new architecture
problems
Perceived cost savings pushed a move from SCSI/SAS disks to
SATA
Between template files & comic file globs. Disks couldn’t keep up.
Implemented memcached to cache templates off-disk, in memory.
Cached glob results (but not files). Cached anything else not likely
to change - expiry set to a week (midnight for “index”)
Sessions performed poorly in both disk and MySQL - caching in
memcached helped.
Apache began crushing memory & disk I/O.
PHP isn’t thread safe; requires forked Apache workers (children are
EXPENSIVE)
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 9 / 42
12. Sluggy.com Technology History II
2006
Migrated to Lighttpd + FastCGI - IO & RAM usage of webserver &
PHP became negligible (Lots of tweaking of handling of static files
esp. stat caching w/ FAM & good event handling):
# without raising default fds (1024) we hit a lot of out of fds @ midnight
# File descriptors include network connections *AND* filesystem handles...
server.max-fds = 4096
# Use a well tuned event handler for connection handling
server.event-handler = "linux-sysepoll"
# Don’t shred the disk pulling static files; w/o a custom engine
# like fam stat() runs for *EVERY* static file fetch.
server.stat-cache-engine = "fam"
# Severely limiting keep-alives paired w/ good Expires headers
server.max-keep-alive-requests = 4
server.max-keep-alive-idle = 2
# Ask politely that browsers don’t keep redownloading static content
expire.url = (
"/javascripts/" => "access 2 weeks",
"/stylesheets/" => "access 2 weeks",
"/icons/" => "access 2 weeks",
"/images" => "access 1 weeks",
"/images/comics/" => "access 1 days"
)
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 10 / 42
13. Sluggy.com Technology History
2009
Rewrote the system in Pylons (Python + SQLAlchemy[MySQL])
Integrated Beaker caching decorators (templates & code blocks) -
simplified adding caching code at need.
Clean ORM model, light & fast with lots of caching.
Ran significantly better than on PHP - infinitely more tunable,
sensible, and sane (Not necessarily a knock on PHP - but it was a
10 year old codebase).
memcached continued to become a big, rickety crutch (cascading
failure sucks)
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 11 / 42
14. Sluggy.com Technology History
Aug. 2009
August 3, 2009: Pylons system (v1.0) Live with MySQL backend.
Huge amounts of our code (as much as 80%) was dedicated to
converting UI Objects to and from Database objects. WTF?...
Most initial bugs occurred in this model <-> view layer.
No more forking - Pylons & Python run threaded via SCGI (Similar
to FastCGI). System resources significantly less taxed by the
presentation stack.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 12 / 42
15. Sluggy.com Technology History
Aug. 2009
August 14, 2009: v1.10 Went Live - MySQL replaced by
MongoDB (and MongoKit)
Easy Migration - MongoKit was quickly dropped in place and
queries adjusted to new model (Stuck to MySQL schema as much
as possible)
Maintained all bug fixes on MySQL branch for a few weeks “just in
case”
Performance vastly improved.
Over next few months, built tools to use MongoDB in place of
memcached for caching (mongodb_beaker)
LAMP replaced by LUMP (Lighttpd, Ubuntu, MongoDB & Python)
A few things left in memcached through a combination of “makes
sense there” and indolence.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 13 / 42
16. Sluggy.com Technology History
Sept. 2009
MongoDB completely obviated dependency on dedicated Physical
hardware; when a major issue with our ISP came up, migrated to
Virtual hosting (slicehost) instead of Physical Hardware.
Average system load is 0.05 on a 2G slice.
MongoDB uses 1% of CPU on average.
Switchover to MongoDB version took 2 minutes (ran data
conversion script, deployed new code tag, bounced webserver /
Pylons app)
No downtime in any way attributable to MongoDB since go live -
now live with MongoDB over a year.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 14 / 42
17. Outline
1 Introduction
Basic Rundown
Technology History
2 What We Learned
Lessons Learned Over Time
Open Source Code Yielded
Why MongoDB?
3 Show Me The Code!
The Old: MySQL Snippets
4 Final Items
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 15 / 42
18. memcached can rapidly become a crutch. . .
meant to make up for RDBMS’ shortfalls but often masks other issues. . .
memcached can be great for things you can afford to lose.
It’s not just about what you “can’t afford to lose”. Beware of
cascading failures.
Over reliance can cause self-DoSing after a crash, reboot,
accidental flush (even of just one keyset) etc... Lesson learned the
hard way.
See Coders at Work (Siebel) for a great discussion with its
creator, Brian Fitzpatrick (founded LiveJournal & now a Google
employee), about what led to memcached’s creation.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 16 / 42
19. As long as I’m being hyperbolic. . . I
MongoDB is a bionic leg replacement. . .
MongoDB’s MMAP system gives you a “free” MRU cache. Done
right and simple; caching on MongoDB is durable, light and fast.
Significantly educes amount of scalability-glue code.
No piles of special code manage caching; if it falls out of memory
cache, it is still safely persisted to disk.
The more you can put in memory, the less you beat on your disks.
Especially important on virtual hosting: Be a Good Neighbor.
But. . . don’t build your MongoDB system like a MySQL system (it’ll
work, but you rapidly lose speed and flexibility)
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 17 / 42
20. As long as I’m being hyperbolic. . . II
MongoDB is a bionic leg replacement. . .
DBRefs should be used sparingly - favor embedded objects (don’t
be afraid to denormalize and duplicate data); autorefs can be even
worse as there’s a performance penalty imposed.
Flexible schemas are good.
Wasting your time mapping data back and forth between your
presentation layer & RDBMS is not just tedious - it’s error prone.
Object Mappers for MongoDB are fantastic tools but don’t overuse
them - you take a huge flexibility & performance hit.
Use field specifications, query operators, and atomic updates for
maximum effectiveness. MongoDB excels at slicing out specific
parts of a document - especially from embedded/nested fields.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 18 / 42
21. Caveats
While there is a lot “wrong” with our first pass implementation,
MongoDB has been consistent, performant and most importantly:
forgiving.
Someone has to enforce a consistent schema - if it’s not your
datastore (like a RDBMS does) then your code or ops people (or
both) have to.
The MongoDB community is vibrant, supportive and consistently
brilliant. Use your community to build the best possible product.
Corollary: If there is not a vibrant, supportive and intelligent
community behind a product you are evaluating. . . run.
Participate: A community that takes and never gives back cannot
thrive. Sharing your knowledge and experience goes a long way.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 19 / 42
22. Outline
1 Introduction
Basic Rundown
Technology History
2 What We Learned
Lessons Learned Over Time
Open Source Code Yielded
Why MongoDB?
3 Show Me The Code!
The Old: MySQL Snippets
4 Final Items
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 20 / 42
23. Open Source Code I
mongodb_beaker: Beaker Caching for MongoDB
Open Source caching plugin for the Python Beaker stack.
Uses distutils plugin entry points.
Switching from memcached to Beaker + MongoDB required a 2
line config file change:
- beaker.session.type = libmemcached
- beaker.session.url = 127.0.0.1:11211
+ beaker.session.type = mongodb
+ beaker.session.url = mongodb://localhost:27017/emergencyPants#sessions
Lots of useful options in MongoDB Beaker.
A few limitations on the beaker side which need changes in
Beaker (manipulable cache data).
Patch incubating for better replica, shard and master/slave
support.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 21 / 42
24. Open Source Code I
MongoKit-Pylons: Pylons patches for Python MongORM
Added support to MongoKit to run within a Pylons environment
(threadlocal setup of connection pool)
Adding a globally available thread safe connection pool to Pylon
was simple. Add 2 lines to config/environment.py:
from mongokit.ext.pylons_env import MongoPylonsEnv
MongoPylonsEnv.init_mongo()
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 22 / 42
25. Open Source Code II
MongoKit-Pylons: Pylons patches for Python MongORM
Added a few other features to simplify SQLAlchemy migration
setattr / getattr support to allow mongoDoc.field instead of the dict
interface (mongoDoc[’field’])
DB Authentication
A few missing corners such as additional datatypes, enhanced
index definitions on-document, group statement shortcuts, etc.
Integrated support for autoreferences (which was/are mostly a very
bad idea)
Changes merged into MongoKit Trunk (MongoKit has stellar unit
test coverage)
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 23 / 42
26. Outline
1 Introduction
Basic Rundown
Technology History
2 What We Learned
Lessons Learned Over Time
Open Source Code Yielded
Why MongoDB?
3 Show Me The Code!
The Old: MySQL Snippets
4 Final Items
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 24 / 42
27. Why MongoDB?
Dynamic Querying
Flexibility (embedded documents, atomic updates, query
operators and server-side javascript.)
CouchDB’s approach appeared obtuse and rather unPythonic (not
an indictment of CouchDB, simply reflective of my knowledge &
opinion at the time)
Tools like MongoKit allowed for easy replacement of existing
MySQL ORM code with something almost identical
FAST
Great Support & Community Available.
Core developers/founders have serious experience in real
scalability, and are active in community.
MongoDB Mailing List
IRC: freenode.net #mongodb
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 25 / 42
28. Outline
1 Introduction
Basic Rundown
Technology History
2 What We Learned
Lessons Learned Over Time
Open Source Code Yielded
Why MongoDB?
3 Show Me The Code!
The Old: MySQL Snippets
4 Final Items
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 26 / 42
29. MySQL Schema Snippets I
The Admin Table...
CREATE TABLE ‘admin_users‘ (
‘id‘ int(11) unsigned NOT NULL auto_increment,
‘username‘ varchar(45) collate latin1_general_ci NOT NULL default ’’,
‘password‘ char(32) collate latin1_general_ci NOT NULL,
‘display_name‘ varchar(64) collate latin1_general_ci default NULL,
‘email‘ varchar(255) collate latin1_general_ci NOT NULL default ’’,
‘avatar‘ varchar(255) collate latin1_general_ci default NULL,
‘last_ip‘ int(10) unsigned default NULL,
‘last_login_date‘ timestamp NOT NULL default ’0000-00-00 00:00:00’,
‘disabled‘ tinyint(1) default ’0’,
PRIMARY KEY (‘id‘),
UNIQUE KEY ‘UNIQUE‘ (‘username‘),
UNIQUE KEY ‘admin_users_uniq‘ (‘email‘)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
PACK_KEYS=1;
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 27 / 42
30. MySQL Schema Snippets I
The News Table...
CREATE TABLE ‘news‘ (
‘id‘ int(11) unsigned NOT NULL auto_increment,
‘author_id‘ int(11) unsigned NOT NULL,
‘start_date‘ date NOT NULL,
‘end_date‘ date default NULL,
‘headline‘ varchar(255) NOT NULL,
‘story‘ text NOT NULL,
‘archive‘ tinyint(1) default ’0’,
PRIMARY KEY (‘id‘),
KEY ‘news_author‘ (‘author_id‘),
CONSTRAINT ‘news_author‘ FOREIGN KEY (‘author_id‘) REFERENCES ‘admin_users‘ (‘id‘)
) ENGINE=InnoDB AUTO_INCREMENT=201 DEFAULT CHARSET=latin1;
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 28 / 42
31. SQLAlchemy Model I
The Admin Object
class AdminUser(ORMObject):
pass
t_admin_users = Table(’admin_users’, meta.metadata,
Column(’id’, Integer, primary_key=True),
Column(’username’, Unicode(45), nullable=False, unique=True),
Column(’password’, Unicode(32), nullable=False),
Column(’display_name’, Unicode(64)), # Should be unique?
Column(’email’, Unicode(255), nullable=False, unique=True),
Column(’last_ip’, IPV4Address, nullable=True),
Column(’last_login_date’, MSTimeStamp, nullable=False),
Column(’avatar’, Unicode(255), nullable=True),
Column(’disabled’, Boolean, default=False),
mysql_engine=’InnoDB’
)
mapper(AdminUser, t_admin_users)
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 29 / 42
32. SQLAlchemy Model I
The News Object
class NewsStory(ORMObject):
pass
t_news = Table(’news’, meta.metadata,
Column(’id’, Integer, primary_key=True),
Column(’author_id’, Integer, ForeignKey(’admin_users.id’), nullable=False),
Column(’start_date’, Date, nullable=False),
Column(’end_date’, Date),
Column(’headline’, Unicode(255), nullable=False),
Column(’story’, Unicode, nullable=False),
Column(’archive’, Boolean, default=False),
mysql_engine=’InnoDB’
)
mapper(NewsStory, t_news, properties={
’author’: relation(AdminUser, backref=’news_stories’)
})
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 30 / 42
38. MongoKit Model II
News
]
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 36 / 42
39. MongoKit Model
PayPal: OH THE. . . that’s not so bad.
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 37 / 42
40. MongoKit Model I
PayPal: “Instead of using the ORM. . . ”
ipn_db = None
try:
dbh = MongoPylonsEnv.mongo_conn()
[MongoPylonsEnv.get_default_db()][’defenders_paypal_ipn’]
ipn_id = dbh.insert(dict(request.POST.items()), safe=True)
ipn_db = dbh.find_one({’_id’: ipn_id})
if not ipn_db:
raise Exception(’could not lookup, post-insert’)
except Exception, e:
log.exception("Paypal IPN Error: Unable to commit IPN data to "
" Database: %s " % repr(e))
raise DefendersIPNException(repr(e))
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 38 / 42
41. MySQL -> Mongo Migration I
Admin Migration
db = mongoModel.AdminUser._get_connection()
conn = db.connection()
conn.drop_database(’emergencyPants’)
admins = {}
for user in meta.Session.query(AdminUser).all():
_admin = mongoModel.AdminUser(doc={
’username’: user.username,
’password’: user.password,
’avatar’: user.avatar,
’disabled’: user.disabled,
’display_name’: user.display_name,
’email’: user.email,
’last_ip’: unicode(user.last_ip),
’last_login_date’: user.last_login_date,
}).save()
admins[user.id] = _admin
mongoModel.AdminUser.get_collection().ensure_index(’password’,
direction=ASCENDING, unique=True)
mongoModel.AdminUser.get_collection().ensure_index(
[(’username’, ASCENDING),
(’password’, ASCENDING)])
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 39 / 42
42. MySQL -> Mongo Migration I
News Migration
print "Importing News Stories."
for story in meta.Session.query(NewsStory).all():
_story = mongoModel.NewsStory(doc={
’author’: admins[story.author_id],
’headline’: story.headline,
’story’: story.story,
’start_date’: convert_date(story.start_date),
’end_date’: convert_date(story.end_date),
’archived’: story.archive
}).save()
print "Setting up news story indices."
mongoModel.NewsStory.get_collection().
ensure_index(’archived’, direction=ASCENDING)
mongoModel.NewsStory.get_collection().
ensure_index([(’start_date’, ASCENDING),
(’end_date’, ASCENDING)])
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 40 / 42
43. Looking for news I
In Mongo. . .
Before, with MySQL/SQLAlchemy:
def _get_news(date):
news = meta.Session.query(NewsStory).
filter(and_(NewsStory.archive==False, NewsStory.start_date<=date,
or_(NewsStory.end_date==None, NewsStory.end_date>date)))
.order_by(NewsStory.start_date.desc()).all()
return news
After, with MongoKit:
def _get_news(date):
news = NewsStory.all({
’archived’: False,
’start_date’: {’$lte’: c._today}
}).where(’this.end_date == null || this.end_date >= new Date()’).
sort(’start_date’, -1)
return news
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 41 / 42
44. Questions?
Contact Info
Contact Me
twitter: @rit
email: bwmcadams@gmail.com
bitbucket: http://hg.evilmonkeylabs.net (For MongoKit &
Beaker code)
github: http://github.com/bwmcadams (Where most of my
newer code lives)
Pressing Questions?
IRC - freenode.net #mongodb
MongoDB Users List -
http://groups.google.com/group/mongodb-user
Mongo Python Language Center -
http://api.mongodb.org/python/index.html (Tutorial, API Docs
and links to third party toolkits)
B.W. McAdams (Evil Monkey Labs) Sluggy.com: MySQL to MongoDB Mongo Boston - 9/20/10 42 / 42