Wordnik migrated their live application from MySQL to MongoDB to address scaling issues. They moved over 5 billion documents totaling over 1.2 TB of data with zero downtime. The migration involved setting up MongoDB infrastructure, designing the data model and software to match their existing object model, migrating the data, and optimizing performance of the new system. They achieved insert rates of over 100,000 documents per second during the migration process and saw read speeds increase to 250,000 documents per second after completing the move to MongoDB.
MongoDB 3.0, Wired Tiger, and the era of pluggable storage engines
With MongoDB 3.0 the Wired Tiger storage engine will be included. In addition, 3rd party pluggable storage engines are possible as well. Kenny will present some performance benchmarks, show typical configuration options, and help attendees make sense of these new changes and how the effect MongoDB workloads. He will detail the various components of the Wired Tiger engine and the impact they make on overall performance.
Kenny will share benchmarks, code and general tunables for the Wired Tiger engine and more.
Understanding and tuning WiredTiger, the new high performance database engine...Ontico
MongoDB 3.0 introduced the concept of different storage engine. The new engine known as WiredTiger introduces document level MVCC locking, compression and a choice between Btree or LSM indexes. In this talk you will learn about the storage engine architecture and specifically WiredTiger, and how to tune and monitor it for best performance.
MongoDB 3.0 представил новый концепт движков хранения. Новый движок известен как WiredTiger и предоставляет новый уровень документов MVCC фикс, компрессию и выбор между Btree или индексами LSM. В этом докладе вы поймете, как тюнить и мониторить архитектуры движка базы данных, а точнее WiredTiger для получения максимальной производительности.
A presentation on the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability are covered as well as how NoSQL technologies have changed the way Wordnik scales.
MongoDB 3.0, Wired Tiger, and the era of pluggable storage engines
With MongoDB 3.0 the Wired Tiger storage engine will be included. In addition, 3rd party pluggable storage engines are possible as well. Kenny will present some performance benchmarks, show typical configuration options, and help attendees make sense of these new changes and how the effect MongoDB workloads. He will detail the various components of the Wired Tiger engine and the impact they make on overall performance.
Kenny will share benchmarks, code and general tunables for the Wired Tiger engine and more.
Understanding and tuning WiredTiger, the new high performance database engine...Ontico
MongoDB 3.0 introduced the concept of different storage engine. The new engine known as WiredTiger introduces document level MVCC locking, compression and a choice between Btree or LSM indexes. In this talk you will learn about the storage engine architecture and specifically WiredTiger, and how to tune and monitor it for best performance.
MongoDB 3.0 представил новый концепт движков хранения. Новый движок известен как WiredTiger и предоставляет новый уровень документов MVCC фикс, компрессию и выбор между Btree или индексами LSM. В этом докладе вы поймете, как тюнить и мониторить архитектуры движка базы данных, а точнее WiredTiger для получения максимальной производительности.
A presentation on the selection criteria, testing + evaluation and successful, zero-downtime migration to MongoDB. Additionally details on Wordnik's speed and stability are covered as well as how NoSQL technologies have changed the way Wordnik scales.
We went over what Big Data is and it's value. This talk will cover the details of Elasticsearch, a Big Data solution. Elasticsearch is an NoSQL-backed search engine using a HDFS-based filesystem.
We'll cover:
• Elasticsearch basics
• Setting up a development environment
• Loading data
• Searching data using REST
• Searching data using NEST, the .NET interface
• Understanding Scores
Finally, I show a use-case for data mining using Elasticsearch.
You'll walk away from this armed with the knowledge to add Elasticsearch to your data analysis toolkit and your applications.
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachJeremy Zawodny
From the 2012 Percona Live MySQL Conference in Santa Clara, CA.
Craigslist uses a variety of data storage systems in its backend systems: in-memory, SQL, and NoSQL. This talk is an overview of how craigslist works with a focus on the data storage and management choices that were made in each of its major subsystems. These include MySQL, memcached, Redis, MongoDB, Sphinx, and the filesystem. Special attention will be paid to the benefits and tradeoffs associated with choosing from the various popular data storage systems, including long-term viability, support, and ease of integration.
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DBMicrosoft Tech Community
Bring your Mongo DB and Cassandra applications to Azure Cosmos DB and benefit turnkey global distribution, guaranteed low latency for cloud scale. Learn how easy it is to migrate your existing NoSQL applications to Azure Cosmos DB by using the MongoDB API and Cassandra API. In this sessions you will find out about the tools used in the process, see the code that leverages Azure Cosmos DB, and learn about techniques for working around the differences between the two products.
These are the slides from my talk at the 2012 Sphinx Search Day in Santa Clara, California. It provides a high-level picture of where Sphinx is used at craigslist, a bit of history, issues, and future work.
Dev Jumpstart: Build Your First App with MongoDBMongoDB
New to MongoDB? This talk will introduce the philosophy and features of MongoDB. We’ll discuss the benefits of the document-based data model that MongoDB offers by walking through how one can build a simple app. We’ll cover inserting, updating, and querying the database of books. This session will jumpstart your knowledge of MongoDB development, providing you with context for the rest of the day's content.
In this session Sam Weaver, Product Manager at MongoDB will introduce MongoDB Compass, a new tool developed by MongoDB that allows you to easily visualize your MongoDB schema and build queries graphically. The session will include a practical demonstration of Compass as well as a discussion of its features and applications.
We went over what Big Data is and it's value. This talk will cover the details of Elasticsearch, a Big Data solution. Elasticsearch is an NoSQL-backed search engine using a HDFS-based filesystem.
We'll cover:
• Elasticsearch basics
• Setting up a development environment
• Loading data
• Searching data using REST
• Searching data using NEST, the .NET interface
• Understanding Scores
Finally, I show a use-case for data mining using Elasticsearch.
You'll walk away from this armed with the knowledge to add Elasticsearch to your data analysis toolkit and your applications.
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachJeremy Zawodny
From the 2012 Percona Live MySQL Conference in Santa Clara, CA.
Craigslist uses a variety of data storage systems in its backend systems: in-memory, SQL, and NoSQL. This talk is an overview of how craigslist works with a focus on the data storage and management choices that were made in each of its major subsystems. These include MySQL, memcached, Redis, MongoDB, Sphinx, and the filesystem. Special attention will be paid to the benefits and tradeoffs associated with choosing from the various popular data storage systems, including long-term viability, support, and ease of integration.
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DBMicrosoft Tech Community
Bring your Mongo DB and Cassandra applications to Azure Cosmos DB and benefit turnkey global distribution, guaranteed low latency for cloud scale. Learn how easy it is to migrate your existing NoSQL applications to Azure Cosmos DB by using the MongoDB API and Cassandra API. In this sessions you will find out about the tools used in the process, see the code that leverages Azure Cosmos DB, and learn about techniques for working around the differences between the two products.
These are the slides from my talk at the 2012 Sphinx Search Day in Santa Clara, California. It provides a high-level picture of where Sphinx is used at craigslist, a bit of history, issues, and future work.
Dev Jumpstart: Build Your First App with MongoDBMongoDB
New to MongoDB? This talk will introduce the philosophy and features of MongoDB. We’ll discuss the benefits of the document-based data model that MongoDB offers by walking through how one can build a simple app. We’ll cover inserting, updating, and querying the database of books. This session will jumpstart your knowledge of MongoDB development, providing you with context for the rest of the day's content.
In this session Sam Weaver, Product Manager at MongoDB will introduce MongoDB Compass, a new tool developed by MongoDB that allows you to easily visualize your MongoDB schema and build queries graphically. The session will include a practical demonstration of Compass as well as a discussion of its features and applications.
This talk will cover lessons learned at Community Engine regarding MongoDB, including: why we moved away from an Hybrid solution using SQL and MongoDB; an outline of the technologies and what we learned using MongoDB on Amazon Web Services; the MongoDB C# driver; MongoDB with SOLR for Full Text Search; how we do migration, deployment and more.
A fotopedia presentation made at the MongoDay 2012 in Paris at Xebia Office.
Talk by Pierre Baillet and Mathieu Poumeyrol.
French Article about the presentation:
http://www.touilleur-express.fr/2012/02/06/mongodb-retour-sur-experience-chez-fotopedia/
Video to come.
Experiences using CouchDB inside Microsoft's Azure teamBrian Benz
Co-presented with Will Perry (@willpe). Real-world experiences using CouchDB inside Microsoft, and also how to get started with CouchDB on Microsoft Azure.
SQL vs NoSQL, an experiment with MongoDBMarco Segato
A simple experiment with MongoDB compared to Oracle classic RDBMS database: what are NoSQL databases, when to use them, why to choose MongoDB and how we can play with it.
Ever wonder how some applications are built? Ever wonder how to combine components of the Windows Azure platform? Stop wondering and learn how we’ve built MyGet.org, a multi-tenant software-as-a-service. In this session we’ll discuss architecture, commands, events, access control, multi tenancy and how to mix and match those things together. Learn about the growing pains and misconceptions we had on the Windows Azure platform. The result just may be a reliable, cost-effective solution that scales.
Topics covered :
- What is Meteor
- What is inside
- What is reactivity
- Reactivity in Meteor
- DDP
- Minimongo
- To use or Not to use
- File Structure
Similar to From MySQL to MongoDB at Wordnik (Tony Tam) (20)
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. What is Wordnik Project to track language like GPS for English Dictionary is a road block to the language Roughly 200 new words created daily Language is not static Capture information about all words Meaning is often undefined in traditional sense Machines can determine meaning through analysis Needs LOTS of data
3. Why should You care Every Developer can use a Robust Language API! Wordnik migrated to MongoDB > 5 Billion documents > 1.2 TB Zero application downtime Learn from our Experience
4. Wordnik Not just a website! But we have one Launched Wordnik entirely on MySQL Hit road bumps with insert speed ~4B rows on MyISAMtables Tables locked for 10’s of seconds during inserts But we need more data! Created elaborate update schemes to work around it Lost lots of sleep babysitting servers while researching LT solution
5. Wordnik + MongoDB What are our storage needs? Database vs. Application Logic No PK/FK constraints No Stored Procedures Consistency? Lots of R&D Tried most all noSQL solutions
6. Migrating Storage Engines Many parts to this effort Setup & Administration Software Design Optimization Many types of data at Wordnik Corpus Structured HierarchicalData User Data Migrated #1 & #2
7. Server Infrastructure Wordnik is Heavily Read-only Master / Slave deployment Looking at replica pairs MongoDB loves system resources Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out) Memory + Disk = Happy Mongo Many X the disk space of MySQL Easy pill to swallow until…
8. Server Infrastructure Physical Hardware 2 x 4 core CPU, 32gb RAM, FC SAN Had bad luck on VMs (you might not) Disk speed => performance
9. Software Design Two distinct use cases for MongoDB Identical structure, different storage engine Same underlying objects, same storage fidelity (largelykey/value) Hierarchical data structure Same underlying objects, document-oriented storage
10. Software Design Create BasicDBObjects from POJOs and used collection methods BasicDBObjectdbo = new BasicDBObject("sentence",s.getSentence()) .append("rating",s.getRating()).append(...); ID Generation to manage unique _ID values Analogous to MySQL AutoIncrement behavior Compatible with MySQL Ids (more later) dbo.append("_ID", getId()); collection.save(dbo); Implemented all CRUD methods in DAO Swappable between MongoDB and MySQL at runtime
11. Software Design Key-Value storage use case Easy as implementing new DAOs SentenceHandlerh = new MongoDBSentenceHandler(); Save methods construct BasicDBObject and call save() on collection Implement same interface Same methods against DAO between MySQL and MongoDB versions Data Abstraction 101
12. Software Design What about bulk inserts? FAF Queued approach Add objects to queue, return to caller Every X seconds, process queue All objects from same collection are appended to a single List<DBObject> Call collection.insert(…) before 2M characters Reduces network overhead Very fast inserts
13. Software Design Hierarchical Data done more elegantly Wordnik Dictionary Model Java POJOs already had JAXB annotations Part of public REST api Used Mysql 12+ tables 13 DAOs 2500 lines of code 50 requests/second uncached Memcache needed to maintain reasonable speed
15. Software Design MongoDB’s Document Storage let us… Turn the Objects into JSON via Jackson Mapper (fasterxml.com) Call save Support all fetch types, enhanced filters 1000 requests / second No explicit caching No less scary code
17. Migrating Data Migrating => existing data logic Use logic to select DAOs appropriately Read from old, write with new Great system test for MongoDB SentenceHandlermysqlSh = new MySQLSentenceHandler(); SentenceHandlermongoSh = new MongoDbSentenceHandler(); while(hasMoreData){ mongoSh.asyncWrite(mysqlSh.next()); ... }
18. Migrating Data Wordnik moved 5 billion rows from MySQL Sustained 100,000 inserts/second Migration tool was CPU bound ID generation logic, among other Wordnik reads MongoDB fast Read + create java objects @ 250k/second (!)
19. Going live to Production Choose your use case carefully if migrating incrementally Scary no matter what Test your perf monitoring system first! Use your DAOs from migration Turn on MongoDB on one server, monitor, tune (rollback, repeat) Full switch over when comfortable
20. Going live to Production Really? SentenceHandlerh = null; if(useMongoDb){ h = new MongoDbSentenceHandler(); } else{ h = new MySQLDbSentenceHandler(); } return h.find(...);
21. Optimizing Performance Home-grown connection pooling Master only ConnectionManager.getReadWriteConnection() Slave only ConnectionManager.getReadOnlyConnection() Round-robin all servers, bias on slaves ConnectionManager.getConnection()
22. Optimizing Performance Caching Had complex logic to handle cache invalidation Out-of-process caches are not free MongoDB loves your RAM Let it do your LRU cache (it will anyway) Hardware Do not skimp on your disk or RAM Indexes Schema-less design Even if no values in any document, needs to read document schema to check
23. Optimizing Performance Disk space Schemaless => schema per document (row) Choose your mappings wisely ({veryLongAttributeName:true}) => more disk space than ({vlan:true})
25. Other Tips Data Types Use caution when changing DBObjectobj = cur.next(); long id = (Long) obj.get(“IWasAnIntOnce”) Attribute names Don’t change w/o migrating existing data! WTFDMDG????
26. What’s next? GridFS Store audio files on disk Requires clustered file system for shared access Capped Collections (rolling out this week) UGC from MySQL => MongoDB Beg/Bribe 10gen for some Features