This document discusses MongoDB sharding as a case study for scaling MongoDB. It provides background on CIGNEX Datamatics and their big data analytics practice. It then describes a use case of 7 million users accessing digital assets across 8 devices each. It recommends MongoDB due to its flexibility and performance. The solution involves sharding across multiple MongoDB nodes to distribute the data and handle the high volume of concurrent requests. Benchmarking shows that sharding significantly improves performance of inserts and updates over non-sharded architecture. The key takeaway is that sharding is very effective but requires careful planning, benchmarking, and choice of shard key.
MMS - Monitoring, backup and management at a single clickMatias Cascallares
MongoDB Management Service (MMS) makes operations effortless, reducing complicated tasks in big deployments to a couple of clicks. You can monitor, backup and manage your replica sets and sharded clusters through the MMS interface. In this presentation we are going to explore how to setup, use and get the best of MMS.
MongoDB 2.6 is the biggest MongoDB release ever. In this presentation you are going to explore which features, improvements and capabilities were added to the latest version and how you can smoothly upgrade your deployments.
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...Prasoon Kumar
MongoDB is a leading nosql database. It is horizonatally scalable, document datastore. In this introduction given at Dr Dobbs Conference, Bangalore and Pune in April 2014, I show schema design with an example blog application and Python code snippets. I delivered the same in the maiden MongoDB Evening event at Delhi and Gurgaon in May 2014.
When constructing a data model for your MongoDB collection for CMS, there are various options you can choose from, each of which has its strengths and weaknesses. The three basic patterns are:
1.Store each comment in its own document.
2.Embed all comments in the “parent” document.
3.A hybrid design, stores comments separately from the “parent,” but aggregates comments into a small number of documents, where each contains many comments.
Code sample and wiki documentation is available on https://github.com/prasoonk/mycms_mongodb/wiki.
NoSQL datastores fall under the following categories: Key-value stores, document databases, column-family stores and graph databases. The traditional TPC-* tests are not sufficient for these heterogeneous database systems. MongoDB, CouchDB, Cassandra, HBase, Memcaches etc belong to one of 4 families and a common workload can be generated by ycsb to simulate your usecase and benchmark them.
This presentation contains a preview of MongoDB 3.2 upcoming release where we explore the new storage engines, aggregation framework enhancements and utility features like document validation and partial indexes.
MMS - Monitoring, backup and management at a single clickMatias Cascallares
MongoDB Management Service (MMS) makes operations effortless, reducing complicated tasks in big deployments to a couple of clicks. You can monitor, backup and manage your replica sets and sharded clusters through the MMS interface. In this presentation we are going to explore how to setup, use and get the best of MMS.
MongoDB 2.6 is the biggest MongoDB release ever. In this presentation you are going to explore which features, improvements and capabilities were added to the latest version and how you can smoothly upgrade your deployments.
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...Prasoon Kumar
MongoDB is a leading nosql database. It is horizonatally scalable, document datastore. In this introduction given at Dr Dobbs Conference, Bangalore and Pune in April 2014, I show schema design with an example blog application and Python code snippets. I delivered the same in the maiden MongoDB Evening event at Delhi and Gurgaon in May 2014.
When constructing a data model for your MongoDB collection for CMS, there are various options you can choose from, each of which has its strengths and weaknesses. The three basic patterns are:
1.Store each comment in its own document.
2.Embed all comments in the “parent” document.
3.A hybrid design, stores comments separately from the “parent,” but aggregates comments into a small number of documents, where each contains many comments.
Code sample and wiki documentation is available on https://github.com/prasoonk/mycms_mongodb/wiki.
NoSQL datastores fall under the following categories: Key-value stores, document databases, column-family stores and graph databases. The traditional TPC-* tests are not sufficient for these heterogeneous database systems. MongoDB, CouchDB, Cassandra, HBase, Memcaches etc belong to one of 4 families and a common workload can be generated by ycsb to simulate your usecase and benchmark them.
This presentation contains a preview of MongoDB 3.2 upcoming release where we explore the new storage engines, aggregation framework enhancements and utility features like document validation and partial indexes.
MongoDB Ops Manager is the easiest way to manage/monitor/operationalize your MongoDB footprint across your enterprise. Ops Manager automates key operations such as deployments, scaling, upgrades, and backups, all with the click of a button and integration with your favorite tools. It also provide the ability to monitor and alert on dozens of platform specific metrics. In this webinar, we'll cover the components of Ops Manager, as well as how it integrates and accelerates your use of MongoDB.
MongoDB has been conceived for the cloud age. Making sure that MongoDB is compatible and performant around cloud providers is mandatory to achieve complete integration with platforms and systems. Azure is one of biggest IaaS platforms available and very popular amongst developers that work on Microsoft Stack.
Hype, buzzword, threat; however you want to characterize it, the Internet of Things (IoT) is here.
IoT scenarios that were hypothetical only a few years ago are real today. Still thinking along the line of fleet management and temperature measurements? You’re out. Endless possibilities of IoT applications are surfacing every day, from the connected cow (huh?) to things that monitor and analyze your daily life (really?).
In this webinar, we will discuss architecture of IoT data management solutions and the challenges that arise. We will explore how MongoDB features provide solutions to those problems. Time permitting, we will demonstrate an IoT Cloud service built on top of MongoDB.
New generations of database technologies are allowing organizations to build applications never before possible, at a speed and scale that were previously unimaginable. MongoDB is the fastest growing database on the planet, and the new 3.2 release will bring the benefits of modern database architectures to an ever broader range of applications and users.
Shift: Real World Migration from MongoDB to CassandraDataStax
Presentation on SHIFT's migration from MongoDB to Cassandra. Topics will include reasons behind choosing to move to Cassandra, zero downtime migration strategy, data modeling patterns, and the benefits of using CQL3.
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
This session will be a case study of eBay’s experience running MongoDB for project Zoom, in which eBay stores all media metadata for the site. This includes references to pictures of every item for sale on eBay. This cluster is eBay's first MongoDB installation on the platform and is a mission critical application. Yuri Finkelstein, an Enterprise Architect on the team, will provide a technical overview of the project and its underlying architecture.
In this session Sam Weaver, Product Manager at MongoDB will introduce MongoDB Compass, a new tool developed by MongoDB that allows you to easily visualize your MongoDB schema and build queries graphically. The session will include a practical demonstration of Compass as well as a discussion of its features and applications.
De nouvelles générations de technologies de bases de données permettent aux organisations de créer des applications jusque-là inédites, à une vitesse et une échelle inimaginables auparavant. MongoDB est la base de données qui connaît la croissance la plus rapide au monde. La nouvelle version 3.2 offre les avantages des architectures de bases de données modernes à une gamme toujours plus large d'applications et d'utilisateurs.
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger MongoDB
Presented by Osmar Olivo, Product Manager, MongoDB
Experience level: Introductory
WiredTiger is MongoDB's first officially supported pluggable storage engine as well as the new default engine in 3.2. It exposes several new features and configuration options. This talk will highlight the major differences between the MMAPV1 and WiredTiger storage engines including currency, compression, and caching.
Webinar: Simplifying the Database Experience with MongoDB AtlasMongoDB
MongoDB Atlas is our database as a service for MongoDB. In this webinar you’ll learn how it provides all of the features of MongoDB, without all of the operational heavy lifting, and all through a pay-as-you-go model billed on an hourly basis.
Jane Uyvova
Senior Solutions Architect, MongoDB
March 21, 2017
MongoDB Evenings San Francisco
Learn how easy it is to set up, operate, and scale your MongoDB deployments in the cloud with MongoDB Atlas.
MongoDB is a leading database technology that combines the foundations of RDBMS with the innovations of NoSQL, allowing organizations to simultaneously boost productivity and lower TCO.
MongoDB Enterprise Advanced is a finely-tuned package of advanced software, enterprise-grade support, and other services designed to accelerate your success with MongoDB in every stage of your app lifecycle, from early development to the scale-out of mission-critical production environments.
With the release of 3.2, MongoDB Enterprise Advanced now includes:
MongoDB Ops Manager 2.0
MongoDB Compass, the MongoDB GUI
MongoDB Connector for Business Intelligence
Encrypted Storage Engine
In-Memory Storage Engine (beta)
Attend this webinar to learn how MongoDB Enterprise Advanced can help you get to market faster and de-risk your mission critical deployments.
Presented by Claudius Li, Solutions Architect at MongoDB, at MongoDB Evenings New England 2017.
MongoDB Atlas is the premier database as a service offering. Find out how MongoDB Atlas can help your team to deploy more easily, develop faster and easily manage deployment, maintenance, upgrades and expansions. We will also demonstrate some of the key features and tools that come with MongoDB Atlas.
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
We will start from understanding how Real-Time Analytics can be implemented on Enterprise Level Infrastructure and will go to details and discover how different cases of business intelligence be used in real-time on streaming data. We will cover different Stream Data Processing Architectures and discus their benefits and disadvantages. I'll show with live demos how to build Fast Data Platform in Azure Cloud using open source projects: Apache Kafka, Apache Cassandra, Mesos. Also I'll show examples and code from real projects.
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersScyllaDB
Disney+ Hotstar is the fastest growing branch of Disney+. Join Disney+ Hotstar Architect Vamsi Subhash and senior data engineer Balakrishnan Kaliyamoorthy to learn…
How Disney+ Hotstar architected their systems to handle massive data loads
Why they chose to replace both Redis and Elasticsearch
Their requirements for massively scalable data infrastructure and evolving data models
How they migrated their data to Scylla Cloud, ScyllaDB’s fully managed NoSQL database-as-a-service, without suffering downtime
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDBMongoDB
Watch this webinar to learn about our new Backend as a Service (BaaS) – MongoDB Stitch.
MongoDB Stitch lets developers focus on building applications rather than on managing data manipulation code, service integration, or backend infrastructure. Whether you’re just starting up and want a fully managed backend as a service, or you’re part of an enterprise and want to expose existing MongoDB data to new applications, Stitch lets you focus on building the app users want, not on writing boilerplate backend logic.
This webinar will cover the what, why, and how of MongoDB Stitch. We’ll cover everything from the features it provides to the architecture that makes it possible. By the end of the session, you should understand how Stitch can kickstart your new project or take your existing application to the next level.
Attendees will learn:
- The basics of MongoDB Stitch and how to use it for new projects or to expose existing data to new applications
- How to control what data and services individual users can access
- How to integrate your favorite services with your MongoDB application without writing extra code
Learn about the various approaches to sharding your data with MongoDB. This presentation will help you answer questions such as when to shard and how to choose a shard key.
MongoDB Ops Manager is the easiest way to manage/monitor/operationalize your MongoDB footprint across your enterprise. Ops Manager automates key operations such as deployments, scaling, upgrades, and backups, all with the click of a button and integration with your favorite tools. It also provide the ability to monitor and alert on dozens of platform specific metrics. In this webinar, we'll cover the components of Ops Manager, as well as how it integrates and accelerates your use of MongoDB.
MongoDB has been conceived for the cloud age. Making sure that MongoDB is compatible and performant around cloud providers is mandatory to achieve complete integration with platforms and systems. Azure is one of biggest IaaS platforms available and very popular amongst developers that work on Microsoft Stack.
Hype, buzzword, threat; however you want to characterize it, the Internet of Things (IoT) is here.
IoT scenarios that were hypothetical only a few years ago are real today. Still thinking along the line of fleet management and temperature measurements? You’re out. Endless possibilities of IoT applications are surfacing every day, from the connected cow (huh?) to things that monitor and analyze your daily life (really?).
In this webinar, we will discuss architecture of IoT data management solutions and the challenges that arise. We will explore how MongoDB features provide solutions to those problems. Time permitting, we will demonstrate an IoT Cloud service built on top of MongoDB.
New generations of database technologies are allowing organizations to build applications never before possible, at a speed and scale that were previously unimaginable. MongoDB is the fastest growing database on the planet, and the new 3.2 release will bring the benefits of modern database architectures to an ever broader range of applications and users.
Shift: Real World Migration from MongoDB to CassandraDataStax
Presentation on SHIFT's migration from MongoDB to Cassandra. Topics will include reasons behind choosing to move to Cassandra, zero downtime migration strategy, data modeling patterns, and the benefits of using CQL3.
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
This session will be a case study of eBay’s experience running MongoDB for project Zoom, in which eBay stores all media metadata for the site. This includes references to pictures of every item for sale on eBay. This cluster is eBay's first MongoDB installation on the platform and is a mission critical application. Yuri Finkelstein, an Enterprise Architect on the team, will provide a technical overview of the project and its underlying architecture.
In this session Sam Weaver, Product Manager at MongoDB will introduce MongoDB Compass, a new tool developed by MongoDB that allows you to easily visualize your MongoDB schema and build queries graphically. The session will include a practical demonstration of Compass as well as a discussion of its features and applications.
De nouvelles générations de technologies de bases de données permettent aux organisations de créer des applications jusque-là inédites, à une vitesse et une échelle inimaginables auparavant. MongoDB est la base de données qui connaît la croissance la plus rapide au monde. La nouvelle version 3.2 offre les avantages des architectures de bases de données modernes à une gamme toujours plus large d'applications et d'utilisateurs.
MongoDB .local Bengaluru 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB Days Silicon Valley: A Technical Introduction to WiredTiger MongoDB
Presented by Osmar Olivo, Product Manager, MongoDB
Experience level: Introductory
WiredTiger is MongoDB's first officially supported pluggable storage engine as well as the new default engine in 3.2. It exposes several new features and configuration options. This talk will highlight the major differences between the MMAPV1 and WiredTiger storage engines including currency, compression, and caching.
Webinar: Simplifying the Database Experience with MongoDB AtlasMongoDB
MongoDB Atlas is our database as a service for MongoDB. In this webinar you’ll learn how it provides all of the features of MongoDB, without all of the operational heavy lifting, and all through a pay-as-you-go model billed on an hourly basis.
Jane Uyvova
Senior Solutions Architect, MongoDB
March 21, 2017
MongoDB Evenings San Francisco
Learn how easy it is to set up, operate, and scale your MongoDB deployments in the cloud with MongoDB Atlas.
MongoDB is a leading database technology that combines the foundations of RDBMS with the innovations of NoSQL, allowing organizations to simultaneously boost productivity and lower TCO.
MongoDB Enterprise Advanced is a finely-tuned package of advanced software, enterprise-grade support, and other services designed to accelerate your success with MongoDB in every stage of your app lifecycle, from early development to the scale-out of mission-critical production environments.
With the release of 3.2, MongoDB Enterprise Advanced now includes:
MongoDB Ops Manager 2.0
MongoDB Compass, the MongoDB GUI
MongoDB Connector for Business Intelligence
Encrypted Storage Engine
In-Memory Storage Engine (beta)
Attend this webinar to learn how MongoDB Enterprise Advanced can help you get to market faster and de-risk your mission critical deployments.
Presented by Claudius Li, Solutions Architect at MongoDB, at MongoDB Evenings New England 2017.
MongoDB Atlas is the premier database as a service offering. Find out how MongoDB Atlas can help your team to deploy more easily, develop faster and easily manage deployment, maintenance, upgrades and expansions. We will also demonstrate some of the key features and tools that come with MongoDB Atlas.
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
We will start from understanding how Real-Time Analytics can be implemented on Enterprise Level Infrastructure and will go to details and discover how different cases of business intelligence be used in real-time on streaming data. We will cover different Stream Data Processing Architectures and discus their benefits and disadvantages. I'll show with live demos how to build Fast Data Platform in Azure Cloud using open source projects: Apache Kafka, Apache Cassandra, Mesos. Also I'll show examples and code from real projects.
Disney+ Hotstar: Scaling NoSQL for Millions of Video On-Demand UsersScyllaDB
Disney+ Hotstar is the fastest growing branch of Disney+. Join Disney+ Hotstar Architect Vamsi Subhash and senior data engineer Balakrishnan Kaliyamoorthy to learn…
How Disney+ Hotstar architected their systems to handle massive data loads
Why they chose to replace both Redis and Elasticsearch
Their requirements for massively scalable data infrastructure and evolving data models
How they migrated their data to Scylla Cloud, ScyllaDB’s fully managed NoSQL database-as-a-service, without suffering downtime
Introducing MongoDB Stitch, Backend-as-a-Service from MongoDBMongoDB
Watch this webinar to learn about our new Backend as a Service (BaaS) – MongoDB Stitch.
MongoDB Stitch lets developers focus on building applications rather than on managing data manipulation code, service integration, or backend infrastructure. Whether you’re just starting up and want a fully managed backend as a service, or you’re part of an enterprise and want to expose existing MongoDB data to new applications, Stitch lets you focus on building the app users want, not on writing boilerplate backend logic.
This webinar will cover the what, why, and how of MongoDB Stitch. We’ll cover everything from the features it provides to the architecture that makes it possible. By the end of the session, you should understand how Stitch can kickstart your new project or take your existing application to the next level.
Attendees will learn:
- The basics of MongoDB Stitch and how to use it for new projects or to expose existing data to new applications
- How to control what data and services individual users can access
- How to integrate your favorite services with your MongoDB application without writing extra code
Learn about the various approaches to sharding your data with MongoDB. This presentation will help you answer questions such as when to shard and how to choose a shard key.
Webinar: Faster Big Data Analytics with MongoDBMongoDB
Learn how to leverage MongoDB and Big Data technologies to derive rich business insight and build high performance business intelligence platforms. This presentation includes:
- Uncovering Opportunities with Big Data analytics
- Challenges of real-time data processing
- Best practices for performance optimization
- Real world case study
This presentation was given in partnership with CIGNEX Datamatics.
Best Practices & Lessons Learned from Deployment of PostgreSQLEDB
This talk will review best practices and lessons learned from working with large and mid-size companies on their deployment of PostgreSQL. We will explore the practices that helped industry leaders move through the stages of PostgreSQL adoption and get as much value out of their deployment as possible without incurring undue risk.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
Designing your SaaS Database for Scale with PostgresOzgun Erdogan
If you’re building a SaaS application, you probably already have the notion of tenancy built in your data model. Typically, most information relates to tenants / customers / accounts and your database tables capture this natural relation.
With smaller amounts of data, it’s easy to throw more hardware at the problem and scale up your database. As these tables grow however, you need to think about ways to scale your multi-tenant (B2B) database across dozens or hundreds of machines.
In this talk, we're first going to talk about motivations behind scaling your SaaS (multi-tenant) database and several heuristics we found helpful on deciding when to scale. We'll then describe three design patterns that are common in scaling SaaS databases: (1) Create one database per tenant, (2) Create one schema per tenant, and (3) Have all tenants share the same table(s). Next, we'll highlight the tradeoffs involved with each design pattern and focus on one pattern that scales to hundreds of thousands of tenants. We'll also share an example architecture from the industry that describes this pattern in more detail.
Last, we'll talk about key PostgreSQL properties, such as semi-structured data types, that make building multi-tenant applications easy. We'll also mention Citus as a method to scale out your multi-tenant database. We'll conclude by answering frequently asked questions on multi-tenant databases and Q&A.
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
Innovative companies are building Internet of Things, mobile, content management, single view, and big data apps on top of MongoDB. In this session, we'll explore how the IBM POWER8 platform brings new levels of performance and ease of configuration to these solutions which already benefit from easier and faster design and development using MongoDB.
Learn more about the tools, techniques and technologies for working productively with data at any scale. This presentation introduces the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.
Jon Einkauf, Senior Product Manager, Elastic MapReduce, AWS
Alan Priestley, Marketing Manager, Intel and Bob Harris, CTO, Channel 4
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
Learn more about the tools, techniques and technologies for working productively with data at any scale. This session will introduce the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Data Con LA
Spark is in-memory, Redis is in-memery. The Spark-Redis connector gives Spark access to Redis' data structures as RDDs. Redis, with its blazing fast performance and optimized in-memory data structures, reduces Spark processing time by up to 98%. In this talk, Dave will share the top use cases for Spark-Redis such as time-series, recommendations and real-time bid management.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Mind map of terminologies used in context of Generative AI
Cignex mongodb-sharding-mongodbdays
1. CIGNEX Datamatics Confidential www.cignex.com
Scaling MongoDB with
Sharding – A Case Study
Presented by: Nikhil Naib
Title: Lead Consultant – Big Data
For MongoDB and CIGNEX Datamatics Use Only
2. CIGNEX Datamatics Confidential www.cignex.com
Who We Are?
• Since 2000, delivering solutions
using Open Source technologies
to
– Address business goals
– Increase business velocity
– Lower the cost of doing business
– Gain competitive advantage
• Dramatically reduce Total Cost of
Ownership (TCO) & deployment
time of IT solutions
2
400+
Implementations
450+
Experts
200+
Integrations
13
Books
5000+
Community
Contributions
Offices : America | India | UK | Europe | Singapore | Australia
Portal
Solutions Content
Solutions
Big Data Analytics
Solutions
3. CIGNEX Datamatics Confidential www.cignex.com
Our Big Data Analytics Practice
3
Team Size: 110+ Projects: 10+
• 20+ Big Data, 100+ Analytics & DW/BI
• Partnership –MongoDB, Cloudera, IBM
• Technical expertise –MongoDB, Hadoop,
Neo4j, Solr, Pentaho, Talend, Cognos, Business
Objects, Tableau, Jasper Reports
• Research & Analytics division with data
scientists
• Connectors/Accelerators, Frameworks
• BIGArchive – Enterprise Scale Archival
• Liferay MongoDB Store
• Drupal MongoDB Connector
Big Data Partners
Business Intelligence Expertise
4. CIGNEX Datamatics Confidential www.cignex.com 4
• Use Case & Database Requirements
• Why MongoDB?
• Solution
• To Shard Or Not To Shard
• Scaling with Sharding
– Sharding Basics
– Architecture and Hardware Sizing
– Sharding – Choosing the RIGHT Shard Key
– Benchmarking with Results
• Key Takeaways
Agenda
5. CIGNEX Datamatics Confidential www.cignex.com 5
Use Case
Load Balancer DatabaseDevices
7 Million Users
Across Geography
Users
8 devices / user
Home/Office/Any
where
High volume of
concurrent CRUD
requests routes
to DB cluster
MongoDB Data
Storage cluster
enabled with
sharding, Auto
replication for
failover, Indexes
Ability to access the digital assets of the service provider across array of
devices registered by the user with the facility of resuming (session shifting).
6. CIGNEX Datamatics Confidential www.cignex.com
Database Requirements
6
Agility in
Development
& Deployment
High
Availability
Flexibility
in Schema
Enterprise
Level
Support
High
Performance
7. CIGNEX Datamatics Confidential www.cignex.com
• Global Coverage
• 24x7 Support
• Ease of
maintenance
Why MongoDB?
7
• Programming
Language drivers
• Shorter Dev cycle
• Faster deployment
• Automatic failover
• Redundancy
• ~100% uptime
Agility in
Development
& Deployment
• Easy integration
• Ease of schema
design
• Document oriented
storage
Loose Schema
Replication
Driver Support
Strong Community
• Concurrent CRUD
• Fast Updates
• Write distribution
with Sharding
Indexes & Sharding
Availability
Flexibility
in Schema
Enterprise
Level
Support
High
Performance
8. CIGNEX Datamatics Confidential www.cignex.com
Sharding – What is it?
8
• Distributes single logical database across multiple mongod
nodes
• Advantages:
– Raises limits of data size beyond a single node
– Increases Write capacity
– Ability to support larger working sets
– Read scaling (By the means of targeting specific shards through
routed requests and distributed data. It is possible to support good
amount of Scatter-gather requests if used judiciously. )
9. CIGNEX Datamatics Confidential www.cignex.com
Sharding – When to use?
9
Storage
Drive
Your data set approaches or exceeds the storage capacity
of a single node in your system
Working Set
RAM
The size of your system’s active working set will soon
exceed the capacity of the maximum amount of RAM
for your system
Storage
Drive
Your system has a large amount of write
activity, a single MongoDB instance cannot
write data fast enough to meet demand, and all
other approaches have not reduced contention
10. CIGNEX Datamatics Confidential www.cignex.com
Sharding - Features
10
• Range-based Data Partitioning
• Automatic Data volume distribution
• Transparent query routing
• Horizontal capacity
– Additional write capacity through distribution
– Right shard key allows expansion of working set
11. CIGNEX Datamatics Confidential www.cignex.com
Solution: Approach
1111
• Schema Design
• Collections and Field DefinitionsSchema
• Document Size
• Total expected data sizeDatabase Size
• Frequency of CRUD operations
• Read/Write ratioConcurrent Load
• Replication, Backup and Automatic Failover
• Right Replication Factor (RF)
• Read Scaling for the use cases with eventual consistency.
Availability
• Working Set
• Access PatternsIndexing
• Horizontal Scaling
• Read/Write ScalingSharding
• Cluster sizing
• RAM and Disk storageHardware Sizing
12. CIGNEX Datamatics Confidential www.cignex.com
To Shard Or Not To Shard ?
• Sharding is a very powerful technique provided by
MongoDB to scale, but it should be used only after due
diligence, else it proves to be an over kill.
• It brings substantial amount of overhead from
infrastructure and maintenance standpoint.
• It should be used only when you have done all the possible
optimizations for the single node and still the write
capacity of the single node proves to be a bottleneck.
• In production minimum 6 server instances are required to
have a sharded cluster with no failover capability.
• In production we can not afford to have no
redundancy/failover. Hence minimum RF of 2 is required
which also brings an arbiter node into picture.
12
15. CIGNEX Datamatics Confidential www.cignex.com
Shard Keys
• The ideal shard key :
– High cardinality which makes it
easy for MongoDB to split the
chunks.
– Higher “randomness”
– Targeted queries
– May need to be computed
15
Shard Keys:
Exist in every document in a
collection. MongoDB uses shard
key to distribute documents
among the shards. Just like
indexes, they can be either a
single field, or a compound key.
16. CIGNEX Datamatics Confidential www.cignex.com
Choosing Right Shard Key
16
Different approach for Shard Keys
• Approach 1: Random Key – UserId + AssetId
• Approach 2: Coarsely ascending key + Random Key –
YearMonth + UserId + AssetId
• Hashed Shard Keys (Not Tested/Applicable here.)
– New in version 2.4.
– Hashed shard keys use a hashed index of a single field as the shard
key to partition data across your sharded cluster.
– Field should good cardinality.
– Hashed keys work well with fields that increase monotonically.
18. CIGNEX Datamatics Confidential www.cignex.com
Results - INSERTS
18
Over 80 million documents
inserted with a decreasing
threshold over 10 million
Over 225 million documents
inserted at a stable rate of 6000
documents/sec
Approach 1
Approach 2
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
19. CIGNEX Datamatics Confidential www.cignex.com
Results - UPDATES
19
Over 50 million documents updated
at avg. 400 documents/sec
Over 100 million documents
updated at as high as. 4000
documents/sec
Approach 1
Approach 2
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
20. CIGNEX Datamatics Confidential www.cignex.com
Results – INSERT, UPDATE
20
>6000 documents/ second
>70 million records
>6000 documents/ second
>50 million records
Simultaneous INSERT
Simultaneous UPDATE
Approach 2
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
21. CIGNEX Datamatics Confidential www.cignex.com
Benchmarking – Sharding Vs Non Sharding
21
Operation Sharding (YearMonth +
UserId)
Non-Sharding
INSERTS ~6000 docs/sec ~2900 docs/sec
UPDATES ~4000 docs/sec ~620 updates/sec
INSERT &
UPDATES
~6000 docs/sec &
~6100 docs/sec
~2000 docs/sec &
~600 docs/sec
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
22. CIGNEX Datamatics Confidential www.cignex.com
Key Takeaways
• MongoDB scales & shines.
– Expected - 690 Million CRUD operations per day.
– Achieved - 840 Million CRUD operations per day.
• Plan early for sharding.
• Sharding scales INSERTS/UPDATES Vs Non sharding.
• There is no magic recipe for finding an ideal shard key.
• DO NOT go to production without benchmarking the shard key. Shard key cannot be
changed for the given configuration.
• Use MMS. It’s a great tool to assess the health of the cluster and identify the bottlenecks
well in advance.
• Sharding with Approach 2(Coarsely ascending Key + Random Key) provides sustained
results & better utilization of the RAM (better index locality).
22
Disclaimer: Suitable shard key depends on your data, so while this Shard Key approach delivers good results for this
use case, it is not a generic approach.
23. CIGNEX Datamatics Confidential www.cignex.com
Key Takeaways
23
• Routed Requests are always faster than scatter/gather requests.
• Identify the consistency requirements for the read queries. In
case of eventual consistency using read preference secondary-
preferred can help you to squeeze more performance.
• Different set of server/s for NON-Sharded collections.
• Indexes to be defined carefully. More number of Indexes
substantially bring down the write throughput.
• Sharded collections should have minimal number of indexes.
Disclaimer: Suitable shard key depends on your data, so while this Shard Key approach delivers good results for this
use case, it is not a generic approach.
24. CIGNEX Datamatics Confidential www.cignex.com
Our Success Stories : At a Glance
24
1
2
3
4
5
6
Big Data Analytics for Telecom
Optimum network bandwidth management & policy
configuration for telecom companies
Social Media Research Platform
for Legal Firms
Leverage social media & unstructured data analytics for collecting
supporting evidences for trials
US based Advanced GPS
Solutions Provider
Real time analysis of data accumulated from 200,000 GPS based
devices
Global Provider of Risk
Management Solutions
Collection and analysis of data from external and internal
applications delivered to a dashboard
US based Networking
Equipment Leader
Cluster configuration of high volume video uploads including 30
million inserts/hour
European Chemical Giant
Patent search – 10x increased in performance and 20x reduction
in TCO
7
US based Social Security
e-Benefits System
Managing billion object repository with enterprise search and
retrieval
25. CIGNEX Datamatics Confidential www.cignex.com
For queries reach out to us at info@cignex.com
Thank You. Any Questions ?
Making Open Source Work