In this session, you will learn how to translate one-to-one, one-to-many and many-to-many relationships, and learn how MongoDB's JSON structures, atomic updates and rich indexes can influence your design. We will also explore implications of storage engines, indexing and query patterns, available tools and related new features in MongoDB 3.2.
As your data grows, the need to establish proper indexes becomes critical to performance. MongoDB supports a wide range of indexing options to enable fast querying of your data, but what are the right strategies for your application?
In this talk we’ll cover how indexing works, the various indexing options, and use cases where each can be useful. We'll dive into common pitfalls using real-world examples to ensure that you're ready for scale.
Indexing in MongoDB works similarly to indexing in relational databases. An index is a data structure that can make certain queries more efficient by maintaining a sorted order of documents. Indexes are created using the ensureIndex() method and take up additional space and slow down writes. The explain() method is used to determine whether a query is using an index.
Inside MongoDB: the Internals of an Open-Source DatabaseMike Dirolf
The document discusses MongoDB, including how it stores and indexes data, handles queries and replication, and supports sharding and geospatial indexing. Key points covered include how MongoDB stores data in BSON format across data files that grow in size, uses memory-mapped files for data access, supports indexing with B-trees, and replicates operations through an oplog.
NoSQL databases only unfold their entire strength when also embracing the their concepts regarding usage and schema design. These slides give some overview of features and concepts of MongoDB.
Indexes are references to documents that are efficiently ordered by key and maintained in a tree structure for fast lookup. They improve the speed of document retrieval, range scanning, ordering, and other operations by enabling the use of the index instead of a collection scan. While indexes improve query performance, they can slow down document inserts and updates since the indexes also need to be maintained. The query optimizer aims to select the best index for each query but can sometimes be overridden.
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. I will share more common mistakes observed and some tips and tricks to avoiding them.
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)MongoDB
The document discusses different data modeling approaches for structuring data in MongoDB, including embedding data versus referencing data in collections. It provides examples of modeling one-to-one, one-to-many, and many-to-many relationships between entities using embedding and referencing. The document recommends different approaches depending on the use case and prioritizes flexibility, performance, and optimal data representation.
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
As your data grows, the need to establish proper indexes becomes critical to performance. MongoDB supports a wide range of indexing options to enable fast querying of your data, but what are the right strategies for your application?
In this talk we’ll cover how indexing works, the various indexing options, and use cases where each can be useful. We'll dive into common pitfalls using real-world examples to ensure that you're ready for scale.
Indexing in MongoDB works similarly to indexing in relational databases. An index is a data structure that can make certain queries more efficient by maintaining a sorted order of documents. Indexes are created using the ensureIndex() method and take up additional space and slow down writes. The explain() method is used to determine whether a query is using an index.
Inside MongoDB: the Internals of an Open-Source DatabaseMike Dirolf
The document discusses MongoDB, including how it stores and indexes data, handles queries and replication, and supports sharding and geospatial indexing. Key points covered include how MongoDB stores data in BSON format across data files that grow in size, uses memory-mapped files for data access, supports indexing with B-trees, and replicates operations through an oplog.
NoSQL databases only unfold their entire strength when also embracing the their concepts regarding usage and schema design. These slides give some overview of features and concepts of MongoDB.
Indexes are references to documents that are efficiently ordered by key and maintained in a tree structure for fast lookup. They improve the speed of document retrieval, range scanning, ordering, and other operations by enabling the use of the index instead of a collection scan. While indexes improve query performance, they can slow down document inserts and updates since the indexes also need to be maintained. The query optimizer aims to select the best index for each query but can sometimes be overridden.
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. I will share more common mistakes observed and some tips and tricks to avoiding them.
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)MongoDB
The document discusses different data modeling approaches for structuring data in MongoDB, including embedding data versus referencing data in collections. It provides examples of modeling one-to-one, one-to-many, and many-to-many relationships between entities using embedding and referencing. The document recommends different approaches depending on the use case and prioritizes flexibility, performance, and optimal data representation.
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
This document discusses MongoDB performance tuning. It emphasizes that performance tuning is an obsession that requires planning schema design, statement tuning, and instance tuning in that order. It provides examples of using the MongoDB profiler and explain functions to analyze statements and identify tuning opportunities like non-covered indexes, unnecessary document scans, and low data locality. Instance tuning focuses on optimizing writes through fast update operations and secondary index usage, and optimizing reads by ensuring statements are tuned and data is sharded appropriately. Overall performance depends on properly tuning both reads and writes.
MongoDB is a document-oriented NoSQL database written in C++. It uses a document data model and stores data in BSON format, which is a binary form of JSON that is lightweight, traversable, and efficient. MongoDB is schema-less, supports replication and high availability, auto-sharding for scaling, and rich queries. It is suitable for big data, content management, mobile and social applications, and user data management.
MongoDB is a non-relational database that stores data in JSON-like documents with dynamic schemas. It features flexibility with JSON documents that map to programming languages, power through indexing and queries, and horizontal scaling. The document explains that MongoDB uses JSON and BSON formats to store data, has no fixed schema so fields can evolve freely, and demonstrates working with the mongo shell and RoboMongo GUI.
Media owners are turning to MongoDB to drive social interaction with their published content. The way customers consume information has changed and passive communication is no longer enough. They want to comment, share and engage with publishers and their community through a range of media types and via multiple channels whenever and wherever they are. There are serious challenges with taking this semi-structured and unstructured data and making it work in a traditional relational database. This webinar looks at how MongoDB’s schemaless design and document orientation gives organisation’s like the Guardian the flexibility to aggregate social content and scale out.
The document discusses PostgreSQL's roadmap for supporting JSON data. It describes how PostgreSQL introduced JSONB in 2014 to allow binary storage and indexing of JSON data, providing better performance than the text-based JSON type. The document outlines how PostgreSQL has implemented features from the SQL/JSON standard over time, including JSON path support. It proposes a new Generic JSON API (GSON) that would provide a unified way to work with JSON and JSONB data types, removing duplicated code and simplifying the addition of new features like partial decompression or different storage formats like BSON. GSON would help PostgreSQL work towards a single unified JSON data type as specified in SQL standards.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
This document provides an overview of indexing in MongoDB. It discusses what indexes are, why they are needed to optimize queries, and how to work with indexes in MongoDB. Some key points covered include how to create, manage, and optimize indexes. Common indexing mistakes are also discussed, such as trying to use multiple indexes per query, having indexes with low selectivity, and queries that cannot use indexes like regular expressions and negation queries.
This document discusses how MongoDB can help enterprises meet modern data and application requirements. It outlines the many new technologies and demands placing pressure on enterprises, including big data, mobile, cloud computing, and more. Traditional databases struggle to meet these new demands due to limitations like rigid schemas and difficulty scaling. MongoDB provides capabilities like dynamic schemas, high performance at scale through horizontal scaling, and low total cost of ownership. The document examines how MongoDB has been successfully used by enterprises for use cases like operational data stores and as an enterprise data service to break down silos.
The document provides an introduction and overview of MongoDB, including what NoSQL is, the different types of NoSQL databases, when to use MongoDB, its key features like scalability and flexibility, how to install and use basic commands like creating databases and collections, and references for further learning.
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
“Why is MongoDB so slow?” you may ask yourself on occasion. You’ve created indexes, you’ve learned how to use the aggregation pipeline. What the heck? Could it be your queries? This talk will outline what tools are at your disposal (both in MongoDB Atlas and in MongoDB server) to identify inefficient queries.
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a senior member of the support team I will share more common mistakes observed and some tips and tricks to avoiding them.
MongoDB is the most famous and loved NoSQL database. It has many features that are easy to handle when compared to conventional RDBMS. These slides contain the basics of MongoDB.
This document discusses schema design patterns for MongoDB. It begins by comparing terminology between relational databases and MongoDB. Common patterns for modeling one-to-one, one-to-many, and many-to-many relationships are presented using examples of patrons, books, authors, and publishers. Embedded documents are recommended when related data always appears together, while references are used when more flexibility is needed. The document emphasizes focusing on how the application accesses and manipulates data when deciding between embedded documents and references. It also stresses evolving schemas to meet changing requirements and application logic.
This presentation will demonstrate how you can use the aggregation pipeline with MongoDB similar to how you would use GROUP BY in SQL and the new stage operators coming 3.4. MongoDB’s Aggregation Framework has many operators that give you the ability to get more value out of your data, discover usage patterns within your data, or use the Aggregation Framework to power your application. Considerations regarding version, indexing, operators, and saving the output will be reviewed.
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
Presented by Austin Zellner, Solutions Architect, MongoDB
Schema design is as much art as it is science, but it is central to understanding how to get the most out of MongoDB. Attendees will walk away with an understanding of how to approach schema design, what influences it, and the science behind the art. After this session, attendees will be ready to design new schemas, as well as re-evaluate existing schemas with a new mental model.
During this session we will cover the best practices for implementing a product catalog with MongoDB. We will cover how to model an item properly when it can have thousands of variations and thousands of properties of interest. You'll learn how to index properly and allow for faceted search with milliseconds response latency and how to implement per-store, per-sku pricing while still keeping a sane number of documents. We will also cover operational considerations, like how to bring the data closer to users to cut down the network latency.
SQL vs NoSQL, an experiment with MongoDBMarco Segato
A simple experiment with MongoDB compared to Oracle classic RDBMS database: what are NoSQL databases, when to use them, why to choose MongoDB and how we can play with it.
This document provides an agenda and background information for a presentation on PostgreSQL. The agenda includes topics such as practical use of PostgreSQL, features, replication, and how to get started. The background section discusses the history and development of PostgreSQL, including its origins from INGRES and POSTGRES projects. It also introduces the PostgreSQL Global Development Team.
Webinar: Schema Design and Performance ImplicationsMongoDB
This document discusses schema design in MongoDB and how it differs from relational databases. It provides examples of modeling one-to-one, one-to-many, and many-to-many relationships using embedding and referencing. The document also discusses two examples - a medical records system and time series device data - and how different schema designs can significantly impact performance and hardware requirements. Overall, the key recommendation is to tailor the schema to the specific queries and workload of the application.
MongoDB Schema Design and its Performance ImplicationsLewis Lin 🦊
This document discusses schema design in MongoDB and how it differs from relational databases. It provides examples of modeling one-to-one, one-to-many, and many-to-many relationships using embedding and referencing. The document also discusses two examples - a medical records system and time series device data - and how different schema designs can significantly impact performance and hardware requirements. Relationships can be modeled flexibly in MongoDB and the best approach depends on the application's data and queries.
This document discusses MongoDB performance tuning. It emphasizes that performance tuning is an obsession that requires planning schema design, statement tuning, and instance tuning in that order. It provides examples of using the MongoDB profiler and explain functions to analyze statements and identify tuning opportunities like non-covered indexes, unnecessary document scans, and low data locality. Instance tuning focuses on optimizing writes through fast update operations and secondary index usage, and optimizing reads by ensuring statements are tuned and data is sharded appropriately. Overall performance depends on properly tuning both reads and writes.
MongoDB is a document-oriented NoSQL database written in C++. It uses a document data model and stores data in BSON format, which is a binary form of JSON that is lightweight, traversable, and efficient. MongoDB is schema-less, supports replication and high availability, auto-sharding for scaling, and rich queries. It is suitable for big data, content management, mobile and social applications, and user data management.
MongoDB is a non-relational database that stores data in JSON-like documents with dynamic schemas. It features flexibility with JSON documents that map to programming languages, power through indexing and queries, and horizontal scaling. The document explains that MongoDB uses JSON and BSON formats to store data, has no fixed schema so fields can evolve freely, and demonstrates working with the mongo shell and RoboMongo GUI.
Media owners are turning to MongoDB to drive social interaction with their published content. The way customers consume information has changed and passive communication is no longer enough. They want to comment, share and engage with publishers and their community through a range of media types and via multiple channels whenever and wherever they are. There are serious challenges with taking this semi-structured and unstructured data and making it work in a traditional relational database. This webinar looks at how MongoDB’s schemaless design and document orientation gives organisation’s like the Guardian the flexibility to aggregate social content and scale out.
The document discusses PostgreSQL's roadmap for supporting JSON data. It describes how PostgreSQL introduced JSONB in 2014 to allow binary storage and indexing of JSON data, providing better performance than the text-based JSON type. The document outlines how PostgreSQL has implemented features from the SQL/JSON standard over time, including JSON path support. It proposes a new Generic JSON API (GSON) that would provide a unified way to work with JSON and JSONB data types, removing duplicated code and simplifying the addition of new features like partial decompression or different storage formats like BSON. GSON would help PostgreSQL work towards a single unified JSON data type as specified in SQL standards.
Slidedeck presented at http://devternity.com/ around MongoDB internals. We review the usage patterns of MongoDB, the different storage engines and persistency models as well has the definition of documents and general data structures.
This document provides an overview of indexing in MongoDB. It discusses what indexes are, why they are needed to optimize queries, and how to work with indexes in MongoDB. Some key points covered include how to create, manage, and optimize indexes. Common indexing mistakes are also discussed, such as trying to use multiple indexes per query, having indexes with low selectivity, and queries that cannot use indexes like regular expressions and negation queries.
This document discusses how MongoDB can help enterprises meet modern data and application requirements. It outlines the many new technologies and demands placing pressure on enterprises, including big data, mobile, cloud computing, and more. Traditional databases struggle to meet these new demands due to limitations like rigid schemas and difficulty scaling. MongoDB provides capabilities like dynamic schemas, high performance at scale through horizontal scaling, and low total cost of ownership. The document examines how MongoDB has been successfully used by enterprises for use cases like operational data stores and as an enterprise data service to break down silos.
The document provides an introduction and overview of MongoDB, including what NoSQL is, the different types of NoSQL databases, when to use MongoDB, its key features like scalability and flexibility, how to install and use basic commands like creating databases and collections, and references for further learning.
MongoDB World 2019: The Sights (and Smells) of a Bad QueryMongoDB
“Why is MongoDB so slow?” you may ask yourself on occasion. You’ve created indexes, you’ve learned how to use the aggregation pipeline. What the heck? Could it be your queries? This talk will outline what tools are at your disposal (both in MongoDB Atlas and in MongoDB server) to identify inefficient queries.
MongoDB World 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB
Query performance can either be a constant headache or the unsung hero of an application. MongoDB provides extremely powerful querying capabilities when used properly. As a senior member of the support team I will share more common mistakes observed and some tips and tricks to avoiding them.
MongoDB is the most famous and loved NoSQL database. It has many features that are easy to handle when compared to conventional RDBMS. These slides contain the basics of MongoDB.
This document discusses schema design patterns for MongoDB. It begins by comparing terminology between relational databases and MongoDB. Common patterns for modeling one-to-one, one-to-many, and many-to-many relationships are presented using examples of patrons, books, authors, and publishers. Embedded documents are recommended when related data always appears together, while references are used when more flexibility is needed. The document emphasizes focusing on how the application accesses and manipulates data when deciding between embedded documents and references. It also stresses evolving schemas to meet changing requirements and application logic.
This presentation will demonstrate how you can use the aggregation pipeline with MongoDB similar to how you would use GROUP BY in SQL and the new stage operators coming 3.4. MongoDB’s Aggregation Framework has many operators that give you the ability to get more value out of your data, discover usage patterns within your data, or use the Aggregation Framework to power your application. Considerations regarding version, indexing, operators, and saving the output will be reviewed.
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
Presented by Austin Zellner, Solutions Architect, MongoDB
Schema design is as much art as it is science, but it is central to understanding how to get the most out of MongoDB. Attendees will walk away with an understanding of how to approach schema design, what influences it, and the science behind the art. After this session, attendees will be ready to design new schemas, as well as re-evaluate existing schemas with a new mental model.
During this session we will cover the best practices for implementing a product catalog with MongoDB. We will cover how to model an item properly when it can have thousands of variations and thousands of properties of interest. You'll learn how to index properly and allow for faceted search with milliseconds response latency and how to implement per-store, per-sku pricing while still keeping a sane number of documents. We will also cover operational considerations, like how to bring the data closer to users to cut down the network latency.
SQL vs NoSQL, an experiment with MongoDBMarco Segato
A simple experiment with MongoDB compared to Oracle classic RDBMS database: what are NoSQL databases, when to use them, why to choose MongoDB and how we can play with it.
This document provides an agenda and background information for a presentation on PostgreSQL. The agenda includes topics such as practical use of PostgreSQL, features, replication, and how to get started. The background section discusses the history and development of PostgreSQL, including its origins from INGRES and POSTGRES projects. It also introduces the PostgreSQL Global Development Team.
Webinar: Schema Design and Performance ImplicationsMongoDB
This document discusses schema design in MongoDB and how it differs from relational databases. It provides examples of modeling one-to-one, one-to-many, and many-to-many relationships using embedding and referencing. The document also discusses two examples - a medical records system and time series device data - and how different schema designs can significantly impact performance and hardware requirements. Overall, the key recommendation is to tailor the schema to the specific queries and workload of the application.
MongoDB Schema Design and its Performance ImplicationsLewis Lin 🦊
This document discusses schema design in MongoDB and how it differs from relational databases. It provides examples of modeling one-to-one, one-to-many, and many-to-many relationships using embedding and referencing. The document also discusses two examples - a medical records system and time series device data - and how different schema designs can significantly impact performance and hardware requirements. Relationships can be modeled flexibly in MongoDB and the best approach depends on the application's data and queries.
Presented by Andrew Erlichson, Vice President, Engineering, Developer Experience, MongoDB
Audience level: Beginner
MongoDB’s basic unit of storage is a document. Documents can represent rich, schema-free data structures, meaning that we have several viable alternatives to the normalized, relational model. In this talk, we’ll discuss the tradeoff of various data modeling strategies in MongoDB. You will learn:
- How to work with documents
- How to evolve your schema
- Common schema design patterns
Health Sciences Research Informatics, Powered by GlobusGlobus
This document summarizes an effort to create an end-to-end infrastructure for cancer researchers to request, explore, and receive de-identified cancer registry data. It involves building a research portal using Globus technologies to allow researchers to search a federated index of de-identified registry data across multiple institutions while maintaining local data control. The goal is to create a scalable network of interoperable cancer registries through federation to support collaborative multi-institutional research queries.
Accelerate Pharmaceutical R&D with Big Data and MongoDBMongoDB
This document provides a summary of a presentation on using MongoDB and big data technologies to accelerate pharmaceutical research and development at AstraZeneca. The presentation discusses:
- AstraZeneca's focus on using next generation sequencing and big data to predict drug effectiveness and find associations between gene sequences and drug responses.
- Pilot projects using MongoDB to store and query unstructured genomic and clinical trial data at scale in a flexible document format.
- How these pilots helped prove the value of NoSQL databases for enabling faster exploration and analysis of large, complex datasets by researchers.
- Future visions for using experimental management systems and big data analytics to integrate multiple data types and power predictive analytics across AstraZeneca's drug development pipelines
How MongoDB is Transforming Healthcare TechnologyMongoDB
Healthcare providers continue to feel increased margin pressure, due to both macro-economic factors as well as significant regulatory change. In response to these pressures, leading healthcare organizations are leveraging new technologies to increase quality of care while simultaneously reducing costs.
In this session, we'll cover:
- How MongoDB has enabled successful real world projects with EHR / EMR in the healthcare industry
- How MongoDB allows providers to create a single view in order to collect patient information from multiple systems
- The challenges with healthcare data collection and how MongoDB handles various data types, HIPAA/PII and hybrid deployments
Data Management 2: Conquering Data ProliferationMongoDB
Today's customers demand applications which integrate intelligently with data from mobile, social media and cloud sources. A system of engagement meets these expectations by applying data and analytics drawn from an array of master systems. The enormous scale and performance required overwhelm relational approaches, but we can use MongoDB to meet the challenge. We'll learn to capture and transmit data changes among disparate systems, expose batch data as interactive operational queries and build systems with strong division of concerns, agility and flexibility.
Accelerate pharmaceutical r&d with mongo dbMongoDB
This document provides a summary of a presentation on using MongoDB and big data technologies to accelerate pharmaceutical research and development at AstraZeneca. The presentation discusses:
- AstraZeneca's focus on using next generation sequencing and big data to predict drug effectiveness and identify new drug targets
- Pilot projects using MongoDB to store and query unstructured genomic data at scale, which proved the technology's ability to enable researchers more quickly
- A vision for an experiment management system to integrate various data sources and processing pipelines using big data technologies
Painting the Future of Big Data with Apache Spark and MongoDBMongoDB
MongoDB is the fastest growing non-relational database, while Apache Spark is the fastest growing data processing engine, and the most active big data project in the history of Apache. Databricks, founded by the creators of Spark, will present how they see Spark evolving to address new use cases, and how to combine the power of MongoDB with Spark.
Next generation electronic medical records and search a test implementation i...lucenerevolution
Presented by David Piraino, Chief Imaging Information Officer, Imaging Institute Cleveland Clinic, Cleveland Clinic
& Daniel Palmer, Chief Imaging Information Officer, Imaging Institute Cleveland Clinic, Cleveland Clinic
Most patient specifc medical information is document oriented with varying amounts of associated meta-data. Most of pateint medical information is textual and semi-structured. Electronic Medical Record Systems (EMR) are not optimized to present the textual information to users in the most understandable ways. Present EMRs show information to the user in a reverse time oriented patient specific manner only. This talk discribes the construction and use of Solr search technologies to provide relevant historical information at the point of care while intepreting radiology images.
Radiology reports over a 4 year period were extracted from our Radiology Information System (RIS) and passed through a text processing engine to extract the results, impression, exam description, location, history, and date. Fifteen cases reported during clinical practice were used as test cases to determine if ""similar"" historical cases were found . The results were evaluated by the number of searches that returned any result in less than 3 seconds and the number of cases that illustrated the questioned diagnosis in the top 10 results returned as determined by a bone and joint radiologist. Also methods to better optimize the search results were reviewed.
An average of 7.8 out of the 10 highest rated reports showed a similar case highly related to the present case. The best search showed 10 out of 10 cases that were good examples and the lowest match search showed 2 out of 10 cases that were good examples.The talk will highlight this specific use case and the issues and advances of using Solr search technology in medicine with focus on point of care applications.
Webinar: Best Practices for Getting Started with MongoDBMongoDB
MongoDB adoption continues to grow at a record pace due to the significant enhancements in developer productivity and scalability that the database provides. Occasionally, however, organizations new to the technology make mistakes that limit their ability to leverage the significant advantages MongoDB provides. This webinar will discuss some of the common mistakes made by users when they first start working with MongoDB, how to identify when you've made those mistakes, and how to resolve them.
This document provides MongoDB best practices for schema design and data modeling. It discusses when to embed documents versus reference them, and how that impacts performance. It also uses an example of medical device data to illustrate how aggregating data at regular intervals (e.g. hourly instead of per minute) can significantly reduce storage requirements and improve query performance. Proper schema design is important for determining the required hardware resources.
Clinical Data Models - The Hyve - Bio IT World April 2019Kees van Bochove
Population genetics and genomics is an emerging topic for the application of machine learning methods in healthcare and biomedical sciences. Currently, several large genomics initiatives, such as Genomics England, UK Biobank, the All of Us Project, and Europe's 1 Million Genomes Initiative are all in the process of making both clinical and genomics data available from large numbers of patients to benefit biomedical research. However, a key challenge in these initiatives is the standardization of the clinical and outcomes data in such a way that machine learning methods can be effectively trained to discover useful medical and scientific insights. In this talk, we will look at what data is available at scale, and review some of examples of the application of common data and evidence models such as OMOP, FHIR, GA4GH etc. in order to achieve this, based on projects which The Hyve has executed with some of these initiatives to harmonize their clinical, genomics, imaging and wearables data and make it FAIR.
Introduction to High-performance In-memory Genome Project at HPI Matthieu Schapranow
The document discusses challenges of big data processing for personalized medicine. It describes the vision of using large amounts of diverse medical data like genomes, medical records, clinical trials, and research papers to enable personalized preventative care and more effective therapies for patients. The speaker then outlines their approach using in-memory databases and analytics to enable interactive analysis of this data. Examples discussed include tools for researchers to analyze genomes, clinicians to find comparable patient cases, and patients to identify relevant clinical trials.
An overview of big data in clinical research. Discussion of big data related to real world evidence (RWE), wearable sensor data (IoT), and clinical genomics. Introduces the use of map-reduce infrastructure for big data in biomedicine.
Users demand applications which integrate intelligently with data from mobile, social media and cloud sources. A system of engagement meets these expectations by applying data and analytics drawn from an array of master systems. The enormous scale and performance required overwhelm relational approaches, but we can use MongoDB to meet the challenge. We'll learn to capture and transmit data changes among disparate systems, expose batch data as interactive operational queries and build systems with strong division of concerns, agility and flexibility
The document provides an update from Brett Whitty of the International Cancer Genome Consortium (ICGC) Data Coordination Center (DCC). It summarizes data releases and additions to the DCC database in 2012, including new cancer types and donors added. It also provides overviews of the total number of mutations and affected genes in the database, as well as completeness of different data types. Clinical data completeness is summarized. Finally, it outlines DCC activities planned for 2013.
Solving the Disconnected Data Problem in Healthcare Using MongoDBMongoDB
1) The document discusses how Zephyr Health is solving the problem of disconnected healthcare data by building a platform that ingests and integrates data from various sources using algorithms and MongoDB.
2) It organizes data into entity-centric profiles and uses a graph-based index to allow complex queries across the integrated data.
3) The platform powers various analytical applications that help address real business problems by leveraging the integrated data in a standardized way.
Similar to Webinar: MongoDB Schema Design and Performance Implications (20)
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
This presentation discusses migrating data from other data stores to MongoDB Atlas. It begins by explaining why MongoDB and Atlas are good choices for data management. Several preparation steps are covered, including sizing the target Atlas cluster, increasing the source oplog, and testing connectivity. Live migration, mongomirror, and dump/restore options are presented for migrating between replicasets or sharded clusters. Post-migration steps like monitoring and backups are also discussed. Finally, migrating from other data stores like AWS DocumentDB, Azure CosmosDB, DynamoDB, and relational databases are briefly covered.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
MongoDB Kubernetes operator and MongoDB Open Service Broker are ready for production operations. Learn about how MongoDB can be used with the most popular container orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications. A demo will show you how easy it is to enable MongoDB clusters as an External Service using the Open Service Broker API for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
Humana, like many companies, is tackling the challenge of creating real-time insights from data that is diverse and rapidly changing. This is our journey of how we used MongoDB to combined traditional batch approaches with streaming technologies to provide continues alerting capabilities from real-time data streams.
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
Time series data is increasingly at the heart of modern applications - think IoT, stock trading, clickstreams, social media, and more. With the move from batch to real time systems, the efficient capture and analysis of time series data can enable organizations to better detect and respond to events ahead of their competitors or to improve operational efficiency to reduce cost and risk. Working with time series data is often different from regular application data, and there are best practices you should observe.
This talk covers:
Common components of an IoT solution
The challenges involved with managing time-series data in IoT applications
Different schema designs, and how these affect memory and disk utilization – two critical factors in application performance.
How to query, analyze and present IoT time-series data using MongoDB Compass and MongoDB Charts
At the end of the session, you will have a better understanding of key best practices in managing IoT time-series data with MongoDB.
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
Our clients have unique use cases and data patterns that mandate the choice of a particular strategy. To implement these strategies, it is mandatory that we unlearn a lot of relational concepts while designing and rapidly developing efficient applications on NoSQL. In this session, we will talk about some of our client use cases, the strategies we have adopted, and the features of MongoDB that assisted in implementing these strategies.
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
Encryption is not a new concept to MongoDB. Encryption may occur in-transit (with TLS) and at-rest (with the encrypted storage engine). But MongoDB 4.2 introduces support for Client Side Encryption, ensuring the most sensitive data is encrypted before ever leaving the client application. Even full access to your MongoDB servers is not enough to decrypt this data. And better yet, Client Side Encryption can be enabled at the "flick of a switch".
This session covers using Client Side Encryption in your applications. This includes the necessary setup, how to encrypt data without sacrificing queryability, and what trade-offs to expect.
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
MongoDB Kubernetes operator is ready for prime-time. Learn about how MongoDB can be used with most popular orchestration platform, Kubernetes, and bring self-service, persistent storage to your containerized applications.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
When you need to model data, is your first instinct to start breaking it down into rows and columns? Mine used to be too. When you want to develop apps in a modern, agile way, NoSQL databases can be the best option. Come to this talk to learn how to take advantage of all that NoSQL databases have to offer and discover the benefits of changing your mindset from the legacy, tabular way of modeling data. We’ll compare and contrast the terms and concepts in SQL databases and MongoDB, explain the benefits of using MongoDB compared to SQL databases, and walk through data modeling basics so you feel confident as you begin using MongoDB.
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
Join this talk and test session with a MongoDB Developer Advocate where you'll go over the setup, configuration, and deployment of an Atlas environment. Create a service that you can take back in a production-ready state and prepare to unleash your inner genius.
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
The document discusses guidelines for ordering fields in compound indexes to optimize query performance. It recommends the E-S-R approach: placing equality fields first, followed by sort fields, and range fields last. This allows indexes to leverage equality matches, provide non-blocking sorts, and minimize scanning. Examples show how indexes ordered by these guidelines can support queries more efficiently by narrowing the search bounds.
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
The document describes a methodology for data modeling with MongoDB. It begins by recognizing the differences between document and tabular databases, then outlines a three step methodology: 1) describe the workload by listing queries, 2) identify and model relationships between entities, and 3) apply relevant patterns when modeling for MongoDB. The document uses examples around modeling a coffee shop franchise to illustrate modeling approaches and techniques.
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
Virtual assistants are becoming the new norm when it comes to daily life, with Amazon’s Alexa being the leader in the space. As a developer, not only do you need to make web and mobile compliant applications, but you need to be able to support virtual assistants like Alexa. However, the process isn’t quite the same between the platforms.
How do you handle requests? Where do you store your data and work with it to create meaningful responses with little delay? How much of your code needs to change between platforms?
In this session we’ll see how to design and develop applications known as Skills for Amazon Alexa powered devices using the Go programming language and MongoDB.
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
aux Core Data, appréciée par des centaines de milliers de développeurs. Apprenez ce qui rend Realm spécial et comment il peut être utilisé pour créer de meilleures applications plus rapidement.
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
Il n’a jamais été aussi facile de commander en ligne et de se faire livrer en moins de 48h très souvent gratuitement. Cette simplicité d’usage cache un marché complexe de plus de 8000 milliards de $.
La data est bien connu du monde de la Supply Chain (itinéraires, informations sur les marchandises, douanes,…), mais la valeur de ces données opérationnelles reste peu exploitée. En alliant expertise métier et Data Science, Upply redéfinit les fondamentaux de la Supply Chain en proposant à chacun des acteurs de surmonter la volatilité et l’inefficacité du marché.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
This presentation provides valuable insights into effective cost-saving techniques on AWS. Learn how to optimize your AWS resources by rightsizing, increasing elasticity, picking the right storage class, and choosing the best pricing model. Additionally, discover essential governance mechanisms to ensure continuous cost efficiency. Whether you are new to AWS or an experienced user, this presentation provides clear and practical tips to help you reduce your cloud costs and get the most out of your budget.
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
5. Medical Records
• Collects all patient information in a central repository
• Provide central point of access for
• Patients
• Care providers: physicians, nurses, etc.
• Billing
• Insurance reconciliation
• Hospitals, physicians, patients, procedures, records
Patient
Records
Medications
Lab Results
Procedures
Hospital
Records
Physicians
Patients
Nurses
Billing
6. Medical Record Data
• Hospitals
• have physicians
• Physicians
• Have patients
• Perform procedures
• Belong to hospitals
• Patients
• Have physicians
• Are the subject of procedures
• Procedures
• Associated with a patient
• Associated with a physician
• Have a record
• Variable meta data
• Records
• Associated with a procedure
• Binary data
• Variable fields
10. Attribute MongoDB Relational
Storage N-dimensional Two-dimensional
Field Values 0, 1, many, or embed Single value
Query Any field, at any level Any field
Schema Flexible Very structured
MongoDB vs. Relational
20. Embedding
• Advantages
• Retrieve all relevant information in a single query/document
• Avoid implementing joins in application code
• Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations
• Large documents mean more overhead if most fields are not relevant
• 16 MB document size limit
22. Embedding
• Advantages
• Retrieve all relevant information in a single query/document
• Avoid implementing joins in application code
• Update related information as a single atomic operation
• MongoDB doesn’t offer multi-document transactions
• Limitations
• Large documents mean more overhead if most fields are not relevant
• 16 MB document size limit
23. Referencing
• Advantages
• Smaller documents
• Less likely to reach 16 MB document limit
• Infrequently accessed information not accessed on every query
• No duplication of data
• Limitations
• Two queries required to retrieve information
• Cannot update related information atomically
24. 1-1: General Recommendations
• Embed
• No additional data duplication
• Can query or index on embedded
field
• e.g., “result.type”
• Exceptional cases…
• Embedding results in large
documents
• Set of infrequently access fields
{
"_id": 333,
"date": "2003-02-09T05:00:00",
"hospital": "County Hills",
"patient": "John Doe",
"physician": "Stephen Smith",
"type": "Chest X - ray",
"result": {
"type": "txt",
"size": 12,
"content": {
"value1": 343,
"value2": "abc"
}
}
}
27. 1-M : General Recommendations
• Embed, when possible
• Many are weak entities
• Access all information in a single query
• Take advantage of update atomicity
• No additional data duplication
• Can query or index on any field
• e.g., { “phones.type”: “mobile” }
• Exceptional cases:
• 16 MB document size
• Large number of infrequently accessed fields
{
_id: 2,
first: “Joe”,
last: “Patient”,
addr: { …},
procedures: [
{
id: 12345,
date: 2015-02-15,
type: “Cat scan”,
…},
{
id: 12346,
date: 2015-02-15,
type: “blood test”,
…}]
}
32. M-M : General Recommendation
• Use case determines whether to reference or
embed:
1. Data Duplication
• Embedding may result in data duplication
• Duplication may be okay if reads
dominate updates
• Of the two, which one changes the
least?
2. Referencing may be required if many
related items
3. Hybrid approach
• Potentially do both .. It’s ok!
{
_id: 2,
name: “Oak Valley Hospital”,
city: “New York”,
beds: 131,
physicians: [12345, 12346]}
{
_id: 12345,
name: “Joe Doctor”,
address: {…},
…}
{
_id: 12346,
name: “Mary Well”,
address: {…},
…}
Hospitals
Reference
Physicians
39. Vital Sign Monitoring Device
Vital Signs Measured:
• Blood Pressure
• Pulse
• Blood Oxygen Levels
Produces data at regular intervals
• Once per minute
• Many Devices, Many Hospitals
40. Data From Vital Signs Monitoring Device
{
deviceId: 123456,
ts: ISODate("2013-10-16T22:07:00.000-0500"),
spO2: 88,
pulse: 74,
bp: [128, 80]
}
• One document x minute x device
• Relational approach
41. Document Per Hour (By minute)
{
deviceId: 123456,
ts: ISODate("2013-10-16T22:00:00.000-0500"),
spO2: { 0: 88, 1: 90, …, 59: 92},
pulse: { 0: 74, 1: 76, …, 59: 72},
bp: { 0: [122, 80], 1: [126, 84], …, 59: [124, 78]}
}
• 1 document x device x hour
• Store per-minute data at the hourly level
• Update-driven workload
42. Characterizing Write Differences
• Example: data generated every minute
• Recording the data for 1 patient for 1 hour:
Document Per Event
60 inserts
Document Per Hour
1 insert, 59 updates
43. Characterizing Read Differences
• Want to graph 24 hour of vital signs for a patient:
• Read performance is greatly improved
Document Per Event
1440 reads
Document Per Hour
24 reads
44. Characterizing Memory and Storage Differences
Document Per Minute Document Per Hour
Number Documents 52.6 Billion 876 Million
Total Index Size 6,364 GB 106 GB
_id index 1,468 GB 24.5 GB
{ts: 1, deviceId: 1} 4,895 GB 81.6 GB
Document Size 92 Bytes 758 Bytes
Database Size 4,503 GB 618 GB
• 100K Devices
• 1 years worth of data, at second resolution (365 x 24 x 60)
46. MongoDB 3.2 – a GIANT Release
Hash-Based Sharding
Roles
Kerberos
On-Prem Monitoring
2.2 2.4 2.6 3.0 3.2
Agg. Framework
Location-Aware
Sharding
$out
Index Intersection
Text Search
Field-Level Redaction
LDAP & x509
Auditing
Document Validation
Fast Failover
Simpler Scalability
Aggregation ++
Encryption At Rest
In-Memory Storage
Engine
BI Connector
$lookup
MongoDB Compass
APM Integration
Profiler Visualization
Auto Index Builds
Backups to File
System
Doc-Level
Concurrency
Compression
Storage Engine API
≤50 replicas
Auditing ++
Ops Manager
47. Tools
• mgenerate
• Part of mtools: https://github.com/rueckstiess/mtools/wiki/mgenerate
• Model schema using json definition
• Generate Millions of documents with random data
• How well does the schema work?
• Queries, Indexes, Data Size, Index Size, Replication
• Demo
48. Documents are Rich Data Structures{
first_name: ‘Paul’,
last_name: ‘Miller’,
cell: 1234567890,
city: ‘London’,
location: [45.123,47.232],
professions: [‘banking’, ‘finance’, ‘trader’],
physicians: [
{ name: ‘Canelo Álvarez, M.D.’,
last_visit: ‘Mission Hospital’,
last_visit_dt: ‘20160501’, … },
{ name: ‘Érik Morales, M.D.’,
last_visit: ‘Del Prado Hospital’,
last_visit_dt: ‘20160302’, … }
]
}
Fields can contain an array of sub-
documents
Fields
Typed field values
Fields can
contain arrays
Fields can be indexed and queried at
any level
ORM Layer removed – Data is already
an object!
51. Visual Query Profiler
Identify your slow-running queries with
the click of a button
Index Suggestions
Index recommendations to improve
your deployment
&
55. MongoDB 3.2 Document Validation
db.runCommand( {
collMod: "Patients",
validator: { $and: [
{ "first_name": { "$type": "string" }},
{ "last_name": { "$type": "string"}},
{ "physicians": { "$type": "array"}}
] },
validationLevel: "strict"
});
https://docs.mongodb.com/manual/core/document-validation/
All Patient records must
have alphanumeric data
for the first and last
name, and a list of
Physicians
56. Summary
Embedding and
Referencing01 Context of Application Data
and Query Workload
Decisions
031-1 : Embed
1-M : Embed
when possible
M-M : Hybrid
02
Different schemas may result
in dramatically different query
performance, data/index size
and hardware requirements!
Iterate
04 $lookup
Document Validation
3.2
06Measure data/index size, query
performance
- mgenerate/mtools
- Compass
- Cloud Manager / Ops
Manager
Tools!
05
Hi my name is Sigfrido Narvaez, and I like to go by Sig.
Today we will be talking about MongoDB schema design and some of its performance implications. We will also explore some of the new features in MongoDB 3.2 that are relevant to schema design, and some additional tools that will help you iterate and try out different approaches quickly.
During the webinar, please feel free to type any questions in the chat box, and at the end, we will have a Q&A session and answer as much as we can.
Ok, so I am a Sr. Solutions Architect here at MongoDB based out of Southern California, and prior to joining I was the Principal Software Architect for a Hybrid Cloud & Polyglot Persistence solution that used MongoDB, and that required leveraging MongoDB’s dynamic flexible schema to power cloud and mobile apps whose main source of data originated from many on-premise ERP’s. And I have also been organizing the orange county MUG for almost 4 years.
I have provided my email address and my Twitter handle, in case I don't cover all the questions or we any follow-ups, so please feel free to reach out with any questions afterwards and I will make sure I find the information you are looking for.
The agenda for today’s presentation, We will use a medical record example, and explore its schema in MongoDB vs. relational, using Embedding & Referencing, and comparing against the classic 1-1, 1-M & M-M. We will then jump into a performance analysis examining data and index growth, and finally, explore new features in MongoDB 3.2
Design a schema for a medical information system. Where we will need to store data for the Patients, the Physicians, the Procedures, and many other aspects about a medical system.
And all this data is interrelated and we have to assume the system will be around for many years and will grow over time.
Left-down to right-down
Let's examine the data entities that are going to be part of this system.
First we have hospitals and hospitals have many physicians,
Then we have the physicians who attend many patients and that will perform many procedures, and who themselves belong to many hospitals
The patients, who again are attended by many physicians, they are the subject of many procedures
The Procedures are of course applied by a physician to a patient, inside a hospital on a particular time, and the data that is produced by each of these procedures can vary a lot. For example an x-ray procedure will produce a bunch of data along with an image or set of images, but a blood test will only produce a bunch of data. Each procedure has different data, schema design problem
As we can see the main entities and their relationships maybe a great fit for a relational database.
But the procedures data is not, and, overtime procedures will change and use new medical devices or go through improvements and may produce even more data with more variability, and we still have to keep historical records too.
This is a real challenge for a relational database
But for MongoDB and the flexible document model, this is easy. The way we would model this is by having some common data points that all procedures have, such as the timestamp, the physician, the patient, and the hospital, and any other common fields but then have a variable section in the json document, for the unique data points of each procedure.
This will make great use of the polymorphic schema capabilities of MongoDB, and with modern languages, this can be modeled using base classes and extensions or inheritance.
Before we go into the modeling exercises, let's do a level set of understanding of MongoDB concepts versus Relational concepts
In MongoDB data is stored in a collection and that is analogous to a table. Collections contain Documents and that is analogous to a Row or Record.
More importantly in mongo DB we think about what data do I need to use and how will it be used, versus how will the data be stored.
In MongoDB, we need to look at queries to guide schema design decisions, where as in relational we model first, and then answer questions, and eventually add Indexes and in some cases, denormalize data to support queries and performance over time.
Another difference is that in mongo DB fields have many dimensions versus just having two (rows and columns)
Each field can contain 0, 1 or many values such as an Array, or even embedded such as sub-documents, and the type can vary from document to document. vs. a single value of a pre-defined type.
I can also query at any field and at any level in the document versus a single field, and we
Okay so when we start modeling data, the first thing to avoid is to think of every single little thing that I may not use immediately, which usually leads to creating complex over normalized schemas
DO NOT PERFORM 3rd normal form modeling, and create hundreds of tables, where you have join tables for M-M’s and store all kinds of entities entities which will be very difficult to join, will slow down performance and will be hard to maintain over time.
Instead what we do is create rich data structures that are single documents. As you can see in this example we have many fields about a patient, where they live, what professions they practice, a list of the physicians they're currently seeing, when was the last visit, etc. So I can get a quick view of a patient in a single document.
Now we have talked about MongoDB having strong data types, such as strings and numbers, but we also have more advanced data types such as coordinates, and arrays of other sub-documents
In MongoDB I can query and index using any number of fields at any level, and the document is already in object form so I don't need an ORM layer like Hibernate or Entity Framework to translate data from relational to object, the data is already an object.
Two ways to model relationships: Referencing and Embedding.
Referencing is a very relational-like approach where I duplicate ID’s across collections. But take into account that MongoDB does not enforce foreign key constraints, so if you were to delete a master document, you will likely end-up with orphans and this has to be handled by the application level. Embedding is more natural to MongoDB and it works by nesting data inside a single document. There May or may not be a need to generate an ID for nested data, but for sure there is no need to duplicate them as everything lives together.
So how does this apply to our medical schema? Let’s look at Procedures and Results. With Referencing I could use two collections and have a relationship between them. In Embedding, I could embed the results inside of the procedure. Now, something to think about, which of these two entities is a strong entity and a weak entity. Clearly the Results is weak as it cannot exist without a Procedure.
Here is how the referencing approach would look like. Obtaining all the data I need will require two reads and two roundtrips to the database. Notice we have placed the result ID in the procedure, Why? Because my application will display Procedures and their Results. This way I only need to read the Procedure collection and then lookup the Result document by its ID, and I can perform this lookup in the application layer.
And to give you a hint about the latter section of the presentation, with MongoDB 3.2 I can use the $lookup pipeline stage to perform what is essentially a left-outer join performed at the database layer.
Take a second to think about this design, using classic relational modeling and considering the strong and weak entities, I would have probably placed the ProcedureID in the Results. But then I would need to create an additional index which costs disk and memory
However, with the Embedding approach, this is quite easy to model and getting my data requires a single read and a single roundtrip to the database.
So the advantages of embedding is that I can retrieve all relevant information and read from a single document.
I don't have to implement any joins in my application code and also when I update or insert data, it is a single atomic operation. Consider that MongoDB, at this time, does not offer multi-document, multi-collection transactions.
Let’s talk about Atomicity for a bit.
In a single database command, we can update many fields, or the whole doc. If there are concurrent reads and writes to the same document, the application will see the document before or after the update, but not in between. So a single Update statement can alter either the complete document, or parts of, as we see in this example, and that is atomic. Explain particular operation.
But what is not possible, up to MongoDB 3.2, is to do multi-document transactions. You cannot begin a transaction, perform operations and then either commit or rollback.
What you may have guessed already, is that Embedding takes advantage of mongodb’s document-level atomicity
But, there are limitations. A large document can also cost more overhead, and there is a 16MB limit, although, 16MB of JSON is a considerable amount of data. So, larger documents can cost more to read and update specially if data does not change too much.
Exact of opposite than embedding
Avoid duplication (1-M)
Always look at embedding first, and then prove that embedding doesn’t work
Can always query on any embedded information
Careful: extra large documents, or embedded data not accessed frequently
Mixed or Hybrid approach – reference to keep master data, but also embedd to store latest or most-used data for speed
Avoid join tables! – what is a join table? A list of key pairs that relate to independent entities
In MongoDB we have arrays
The relationship can be done as embedded or referencing
Using Embedding, arrays can be used. Data Duplication will happen and this is not such a bad idea as it is in relational. Notice how we are denormalizing some of the fields that we need most often (like the dr’s names) and still suffice our queries in a very fast manner
Downside, if the fields we duplicate change, then we do have maintance or stale data. So take into account which fields will most likely not change, such as a Dr’s name.
What to do if the fields change often?
If the fields change quite often, then perhaps we could revert to Referencing, knowing we may need to hit the DB multiple times.
Decision is really dependent on your application
Fast queries
Atomic updates
Data maintenance when duplication - How often does data change?
Read or Write intensive?
Let’s look at Patients and Procedures
Hypothetically decided to always use Referencing
Look at queries – find all patients from a state that have had a particular procedure
Very difficult query!! Bad performance
Query Patients coll in New Hampshire – get the Patient ID’s
Now go against all procedures of type Xray and for these Patiend ID’s – join code in application
Referencing and embedding
Contains the Type of the Procedure
Can now embedd a small amount of Procedure info and can now execute in a single query
If the “Chest X-Ray” changes, have to change everywhere – but very seldom changes, maybe once a decade!
Tons of data into MongoDB every second
Patients pulse, heart pressure, from which device, when, etc.
Schema easy by creating a record per event – easy, but let’s analyze the consequences?
Millions of records very quickly, a lot of the same data repeats! E.b. Device Id, the PatiendIT, and most of the time-stamp
Index space will grow significantly, operations and queries will be expensive too!
Store one document per hour! Vs. 1 doc per minute
Each doc will contain 60 mins of data
Pulse is a two dimensional array
In general, an update is less costly than an insert, in this case we are creating less write workload by doing more updates than inserts
Graph 1 day of activity
Substantially less IOPS for Read, which means reading is faster
Order of magnitudes differences!! When planning 1 years worth of data
ALWAYS ALWAYS consider what indexes are neede, and the size of that index
Revisit for formatting and easy reads
Consider hardware needed!! – Servers with 100’s GB RAM are easy then TB sizes! - Same for disk space
Summarize total HW at bottom row
Use mgenerate to model data to see actual data sizes!
Quickly identify your slow-running queries.
Part of MongoDB Ops Manager, the Visual Query Profiler displays how query and write latency vary over time
With the click of a button, the Visual Query Profiler consolidates and displays metrics from all your nodes on a single screen
Let’s go back to this example from earlier, and imagine that the Procedure Name changes quite often, and we have decided to reference instead of embed.
But I also want to get a view of the data that just has the Patient and his/hers Procedures, but not the Physicians
Using $lookup I can do this
Finally, when schema is done, working and performing well and I am in Production, I may want to lock this down. I can do this with Document Validation in 3.2.