Deciding which of the many database options to choose from in Azure can be overwhelming. There are many options, and it’s impossible for everyone to know all of them. Traditionally, the choice has been to select which relational database to choose. But with all the NoSql databases available, there are many more choices that may be a better fit for your application. What are the trade offs among all the choices? Why pick just one? I will give some practical examples of how to combine different types of databases. Microsoft released Document DB a couple of years ago, which was their first managed NoSql cloud database. Just recently Cosmos DB has expanded those offerings and made it easier than ever to use. Cosmos DB is a service that contains several types of databases: Relational, Key Value Pair, Document and Graph. I will explain what each of these are, along with some code samples for each one to get you started. You will leave this session with a greater understanding of the different types of NoSql databases and what kinds of problems each of them solves best.
Webinar: Live Data Visualisation with Tableau and MongoDBMongoDB
MongoDB 3.2 introduces a new way for familiar Business Intelligence (BI) tools to access your real-time operational data – opening it up to data analysts and data scientist, enabling new insights to be discovered faster than ever before. Tableau accesses the JSON document data stored in MongoDB via this new BI connector. We will cover how the BI connector works by creating a relational view definition of a JSON data set that is then used to present a tabular SQL/ODBC interface to Tableau. Then we will set-up a live connection from Tableau Desktop to the MongoDB Connector for BI. Once we have Tableau Desktop and MongoDB connected, we will demonstrate the visual power of Tableau to explore the agile data storage of MongoDB. This webinar will cover:
What is the MongoDB BI Connector?
Setting up a connection from Tableau to the MongoDB BI Connector.
How to perform data discovery Tableau connected to MongoDB live data.
Publishing a Tableau Dashboard for sharing insights.
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
Pairing your real-time operational data stored in a modern database like MongoDB with first-class business intelligence platforms like Tableau enables new insights to be discovered faster than ever before.
Many leading organizations already use MongoDB in conjunction with Tableau including a top American investment bank and the world’s largest airline. With the Connector for BI 2.0, it’s never been easier to streamline the connection process between these two systems.
In this webinar, we will create a live connection from Tableau Desktop to a MongoDB cluster using the Connector for BI. Once we have Tableau Desktop and MongoDB connected, we will demonstrate the visual power of Tableau to explore the agile data storage of MongoDB.
You’ll walk away knowing:
- How to configure MongoDB with Tableau using the updated connector
- Best practices for working with documents in a BI environment
- How leading companies are using big data visualization strategies to transform their businesses
Beyond the Basics 3: Introduction to the MongoDB BI ConnectorMongoDB
Watch this presentation to learn how the MongoDB BI Connector lets you use MongoDB as a data source for your SQL-based BI and analytics platforms.
Learn how to seamlessly create the visualizations and dashboards that will help you extract the insights and hidden value in your multi-structured data.
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
This document provides a high-level summary of MongoDB and its features. It begins with an overview of MongoDB, including its employees, customers, offices, and public status. It then discusses MongoDB's document model and how it allows for flexible, schema-less structures. It also covers MongoDB's rich query language and secondary indexing capabilities. Other sections summarize MongoDB's availability and workload isolation with replica sets, its scalability features including sharding and data locality, its security features, and management tools like Ops Manager and Compass. The document also briefly discusses MongoDB's integration with BI tools and running MongoDB in the cloud with MongoDB Atlas.
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQLMongoDB
Data administrators face the challenge of integrating disparate data technologies into a cohesive and performant data platform. This is especially true when using diverse query languages and protocols. This session will focus on how to integrate SQL-aware applications into a MongoDB data platform.
Webinar: Live Data Visualisation with Tableau and MongoDBMongoDB
MongoDB 3.2 introduces a new way for familiar Business Intelligence (BI) tools to access your real-time operational data – opening it up to data analysts and data scientist, enabling new insights to be discovered faster than ever before. Tableau accesses the JSON document data stored in MongoDB via this new BI connector. We will cover how the BI connector works by creating a relational view definition of a JSON data set that is then used to present a tabular SQL/ODBC interface to Tableau. Then we will set-up a live connection from Tableau Desktop to the MongoDB Connector for BI. Once we have Tableau Desktop and MongoDB connected, we will demonstrate the visual power of Tableau to explore the agile data storage of MongoDB. This webinar will cover:
What is the MongoDB BI Connector?
Setting up a connection from Tableau to the MongoDB BI Connector.
How to perform data discovery Tableau connected to MongoDB live data.
Publishing a Tableau Dashboard for sharing insights.
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
Pairing your real-time operational data stored in a modern database like MongoDB with first-class business intelligence platforms like Tableau enables new insights to be discovered faster than ever before.
Many leading organizations already use MongoDB in conjunction with Tableau including a top American investment bank and the world’s largest airline. With the Connector for BI 2.0, it’s never been easier to streamline the connection process between these two systems.
In this webinar, we will create a live connection from Tableau Desktop to a MongoDB cluster using the Connector for BI. Once we have Tableau Desktop and MongoDB connected, we will demonstrate the visual power of Tableau to explore the agile data storage of MongoDB.
You’ll walk away knowing:
- How to configure MongoDB with Tableau using the updated connector
- Best practices for working with documents in a BI environment
- How leading companies are using big data visualization strategies to transform their businesses
Beyond the Basics 3: Introduction to the MongoDB BI ConnectorMongoDB
Watch this presentation to learn how the MongoDB BI Connector lets you use MongoDB as a data source for your SQL-based BI and analytics platforms.
Learn how to seamlessly create the visualizations and dashboards that will help you extract the insights and hidden value in your multi-structured data.
MongoDB .local Munich 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB
Are you new to schema design for MongoDB, or are you looking for a more complete or agile process than what you are following currently? In this talk, we will guide you through the phases of a flexible methodology that you can apply to projects ranging from small to large with very demanding requirements.
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
This document provides a high-level summary of MongoDB and its features. It begins with an overview of MongoDB, including its employees, customers, offices, and public status. It then discusses MongoDB's document model and how it allows for flexible, schema-less structures. It also covers MongoDB's rich query language and secondary indexing capabilities. Other sections summarize MongoDB's availability and workload isolation with replica sets, its scalability features including sharding and data locality, its security features, and management tools like Ops Manager and Compass. The document also briefly discusses MongoDB's integration with BI tools and running MongoDB in the cloud with MongoDB Atlas.
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQLMongoDB
Data administrators face the challenge of integrating disparate data technologies into a cohesive and performant data platform. This is especially true when using diverse query languages and protocols. This session will focus on how to integrate SQL-aware applications into a MongoDB data platform.
SlamData - How MongoDB Is Powering a Revolution in Visual AnalyticsJohn De Goes
Slides from my presentation at MongoDB Days Silicon Valley. I discuss what SlamData is, the challenges it had to solve to build an analytics solution in the NoSQL space, and how specific features of MongoDB help power its advanced analytics functionality.
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
Modern architectures are moving away from a "one size fits all" approach. We are well aware that we need to use the best tools for the job. Given the large selection of options available today, chances are that you will end up managing data in MongoDB for your operational workload and with Spark for your high speed data processing needs.
Description: When we model documents or data structures there are some key aspects that need to be examined not only for functional and architectural purposes but also to take into consideration the distribution of data nodes, streaming capabilities, aggregation and queryability options and how we can integrate the different data processing software, like Spark, that can benefit from subtle but substantial model changes. A clear example is when embedding or referencing documents and their implications on high speed processing.
Over the course of this talk we will detail the benefits of a good document model for the operational workload. As well as what type of transformations we should incorporate in our document model to adjust for the high speed processing capabilities of Spark.
We will look into the different options that we have to connect these two different systems, how to model according to different workloads, what kind of operators we need to be aware of for top performance and what kind of design and architectures we should put in place to make sure that all of these systems work well together.
Over the course of the talk we will showcase different libraries that enable the integration between spark and MongoDB, such as MongoDB Hadoop Connector, Stratio Connector and MongoDB Spark Native Connector.
By the end of the talk I expect the attendees to have an understanding of:
How they connect their MongoDB clusters with Spark
Which use cases show a net benefit for connecting these two systems
What kind of architecture design should be considered for making the most of Spark + MongoDB
How documents can be modeled for better performance and operational process, while processing these data sets stored in MongoDB.
The talk is suitable for:
Developers that want to understand how to leverage Spark
Architects that want to integrate their existing MongoDB cluster and have real time high speed processing needs
Data scientists that know about Spark, are playing with Spark and want to integrate with MongoDB for their persistency layer
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB
What's the Scoop on MongoDB & Hadoop
Jake Angerman, Sr. Solutions Architect, MongoDB
MongoDB Evenings Dallas
March 30, 2016 at the Addison Treehouse, Dallas, TX
MongoDB Launchpad 2016: What’s New in the 3.4 ServerMongoDB
Asya Kamsky, a lead product manager at MongoDB, discussed improvements, extensions, and innovations in MongoDB. These included improvements to the Wired Tiger storage engine, replica set election process, and initial sync process. MongoDB was also extended with features like document validation, partial indexes, $lookup, read-only views, and faceted search. Innovations involved improvements to the aggregation pipeline, mixed storage engine sets, zones, and BI connectors.
Tutorial: Building Your First App with MongoDB StitchMongoDB
MongoDB Stitch allows developers to easily access and integrate MongoDB databases with key services. It provides integrated rules, functions and SDKs to handle complex connection logic and orchestrate databases and third party services. Requests made through Stitch applications are parsed, services are orchestrated, rules are applied, and results are returned to clients. Stitch offers scalable hosted JavaScript functions and declarative access controls to securely manage data and service access.
This document provides an overview of MongoDB and discusses its installation and configuration on Windows systems. It covers downloading the appropriate MongoDB version, installing the downloaded file, setting up the MongoDB environment by creating a data directory and log files, and connecting to MongoDB using the mongo shell. The document is divided into multiple sections covering MongoDB's features, data modeling using documents, database and collection management operations, and connecting to MongoDB from Java applications.
The document outlines an agenda for discussing MongoDB, including an overview of MongoDB as a non-SQL, document-based database using dynamic schemas. It then compares SQL and MongoDB concepts like databases, tables, and indexes. Key features and how MongoDB achieves performance are mentioned, as well as where MongoDB fits and doesn't fit. The agenda closes with discussing pros and cons, a demo, customers and references, and Q&A.
Document Model for High Speed Spark ProcessingMongoDB
The document discusses Apache Spark and its integration with MongoDB. It provides an overview of Spark's architecture and capabilities including Spark SQL, streaming, machine learning libraries. It then covers use cases and benefits of using Spark with MongoDB, including real-time analytics, fraud detection, and time series analysis. The document demonstrates how the Stratio Spark-MongoDB connector allows querying and analyzing MongoDB data using Spark SQL and DataFrames.
This document provides an introduction to schema design in MongoDB. It begins by explaining why schema design is important, even though MongoDB is often described as "schema-less". The goals of the session are to explain the document model, differences from relational databases, how schema design impacts performance, and thinking about schema design in a new way. Various schema design patterns are discussed, including embedding vs referencing for different types of data relationships. Examples of schema designs for a guitar collector application and healthcare application are provided. Tools like Atlas, Stitch, and Compass that are useful for MongoDB are also mentioned.
The MongoDB Spark Connector integrates MongoDB and Apache Spark, providing users with the ability to process data in MongoDB with the massive parallelism of Spark. The connector gives users access to Spark's streaming capabilities, machine learning libraries, and interactive processing through the Spark shell, Dataframes and Datasets. We'll take a tour of the connector with a focus on practical use of the connector, and run a demo using both Spark and MongoDB for data processing.
Analyze and visualize non-relational data with DocumentDB + Power BISriram Hariharan
The session will show how to do Analyze and visualize non-relational data with DocumentDB + Power BI. We are in the midst of a paradigm shift on how we store and analyze data. Unstructured or flexible schema data represents a large portion of data within an organization. Everyone is obsessed to turn this data into meaningful business information. Unstructured data analytics do not need to be time consuming and complex. Come learn how to analyze and visualize unstructured data in DocumentDB.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB and Hadoop: Driving Business InsightsMongoDB
MongoDB and Hadoop can work together to solve big data problems facing today's enterprises. We will take an in-depth look at how the two technologies complement and enrich each other with complex analyses and greater intelligence. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights with MapReduce, Pig, and Hive, and demo a Spark application to drive product recommendations.
The document discusses purpose-built databases and how developers need to be able to use multiple databases within their applications. It provides examples of companies like Airbnb and Expedia using different databases for different purposes. The rest of the document outlines common data models, use cases, and Amazon database offerings and provides a demo and additional resources.
Addressing Your Backup Needs Using Ops Manager and AtlasMongoDB
This document discusses disaster recovery options for MongoDB databases using MongoDB Ops Manager, Cloud Manager, and Atlas. It describes the differences between replication and disaster recovery and the importance of restore point and time objectives. It then outlines features of MongoDB Ops Manager and Cloud Manager for backup, restore, and point-in-time recovery. Finally, it details how MongoDB Atlas provides automated, secure, globally available databases with continuous backups and new options for cloud provider snapshots for disaster recovery.
Data Analytics: Understanding Your MongoDB DataMongoDB
This document discusses data visualization and analytics using MongoDB data. It covers the importance of data visualization, different architectures for analytics, and tooling options for visualizing MongoDB data, including building custom solutions, MongoDB Compass, the MongoDB BI Connector, and the new MongoDB Charts tool. The goal is to help users understand which visualization methods and tools are best suited to their specific needs and data.
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...Naoki (Neo) SATO
The document discusses the Microsoft data platform and its capabilities for building applications in the IoT/AI era. It outlines various Azure data services for relational, non-relational, caching, and search needs. Examples are provided for when to use SQL Database, Table Storage, Cosmos DB, HBase on HDInsight, Redis Cache, and Azure Search. The platform allows capturing, managing, transforming, analyzing, and visualizing data to power intelligent applications.
This document provides an overview of NoSQL databases in Azure. It discusses 7 different database types - key-value, column family, document, graph and Hadoop. For each database type it provides information on what it is, examples of use cases, and how to query or model data. It encourages attendees to explore these databases and stresses that choosing the right database for the job is important.
SlamData - How MongoDB Is Powering a Revolution in Visual AnalyticsJohn De Goes
Slides from my presentation at MongoDB Days Silicon Valley. I discuss what SlamData is, the challenges it had to solve to build an analytics solution in the NoSQL space, and how specific features of MongoDB help power its advanced analytics functionality.
MongoDB .local Munich 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
MongoDB Atlas Data Lake is a new service offered by MongoDB Atlas. Many organizations store long term, archival data in cost-effective storage like S3, GCP, and Azure Blobs. However, many of them do not have robust systems or tools to effectively utilize large amounts of data to inform decision making. MongoDB Atlas Data Lake is a service allowing organizations to analyze their long-term data to discover a wealth of information about their business.
This session will take a deep dive into the features that are currently available in MongoDB Atlas Data Lake and how they are implemented. In addition, we'll discuss future plans and opportunities and offer ample Q&A time with the engineers on the project.
Modern architectures are moving away from a "one size fits all" approach. We are well aware that we need to use the best tools for the job. Given the large selection of options available today, chances are that you will end up managing data in MongoDB for your operational workload and with Spark for your high speed data processing needs.
Description: When we model documents or data structures there are some key aspects that need to be examined not only for functional and architectural purposes but also to take into consideration the distribution of data nodes, streaming capabilities, aggregation and queryability options and how we can integrate the different data processing software, like Spark, that can benefit from subtle but substantial model changes. A clear example is when embedding or referencing documents and their implications on high speed processing.
Over the course of this talk we will detail the benefits of a good document model for the operational workload. As well as what type of transformations we should incorporate in our document model to adjust for the high speed processing capabilities of Spark.
We will look into the different options that we have to connect these two different systems, how to model according to different workloads, what kind of operators we need to be aware of for top performance and what kind of design and architectures we should put in place to make sure that all of these systems work well together.
Over the course of the talk we will showcase different libraries that enable the integration between spark and MongoDB, such as MongoDB Hadoop Connector, Stratio Connector and MongoDB Spark Native Connector.
By the end of the talk I expect the attendees to have an understanding of:
How they connect their MongoDB clusters with Spark
Which use cases show a net benefit for connecting these two systems
What kind of architecture design should be considered for making the most of Spark + MongoDB
How documents can be modeled for better performance and operational process, while processing these data sets stored in MongoDB.
The talk is suitable for:
Developers that want to understand how to leverage Spark
Architects that want to integrate their existing MongoDB cluster and have real time high speed processing needs
Data scientists that know about Spark, are playing with Spark and want to integrate with MongoDB for their persistency layer
MongoDB Evenings Dallas: What's the Scoop on MongoDB & HadoopMongoDB
What's the Scoop on MongoDB & Hadoop
Jake Angerman, Sr. Solutions Architect, MongoDB
MongoDB Evenings Dallas
March 30, 2016 at the Addison Treehouse, Dallas, TX
MongoDB Launchpad 2016: What’s New in the 3.4 ServerMongoDB
Asya Kamsky, a lead product manager at MongoDB, discussed improvements, extensions, and innovations in MongoDB. These included improvements to the Wired Tiger storage engine, replica set election process, and initial sync process. MongoDB was also extended with features like document validation, partial indexes, $lookup, read-only views, and faceted search. Innovations involved improvements to the aggregation pipeline, mixed storage engine sets, zones, and BI connectors.
Tutorial: Building Your First App with MongoDB StitchMongoDB
MongoDB Stitch allows developers to easily access and integrate MongoDB databases with key services. It provides integrated rules, functions and SDKs to handle complex connection logic and orchestrate databases and third party services. Requests made through Stitch applications are parsed, services are orchestrated, rules are applied, and results are returned to clients. Stitch offers scalable hosted JavaScript functions and declarative access controls to securely manage data and service access.
This document provides an overview of MongoDB and discusses its installation and configuration on Windows systems. It covers downloading the appropriate MongoDB version, installing the downloaded file, setting up the MongoDB environment by creating a data directory and log files, and connecting to MongoDB using the mongo shell. The document is divided into multiple sections covering MongoDB's features, data modeling using documents, database and collection management operations, and connecting to MongoDB from Java applications.
The document outlines an agenda for discussing MongoDB, including an overview of MongoDB as a non-SQL, document-based database using dynamic schemas. It then compares SQL and MongoDB concepts like databases, tables, and indexes. Key features and how MongoDB achieves performance are mentioned, as well as where MongoDB fits and doesn't fit. The agenda closes with discussing pros and cons, a demo, customers and references, and Q&A.
Document Model for High Speed Spark ProcessingMongoDB
The document discusses Apache Spark and its integration with MongoDB. It provides an overview of Spark's architecture and capabilities including Spark SQL, streaming, machine learning libraries. It then covers use cases and benefits of using Spark with MongoDB, including real-time analytics, fraud detection, and time series analysis. The document demonstrates how the Stratio Spark-MongoDB connector allows querying and analyzing MongoDB data using Spark SQL and DataFrames.
This document provides an introduction to schema design in MongoDB. It begins by explaining why schema design is important, even though MongoDB is often described as "schema-less". The goals of the session are to explain the document model, differences from relational databases, how schema design impacts performance, and thinking about schema design in a new way. Various schema design patterns are discussed, including embedding vs referencing for different types of data relationships. Examples of schema designs for a guitar collector application and healthcare application are provided. Tools like Atlas, Stitch, and Compass that are useful for MongoDB are also mentioned.
The MongoDB Spark Connector integrates MongoDB and Apache Spark, providing users with the ability to process data in MongoDB with the massive parallelism of Spark. The connector gives users access to Spark's streaming capabilities, machine learning libraries, and interactive processing through the Spark shell, Dataframes and Datasets. We'll take a tour of the connector with a focus on practical use of the connector, and run a demo using both Spark and MongoDB for data processing.
Analyze and visualize non-relational data with DocumentDB + Power BISriram Hariharan
The session will show how to do Analyze and visualize non-relational data with DocumentDB + Power BI. We are in the midst of a paradigm shift on how we store and analyze data. Unstructured or flexible schema data represents a large portion of data within an organization. Everyone is obsessed to turn this data into meaningful business information. Unstructured data analytics do not need to be time consuming and complex. Come learn how to analyze and visualize unstructured data in DocumentDB.
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
These days, everyone is expected to be a data analyst. But with so much data available, how can you make sense of it and be sure you're making the best decisions? One great approach is to use data visualizations. In this session, we take a complex dataset and show how the breadth of capabilities in MongoDB Charts can help you turn bits and bytes into insights.
MongoDB and Hadoop: Driving Business InsightsMongoDB
MongoDB and Hadoop can work together to solve big data problems facing today's enterprises. We will take an in-depth look at how the two technologies complement and enrich each other with complex analyses and greater intelligence. We will take a deep dive into the MongoDB Connector for Hadoop and how it can be applied to enable new business insights with MapReduce, Pig, and Hive, and demo a Spark application to drive product recommendations.
The document discusses purpose-built databases and how developers need to be able to use multiple databases within their applications. It provides examples of companies like Airbnb and Expedia using different databases for different purposes. The rest of the document outlines common data models, use cases, and Amazon database offerings and provides a demo and additional resources.
Addressing Your Backup Needs Using Ops Manager and AtlasMongoDB
This document discusses disaster recovery options for MongoDB databases using MongoDB Ops Manager, Cloud Manager, and Atlas. It describes the differences between replication and disaster recovery and the importance of restore point and time objectives. It then outlines features of MongoDB Ops Manager and Cloud Manager for backup, restore, and point-in-time recovery. Finally, it details how MongoDB Atlas provides automated, secure, globally available databases with continuous backups and new options for cloud provider snapshots for disaster recovery.
Data Analytics: Understanding Your MongoDB DataMongoDB
This document discusses data visualization and analytics using MongoDB data. It covers the importance of data visualization, different architectures for analytics, and tooling options for visualizing MongoDB data, including building custom solutions, MongoDB Compass, the MongoDB BI Connector, and the new MongoDB Charts tool. The goal is to help users understand which visualization methods and tools are best suited to their specific needs and data.
[WITH THE VISION 2017] IoT/AI時代を生き抜くためのデータ プラットフォーム (Leveraging Azure Data Se...Naoki (Neo) SATO
The document discusses the Microsoft data platform and its capabilities for building applications in the IoT/AI era. It outlines various Azure data services for relational, non-relational, caching, and search needs. Examples are provided for when to use SQL Database, Table Storage, Cosmos DB, HBase on HDInsight, Redis Cache, and Azure Search. The platform allows capturing, managing, transforming, analyzing, and visualizing data to power intelligent applications.
This document provides an overview of NoSQL databases in Azure. It discusses 7 different database types - key-value, column family, document, graph and Hadoop. For each database type it provides information on what it is, examples of use cases, and how to query or model data. It encourages attendees to explore these databases and stresses that choosing the right database for the job is important.
This document discusses building a single database containing all web data by creating a scalable web crawler, data store, and data retrieval system. It describes the challenges of collecting and structuring data from millions of websites, building a NoSQL data store using Cassandra to handle terabytes of data, and providing an intuitive RESTful API for querying the unified database. The project aims to make web data easily accessible through a single source as if querying a database.
Big Data Analytics from Azure Cloud to Power BI MobileRoy Kim
This document discusses using Azure services for big data analytics and data insights. It provides an overview of Azure services like Azure Batch, Azure Data Lake, Azure HDInsight and Power BI. It then describes a demo solution that uses these Azure services to analyze job posting data, including collecting data using a .NET application, storing in Azure Data Lake Store, processing with Azure Data Lake Analytics and Azure HDInsight, and visualizing results in Power BI. The presentation includes architecture diagrams and discusses implementation details.
Designing big data analytics solutions on azureMohamed Tawfik
This document discusses designing big data analytics solutions on Azure. It provides an overview of Azure's data landscape and common architectural patterns and scenarios for building analytics solutions using various Azure data and analytics services. These include Azure SQL Data Warehouse, Azure Data Lake Store, Azure Data Factory, Azure Machine Learning, and Power BI for reporting and visualization. The document also discusses using these services to build solutions for scenarios like data warehousing, data lakes, ETL/ELT, machine learning, streaming analytics and more.
This was a very interesting conference, TIC students oriented where I take him to the azure ecosystem for data warehousing architecture and best practices to reach powerful Business Intelligence Solutions according to the new era
Our own Sean Doherty was in Madrid this week, presenting at the Red Hat Partner summit on the rise of big data and what it means for the future of the RDBMS in the enterprise. Check out his presentation!
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
Sig Narvaez, Executive Solution Architect at MongoDB
MongoDB is now a Developer Data Platform. Come learn what�s new in the 6.0 release and Atlas following all the recent announcements made at MongoDB World 2022. Topics will include
- Atlas Search which combines 3 systems into one (database, search engine, and sync mechanisms) letting you focus on your product's differentiation.
- Atlas Data Federation to seamlessly query, transform, and aggregate data from one or more MongoDB Atlas databases, Atlas Data Lake and AWS S3 buckets
- Queryable Encryption lets you run expressive queries on fully randomized encrypted data to meet the most stringent security requirements
- Relational Migrator which analyzes your existing relational schemas and helps you design a new MongoDB schema.
- And more!
Evolution of the DBA to Data Platform Administrator/SpecialistTony Rogerson
DBA's used to be Relational Database centric for instance managing Microsoft SQL Server or Oracle, in this changing world of polyglot database environments their role has expanded not just into new platforms other than SQL but also new legal governance, modelling techniques, architecture etc. They need to have a base knowledge of Kimball, Inmon, Data Vault, what CAP theorem is, LAMBDA, Big Data, Data Science etc.
Myth Busters II: BI Tools and Data Virtualization are InterchangeableDenodo
Watch Here: https://bit.ly/2NcqU6F
We take on the 2nd myth about data virtualization and it’s one that suggests a BI tool can substitute a data virtualization software.
You might be thinking: If I can have multi-source queries and define a logical model in my reporting tool, why would I need a data virtualization software?
Reporting tools, no doubt important and necessary, focus on the visualization of data and it’s presentation to the business user. Data virtualization is a governed data access layer designed to connect to and provide transparency of all enterprise data.
Yet the myth suggests that these technologies are interchangeable. So we’re going to take it on!
Watch this webinar as we compare and contrast BI tools and data virtualization to draw a final conclusion.
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Amazon Web Services LATAM
Data lakes allow organizations to store all types of data in a centralized repository at scale. AWS Lake Formation makes it easy to build secure data lakes by automatically registering and cleaning data, enforcing access permissions, and enabling analytics. Data stored in data lakes can be analyzed using services like Amazon Athena, Redshift, and EMR depending on the type of analysis and latency required.
The document discusses the evolution of databases and the rise of content applications. It argues that traditional databases are no longer suitable for next-generation applications that require flexible querying of content. New specialized database systems optimized for XML and content are needed to power applications that help users complete tasks and gain insights from unstructured data. Content applications will blend software and content to seamlessly support users in interactive, read-write experiences.
This document summarizes a presentation about using events and metrics to manage web operations. It discusses how the presenter's company Datadog aggregates and correlates metrics and events data from multiple sources to provide visibility and insights for developers and operations teams. It describes some of the challenges of dealing with large and diverse data streams. It also covers some of the tradeoffs and techniques for managing infrastructure in both on-premise and cloud environments, particularly around networking, storage, and scaling of compute and data resources.
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...South London Geek Nights
The document provides an overview of NoSQL databases, including what NoSQL means, the rise of NoSQL as an alternative to relational databases, different classifications of NoSQL databases, pros and cons, use cases, and real-world examples. It discusses how NoSQL databases provide more flexible schemas and scalability than relational databases for applications like logging, shopping carts, and user preferences, while relational databases remain better for transactions and business critical data. The presenter then demonstrates CouchDB as one example of a NoSQL database.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
Data Integration through Data Virtualization (SQL Server Konferenz 2019)Cathrine Wilhelmsen
Data Integration through Data Virtualization - PolyBase and new SQL Server 2019 Features (Presented at SQL Server Konferenz 2019 on February 21st, 2019)
"Building Data Warehouse with Google Cloud Platform", Artem NikulchenkoFwdays
In this talk, we would explore available options for building Data Warehouse for data-oriented business using Google Cloud Platform. We will start by discussing why Data Warehouse can be needed, move to the differences between "traditional" and Cloud Data Warehouses, and finally discuss steps and options for building your own Data Warehouse.
Making the leap to BI on Hadoop by Mariani, dave @ atscaleTin Ho
This document discusses making the transition to business intelligence (BI) on Hadoop from traditional data warehousing. It notes that traditionally only 3% of data is analyzed due to the complexity of data warehousing which requires multiple copies of data and rigid schemas. Hadoop provides an alternative that allows for schema on read, direct analysis of raw data without transformation, and horizontal scaling without joins. The document demonstrates this through an example analyzing match data from the online game Dota 2, showing how different questions can be answered directly from raw JSON logs stored on Hadoop without loading to a data warehouse.
Data warehousing systems are changing to address new data types and sources. Data is increasingly coming from real-time and non-relational sources as well as the cloud. Data lakes are emerging to handle diverse data in its native format and provide a single storage system. Data factories are being used to orchestrate movement of data between sources and facilitate analytics across data lakes and data warehouses.
Similar to Azure Database Options - NoSql vs Sql (20)
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
1. The name “NoSQL” was in fact first used by Carlo Strozzi in 1998 as the name of file-
based database he was developing. It was a relational database.
Neal Ford coined the term Polyglot Programming in 2006
SABRE, a database still used in the airline industry, predates relational databases
(however, they began using a relational database is 2001)
Ingres and System R were two early relational database prototypes in the 70s
The term "relational database" was invented by E. F. Codd at IBM in 1970
Some claim that “NoSQL” now means “Not Only SQL”, and that it isn’t anti-relational
Rackspace used the term “NoSQL” for a conference in 2009.
Terms: “Relational Winter” gives way to “Database Thaw” (ok, not a fact, but still fun)
2. Which Azure Database
Option Should I Choose? Anne Bougie
Senior Software Developer
Concurrency, Inc.
Twitter: @bougiefever
Understanding Database Options
11. Order
Player: Mojo Jojo
In-App Purchase
Order Id: 1234
Financial
Credit Card: 4012888888881881
Expiration: 5/2020
1 Gems 250 $5
2 Potions 10 $10
Order
Player Assets
Line Items
Credit Card
The object-relational impedance
mismatch is a set of conceptual and
technical difficulties that are often
encountered when a relational database
management system (RDBMS) is being
served by an application program (or
multiple application programs) written in
an object-oriented programming
language or style, particularly because
objects or class definitions must
be mapped to database tables defined by
relational schema.
12. Order
Player: Mojo Jojo
In-App Purchase
Order Id: 1234
Financial
Credit Card: 4012888888881881
Expiration: 5/2020
1 Gems 250 $5
2 Potions 10 $10
Order
Player Assets
Line Items
Credit Card
16. Fast reads & writes
Flat data structure
Schema-less
Partition key
Row key
Time stamp
Service will scale out using partition
key
17. In memory key-value store
Very fast reads (faster than table
storage),
Used as a database, cache and
message broker
Transactions
Expiration of items
18.
19. Stores data in json
Schema-less
Stores complex, hierarchical data
Highly scalable
20.
21. Part of the Hadoop eco system
Commonly augmented with Hive
Can handle very large amounts of writes in a short period of time
22.
23. Nodes and relationships
Data with many complex relationships
Typically used to augment the system of record
27. What are the relationships
How much data, and how fast is it coming in
How the data will be accessed
Data access performance requirements
Consistency/Transactional requirements
Entity complexity
Programmer skill level
28. Lots of data
Simple data
structure
Quickly perform
small read and
write operations
Inexpensive,
fairly simple
Need to add data
items willy nilly
Lots of data
Need high
performance
More complex
data structure
Need to add data
items willy nilly
Lots of data
Insanely huge
amounts of data
Need high
performance
No joins
Lots of data
Lots of
connections
between entities
Quickly changing
relationships
between entities
Key Value Column Family Document Graph
29. Azure Redis Cache
Leaderboards, Shopping Carts
Latest x items of anything
Deletes and filters
Cache
Azure Table Storage
Large amounts of data with a flat structure
Fast querying using the partition and row keys
DocumentDb
Product catalogs, gaming, social networking
Hbase
Voting, Race, anything with huge amounts of data being generated in huge bursts, telemetry
Gremlin
Social networking relationships, anything with complex and changing relationships between
entities
30. Problem
Lots of data
Fairly simple data structure
Lots of small reads and writes
Need high performance
Need high availability
Need fast searching on columns other
than the key
Solution
Use Azure Storage
Augment with Redis Cache for
searching
31. Problem
Really need high consistency
Highly structured data
Current data is not really large
Historical data is huge
And we need to report on historical
data
Solution
Use Azure Sql
Archive to DocumentDb
33. Slides on GitHub https://github.com/Bougiefever/AzureNoSqlDataPrimer
A Newbie Guide to Databases https://blog.appdynamics.com/engineering/a-newbie-
guide-to-databases/
That NoSQL Thing: Column (Family) Databases https://ayende.com/blog/4500/that-
no-sql-thing-column-family-databases
NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence
http://a.co/1YwtJ47
On Sharding Graph Databases http://jimwebber.org/2011/02/on-sharding-graph-
databases/
Azure Cosmos DB Documentation https://docs.microsoft.com/en-us/azure/cosmos-db/
Anne Bougie
anne.bougie@gmail.com
@bougiefever
http://www.bougiefever.com
https://github.com/Bougiefever
Editor's Notes
What I.m going to cover today is an overview of several of the types of storage options in Azure, how they compare to one another.
In the past, before my time even, there were many ways to store data. Punch cards, paper reels, dewey decimal system.
In the 80s, Relation Databases became popular, and have remained so, and will continue to be. In fact, RDBMSs have been so popular, that the choice hasn’t been which data storage option to choose, but how are we going to set up our tables, where are the foreign keys, indexes and so on.
Talk about paper machine telemetry.
Since NoSQL covers such a broad range of data storage types, it really doesn't have a clear-cut definition. Set's start out by listing some of the words we use to describe them.
On the SQL side, we talk about schema, table relationships, transactions, enforcing data and referential integrity
On the NoSQL side, you hear a lot of chatter about no schemas - this is considered one of the many benefits of NoSQL - Scalability, partitioning - and you may have heard about the fast data access
Talk about Schema-less vs schema here!
Today, however, with NoSQL offerings, that choice is not as clear cut. Designing the data storage has never been a clear cut, easy task, and it’s even more complex now.
So, hopefully I can shed some light on what options there are, and what the trade-offs are between choosing one over another.
Since Relational database have been so popular, let's try to define what NoSQL is.
if you starting a new strategic enterprise application you should no longer be assuming that your persistence should be relational. The relational option might be the right one - but you should seriously look at other alternatives.
Since NoSQL is not that easy to define, I thought it made sense to compare some of the features against one another.
SQL - Bubbles/NoSQL Buttercup
Querying - (SQL) very strong query support/joining - you can even join tables in different database - RDBMS is an integration tool
NoSQL side - There are tools to help with indexing and querying, but the tools themselves fall short in this area.
This may or may not be a problem, depending on your application
Transactions - Relational databases are the golden standard. That's not to say that NoSQL doesn't have transactional support, but Bubbles is the clear winner. Redid Cache has no concept of a transaction.
Scalability - NoSQL it's one of the biggest reasons No SQL was invented. Sharding is something that is planned for when you first set up your database - you must provide a partition key.
Configuration/Management - (NoSQL) Had to go with NoSQL. Adding or dropping fields is as simple as either saving or not saving them in your code. Both Azure SQL and DocumentDb offer geo-redundancy and failover, but DocDb does it more gracefully and without any changes to your apps to reflect the change
Schema - If you want strong control over your data at the database level, clearly Bubbles is the winner. However, if you want the flexibility that schema-less has, then Buttercup wins.
Talk about schematic vs schema-less
Speed - Again, one of the reasons it was invented was to deliver fast speeds when you have lots and lots of data. We all know SQL queries against many millions of rows are slow, even with indexing.
Scalability - So - we have our data on our servers in the cloud and we need to scale.
social networks, activity in logs, mapping data.
In SQL Server, our only option, really, is to scale up to a larger, faster server. There are some horizontal scaling options, but it is tricky to set up and maintain.
(show)
(NoSql) Partitioning and scaling horizontally is very easy to do, and after setting up the partition key, you really don't have to do much at all - That being said, you need to think a lot about what you choose for your partition key, because a wrong choice will mean poor performance. Entities that are retrieved together need to be in the same partition so you don't end up hitting
Scalability - So - we have our data on our servers in the cloud and we need to scale.
social networks, activity in logs, mapping data.
In SQL Server, our only option, really, is to scale up to a larger, faster server. There are some horizontal scaling options, but it is tricky to set up and maintain.
(show)
(NoSql) Partitioning and scaling horizontally is very easy to do, and after setting up the partition key, you really don't have to do much at all - That being said, you need to think a lot about what you choose for your partition key, because a wrong choice will mean poor performance. Entities that are retrieved together need to be in the same partition so you don't end up hitting
When object-oriented program is persisted to a RDBMS (relational database management system)
A cohesive set of data in your program gets persisted to several different tables
ORMs created to handle this problem
Order – we think of an order as one thing, but Order/LineItem – when we save it, we need to split it out
When object-oriented program is persisted to a RDBMS (relational database management system)
A cohesive set of data in your program gets persisted to several different tables
ORMs created to handle this problem
Order – we think of an order as one thing, but Order/LineItem – when we save it, we need to split it out
Talk about schema/not lack of stucture
Document: DocumentDb/MongoDb/CouchDb
Json – Complex hierarchical structures – can pull out parts of document
------
KeyValue: Azure Table Storage/SimpleDb/DynamoDb/Riak
Simpleist – Glorified Dictionary/Hashmap
Redis Cache/Redis/Memcached
ColumnFamily: Azure Hbase/Apache Hbase/Cassandra (Apache)/BigTable
Graph: Neo4J/OrientDb
Azure Table Storage/Azure Redis Cache/DynamoDb/Riak
KeyValue: Azure Table Storage/SimpleDb/DynamoDb/Riak
Simpleist – Glorified Dictionary/Hashmap
No set schema – need structure – Implicit Schema to get info out and to deserialize
Partition along partition key
Azure Storage - Retrieve data with partition key and row key
Redis Cache – Retrieve data with key
500 TB
1MB each row
252 properties each table
Edm.Binary.
Edm.Boolean.
Edm.DateTime.
Edm.Double.
Edm.Guid.
Edm.Int32.
Edm.Int64.
Edm.String.
PartitionKeys and RowKeys Drive Performance and Scalability
Memcached
CouchDb,MongoDb
Similar to key/value, but may store complex objects
Documents in a document store are roughly equivalent to the programming concept of an object.
No set schema – need structure – Implicit Schema to get info out and to deserialize
Document size 2MB
Types of data: Json
ColumnFamily: Azure Hbase/Apache Hbase/Cassandra (Apache)/BigTable
In column-oriented NoSQL database, data is stored in cells grouped in columns of data rather than as rows of data. Columns are logically grouped into column families. Column families can contain a virtually unlimited number of columns that can be created at runtime or the definition of the schema. Read and write is done using columns rather than rows
Instead of calling them tables, they are called columns. Unlike a table, however, the only thing that you define in a column family is the name and the key sort options (there is no schema).
A value is a tuple consisting of a name, a value, and a timestamp.
More complicated
Row key – can store multiple column families – columns that fit together
Retrieve by row key and column family name
Open source Built on Hadoop – modeled after Google’s BigTable
framework that supports the processing and storage of extremely large data sets in a distributed computing environment
Hive provides a database query interface to Apache Hadoop.
Imagine you had a file that was larger than your PC's capacity. You could not store that file, right? Hadoop lets you store files bigger than what can be stored on one particular node or server. So you can store very, very large files. It also lets you store many, many files.“
Move data to server
Node and relationships – relationships are first-class citizens of db
Inbound/outbound -
Movie
Club
Customer with orders – products
Find all customer’s who’ve ordered a particular product – lots of work in traditional db
In graph – you have an order node, look at inbound relationship, connect to customer nodes
Typically used
Strong consistency is only guaranteed within a geo region, across geo regions the consistency guarantees are weaker. So the CAP statement theorem is only overcome within a certain geographic scope.
Consistent: Every read receives the most recent write
Available: Every request receives a response, but it may not be the most recent
Partition Tolerance: The system continues to operate despite dropped requests between nodes
Relationships: Data that has a lot of relationships don’t do as well in relational – Graph can get data with lots of connections faster than lots of joining
Scaling needs: FB – Column family database
Complexity – DocumentDb with json especially if you are going to serialize to json anyway
ORM has helped a lot – performance is often a problem
CFDB is what happens when you take a database, strip everything away that make it hard to run in on a cluster and see what happens. (Ayende)
It always sounds easy - "use the best tool for the job". With very isolated systems, it's easy to decide RDBMS for one application, Redis for another and Cassandra for somethings else. When it comes time to building systems with multiple persistent stores, we're met with challenges in integration, existing applications, and push back from IT administrators.