Data analysts, data engineers, and application developers are supporting unprecedented rates of change, whether talking about latency requirements to the expanding arena of data usage scenarios. While the technology functionality must rapidly evolve to meet customer needs and respond to competitive pressures, how can we enhance the data platform to help manage this unpredictability?
To help address these realities, data practitioners from a diverse set of backgrounds are increasingly relying on schema-free, distributed, scalable, and high-performance data storage (also known as NoSQL databases). In this session, we will showcase a wide variety of customer scenarios, business goals, and technical challenges faced by real-world customers. More importantly, how adding Azure DocumentDB into a data practitioner's arsenal within the Microsoft/Azure data ecosystem will allow you to easily solve these complex design patterns at massive scale.
DocumentDB is a fast, globally distributed, multi-model NoSQL database service. It provides automatic scaling of storage and throughput, high availability across regions, flexible data models, and developer productivity with support for SQL and JavaScript queries. Customers can use DocumentDB for building scalable applications that need to handle large volumes of data across any number of regions worldwide with low latency and high availability.
Introducing Azure DocumentDB - NoSQL, No ProblemAndrew Liu
Application developers support unprecedented rates of change – functionality must rapidly evolve to meet changing customer needs and to respond to competitive pressures while user populations can grow dramatically and unpredictably. To address these realities, developers are selecting document-oriented databases for schema flexibility, scalability and high performance data storage.
In this session, we will get hands on with Azure’s NoSQL document database service. Azure DocumentDB offers full indexing of JSON documents, SQL query capabilities and multi-document transactions. Learn how to get started with Azure DocumentDB and hear about some of the recent improvements to the service.
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced FeaturesAndrew Liu
Let's talk about how you can get the most out of Azure DocumentDB. In this session we will dive deep into the mechanics of DocumentDB and explain the various levers available to tune performance and scale. From partitioned collections to global databases to advanced indexing and query features - this session will equip you with the best practices and nuggets of information that will become invaluable tools in your toolbox for building blazingly fast large-scale applications.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
When it comes time to select database software for your project, there are a bewildering number of choices. How do you know if your project is a good fit for a relational database, or whether one of the many NoSQL options is a better choice?
In this webinar you will learn when to use MongoDB and how to evaluate if MongoDB is a fit for your project. You will see how MongoDB's flexible document model is solving business problems in ways that were not previously possible, and how MongoDB's built-in features allow running at scale.
Topics covered include:
Performance and Scalability
MongoDB's Data Model
Popular MongoDB Use Cases
Customer Stories
Are you in the process of evaluating or migrating to MongoDB? We will cover key aspects of migrating to MongoDB from a RDBMS, including Schema design, Indexing strategies, Data migration approaches as your implementation reaches various SDLC stages, Achieving operational agility through MongoDB Management Services (MMS).
The document discusses migrating from an RDBMS to MongoDB. It covers determining if a migration is worthwhile based on evaluating current pain points and target value. It also discusses the roles and responsibilities that will change during a migration, including data architects, developers, DBAs and more. Bulk migration techniques are reviewed including using mongoimport to import JSON data. System cutover is also mentioned as an important part of the migration process.
DocumentDB is a fast, globally distributed, multi-model NoSQL database service. It provides automatic scaling of storage and throughput, high availability across regions, flexible data models, and developer productivity with support for SQL and JavaScript queries. Customers can use DocumentDB for building scalable applications that need to handle large volumes of data across any number of regions worldwide with low latency and high availability.
Introducing Azure DocumentDB - NoSQL, No ProblemAndrew Liu
Application developers support unprecedented rates of change – functionality must rapidly evolve to meet changing customer needs and to respond to competitive pressures while user populations can grow dramatically and unpredictably. To address these realities, developers are selecting document-oriented databases for schema flexibility, scalability and high performance data storage.
In this session, we will get hands on with Azure’s NoSQL document database service. Azure DocumentDB offers full indexing of JSON documents, SQL query capabilities and multi-document transactions. Learn how to get started with Azure DocumentDB and hear about some of the recent improvements to the service.
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced FeaturesAndrew Liu
Let's talk about how you can get the most out of Azure DocumentDB. In this session we will dive deep into the mechanics of DocumentDB and explain the various levers available to tune performance and scale. From partitioned collections to global databases to advanced indexing and query features - this session will equip you with the best practices and nuggets of information that will become invaluable tools in your toolbox for building blazingly fast large-scale applications.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
When it comes time to select database software for your project, there are a bewildering number of choices. How do you know if your project is a good fit for a relational database, or whether one of the many NoSQL options is a better choice?
In this webinar you will learn when to use MongoDB and how to evaluate if MongoDB is a fit for your project. You will see how MongoDB's flexible document model is solving business problems in ways that were not previously possible, and how MongoDB's built-in features allow running at scale.
Topics covered include:
Performance and Scalability
MongoDB's Data Model
Popular MongoDB Use Cases
Customer Stories
Are you in the process of evaluating or migrating to MongoDB? We will cover key aspects of migrating to MongoDB from a RDBMS, including Schema design, Indexing strategies, Data migration approaches as your implementation reaches various SDLC stages, Achieving operational agility through MongoDB Management Services (MMS).
The document discusses migrating from an RDBMS to MongoDB. It covers determining if a migration is worthwhile based on evaluating current pain points and target value. It also discusses the roles and responsibilities that will change during a migration, including data architects, developers, DBAs and more. Bulk migration techniques are reviewed including using mongoimport to import JSON data. System cutover is also mentioned as an important part of the migration process.
This document summarizes a MongoDB webinar on advanced schema design patterns. It introduces common schema design patterns like attribute, subset, computed, and approximation patterns. It discusses how to use these patterns to address issues like large documents with many fields, working sets that don't fit in RAM, high CPU usage from repeated calculations, and changing schemas over time. The webinar provides examples of each pattern and encourages learning a common vocabulary for designing MongoDB schemas by applying these reusable patterns.
This document discusses how to achieve scale with MongoDB. It covers optimization tips like schema design, indexing, and monitoring. Vertical scaling involves upgrading hardware like RAM and SSDs. Horizontal scaling involves adding shards to distribute load. The document also discusses how MongoDB scales for large customers through examples of deployments handling high throughput and large datasets.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Topics covered include:
- Scaling Vertically
- Hardware Considerations
- Index Optimization
- Schema Design
- Sharding
Webinar: Choosing the Right Shard Key for High Performance and ScaleMongoDB
Read these webinar slides to learn how selecting the right shard key can future proof your application.
The shard key that you select can impact the performance, capability, and functionality of your database.
- MongoDB is well-suited for systems of engagement that have demanding real-time requirements, diverse and mixed data sets, massive concurrency, global deployment, and no downtime tolerance.
- It performs well for workloads with mixed reads, writes, and updates and scales horizontally on demand. However, it is less suited for analytical workloads, data warehousing, business intelligence, or transaction processing workloads.
- MongoDB shines for use cases involving single views of data, mobile and geospatial applications, real-time analytics, catalogs, personalization, content management, and log aggregation. It is less optimal for workloads requiring joins, full collection scans, high-latency writes, or five nines u
Determining the root cause of performance issues is a critical task for Operations. In this webinar, we'll show you the tools and techniques for diagnosing and tuning the performance of your MongoDB deployment. Whether you're running into problems or just want to optimize your performance, these skills will be useful.
Webinar: Schema Patterns and Your Storage EngineMongoDB
How do MongoDB’s different storage options change the way you model your data?
Each storage engine, WiredTiger, the In-Memory Storage engine, MMAP V1 and other community supported drivers, persists data differently, writes data to disk in different formats and handles memory resources in different ways.
This webinar will go through how to design applications around different storage engines based on your use case and data access patterns. We will be looking into concrete examples of schema design practices that were previously applied on MMAPv1 and whether those practices still apply, to other storage engines like WiredTiger.
Topics for review: Schema design patterns and strategies, real-world examples, sizing and resource allocation of infrastructure.
To understand how to make your application fast, it's important to understand what makes the database fast. We will take a detailed look at how to think about performance, and how different choices in schema design affect your cluster performances depending on storage engines used and physical resources available.
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
How do you determine whether your MongoDB Atlas cluster is over provisioned, whether the new feature in your next application release will crush your cluster, or when to increase cluster size based upon planned usage growth? MongoDB Atlas provides over a hundred metrics enabling visibility into the inner workings of MongoDB performance, but how do apply all this information to make capacity planning decisions? This presentation will enable you to effectively analyze your MongoDB performance to optimize your MongoDB Atlas spend and ensure smooth application operation into the future.
CyberAgent is a leading Internet company in Japan focused on smartphone social communities and a game platform known as Ameba, which has 40M users. In this presentation, we will introduce how we use HBase for storing social graph data and as a basis for ad systems, user monitoring, log analysis, and recommendation systems.
This document discusses best practices for using DynamoDB for game data, including tips for indexing, scaling, data modeling, and real-world use cases. It provides examples of using local secondary indexes (LSI) and global secondary indexes (GSI) to query game data efficiently. It also recommends modeling time series data with separate tables per time period. The document concludes with an overview of how Nexon Korea uses DynamoDB for mobile game databases.
The document provides an overview of a webinar on transitioning from SQL to MongoDB. It introduces the presenter Buzz Moschetti and his background. It then discusses how developers currently spend their time integrating with different components and systems like databases, and how the mismatch between data at the business level versus the database level has been a long-standing problem. The document uses examples to show how MongoDB can help by allowing richer data structures and a more direct match between data in code and the database.
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionJoão Gabriel Lima
This document discusses combining Apache Spark and MongoDB for real-time analytics. It provides an overview of MongoDB's native analytics capabilities including querying, data aggregation, and indexing. It then discusses how Apache Spark can extend these capabilities by providing additional analytics functions like machine learning, SQL queries, and streaming. Combining Spark and MongoDB allows organizations to perform real-time analytics on operational data without needing separate analytics infrastructure.
DynamoDB is a scalable NoSQL database service provided by Amazon that allows developers to purchase throughput rather than storage. It automatically spreads data and traffic across servers and SSDs for predictable performance. While it does not automatically scale, administrators can request more throughput. DynamoDB integrates with other AWS services like EMR for Hadoop and Redshift for data warehousing.
This document provides an overview of MongoDB, a popular NoSQL database. It discusses key features of MongoDB like its schemaless and document-oriented data model. It also covers how MongoDB supports high availability through replica sets and horizontal scaling through sharding. The document aims to help developers understand how MongoDB works and when it may be suitable for different use cases.
This document discusses common use cases for MongoDB and why it is well-suited for them. It describes how MongoDB can handle high volumes of data feeds, operational intelligence and analytics, product data management, user data management, and content management. Its flexible data model, high performance, scalability through sharding and replication, and support for dynamic schemas make it a good fit for applications that need to store large amounts of data, handle high throughput of reads and writes, and have low latency requirements.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
The document discusses NoSQL databases and CouchDB. It provides an overview of NoSQL, the different types of NoSQL databases, and when each type would be used. It then focuses on CouchDB, explaining its features like document centric modeling, replication, and fail fast architecture. Examples are given of how to interact with CouchDB using its HTTP API and tools like Resty.
Test driving Azure Search and DocumentDBAndrew Siemer
This document provides an overview and comparison of DocumentDB and Azure Search. It discusses what NoSQL and search are, when each service is better to use, how to set up and structure data in each, and examples of querying. DocumentDB is described as a NoSQL database that uses a flexible JSON document structure and scales easily. Azure Search is an elastic search service that indexes and scores search results. The document provides examples of setting up databases and indexes, adding and querying data, and considerations for different field types and scoring profiles. It also discusses where each service may fit in different parts of an application architecture.
This document summarizes a MongoDB webinar on advanced schema design patterns. It introduces common schema design patterns like attribute, subset, computed, and approximation patterns. It discusses how to use these patterns to address issues like large documents with many fields, working sets that don't fit in RAM, high CPU usage from repeated calculations, and changing schemas over time. The webinar provides examples of each pattern and encourages learning a common vocabulary for designing MongoDB schemas by applying these reusable patterns.
This document discusses how to achieve scale with MongoDB. It covers optimization tips like schema design, indexing, and monitoring. Vertical scaling involves upgrading hardware like RAM and SSDs. Horizontal scaling involves adding shards to distribute load. The document also discusses how MongoDB scales for large customers through examples of deployments handling high throughput and large datasets.
Has your app taken off? Are you thinking about scaling? MongoDB makes it easy to horizontally scale out with built-in automatic sharding, but did you know that sharding isn't the only way to achieve scale with MongoDB?
In this webinar, we'll review three different ways to achieve scale with MongoDB. We'll cover how you can optimize your application design and configure your storage to achieve scale, as well as the basics of horizontal scaling. You'll walk away with a thorough understanding of options to scale your MongoDB application.
Topics covered include:
- Scaling Vertically
- Hardware Considerations
- Index Optimization
- Schema Design
- Sharding
Webinar: Choosing the Right Shard Key for High Performance and ScaleMongoDB
Read these webinar slides to learn how selecting the right shard key can future proof your application.
The shard key that you select can impact the performance, capability, and functionality of your database.
- MongoDB is well-suited for systems of engagement that have demanding real-time requirements, diverse and mixed data sets, massive concurrency, global deployment, and no downtime tolerance.
- It performs well for workloads with mixed reads, writes, and updates and scales horizontally on demand. However, it is less suited for analytical workloads, data warehousing, business intelligence, or transaction processing workloads.
- MongoDB shines for use cases involving single views of data, mobile and geospatial applications, real-time analytics, catalogs, personalization, content management, and log aggregation. It is less optimal for workloads requiring joins, full collection scans, high-latency writes, or five nines u
Determining the root cause of performance issues is a critical task for Operations. In this webinar, we'll show you the tools and techniques for diagnosing and tuning the performance of your MongoDB deployment. Whether you're running into problems or just want to optimize your performance, these skills will be useful.
Webinar: Schema Patterns and Your Storage EngineMongoDB
How do MongoDB’s different storage options change the way you model your data?
Each storage engine, WiredTiger, the In-Memory Storage engine, MMAP V1 and other community supported drivers, persists data differently, writes data to disk in different formats and handles memory resources in different ways.
This webinar will go through how to design applications around different storage engines based on your use case and data access patterns. We will be looking into concrete examples of schema design practices that were previously applied on MMAPv1 and whether those practices still apply, to other storage engines like WiredTiger.
Topics for review: Schema design patterns and strategies, real-world examples, sizing and resource allocation of infrastructure.
To understand how to make your application fast, it's important to understand what makes the database fast. We will take a detailed look at how to think about performance, and how different choices in schema design affect your cluster performances depending on storage engines used and physical resources available.
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
How do you determine whether your MongoDB Atlas cluster is over provisioned, whether the new feature in your next application release will crush your cluster, or when to increase cluster size based upon planned usage growth? MongoDB Atlas provides over a hundred metrics enabling visibility into the inner workings of MongoDB performance, but how do apply all this information to make capacity planning decisions? This presentation will enable you to effectively analyze your MongoDB performance to optimize your MongoDB Atlas spend and ensure smooth application operation into the future.
CyberAgent is a leading Internet company in Japan focused on smartphone social communities and a game platform known as Ameba, which has 40M users. In this presentation, we will introduce how we use HBase for storing social graph data and as a basis for ad systems, user monitoring, log analysis, and recommendation systems.
This document discusses best practices for using DynamoDB for game data, including tips for indexing, scaling, data modeling, and real-world use cases. It provides examples of using local secondary indexes (LSI) and global secondary indexes (GSI) to query game data efficiently. It also recommends modeling time series data with separate tables per time period. The document concludes with an overview of how Nexon Korea uses DynamoDB for mobile game databases.
The document provides an overview of a webinar on transitioning from SQL to MongoDB. It introduces the presenter Buzz Moschetti and his background. It then discusses how developers currently spend their time integrating with different components and systems like databases, and how the mismatch between data at the business level versus the database level has been a long-standing problem. The document uses examples to show how MongoDB can help by allowing richer data structures and a more direct match between data in code and the database.
Apache Spark and MongoDB - Turning Analytics into Real-Time ActionJoão Gabriel Lima
This document discusses combining Apache Spark and MongoDB for real-time analytics. It provides an overview of MongoDB's native analytics capabilities including querying, data aggregation, and indexing. It then discusses how Apache Spark can extend these capabilities by providing additional analytics functions like machine learning, SQL queries, and streaming. Combining Spark and MongoDB allows organizations to perform real-time analytics on operational data without needing separate analytics infrastructure.
DynamoDB is a scalable NoSQL database service provided by Amazon that allows developers to purchase throughput rather than storage. It automatically spreads data and traffic across servers and SSDs for predictable performance. While it does not automatically scale, administrators can request more throughput. DynamoDB integrates with other AWS services like EMR for Hadoop and Redshift for data warehousing.
This document provides an overview of MongoDB, a popular NoSQL database. It discusses key features of MongoDB like its schemaless and document-oriented data model. It also covers how MongoDB supports high availability through replica sets and horizontal scaling through sharding. The document aims to help developers understand how MongoDB works and when it may be suitable for different use cases.
This document discusses common use cases for MongoDB and why it is well-suited for them. It describes how MongoDB can handle high volumes of data feeds, operational intelligence and analytics, product data management, user data management, and content management. Its flexible data model, high performance, scalability through sharding and replication, and support for dynamic schemas make it a good fit for applications that need to store large amounts of data, handle high throughput of reads and writes, and have low latency requirements.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
The document discusses NoSQL databases and CouchDB. It provides an overview of NoSQL, the different types of NoSQL databases, and when each type would be used. It then focuses on CouchDB, explaining its features like document centric modeling, replication, and fail fast architecture. Examples are given of how to interact with CouchDB using its HTTP API and tools like Resty.
Test driving Azure Search and DocumentDBAndrew Siemer
This document provides an overview and comparison of DocumentDB and Azure Search. It discusses what NoSQL and search are, when each service is better to use, how to set up and structure data in each, and examples of querying. DocumentDB is described as a NoSQL database that uses a flexible JSON document structure and scales easily. Azure Search is an elastic search service that indexes and scores search results. The document provides examples of setting up databases and indexes, adding and querying data, and considerations for different field types and scoring profiles. It also discusses where each service may fit in different parts of an application architecture.
This is an excerpt of the "Tier-1 BI in the World of Big Data" by Thomas Kejser, Denny Lee, and Kenneth Lieu specific to the Yahoo! TAO Case Study published at: http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000001707
SQL Server vs. Azure DocumentDB – Ein Battle zwischen XML und JSONSascha Dittmann
Seit dem SQL Server 2000 hielt Stück für Stück die XML-Unterstützung Einzug in die Microsoft RDBMS Welt.
Mit der Azure DocumentDB kam die zweite, hauseigene NoSQL-Datenbank in der Microsoft Cloud hinzu, welche die Daten im JSON-Format verarbeitet.
In dieser Session werden wir anhand eines Praxisbeispiels step-by-step, d.h. von den vorbereitenden Schritten, über das Schreiben bis hin zum Lesen der Daten, diese beiden Technologien gegenüberstellen.
Dabei arbeiten wir die Vor- und Nachteile der einzelnen Ansätze heraus und zeigen Best Practices auf.
The document discusses DocumentDB, a NoSQL database service. It covers the three V's of data today - variety, velocity, and volume. It also discusses some key features of DocumentDB like its flexible schema, fast performance, scalability to high volumes, and support for queries on JSON documents. Examples of common uses of DocumentDB are given like product catalogs, game data, sensor data from IoT, and social analytics.
This document provides an overview of Azure DocumentDB presented by Marco Parenzan. Key points include:
- DocumentDB is a fully managed NoSQL database that is schema-agnostic and scalable.
- It allows for tunable consistency levels and indexing policies. Queries use a familiar SQL syntax.
- Documents are stored in JSON format across collections within databases.
- DocumentDB has appeal for developers as documents map directly to JSON and objects, requiring no ORM.
- The document discusses strategies like using view models and data normalization versus embedding versus referencing.
- It also covers the resource model, performance, partitioning, indexing, querying, and programmability features of DocumentDB
Azure DocumentDB is a fully managed NoSQL document database by Microsoft that stores data as JSON documents. It offers high scalability, availability, and performance. The .NET API provides asynchronous methods for CRUD operations on DocumentDB resources like databases, collections, and documents. Queries can be performed using SQL or LINQ and results are returned as .NET objects or in a paged feed. DocumentDB is currently in preview and accessible via the Azure portal.
DocumentDB is a powerful NoSQL solution. It provides elastic scale, high performance, global distribution, a flexible data model, and is fully managed. If you are looking for a scaled OLTP solution that is too much for SQL Server to handle (i.e. millions of transactions per second) and/or will be using JSON documents, DocumentDB is the answer.
Denny Lee introduced Azure DocumentDB, a fully managed NoSQL database service. DocumentDB provides elastic scaling of throughput and storage, global distribution with low latency reads and writes, and supports querying JSON documents with SQL and JavaScript. Common scenarios that benefit from DocumentDB include storing product catalogs, user profiles, sensor telemetry, and social graphs due to its ability to handle hierarchical and de-normalized data at massive scale.
Quick trip around the Cosmos - Things every astronaut supposed to knowRafał Hryniewski
Slides for my talk which overviews new(ish) product of Microsoft - multi-model, cloud database known as CosmosDB.
Recorded talk (in Polish) is available here: https://youtu.be/ZWpJne0kcds?t=1h52m45s
The document discusses data partitioning and distribution across multiple machines in a cluster. It explains that data replication does not scale well, but data partitioning, where each record exists on only one machine, allows write latency to scale with the number of machines in the cluster. Coherence provides a distributed cache that partitions data and offers functions for server-side processing near the data through tools like entry processors.
Let's make a brief introduction to Azure Data eXplorer, with many examples using Kusto dialect and C# client.
With a particular focus on IIoT contexts and proces control data, let's discover how to implement time series analysis in terms of pattern recognition, and trend correlation.
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Andre Essing
This document summarizes an introduction presentation about Azure Cosmos DB. It discusses key aspects of Cosmos DB including that it is a globally distributed, massively scalable database that supports multiple data models. It also covers request units, partitioning, indexing, consistency models, and other architectural aspects that allow Cosmos DB to elastically scale storage and throughput worldwide.
Tour de France Azure PaaS 3/7 Stocker des informationsAlex Danvy
3 possibilités de stocker des données dans Azure :
- Evolution : Le compte de stockage est plus que jamais essentiel. Bien que basic, il ne cesse d'évoluer.
- Innovation : Le Cloud permet d'imaginer de nouveaux scénarios mettant à rude épreuve les technologies de stockage. Il faut parfois en inventer de nouvelles : Cosmos DB
- Open Source : S'il est possible de faire fonctionner les solutions Open Source dans des VM, celle n'apporte que très rarement de la valeur. Autant en laisser la gestion au fournisseur de Cloud. MySQL, PostegreSQL et Maria DB sont maintenant disponibles sous la forme de service managé.
TechEd NZ 2014 - DCIM211 - Aben Samuel
This session with take IT Pros, Managers through various aspects of Azure, but with a focus on SharePoint and how organizations should be looking at Azure with regards to: 1. Hybrid Approach 2. Complete Warm SharePoint Platform 3. Disaster Recovery , Business Continuity The session would also look into some of the newer features that have been made available recently and also look into some of the experiences with deploying SharePoint implementations on Azure.
Dyn delivers exceptional Internet Performance. Enabling high quality services requires data centers around the globe. In order to manage services, customers need timely insight collected from all over the world. Dyn uses DataStax Enterprise (DSE) to deploy complex clusters across multiple datacenters to enable sub 50 ms query responses for hundreds of billions of data points. From granular DNS traffic data, to aggregated counts for a variety of report dimensions, DSE at Dyn has been up since 2013 and has shined through upgrades, data center migrations, DDoS attacks and hardware failures. In this webinar, Principal Engineers Tim Chadwick and Rick Bross cover the requirements which led them to choose DSE as their go-to Big Data solution, the path which led to SPARK, and the lessons that we’ve learned in the process.
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics
This document discusses using DDN's parallel file systems to improve the performance of kdb+ analytics queries on large datasets. Running kdb+ on a parallel file system can significantly reduce query latency by distributing data and queries across multiple file system servers. This allows queries to achieve near linear speedups as more servers are added. The shared namespace also allows multiple independent kdb+ instances to access the same consolidated datasets.
Docker is an open platform for developers and system administrators to build, ship and run distributed applications. Using Docker, companies in Jordan have been able to build powerful system architectures that allow speeding up delivery, easing deployment processes and at the same time cutting major hosting costs.
Osama Jaber shares his experience at ArabiaWeather in how they moved away from AWS to a highly-redundant, high-performance and low-cost solution using docker and other open-source technologies.
(BAC404) Deploying High Availability and Disaster Recovery Architectures with...Amazon Web Services
The document discusses disaster recovery strategies for AWS including backup and restore, pilot light, and warm standby approaches. It provides examples of architectures using these approaches including replicating databases across Availability Zones and regions for high availability and disaster recovery. CloudFormation templates are shown that can automate the deployment of load balanced auto-scaled web servers across Availability Zones for disaster recovery.
This document provides an overview of Amazon Redshift presented by Pavan Pothukuchi and Chris Liu. The agenda includes an introduction to Redshift, its benefits, use cases, and Coursera's experience using Redshift. Some key benefits highlighted are that Redshift is fast, inexpensive, fully managed, secure, and innovates quickly. Example use cases from NTT Docomo and Nasdaq are discussed. Chris Liu then discusses Coursera's experience moving from no data warehouse to using Redshift over three years, including their current ecosystem involving Redshift, other AWS services, and business intelligence applications. Lessons learned around thinking in Redshift, communicating with users, surprises, and reflections are also shared.
Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build
and run applications that work with highly connected datasets. The core of Neptune is a purpose-built,
high-performance graph database engine. This engine is optimized for storing billions of relationships
and querying the graph with milliseconds latency. Neptune supports the popular graph query languages
Apache TinkerPop Gremlin, the W3C’s SPARQL, and Neo4j's openCypher, enabling you to build
queries that efficiently navigate highly connected datasets. Neptune powers graph use cases such as
recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security Neptune is highly available, with read replicas, point-in-time recovery, continuous backup to Amazon
S3, and replication across Availability Zones. Neptune provides data security features, with support
for encryption at rest and in transit. Neptune is fully managed, so you no longer need to worry about
database management tasks like hardware provisioning, software patching, setup, configuration, or
backups
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...Red_Hat_Storage
This document discusses the need for storage modernization driven by trends like mobile, social media, IoT and big data. It outlines how scale-out architectures using open source Ceph software can help meet this need more cost effectively than traditional scale-up storage. Specific optimizations for IOPS, throughput and capacity are described. Intel is presented as helping advance the industry through open source contributions and optimized platforms, software and SSD technologies. Real-world examples are given showing the wide performance range Ceph can provide.
Solving enterprise challenges through scale out storage & big compute finalAvere Systems
Google Cloud Platform, Avere Systems, and Cycle Computing experts will share best practices for advancing solutions to big challenges faced by enterprises with growing compute and storage needs. In this “best practices” webinar, you’ll hear how these companies are working to improve results that drive businesses forward through scalability, performance, and ease of management.
The slides were from a webinar presented January 24, 2017. The audience learned:
- How enterprises are using Google Cloud Platform to gain compute and storage capacity on-demand
- Best practices for efficient use of cloud compute and storage resources
- Overcoming the need for file systems within a hybrid cloud environment
- Understand how to eliminate latency between cloud and data center architectures
- Learn how to best manage simulation, analytics, and big data workloads in dynamic environments
- Look at market dynamics drawing companies to new storage models over the next several years
Presenters communicated a foundation to build infrastructure to support ongoing demand growth.
Similar to [PASS Summit 2016] Blazing Fast, Planet-Scale Customer Scenarios with Azure DocumentDB (20)
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
1. Blazing Fast, Planet-Scale
Customer Scenarios with
Azure DocumentDB
Denny Lee
Program Manager
Azure DocumentDB
@dennylee
Andrew Liu
Program Manager
Azure DocumentDB
@aliuy8
11. “If all you have is a hammer, everything looks like a nail“
-Abraham Maslow
12. Choose the right
tools for the right job
SQL
SQL
Server 2016
SQL
Database
Azure
DocumentDB
Azure
Search
Azure
HDInsight
Azure
Data Lake
Azure DW APS
Azure
Stream Analytics
SQL
SQL
Server 2016
Azure
Data Factory
Azure
ML
Azure
Data Catalog
Power BI
SQL
SQL
Server 2016
SQL
Server 2016
SQL
Microsoft Data Platform
13. 3 V’s of data : Endless possibilities
LearningGaming
Retail
Telematics
Mobile Apps
IoT
24. Request Unit (RU) is the
normalized currency
%
Memory
%
IOPS
%
CPU
Replica gets a fixed budget
of Request Units
Resource
Resource
set
Resource
Resource
DocumentsSQL
sprocs
args
Resource Resource
Predictable Performance
Request units
30. … with well-defined consistency models!
Bounded
Staleness
Sessio
n
EventualStrong
LEFT TO RIGHT Relaxed consistency => better performance and availability
Consistency Level Strong Bounded Staleness Session Eventual
Total global order Yes Yes, outside of the “staleness
window”
No, partial “session” order No
Consistent prefix
guarantee
Yes Yes Yes Yes
Monotonic reads Yes Yes, across regions outside of the
staleness window and within a region
all the time
Yes, for the given session No
Monotonic writes Yes Yes Yes Yes
Read your writes Yes Yes (in the write region) Yes No
27%
3%
54%
16%
Observed Distribution
BoundedStaleness
Eventual
Session
Strong
44. Retail
• Product Catalog
• Product Recommendations + Personalization
Gaming
• Multiplayer + Social Gameplay
IoT / Sensor Data
• Telemetry + Event Store
• Device Registry
Social Analytics + Ad Technology
• User behavior telemetry
• 3rd-Party Data from Web Crawlers
Common scenarios
45. IoT / Sensor Data
• Telemetry + Event Store
• Device Registry
Common scenarios
IoT / Sensor Data Challenges:
• Hardware is relatively hard to update
• Different generations of devices
=> different schema
(Variety)
• Lots of sensors emitting telemetry
=> high rate of ingestion
(Volume + Velocity)
48. Common Scenarios
Social Analytics + Ad Technology:
• Ingest + Analyze 3rd-Party Data
=> Who dictates schema? How do you index?
(Variety)
• Lots of social / user profiles
=> high rate of ingestion
(Volume + Velocity)
Social Analytics + Ad Technology
• User behavior telemetry
• 3rd-Party Data from Web Crawlers
49. Social Analytics + Ad Technology
>1B
Social Media
Profiles
>50M
Tweets per Day
50. Social Analytics + Ad Technology
>1B
Social Media
Profiles
>50M
Tweets per Day
Before moving to DocumentDB, my developers would
need to come to me to confirm that our Elasticsearch
deployment would support their data or if I would need
to scale things to handle it. DocumentDB removed me
as a bottleneck, which has been great for me and them.
-Stephen Hankinson, CTO, Affinio
55. Flight Graph with
Spark and DocumentDB
Notebook
View: https://aka.ms/docdb-spark-graph
Code: https://aka.ms/docdb-spark-graph-code
Demo
56. Understanding most important
airport (most flights in / out)
tripGraph.inDegrees
.sort(desc("inDegree"))
.limit(10))
Graph Calculations: Degrees, PageRank
56
57. • Blazing Fast IoT Scenarios
• Updateable columns
• Push-down predicate filtering
Advantages of DocumentDB in Data
Science Scenarios
57
58. Advantages
Blazing Fast IoT Scenarios
58
Flight
information
global safety
alerts
weather
Data Science Scenarios
Device
Notifications
Web / REST API
63. More Resources / Coming Soon
Want to know more about Spark-to-DocumentDB
Connector?
Have any other questions?
64. Session Evaluations
ways to access
Go to passSummit.com Download the GuideBook App
and search: PASS Summit 2016
Follow the QR code link displayed
on session signage throughout the
conference venue and in the
program guide
Submit by 5pm
Friday November 6th to
WIN prizes
Your feedback is
important and valuable. 3
65. Thank You
Learn more from
Azure DocumentDB
askdocdb@microsoft.com or follow @DocumentDB
Editor's Notes
Independently scale storage and throughput. Provisioned throughput guaranteed.
Elastically scale throughput from 100 to 10s of millions of requests/sec
Transparent server side partitioning
Optionally evict old data with TTL
Cheaper than hosted OSS NoSQL databases or DynamoDB
Watch “Predictable performance” module
Write optimized, SSD-based database engine with low latency access
Synchronous and automatic indexing at sustained ingestion rates
Globally distributed with reads and writes served from local region
Watch “Predictable performance” module
Scale across any number of Azure regions
Turn-key high availability with transparent failover
Multi-homing
Well-defined consistency models
Watch “Achieve planet scale with DocumentDB: Multi-region replication”
Rich SQL, JavaScript, MongoDB
Multi-modal: key-values, column family, or documents
No impedance mismatch - JavaScript is the type system
Write business logic entirely in JavaScript with stored procedures and triggers
Integrated multi-document transactions with snapshot isolation
.NET, Java, Node, Python SDKs
Well nested, multiple properties and values
Not word documents
Well nested, multiple properties and values
Today our lives are informed and influenced by lots of data. Data is the new currency. From how we find a ride to what we watch and how we shop, data drives all these experiences.
From the world of structured data for operational needs and analytical needs, today each LOB applications and services we consume use several variety of data.
Variety of data dictates that operational databases become schema agnostic and support schema free data storage.
Not only variety, but volume of data being generated and the applications deriving intelligence from data is increasing.
Velocity of data is also increasing which means applications also expect high throughput for data ingestion and low latency for data retrieval.
Databases that are able to provide intelligence to modern applications from high velocity data of different varieties of data at very high volume become what we call as intellibases.
These design patterns create endless possibilities in all forms of business applications that improve products and services for consumers.
Online retailing experiences, gaming experiences, mobile experiences, vehicle telematics and IoT are some of the key applications that take advantage of the 3V’s of data.
That’s right, you were waiting for the zombies … The Walking Dead is a show about a zombie apocalypse … or ‘walkers’ as they’re referred to in the show.
The Walking Dead No Man’s Land is a mobile game based on this very successful AMC series. No Man’s Land is developed by a game company by the name of Next Games.
Some of you may have heard of Next Games from Scott’s keynote this morning but for those who missed the keynote, I’ll roll the video … > PLAY VIDEO
#### summary of what we just saw
They will help look at a customer’s social media account (e.g. Nike’s Twitter Account) and provide reporting and analytics on the customer’s followers – including segmenting followers (e.g. you have a high number of soccer moms and athletes following you) and analyzing what kind of content is trending amongst each segment of followers. If links are included in the content – they will scrape the link and provide further analysis on any content found in the link. They are using DocumentDB as their data store for storing and querying the scraped content.