Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...NoSQLmatters
When deploying your service to Microsoft Azure, you have a number of options in terms of noSQL: you can install databases on Linux or Windows virtual machines by yourself, or via the marketplace, or you can use open source databases available as a service like HBase or proprietary and managed databases like Document DB. After showing these options, we'll show Document DB in more details. This is a noSQL database as a service that stores JSON.
1er décembre 2015
Groupe Azure
Sujet: Introduction à DocumentDB
Conférencier: Vicent-Philippe Lauzon, Microsoft
Azure DocumentDB est une base de données de type NoSQL. Lors de cette introduction à DocumentDB, vous verrez:
• Ce qu'est une base de données NoSQL
• Comment DocumentDB se compare t-il face aux autres base de données Azure
• Comment DocumentDB se compare t-il face aux autres base de données NoSQL
• Comment créer et gérer une base DocumentDB
• Comment l'utiliser (outils + C#)
• Sécurité
• Performance / Capacité
Vincent-Philippe Lauzon est un Microsoft Azure Solution Architect & Machine Learning / Consultant Sénior chez CGI. Vous pouvez lire son blog http://vincentlauzon.com et le suivre sur Twitter https://twitter.com/vplauzon
Analyze and visualize non-relational data with DocumentDB + Power BISriram Hariharan
The session will show how to do Analyze and visualize non-relational data with DocumentDB + Power BI. We are in the midst of a paradigm shift on how we store and analyze data. Unstructured or flexible schema data represents a large portion of data within an organization. Everyone is obsessed to turn this data into meaningful business information. Unstructured data analytics do not need to be time consuming and complex. Come learn how to analyze and visualize unstructured data in DocumentDB.
Azure DocumentDB for Healthcare IntegrationBizTalk360
In this session,
You will learn what the series is about, and see what we want to accomplish.
For this session you will be learning about Azure DocumentDB, its features and capabilities.
You will learn how to create a DocumentDB database and configure it to support CRUD operations.
You will also learn about the two API’s provided for DocumentDB
You will learn how DocumentDB can be leveraged as a repository for HL7 documents
We will take a look at using DocumentDB with both API and Logic apps
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...NoSQLmatters
When deploying your service to Microsoft Azure, you have a number of options in terms of noSQL: you can install databases on Linux or Windows virtual machines by yourself, or via the marketplace, or you can use open source databases available as a service like HBase or proprietary and managed databases like Document DB. After showing these options, we'll show Document DB in more details. This is a noSQL database as a service that stores JSON.
1er décembre 2015
Groupe Azure
Sujet: Introduction à DocumentDB
Conférencier: Vicent-Philippe Lauzon, Microsoft
Azure DocumentDB est une base de données de type NoSQL. Lors de cette introduction à DocumentDB, vous verrez:
• Ce qu'est une base de données NoSQL
• Comment DocumentDB se compare t-il face aux autres base de données Azure
• Comment DocumentDB se compare t-il face aux autres base de données NoSQL
• Comment créer et gérer une base DocumentDB
• Comment l'utiliser (outils + C#)
• Sécurité
• Performance / Capacité
Vincent-Philippe Lauzon est un Microsoft Azure Solution Architect & Machine Learning / Consultant Sénior chez CGI. Vous pouvez lire son blog http://vincentlauzon.com et le suivre sur Twitter https://twitter.com/vplauzon
Analyze and visualize non-relational data with DocumentDB + Power BISriram Hariharan
The session will show how to do Analyze and visualize non-relational data with DocumentDB + Power BI. We are in the midst of a paradigm shift on how we store and analyze data. Unstructured or flexible schema data represents a large portion of data within an organization. Everyone is obsessed to turn this data into meaningful business information. Unstructured data analytics do not need to be time consuming and complex. Come learn how to analyze and visualize unstructured data in DocumentDB.
Azure DocumentDB for Healthcare IntegrationBizTalk360
In this session,
You will learn what the series is about, and see what we want to accomplish.
For this session you will be learning about Azure DocumentDB, its features and capabilities.
You will learn how to create a DocumentDB database and configure it to support CRUD operations.
You will also learn about the two API’s provided for DocumentDB
You will learn how DocumentDB can be leveraged as a repository for HL7 documents
We will take a look at using DocumentDB with both API and Logic apps
Performance Benchmarking of Key-Value Store NoSQL Databases IJECEIAES
Increasing requirements for scalability and elasticity of data storage for web applications has made Not Structured Query Language NoSQL databases more invaluable to web developers. One of such NoSQL Database solutions is Redis. A budding alternative to Redis database is the SSDB database, which is also a key-value store but is disk-based. The aim of this research work is to benchmark both databases (Redis and SSDB) using the Yahoo Cloud Serving Benchmark (YCSB). YCSB is a platform that has been used to compare and benchmark similar NoSQL database systems. Both databases were given variable workloads to identify the throughput of all given operations. The results obtained shows that SSDB gives a better throughput for majority of operations to Redis’s performance.
An Intro to NoSQL Databases -- NoSQL databases will not become the new dominators. Relational will still be popular, and used in the majority of situations. They, however, will no longer be the automatic choice. (source : http://martinfowler.com/)
Data Virtualization in the Cloud: Accelerating Data Virtualization AdoptionDenodo
This presentation introduces our new product: Denodo Platform for AWS. You will see the current data virtualization landscape, the new cloud deployment options that are being introduced with the Denodo Platform 6.0 and some examples of when it will be useful to deploy Denodo in the cloud.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/PcvHmj.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
This presentation contains a preview of MongoDB 3.2 upcoming release where we explore the new storage engines, aggregation framework enhancements and utility features like document validation and partial indexes.
MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.
First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.
Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.
At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.
Speaker: Ronan Bohan, Solutions Architect, MongoDB
Speaker: Viady Krishnan
Level: 100 (Beginner)
Track: Jumpstart
Get started with the BI connector and Tableau in this introductory session. We will give you insight into how you can view your MongoDB data in traditional BI tools and an overview of connecting Tableau with MongoDB. After attending this session, students should be able connect their analytics tool of choice to a MongoDB data store using the BI connector, secure their client connection, and know how to enable authentication. Audience members should be familiar with analytics tools like Tableau to do business analytics, and know how to set up and run analytics in a BI tool. This session will use Tableau as an example.
This is a Jumpstart session, held before the keynotes, designed to give you an overview of MongoDB basics so you can dive into more advanced technical sessions later in the day.
What You Will Learn:
- How to connect your analytics tool of choice to a MongoDB data store using the BI connector.
- How to view MongoDB data in Tableau or another BI tool.
- How to secure your client connection to MongoDB.
Data is as critical as ever. Storage costs are lower but we have more and more data to store. This is where Microsoft Azure Data Storage solutions come in. This slide deck provides an overview of the most important data storage options available in Azure.
Note: I did not create this deck. I instead combined slides from the Microsoft Azure-Readiness/DevCamp repo on GitHub (https://github.com/Azure-Readiness/DevCamp) while adding additional material from a slide deck of David Chappell's.
This talk was given at Cloud Camp Kitchener 2015.
Performance Benchmarking of Key-Value Store NoSQL Databases IJECEIAES
Increasing requirements for scalability and elasticity of data storage for web applications has made Not Structured Query Language NoSQL databases more invaluable to web developers. One of such NoSQL Database solutions is Redis. A budding alternative to Redis database is the SSDB database, which is also a key-value store but is disk-based. The aim of this research work is to benchmark both databases (Redis and SSDB) using the Yahoo Cloud Serving Benchmark (YCSB). YCSB is a platform that has been used to compare and benchmark similar NoSQL database systems. Both databases were given variable workloads to identify the throughput of all given operations. The results obtained shows that SSDB gives a better throughput for majority of operations to Redis’s performance.
An Intro to NoSQL Databases -- NoSQL databases will not become the new dominators. Relational will still be popular, and used in the majority of situations. They, however, will no longer be the automatic choice. (source : http://martinfowler.com/)
Data Virtualization in the Cloud: Accelerating Data Virtualization AdoptionDenodo
This presentation introduces our new product: Denodo Platform for AWS. You will see the current data virtualization landscape, the new cloud deployment options that are being introduced with the Denodo Platform 6.0 and some examples of when it will be useful to deploy Denodo in the cloud.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/PcvHmj.
In this lecture we analyze document oriented databases. In particular we consider why there are the first approach to nosql and what are the main features. Then, we analyze as example MongoDB. We consider the data model, CRUD operations, write concerns, scaling (replication and sharding).
Finally we presents other document oriented database and when to use or not document oriented databases.
This presentation contains a preview of MongoDB 3.2 upcoming release where we explore the new storage engines, aggregation framework enhancements and utility features like document validation and partial indexes.
MongoDB is a popular NoSQL database. This presentation was delivered during a workshop.
First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases.
Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less.
At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.
Speaker: Ronan Bohan, Solutions Architect, MongoDB
Speaker: Viady Krishnan
Level: 100 (Beginner)
Track: Jumpstart
Get started with the BI connector and Tableau in this introductory session. We will give you insight into how you can view your MongoDB data in traditional BI tools and an overview of connecting Tableau with MongoDB. After attending this session, students should be able connect their analytics tool of choice to a MongoDB data store using the BI connector, secure their client connection, and know how to enable authentication. Audience members should be familiar with analytics tools like Tableau to do business analytics, and know how to set up and run analytics in a BI tool. This session will use Tableau as an example.
This is a Jumpstart session, held before the keynotes, designed to give you an overview of MongoDB basics so you can dive into more advanced technical sessions later in the day.
What You Will Learn:
- How to connect your analytics tool of choice to a MongoDB data store using the BI connector.
- How to view MongoDB data in Tableau or another BI tool.
- How to secure your client connection to MongoDB.
Data is as critical as ever. Storage costs are lower but we have more and more data to store. This is where Microsoft Azure Data Storage solutions come in. This slide deck provides an overview of the most important data storage options available in Azure.
Note: I did not create this deck. I instead combined slides from the Microsoft Azure-Readiness/DevCamp repo on GitHub (https://github.com/Azure-Readiness/DevCamp) while adding additional material from a slide deck of David Chappell's.
This talk was given at Cloud Camp Kitchener 2015.
A really quick introduction to Microsoft Azure Storage and all of its services. It's one of the core components of Azure and it's really important to understand it if you want to "move to the cloud".
Cosmos DB Real-time Advanced Analytics WorkshopDatabricks
The workshop implements an innovative fraud detection solution as a PoC for a bank who provides payment processing services for commerce to their merchant customers all across the globe, helping them save costs by applying machine learning and advanced analytics to detect fraudulent transactions. Since their customers are around the world, the right solutions should minimize any latencies experienced using their service by distributing as much of the solution as possible, as closely as possible, to the regions in which their customers use the service. The workshop designs a data pipeline solution that leverages Cosmos DB for both the scalable ingest of streaming data, and the globally distributed serving of both pre-scored data and machine learning models. Cosmos DB’s major advantage when operating at a global scale is its high concurrency with low latency and predictable results.
This combination is unique to Cosmos DB and ideal for the bank needs. The solution leverages the Cosmos DB change data feed in concert with the Azure Databricks Delta and Spark capabilities to enable a modern data warehouse solution that can be used to create risk reduction solutions for scoring transactions for fraud in an offline, batch approach and in a near real-time, request/response approach. https://github.com/Microsoft/MCW-Cosmos-DB-Real-Time-Advanced-Analytics Takeaway: How to leverage Azure Cosmos DB + Azure Databricks along with Spark ML for building innovative advanced analytics pipelines.
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
Microsoft Azure Data Lake Storage is designed to enable operational and exploratory analytics through a hyper-scale repository. Journey through Azure Data Lake Storage Gen1 with Microsoft Data Platform Specialist, Audrey Hammonds. In this video she explains the fundamentals to Gen 1 and Gen 2, walks us through how to provision a Data Lake, and gives tips to avoid turning your Data Lake into a swamp.
Learn more about Data Lakes with our blog - Data Lakes: Data Agility is Here Now https://bit.ly/2NUX1H6
What is in a modern BI architecture? In this presentation, we explore PaaS, Azure Active Directory and Storage options including SQL Database and SQL Datawarehouse.
This document is an overview of OpenProdoc, describing functionality and architecture.
OpenProdoc is an ECM Document Management system with the characteristics:
- A complete portable version that can be used without installation in Linux, Windows, Mac.
- Open Source
- Multi-platform (Java)
- Multi-database (Derby, MySQL, Oracle, DB2, PostgreSQL, SQLServer, SQLLite, HSQLDB)
- Low requirements for the engine (can work without a J2EE server)
- Several ways for Authentication (Ldap, DDBB, OS, Own system)
- Different ways to store documents (FileSystem?, BLOB,ftp, Reference)
- Object oriented definitions for documents and folders (including inheritance)
- Fine granularity of administration and permissions, allowing delegation of different functions.
- Multi-language (English, Spanish and portuguese)
- Thin (Web) and Thick (Swing) Clients
Download: http://code.google.com/p/openprodoc/
3. ww.aditi.com
Azure Document DB?
• NoSQL document database service designed for modern mobile and web applications
• Provides fast read and write
• Easily scale up and down on demand
• JavaScript integration
• Schema less database
• Enables complex ad hoc queries using SQL language
• Multi-Document transaction processing using Stored Procedure, triggers & UDFs
• Natively supports JSON documents
3
6. ww.aditi.com
Resource Description
Database account A database account is associated with a set of databases and a fixed amount of blob storage for attachments (preview feature). You can
create one or more database accounts using your Azure subscription. Every Standard database account is allocated a minimum capacity of
one S1 collection. For more information, visit our pricing page.
Database A database is a logical container of document storage partitioned across collections. It is also a users container.
User The logical namespace for scoping/partitioning permissions.
Permission An authorization token associated with a user for authorized access to a specific resource.
Collection A collection is a container of JSON documents and the associated JavaScript application logic. A collection is a billable entity, where the
cost is determined by the performance level associated with the collection. The performance levels (S1, S2 and S3) provide 10GB of
storage and a fixed amount of throughput. For more information on performance levels, visit our performance page.
Stored Procedure Application logic written in JavaScript which is registered with a collection and transactionally executed within the database engine.
Trigger Application logic written in JavaScript modeling side effects associated with insert, replace or delete operations.
UDF A side effect free, application logic written in JavaScript. UDFs enable you to model a custom query operator and thereby extend the core
DocumentDB query language.
Document User defined (arbitrary) JSON content. By default, no schema needs to be defined nor do secondary indices need to be provided for all the
documents added to a collection.
Attachment An attachment is a special document containing references and associated metadata for external blob/media. The developer can choose
to have the blob managed by DocumentDB or store it with an external blob service provider such as OneDrive, Dropbox, etc.
7. ww.aditi.com
Addressing a resource
Value of the _self Description
/dbs Feed of databases under a database account.
/dbs/{_rid-db} Database with the unique id property with the value {_rid-db}.
/dbs/{_rid-db}/colls/ Feed of collections under a database.
/dbs/{_rid-db}/colls/{_rid-coll} Collection with the unique id property with the value {_rid-coll}.
/dbs/{_rid-db}/users/ Feed of users under a database.
/dbs/{_rid-db}/users/{_rid-user} User with the unique id property with the value {_rid-user}.
/dbs/{_rid-db}/users/{_rid-user}/permissions Feed of permissions under a database.
/dbs/{_rid-db}/users/{_rid-user}/permissions/{_rid-permission} Permission with the unique id property with the value {_rid-permission}.
7
10. ww.aditi.com
Modeling Data in Document DB
• Embedding data
10
When to embed
In general, use embedded data models when:
• There are contains relationships between entities.
• There are one-to-few relationships between entities.
• There is embedded data that changes infrequently.
• There is embedded data won't grow without bound.
11. ww.aditi.com
• Referencing data
11
When to reference
In general, use normalized data models when:
• Representing one-to-many relationships.
• Representing many-to-many relationships.
• Related data changes frequently.
• Referenced data could be unbounded.
12. ww.aditi.com
• Hybrid data (Combination of embed & referencing)
12
Detail Info: https://azure.microsoft.com/en-in/documentation/articles/documentdb-modeling-data/
13. ww.aditi.com
Indexing
13
• Specified at the collection level
• Automatic indexing
– By default, all documents are automatically indexed, but you can choose to turn it off.
– When indexing is turned off, documents can be accessed only through their self-links or by queries using ID.
14. ww.aditi.com 14
• Indexing mode
– Consistent – By default indexes synchronously updated on insert, replace or delete
– Lazy – asynchronous index update (targeted at bulk ingestion)
15. ww.aditi.com 15
• Indexing Types
Hash – supports efficient equality and JOIN queries
Range – supports efficient equality queries, range queries (using >, <, >=, <=, !=), and Order By queries
Hash:
SELECT * FROM collection c WHERE c.prop = "value
SELECT tag FROM collection c JOIN tag IN c.props WHERE tag = 5
Range:
SELECT * FROM collection c WHERE c.prop = "value“
SELECT * FROM collection c WHERE c.prop > 5
SELECT * FROM collection c ORDER BY c.prop
Note: Range indexes are supported only for numeric values.
16. ww.aditi.com
Indexing logical View
16
Example:
JSON property: {"headquarters": "Belgium"}
Corresponds to the path: /"headquarters"/"Belgium".
JSON array: {"exports": [{"city": “Moscow"}, {"city": Athens"}]}
Correspond to the paths: /"exports"/0/"city"/"Moscow" and /"exports"/1/"city"/"Athens".
17. ww.aditi.com
Consistency levels
The choice of consistency level has performance implications for both write and read operations. It applies to all the collections in
the database (future there is a plan of overriding the consistency level per collection basics)
• Strong
– provides absolute guarantees on data consistency, but offers the lowest level of read and write performance.
• Bounded staleness
– provides more predictable behavior for read consistency while offering the lowest latency writes
• Session
– provides predictable read data consistency for a session while offering the lowest latency writes
• Eventual
– provides the weakest read consistency but offers the lowest latency for both reads and writes.
17
18. ww.aditi.com
Securing access
• Account administrator
– Full access to all of the resources (administrative and application) within a given DocumentDB account.
• Read-only administrator
– Read-only access to all of the resources (administrative and application within a given DocumentDB account.
• Database user:
– The DocumentDB user resource associated with a specific set of DocumentDB database resources (e.g. collections, documents,
scripts).
18
Administrative resources
• Account
• Database
• User
• Permission
Application resources
• Collection
• Document
• Attachment
• Stored procedure
• Trigger
• User-defined function
22. ww.aditi.com
UDF’s
22
Creates a UDF to calculate income tax based on rates for various income brackets, and then uses it inside a query to find all people who paid
more than $20,000 in taxes.
26. ww.aditi.com
Limits and quotas
26
ENTITY QUOTA (STANDARD OFFER)
Database Accounts* 5
Number of databases per database account 100
Number of users per database account – across all databases 500,000
Number of permissions per database account – across all databases 2,000,000
Attachment storage per database account (Preview Feature) 2 GB
Maximum Request Units / second per collection 2500
Number of stored procedures, triggers and UDFs per collection* 25 each
Maximum execution time for stored procedure and trigger 5 seconds
Provisioned document storage / collection 10 GB
Maximum collections per database account* 100
Maximum document storage per database (100 collections)* 1 TB
Maximum Length of the Id property 255 characters
Maximum items per page 1000
27. ww.aditi.com 27
ENTITY QUOTA (STANDARD OFFER)
Maximum request size of document and attachment 512KB
Maximum request size of stored procedure, trigger and UDF 512KB
Maximum response size 1MB
String All strings must conform to the UTF-8 encoding. Since UTF-8 is a variable width encoding, string
sizes are determined using the UTF-8 bytes.
Maximum length of property or value No practical limit
Maximum number of UDFs per query* 1
Maximum number of built-in functions per query No practical limit
Maximum number of JOINs per query* 2
Maximum number of AND clauses per query* 5
Maximum number of OR clauses per query* 5
Maximum number of values per IN expression* 100
Maximum number of collection creates per minute* 5
Maximum number of scale operations per minute* 5