5. Agenda del Curso
Date Time Webinar
25-Mayo-2016 16:00 CEST Introducción a NoSQL
7-Junio-2016 16:00 CEST Su primera aplicación MongoDB
21-Junio-2016 16:00 CEST Diseño de esquema orientado a documentos
07-Julio-2016 16:00 CEST Indexación avanzada, índices de texto y geoespaciales
19-Julio-2016 16:00 CEST Introducción al Aggregation Framework
28-Julio-2016 16:00 CEST Despliegue en producción
6. Agenda de hoy
• ¿Porqué existe NoSQL?
• Tipos de bases de datos NoSQL
• Características clave de MongoDB
• Tolerancia a fallos y persistencia de datos en MongoDB
• Escalabilidad en MongoDB
• Preguntas
7. The origin of SQL (1979)
250 Mb
$ 81.000/year
Dennis Ritchie
Brian Kernighan
$ 8.000/year … (both)
14. Key Value Stores
• An associative array
• Single key lookup
• Very fast single key lookup
• Not so hot for “reverse lookups”
Key Value
12345 4567.3456787
12346 { addr1 : “The Grange”, addr2: “Dublin” }
12347 “top secret password”
12358 “Shopping basket value : 24560”
12787 12345
15. Revision : Row Stores (RDBMS)
• Store data aligned by rows (traditional RDBMS, e.g MySQL)
• Reads retrieve a complete row every time
• Reads requiring only one or two columns are wasteful
ID Name Salary Start Date
1 Ruben T $24000 1/Jun/1970
2 Peter J $28000 1/Feb/1972
3 Phil G $23000 1/Jan/1973
1 Ruben T $24000 1/Jun/1970 2 Peter J $28000 1/Feb/1972 3 Phil G $23000 1/Jan/1973
16. How a Column Store Does it
1 2 3
ID Name Salary Start Date
1 Ruben T $24000 1/Jun/1970
2 Peter J $28000 1/Feb/1972
3 Phil G $23000 1/Jan/1973
Ruben T Peter J Phil G $24000 $28000 $23000 1/Jun/1970 1/Feb/1972 1/Jan/1973
17. Why is this Attractive?
• A series of consecutive seeks can retrieve a column efficiently
• Compressing similar data is super efficient
• So reads can grab more data off disk in a single seek
• How do I align my rows? By order or by inserting a row ID
• IF you just need a small number of columns you don’t need to read all
the rows
• But Updating and deleting by row is expensive
• Append only is preferred
• Better for OLAP than OLTP
18. Graph Stores
• Store graphs (edges and vertexes)
• E.g. social networks
• Designed to allow efficient traversal
• Optimised for representing connections
• Can be implemented as a key value stored with the ability to store
links
• If your use case is not a graph you don’t need a graph database
19. Multi-Model Databases
• Combine multiple storage/access models
• Often Graph plus “something else”
• Fixes the “polyglot persistence” issue of keeping multiple
independent databases consistent
• The “new new thing” in NoSQL Land
• Expect to hear more noise about these kinds of databases
20. Document Store
• Not PDFs, Microsoft Word or HTML
• Documents are nested structures created using Javascript Object Notation (JSON)
{
name : “Rubén Terceño”,
title : “Senior Solutions Architect”,
employee_number : 653,
location : {
type : “Point”,
coordinates : [ 43.34, -3.26 ]},
expertise: [ “MongoDB”, “Java”, “Geospatial” ],
address : {
address1 : “Rutilo 11”,
address2 : “Piso 1, Oficina 2”,
zipcode : “28041”,
}
}
21. Documents are Rich Structures
{
name : “Rubén Terceño”,
title : “Senior Solutions Architect”,
employee_number : 653,
location : {
type : “Point”,
coordinates : [ 43.34, -3.26 ]},
expertise: [ “MongoDB”, “Java”, “Geospatial” ],
address : {
address1 : “Rutilo 11”,
address2 : “Piso 1, Oficina 2”,
zipcode : “28041”,
}
}
Fields can contain sub-documents
Typed field values
Fields can contain arrays
Fields
22. • From the very first version it was a native JSON database
• Understands and can index the sub-structures
• Stores JSON as an serialized binary format called BSON
• Efficient for encoding and decoding for network transmission
• MongoDB can create indexes on any document field
• (We will cover these areas in detail later on in the course)
MongoDB really speaks JSON
23. MongoDB is Full-Featured
Rich Queries
Find all Solution Architects
Find all employees knowing Java in Support or
Consulting
Geospatial Find all the employees currently in France
Text Search
Find all employees describing themselves as “self-
driven”
Aggregation
Calculate the average distance to the Office for all
employees
Map Reduce
What are the most common skills by region over time
(is node.js trending in Brasil?)
35. Elastic Scalability: Automatic Sharding
• Increase or decrease capacity as you go
• Automatic load balancing
• Three types of sharding
• Hash-based
• Range-based
• Tag-aware
Shard 1 Shard 2 Shard 3 Shard N
Horizontally Scalable
36. Scalability with Sharding
• Shard key partitions the content
• MongoDB automatically balances the cluster
• Shards can be added dynamically to a live system
• Rebalancing happens in the background
• Shard key is immutable
• Shard key can route queries to a specific shard
• Queries without a shard key are sent to all members
• Each member process its part in parallel.
37. Query Routing
• Multiple query optimization models
• Each of the sharding options are
appropriate for different apps / use
cases
38. Query Routing
• With a sharded cluster we use a routing layer to guide queries
• We use a daemon called MongoS (Mongo Shard Router)
• Daemon is stateless
• Can run as many as required
• Typically one per app server
39. Resumen
• ¿Porqué existe NoSQL?
• Tipos de bases de datos NoSQL
• Características clave de MongoDB
• Tolerancia a fallos y persistencia de datos en MongoDB
• Escalabilidad en MongoDB
40. Próximo Webinar
Su primera aplicación MongoDB
• 7 de Junio 2016 – 16:00 GMT, 11:00, 9:00
• ¡Regístrese si aún no lo ha hecho!
• Aprenda cómo construir tu primera aplicación con MongoDB
• Cree bases de datos y colecciones
• Cree queries
• Construya Índices
• Entienda cómo analizar el rendimiento
• Regístrese en : https://www.mongodb.com/webinars
• Denos su opinión, por favor: back-to-basics@mongodb.com
Delighted to have you here. Hope you can make it to all the sessions. Sessions will be recorded so we can send them out afterwards so don’t worry if you miss one.
If you have questions please pop them in the sidebar.
A lot of people expect us to come in and bash relational database or say we don’t think they’re good. And that’s simply not true.
Relational databases has laid the foundation for what you’d want out of a database, and we absolutely think there are capabilities that remain critical today
Expressive query language & secondary Indexes. Users should be able to access and manipulate their data in sophisticated ways – and you need a query language that let’s you do all that out of the box. Indexes are a critical part of providing efficient access to data. We believe these are table stakes for a database.
Strong consistency. Strong consistency has become second nature for how we think about building applications, and for good reason. The database should always provide access to the most up-to-date copy of the data. Strong consistency is the right way to design a database.
Enterprise Management and Integrations. Finally, databases are just one piece of the puzzle, and they need to fit into the enterprise IT stack. Organizations need a database that can be secured, monitored, automated, and integrated with their existing IT infrastructure and staff, such as operations teams, DBAs, and data analysts.
But of course the world has changed a lot since the 1980s when the relational database first came about.
First of all, data and risk are significantly up.
In terms of data
90% data created in last 2 years - think about that for a moment, of all the data ever created, 90% of it was in the last 2 years
80% of enterprise data is unstructured - this is data that doesn’t fit into the neat tables of a relational database
Unstructured data is growing 2X rate of structured data
At the same time, risks of running a database are higher than ever before. You are now faced with:
More users - Apps have shifted from small internal departmental system with thousands of users to large external audiences with millions of users
No downtime - It’s no longer the case that apps only need to be available during standard business hours. They must be up 24/7.
All across the globe - your users are everywhere, and they are always connected
On the other hand, time and costs are way down.
There’s less time to build apps than ever before. You’re being asked to:
Ship apps in a few months not years - Development methods have shifted from a waterfall process to an iterative process that ships new functionality in weeks and in some cases multiple times per day at companies like Facebook and Amazon.
And costs are way down too. Companies want to:
Pay for value over time - Companies have shifted to open-source business and SaaS models that allow them to pay for value over time
Use cloud and commodity resources - to reduce the time to provision their infrastructure, and to lower their total cost of ownership
Because the relational database was not designed for modern applications, starting about 10 years ago a number of companies began to build their own databases that are fundamentally different. The market calls these NoSQL.
NoSQL databases were designed for this new world…
Flexibility. All of them have some kind of flexible data model to allow for faster iteration and to accommodate the data we see dominating modern applications. While they all have different approaches, what they have in common is they want to be more flexible.
Scalability + Performance. Similarly, they were all built with a focus on scalability, so they all include some form of sharding or partitioning. And they're all designed to deliver great performance. Some are better at reads, some are better at writes, but more or less they all strive to have better performance than a relational database.
Always-On Global Deployments. Lastly, NoSQL databases are designed for highly available systems that provide a consistent, high quality experience for users all over the world. They are designed to run on many computers, and they include replication to automatically synchronize the data across servers, racks, and data centers.
However, when you take a closer look at these NoSQL systems, it turns out they have thrown out the baby with the bathwater. They have sacrificed the core database capabilities you’ve come to expect and rely on in order to build fully functional apps, like rich querying and secondary indexes, strong consistency, and enterprise management.
Think redis, memcached or Couchbase.
Column stores you know and love, HP Vertica, Cassandra.
Rich queries, text search, geospatial, aggregation, mapreduce are types of things you can build based on the richness of the query model.
MongoDB was built to address the way the world has changed while preserving the core database capabilities required to build modern applications.
Our vision is to leverage the work that Oracle and others have done over the last 40 years to make relational databases what they are today, and to take the reins from here. We pick up where they left off, incorporating the work that internet pioneers like Google and Amazon did to address the requirements of modern applications.
MongoDB is the only database that harnesses the innovations of NoSQL and maintains the foundation of relational databases – and we call this our Nexus Architecture.
High Availability – Ensure application availability during many types of failures
Meet stringent SLAs with fast-failover algorithm
Under 2 seconds to detect and recover from replica set primary failure
Disaster Recovery – Address the RTO and RPO goals for business continuity
Maintenance – Perform upgrades and other maintenance operations with no application downtime
Secondaries can be used for a variety of applications – failover, hot backup, rolling upgrades, data locality and privacy and workload isolation
MongoDB provides horizontal scale-out for databases using a technique called sharding, which is trans- parent to applications. Sharding distributes data across multiple physical partitions called shards. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O, without adding complexity to the application.
MongoDB automatically balances the data in the cluster as the data grows or the size of the cluster increases or decreases.
MongoDB supports three types of sharding:
• Range-based Sharding. Documents are partitioned across shards according to the shard key value. Documents with shard key values “close” to one another are likely to be co-located on the same shard. This approach is well suited for applications that need to optimize range- based queries.
• Hash-based Sharding. Documents are uniformly distributed according to an MD5 hash of the shard key value. Documents with shard key values “close” to one another are unlikely to be co-located on the same shard. This approach guarantees a uniform distribution of writes across shards, but is less optimal for range-based queries.
• Tag-aware Sharding. Documents are partitioned according to a user-specified configuration that associates shard key ranges with shards. Users can optimize the physical location of documents for application requirements such as locating data in specific data centers.
Sharding is transparent to applications; whether there is one or one hundred shards, the application code for querying MongoDB is the same. Applications issue queries to a query router that dispatches the query to the appropriate shards.
For key-value queries that are based on the shard key, the query router will dispatch the query to the shard that manages the document with the requested key. When using range-based sharding, queries that specify ranges on the shard key are only dispatched to shards that contain documents with values within the range. For queries that don’t use the shard key, the query router will dispatch the query to all shards and aggregate and sort the results as appropriate. Multiple query routers can be used with a MongoDB system, and the appropriate number is determined based on performance and availability requirements of the application.