Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Introducción a NoSQL y MongoDB Webinar

3,196
views

Published on

El segmento de la base de datos está evolucionando, al mismo tiempo que vemos como nuevos, almacenes escalables de datos emergen. Key value stores, grandes columnas de almacenamiento y bases de datos …

El segmento de la base de datos está evolucionando, al mismo tiempo que vemos como nuevos, almacenes escalables de datos emergen. Key value stores, grandes columnas de almacenamiento y bases de datos orientados en documentos, ofrecen una alternativa atractiva a la base de datos relacional tradicional. Evitando las suposiciones tradicionales sobre los cuales se construyeron las bases de datos anteriores, esta nueva clase de soluciones de no-relacionales o "NoSQL" adquieren la capacidad de escalar horizontalmente. Además, las soluciones NoSQL ofrecen alternativas interesantes al modelo tradicional de datos relacional.

Esta presentación mostrara a los asistentes, los conceptos claves y necesarios para comprender y evaluar los almacenes de datos NoSQL. Vamos a explorar las diferencias fundamentales que existen entre las diversas clases de soluciones NoSQL y que concluyen con un examen en profundidad, de la base de datos MongoDB orientada a documentos.

Esta presentación incluirá:

Orígenes del movimiento NoSQL
Una visión general del segmento de NoSQL
La filosofía y la creación de MongoDB
MongoDB, arquitectura del sistema
MongoDB, ejemplos de uso

Published in: Technology

2 Comments
1 Like
Statistics
Notes
  • I wish request authorization to Robert Stam and 10gen to use some resources (such images) from this presentation to insert in muy introductory conference about MongoDB at one University in Loja, Ecuador.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • La semana que viene habrá un Webinar en ingles sobre MongoDB-Ruby. tienen planeado hacer uno igual pero en Español?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
3,196
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
136
Comments
2
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Intro – dozen years in RDBMS space, OLTP (transactional systems), OLAP (datawarehousing), etc. Now noSQL - at 10gen, MongoDB company (developed MongoDB and provides support, consulting, training for MongoDB)
  • No redundant data, complex ERDs, OLTP vs OLAP, 1980's indexing invented/published, becomes standard.late 1990's – learned how to scale web via load balancers, etc. but one DB behind it all.Static content distributed and "hosted" near the edge but what about writes?
  • No redundant data, complex ERDs, OLTP vs OLAP, 1980's indexing invented/published, becomes standard.late 1990's – learned how to scale web via load balancers, etc. but one DB behind it all.Static content distributed and "hosted" near the edge but what about writes?
  • No redundant data, complex ERDs, OLTP vs OLAP, 1980's indexing invented/published, becomes standard.late 1990's – learned how to scale web via load balancers, etc. but one DB behind it all.Static content distributed and "hosted" near the edge but what about writes?
  • Social media: writes start to catch up to the reads.also mobile/PDAs, large data projects like genome, space, clickstream analysis, logsBI comes of ageTraditional RDBMS cannot handle the volumecache data in-memorymanual sharding/partitioningreplication
  • Social media: writes start to catch up to the reads.also mobile/PDAs, large data projects like genome, space, clickstream analysis, logsBI comes of ageTraditional RDBMS cannot handle the volumecache data in-memorymanual sharding/partitioningreplication
  • Developers interact with ORM layer, lose visibility of relational data model – unintuitive, many unfamiliar with performance implications
  • Developers interact with ORM layer, lose visibility of relational data model – unintuitive, many unfamiliar with performance implications
  • Developers started coming up with work-arounds:Denormalize dataAvoid joins and long running transactions Custom cachesApplication Level PartitioningDistributed Caches
  • Developers started coming up with work-arounds:Denormalize dataAvoid joins and long running transactions Custom caches Require more RAM!Application Level PartitioningDistributed Caches lose ability to do joins, long running transactions
  • today's apps need: low and predictable response timesscalability "on demand" (high peak times, cloud deployment)HA for reads AND writesmulti data center distribution
  • Maybe no SQL or maybe not only SQL
  • trade off some of less critical components for speed, scale, and ease of use.some have no ad hoc queries, some limited way of reading or updating the data.
  • NoSQL = Non-relational next generation operation data stores and databasesown query language or simple fetch by keyNo SQL as query languageDoes not give ACID guarantees (transactions limited to single item)Distributed fault-tolerant architecture
  • Eventual Consistency (but with strong ability to distribute and high availability)no joins +light transactional semantics = horizontally scalable architectures
  • Eventual Consistency (but with strong ability to distribute and high availability)no joins +light transactional semantics = horizontally scalable architectures
  • Eventual Consistency (but with strong ability to distribute and high availability)no joins +light transactional semantics = horizontally scalable architectures
  • Eventual Consistency (but with strong ability to distribute and high availability)no joins +light transactional semantics = horizontally scalable architectures
  • K/V stores: PNUTS(Yahoo), Dynamo (by Amazon),Voldemort (originally by LinkedIn), bigtable – google (BigTable – column stores)pnuts – yahoo (key value store)Dynamo (amazon), cassandra (Facebook) similar to BigTable, riak (Basho), membaseNeo4jcouchdb/couchbase
  • How much data do you have? Reads? Writes? Types of queries?How important is it not to ever lose data? (too bad, all systems can lose data)How easy to maintain? Pages in the middle of the night?EASE OF USE – represents your data intuitively?
  • Founders founder of Double-click, Shop-wiki, GILT groupe, biz insider running into the same problems over and over.originally working on platform for the cloud (like google apps)Application stack that would scale out easily
  • Maintain richness and depth of functionality of RDBMS combined with performance and scalability of in-memory key-value stores.
  • Documents: richer than flat structures
  • let's look at a simple data model for blog posts, with authors, comments, tags, votes.
  • author/post/comments/tags
  • JSON document – contains key value pairs, different types, values can also be arrays and other documents
  • ----- Meeting Notes (6/19/12 13:12) -----because of the way MongoDB lets you update documents atomically we can be sure totals and list of voters will stay in sync
  • comments is an array of JSON documentswe can query by fields inside embedded documents as well as array members.
  • secondary indexes, compound indexes, multikey indexes.----- Meeting Notes (6/19/12 13:12) -----why is it important to have all of document together?
  • seeks take long, reads less so.this is why joins are expensive!
  • REPLICA SETS
  • Horizonal scaling, automatically managed by MongoDB to partition the data across large number of servers transparently to the application.
  • Horizonal scaling, automatically managed by MongoDB to partition the data across large number of servers transparently to the application.
  • Language drivers are replica set aware and keep a list of replica set members (and can query via isMaster to determine which is master).
  • Full deployment. As many mongoS processes as you have app servers (for example); Config DBs are small but hold the critical information about where ranges of data are located on disk/shards.
  • Special sharding router process, to apps looks just like a stand-alone MongoD.
  • Full deployment. As many mongoS processes as you have app servers (for example); Config DBs are small but hold the critical information about where ranges of data are located on disk/shards.
  • Full deployment. As many mongoS processes as you have app servers (for example); Config DBs are small but hold the critical information about where ranges of data are located on disk/shards.
  • Transcript

    • 1. Introducción al NoSQL y MongoDB 13 de septiembre, 2012 Robert Stam 1
    • 2. • 1970s Aparecen las bases de datos relacionales – El almacenamiento es costoso – Los datos se normalizan – El almacenamiento es abstraído de la aplicación 2
    • 3. • 1970s Aparecen las bases de datos relacionales – El almacenamiento es caro – Los datos se normalizan – El almacenamiento es abstraído de la aplicación• 1980s Aparecen versiones comerciales de las RDBMS – Modelo cliente/servidor – SQL emerge como estándar 3
    • 4. • 1970s Aparecen las bases de datos relacionales – El almacenamiento es caro – Los datos se normalizan – El almacenamiento es abstraído de la aplicación• 1980s Aparecen versiones comerciales de las RDBMS – Modelo cliente/servidor – SQL emerge como estándar• 1990s Las cosas empiezan a cambiar – Cliente/servidor => arquitectura 3-niveles – Aparecen el internet y la web 4
    • 5. • 2000s Web 2.0 – Aparece "Social Media" – Aceptación de E-Commerce – Continuan bajando precios de HW – Incremento masivo de datos coleccionados 5
    • 6. • 2000s Web 2.0 – Aparece "Social Media" – Aceptación de E-Commerce – Continuan bajando precios de HW – Increment masivo de datos coleccionados• Resultado – Requerimiento continuo para escalar dramáticamente – ¿Cómo podemos escalar? 6
    • 7. + transacciones complejas + datos tabulares + consultas ad hoc - O<->R mapeo es difícil - problemas de velocidad y escalabilidad - no es muy ágil BI / OLTP /reporting operational 7
    • 8. + transacciones complejas+ consultas ad hoc + datos tabulares+ SQL como protocolo + consultas ad hocestándar entre clientes y - O<->R mapeo es difícilservidores - problemas de velocidad y+ crece horizontalmente escalabilidadmejor que las bases de - no es muy ágildatos operacionales- algunos limites deescalabilidad BI / OLTP /- esquemas rígidos- no es en tiempo reporting operationalreal, pero funciona biencon cargas masivas enhoras de la madrugada 8
    • 9. + transacciones complejas+ consultas ad hoc + datos tabulares+ SQL como protocolo + consultas ad hocestándar entre clientes y - O<->R mapeo es difícilservidores - problemas de velocidad y+ crece horizontalmente escalabilidadmejor que las bases de - no es muy ágildatos operacionales- algunos limites deescalabilidad BI / OLTP /- esquemas rígidos- no es tiempo real, pero reporting operationalfunciona bien con cargasmasivas en horas de lamadrugada Menos problemas aquí 9
    • 10. + transacciones complejas+ consultas ad hoc + datos tabulares+ SQL como protocolo + consultas ad hocestándar entre clientes y - O<->R mapeo es difícilservidores - problemas de velocidad y+ crece horizontalmente escalabilidadmejor que las bases de - no es muy ágildatos operacionales- algunos limites deescalabilidad BI / OLTP /- esquemas rígidos- no es tiempo real, pero reporting operationalfunciona bien con cargasmasivas en horas de lamadrugada Menos problemas aquí Más problemas aquí 10
    • 11. + transacciones complejas+ consultas ad hoc + datos tabulares+ SQL como protocolo + consultas ad hocestándar entre clientes y - O<->R mapeo es difícilservidores - problemas de velocidad y+ crece horizontalmente escalabilidadmejor que las bases de - no es muy ágildatos operacionales- algunos limites deescalabilidad BI / OLTP /- esquemas rígidos- no es tiempo real, pero reporting operational cacheofunciona bien con cargasmasivas en horas de lamadrugada Particionamiento Archivos planos al nivel de la aplicación map/reduce 11
    • 12. • Metodología de desarrollo ágil • Ciclos de desarrollo cortos • Constante evolución de requerimientos • Flexibilidad de diseño 12
    • 13. • Metodología de desarrollo ágil • Ciclos de desarrollo cortos • Constante evolución de requerimientos • Flexibilidad de diseño • Esquema relacional • Difícil de evolucionar • Migraciones lentas y difíciles • En sincronía con la aplicación • Pocos desarrolladores interactúan directamente con la base de datos 13
    • 14. 14
    • 15. 15
    • 16. • Escalabilidad horizontal• Más resultados en tiempo real• Desarrollo más veloz• Modelo de datos flexible• Bajo costo inicial• Bajo costo de operación 16
    • 17. ¿Qué es NoSQL? Relacional vs No-relacional 17
    • 18. + velocidad y escalabilidad - consultas ad hoc limitadas - no son muy transaccionales - no usan SQL/no hay estándares + se acoplan bien al model OO Escalable + ágiles BI / no-relacionalreporting (“nosql”) OLTP / operational 18
    • 19. La próxima generación de bases de datos no-relacionalesUna colección de productos muy diferentes• Diferentes modelos de datos (no-relacionales)• La mayoría no usan SQL para las consultas• No requieren un esquema predefinido• Algunos permiten estructuras de datos flexibles 19
    • 20. • Relacional • Key-Value • Documentos • XML • Grafos • Columnas 20
    • 21. • Relacional • Key-Value • Documentos • XML • Grafos • Columnas• ACID • BASE • (atomicity, consistency, isol • (basically available, soft ation, durability) state, eventual consistency) 21
    • 22. • Relacional • Key-Value • Documentos • XML • Grafos • Columnas• ACID • BASE• Confirmación en 2 fases • Transacciones atómicas (two-phase commit) al nivel de documentos 22
    • 23. • Relacional • Key-Value • Documentos • XML • Grafos • Columnas• ACID • BASE• Confirmación en 2 fases • Transacciones atómicas (two-phase commit) al nivel de documentos• Uniones (joins) • No hay uniones (joins) 23
    • 24. 24
    • 25. • Cantidad de transacciones• Confiabilidad• Mantenimiento• Facilidad de uso• Escalabilidad• Costo 25
    • 26. MongoDB: Introducción 26
    • 27. • Diseñado y desarrollado por los fundadores de DoubleClick, ShopWiki, GILT Groupe, etc…• Programación empieza a fines del 2007• Primer sitio en producción: marzo 2008 businessinsider.com• Código abierto – AGPL, escrito en C++• Versión 0.8 – primera versión oficial febrero 2009• Versión 1.0 – agosto 2009• Versión 2.0 – septiembre 2011• Versión 2.2 – agosto 2012 27
    • 28. MongoDBObjetivos de diseño 28
    • 29. 29
    • 30. • Orientado a documentos • Basado en documentos JSON • Esquema flexible• Arquitectura escalable • Auto-sharding • Replicación y alta disponibilidad• Características importantes • Índices secundarios • Lenguaje de consulta (consultas ad hoc) • Map/Reduce (agregación) 30
    • 31. • Modelo de datos poderoso y flexible• Conversión transparente de objetos en la aplicación (OO) a documentos JSON• Flexibilidad para datos dinámicos• Mejor localidad de datos 31
    • 32. 32
    • 33. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : “Too Big to Fail”, when : Date(“2011-07-26”), author : “joe”, text : “blah”} 33
    • 34. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : “Too Big to Fail”, when : Date(“2011-07-26”), author : “joe”, text : “blah”, tags : [“business”, “news”, “north america”]}> db.posts.find( { tags : “news” } ) 34
    • 35. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : “Too Big to Fail”, when : Date(“2011-07-26”), author : “joe”, text : “blah”, tags : [“business”, “news”, “north america”], votes : 3, voters : [“dmerr”, “sj”, “jane” ]} 35
    • 36. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : “Too Big to Fail”, when : Date(“2011-07-26”), author : “joe”, text : “blah”, tags : [“business”, “news”, “north america”], votes : 3, voters : [“dmerr”, “sj”, “jane” ], comments : [ { by : “tim157”, text : “great story” }, { by : “gora”, text : “i don’t think so” }, { by : “dmerr”, text : “also check out...” } ]} 36
    • 37. { _id : ObjectId("4e2e3f92268cdda473b628f6"), title : “Too Big to Fail”, when : Date(“2011-07-26”), author : “joe”, text : “blah”, tags : [“business”, “news”, “north america”], votes : 3, voters : [“dmerr”, “sj”, “jane” ], comments : [ { by : “tim157”, text : “great story” }, { by : “gora”, text : “i don’t think so” }, { by : “dmerr”, text : “also check out...” } ]}> db.posts.find( { “comments.by” : “gora” } )> db.posts.ensureIndex( { “comments.by” : 1 } ) 37
    • 38. Búsqueda = 5+ ms Lectura = súper rápido Post CommentAuthor 38
    • 39. Post Author Comment Comment Comment Comment Comment 39
    • 40. • Índices secundarios• Consultas dinámicas• Orden de los resultados (sort)• Operaciones poderosas: update, upsert• Funciones para agregaciones• Viable como almacenamiento primario 40
    • 41. • Escalabilidad lineal• Alta disponibilidad• Incrementar capacidad sin sacar la aplicación de servicio• Transparente a la aplicación 41
    • 42. Conjunto de réplicas (replica sets)• Alta disponibilidad/transferencia automática• Redundancia de los datos• Recuperación en caso de desastre• Transparente a la aplicación• Posibilidad de mantenimiento sin sacar la aplicación de servicio 42
    • 43. AsynchronousReplication 43
    • 44. AsynchronousReplication 44
    • 45. AsynchronousReplication 45
    • 46. 46
    • 47. Elecciónautomática 47
    • 48. 48
    • 49. • Incrementar capacidad sin sacar la aplicación de servicio• Transparente a la aplicación 49
    • 50. • Incrementar capacidad sin sacar la aplicación de servicio• Transparente a la aplicación• Particiones basados en rangos de valores• Particionamiento y balanceo automático 50
    • 51. Key Range 0..100 mongodEscalabilidad para escribir 51
    • 52. Key Range Key Range 0..50 51..100 mongod mongodEscalabilidad para escribir 52
    • 53. Key Range Key Range Key Range Key Range 0..25 26..50 51..75 76.. 100 mongod mongod mongod mongodEscalabilidad para escribir 53
    • 54. Key Range Key Range Key Range Key Range 0..25 26..50 51..75 76.. 100Primary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 54
    • 55. Aplicación MongoS Key Range Key Range Key Range Key Range 0..25 26..50 51..75 76.. 100Primary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 55
    • 56. Aplicación MongoS MongoS MongoS Key Range Key Range Key Range Key Range 0..25 26..50 51..75 76.. 100Primary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 56
    • 57. Aplicación Config Config MongoS MongoS MongoS Config Key Range Key Range Key Range Key Range 0..25 26..50 51..75 76.. 100Primary Primary Primary PrimarySecondary Secondary Secondary SecondarySecondary Secondary Secondary Secondary 57
    • 58. • Pocas opciones para configurar• La configuración estándar funciona bien• Fácil de instalar y administrar 58
    • 59. MySQL MongoDBSTART TRANSACTION; db.contacts.save( {INSERT INTO contacts VALUES userName: ‚joeblow‛, (NULL, ‘joeblow’); emailAddresses: [INSERT INTO contact_emails VALUES ‚joe@blow.com‛, ( NULL, ‛joe@blow.com‛, ‚joseph@blow.com‛ ] } ); LAST_INSERT_ID() ), ( NULL, ‚joseph@blow.com‛, LAST_INSERT_ID() );COMMIT; 59
    • 60. MySQL MongoDBSTART TRANSACTION; db.contacts.save( {INSERT INTO contacts VALUES userName: ‚joeblow‛, (NULL, ‘joeblow’); emailAddresses: [INSERT INTO contact_emails VALUES ‚joe@blow.com‛, ( NULL, ‛joe@blow.com‛, ‚joseph@blow.com‛ ] } ); LAST_INSERT_ID() ), ( NULL, ‚joseph@blow.com‛, LAST_INSERT_ID() );COMMIT; • Existen interfaces (drivers) para docenas de lenguajes de programación • Una relación natural entre objetos (OO) y documentos 60
    • 61. MongoDB ejemplos de uso 61
    • 62. Manejo de contenido Inteligencia de operaciones E-Commerce Procesamiento de datos de alto Manejo de datos de usuarios volúmen 62
    • 63. Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire text corpus – 3.5T of data in 20 billion records Problem Why MongoDB Impact Analyze a staggering amount of  Migrated 5 billion records in a  Reduced code by 75% data for a system build on single day with zero downtime compared to MySQL continuous stream of high-  MongoDB powers every  Fetch time cut from 400ms to quality text pulled from online website request: 20m API calls 60ms sources per day  Sustained insert speed of 8k Adding too much data too  Ability to eliminate memcached words per second, with quickly resulted in outages; layer, creating a simplified frequent bursts of up to 50k per tables locked for tens of system that required fewer second seconds during inserts resources and was less prone to  Significant cost savings and 15% Initially launched entirely on error. reduction in servers MySQL but quickly hit performance road blocks Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller. Since we don’t spend time worrying about the database, we can spend more time writing code for our application. -Tony Tam, Vice President of Engineering and Technical Co-founder 63
    • 64. Intuit relies on a MongoDB-powered real-time analytics tool for small businesses to derive interesting and actionable patterns from their customers’ website traffic Problem Why MongoDB Impact Intuit hosts more than 500,000  MongoDBs querying and  In one week Intuit was able to websites Map/Reduce functionality could become proficient in MongoDB wanted to collect and analyze server as a simpler, higher- development data to recommend conversion performance solution than a  Developed application features and lead generation complex Hadoop more quickly for MongoDB than improvements to customers. implementation. for relational databases With 10 years worth of user  The strength of the MongoDB  MongoDB was 2.5 times faster data, it took several days to community. than MySQL process the information using a relational database. We did a prototype for one week, and within one week we had made big progress. Very big progress. It was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit 64
    • 65. Shutterfly uses MongoDB to safeguard more than six billion images for millions of customers in the form of photos and videos, and turn everyday pictures into keepsakes Problem Why MongoDB Impact Managing 20TB of data (six  JSON-based data structure  500% cost reduction and 900% billion images for millions of  Provided Shutterfly with an performance improvement customers) partitioning by agile, high compared to previous Oracle function. performance, scalable solution implementation Home-grown key value store on at a low cost.  Accelerated time-to-market for top of their Oracle database  Works seamlessly with nearly a dozen projects on offered sub-par performance Shutterfly’s services-based MongoDB Codebase for this hybrid store architecture  Improved Performance by became hard to manage reducing average latency for High licensing, HW costs inserts from 400ms to 2ms. The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services 65
    • 66. 66
    • 67. Una base de datos de código abierto y de alto rendimiento 67