Advertisement

NoSQLDatabases

Solutions Architect at American Express
Apr. 10, 2016
Advertisement

More Related Content

Advertisement

NoSQLDatabases

  1. The complexity for minimum component costs has increased at a rate of roughly a factor of two per year...Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. -- Gordon Moore, 1965 …Then you better start swimmin’…Or you’ll sink like a stone…For the times they are a-changin’. -- Bob Dylan
  2. •NoSQL is a set of concepts that allows the rapid and efficient processing of data sets with a focus on performance, reliability, and agility. Definition of NoSQL Sounds great… What???
  3. Operational Data • Read and written by applications to carry out their ordinary functions. • Examples: • Shopping cart data in Amazon.com • Information about employees in a human resources system • Buy/Sell prices in Fidelity • Posts made by Facebook users • Travel Itineraries for bookings done on Expedia Two Categories of Data
  4. Analytical Data • Used to provide business intelligence (BI). • Data is often created by storing the operational data used by applications over time, and it’s commonly read-only. • Because these analytical datasets provide a historical record, they’re commonly much bigger than an application’s current operational data. • Example: • A e-commerce company might record all of the purchase data from its web application, then analyze this data to learn about customer buying habits or market trends. • Facebook might sell all the posts made by its users to other companies who can analyze the posts to determine each user’s significant events so that they can tailor offers based on user needs, likes and dislikes. Two Categories of Data
  5. The Problem called Big Data Cracks in the Single CPU RDBMS System due to pressure from the four business drivers of the current age.
  6. Volume • Need to query big data always resulted in performance concerns in RDBMS. • These performance concerns were solved by purchasing faster processors. • But, the power wall was reached which meant increasing processor speed was no longer an option. • System designers shifted their focus from increasing speed on a single chip (vertical scaling or scale up) to using more processors working together (horizontal scaling or scale out). The Problem called Big Data
  7. Velocity • Many single-processor RDBMSs are unable to keep up with the demands of real-time inserts and online queries to the database made by public-facing websites. • RDBMSs frequently index many columns of every new row, a process which decreases system performance. • When single-processor RDBMSs are used as a back end to a web store front, the random bursts in web traffic slow down response for everyone, and tuning these systems can be costly when both high read and write throughput is desired. • This was another reason for engineers to look for a scaled out solution. The Problem called Big Data
  8. Variability • Companies that want to capture and report on exception data struggle when attempting to use rigid database schema structures imposed by RDBMS. For example, if a business unit wants to capture a few custom fields for a particular customer, all customer rows within the database need to store this information even though it doesn’t apply. • Adding new columns to an RDBMS requires the system be shut down and ALTER TABLE commands to be run. When a database is large, this process can impact system availability, costing time and money. • This was another reason engineers looked for a more viable solution. The Problem called Big Data
  9. Agility • The most complex part of building applications using RDBMSs is the process of putting data into and getting data out of the database. • If your data has nested and repeated subgroups of data structures, you need to include an object-relational mapping layer. The responsibility of this layer is to generate the correct combination of INSERT, UPDATE, DELETE, and SELECT SQL statements to move object data to and from the RDBMS persistence layer. • This process isn’t simple and is associated with the largest barrier to rapid change when developing new or modifying existing applications. The Problem called Big Data
  10. • It’s more than rows in tables • NoSQL systems store and retrieve data from many formats: key-value stores, graph databases, column-family stores, document stores, and even rows in tables. • It’s free of joins • NoSQL systems allow you to extract your data using simple interfaces without joins. • It’s schema-free • NoSQL systems allow you to drag-and-drop your data into a folder and then query it without creating an entity-relational model. The Solution called NoSQL
  11. • It works on many processors • NoSQL systems allow you to store your database on multiple processors and maintain high-speed performance. • It uses shared-nothing commodity computers • Most NoSQL systems leverage low-cost commodity processors that have separate RAM and disk. • It supports linear scalability • When you add more processors, you get a consistent increase in performance. • It’s innovative • NoSQL offers options to a single way of storing, retrieving, and manipulating data. NoSQL supporters (also known as NoSQLers) have an inclusive attitude about NoSQL and recognize SQL solutions as viable options. To the NoSQL community, NoSQL means “Not only SQL.” What else?
  12. • It’s not about not using the SQL language • It’s not only open source • It’s not only about volume • It’s not about cloud computing • It’s not just a clever use of RAM and SSD • It’s not an elite group of products • It’s not just Hadoop What is NoSQL not…
  13. Single Complex Component Vs Multiple Simple Components • Removes Complexity • Promotes Reuse • Easier Maintenance • Functions distributed to many NoSQL (and SQL) databases that consist of simple tools that have simpler interfaces and well- defined roles. • NoSQL products take a Master of one thing Vs Jack of All things approach. • Example: MemCache to share objects in RAM, MapReduce to run batch jobs, DynamoDB to store key-value items. NoSQL Concepts
  14. Use application tiers to simplify design NoSQL Concepts
  15. Strategic Use of RAM, SSD and HDD using Consistent Hashing NoSQL Concepts
  16. Transaction Control Using ACID •Atomicity •Consistency •Isolation •Durability NoSQL Concepts
  17. Transaction Control Using BASE •BAsic Availability •Soft State •Eventual Consistency NoSQL Concepts
  18. NoSQL Concepts ACID BASE Get transaction details right Never block a write Block any reports while you are working Focus on throughput, not consistency Be pessimistic, anything might go wrong! Be optimistic, if one service fails it will eventually get caught up Detailed testing and failure mode analysis Some reports may be inconsistent for a while, but don’t worry Lots of locks and unlocks Keep things simple and avoid locks
  19. Automatic Sharding NoSQL Concepts
  20. Eric Brewer’s CAP Theorem for Replication Consistency—Having a single, up-to-date, readable version of your data available to all clients. Consistency here is concerned with multiple clients reading the same items from replicated partitions and getting consistent results. High availability—Knowing that the distributed database will always allow database clients to update items without delay. Internal communication failures between replicated data shouldn’t prevent updates. Partition tolerance—The ability of the system to keep responding to client requests even if there’s a communication failure between database partitions. This is analogous to a person still having an intelligent conversation even after a link between parts of their brain isn’t working. NoSQL Concepts
  21. NoSQL Concepts Eric Brewer’s CAP Theorem for Replication
  22. NoSQL Concepts in Action
  23. Four Quadrants of Data Technologies Operational Relational SQL Relational Databases Oracle SQL Server MySQL Relational Analytics Oracle SQL Server MySQL NoSQL Key-Value Stores DynamoDB, Azure Tables, Riak, etc. Column Family Stores Apache HBase, Apache Cassandra, Google BigTable, etc. Document Stores MongoDB, DocumentDB, etc. Graph Stores Neo4j, AllegoGraph, etc. Big Data Analytics Hadoop HDInsight
  24. Operational NoSQL
  25. RDBMS
  26. Key/Value Stores
  27. Column Family Stores
  28. Document Stores
  29. Document Stores
  30. Graph Stores
  31. Graph Store Example: Social Network
  32. Graph Store Example: User’s Order History
  33. Graph Store Example: Airport Terminal
  34. Analytical NoSQL
  35. Big Data Analytics using Hadoop
  36. Big Data Analytics using Hadoop
  37. Hadoop Core Technologies • Hadoop Distributed File System (HDFS) • Provides a way to store and access very large binary files across a cluster of commodity servers and disk drives. • Hadoop MapReduce • Supports the creation of applications that process large amounts of analytical data in parallel. That data is commonly stored in HDFS. • Hive • A Hadoop-based framework for querying and analyzing data. Among other things, it provides HiveQL, a SQL-like language that can generate MapReduce jobs. • Pig • Another Hadoop-based framework for working with data. It provides a language called Pig Latin for creating MapReduce jobs. Big Data Analytics using Hadoop
  38. • NoSQL really means Not Only SQL • Volume, Velocity, Variability & Agility are the main business drivers for NoSQL. • Key NoSQL Concepts: Multiple Simple Components, Application Tiers With External Services, Strategic Use of RAM, SSD, HDD, BASE Transaction Control, Automatic Sharding, Replication Using CAP. • Popular NoSQL Datastores: Key-Value, Column Family, Document, Graph. • Big Data Analytics using Hadoop Quick Recap
  39. Q & A
Advertisement