The CIOs Guide to NoSQL 2012


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The CIOs Guide to NoSQL 2012

  1. 1. The CIOs Guide to NoSQL Dan McCreary July 12th 2012 Version 6
  2. 2. Agenda • What is NoSQL? • What Triggered the NoSQL Movement? • How is NoSQL distinct from Big Data and Cloud Computing? • Common Characteristics of NoSQL System • Business Benefits of NoSQL • Core NoSQL Concepts • Selected NoSQL Implementations • Recent NoSQL Developments • Selecting the Right NoSQL System • Next Step: Selecting the Right NoSQL Pilot ProjectM D 2 Copyright Kelly-McCreary & Associates, LLC
  3. 3. Manning NoSQL BooksM D Kelly-McCreary & Associates, LLC 3
  4. 4. Background for Dan McCreary • Bell Labs • NeXT Computer (Steve Jobs) • Owner of Custom Object-Oriented Software Consultancy • Federal data integration (National Information Exchange Model) • Native XML/XQuery – 2006 • Advocate of NoSQL/XRX systems • Working with Manning Publications on NoSQL TopicM D 4 Copyright Kelly-McCreary & Associates, LLC
  5. 5. NoSQL Definition The NoSQL movement is a set of concepts and technologies that allow the rapid and efficient processing of large data sets with a focus on performance and resiliency.M D 5 Copyright Kelly-McCreary & Associates, LLC
  6. 6. Sample of NoSQL Jargon Document orientation Indexing B-Tree Schema free Configurable durability MapReduce Documents for archives Horizontal scaling Functional programming Sharding and auto-sharding Document Transformation Document Indexing and Search Brewers CAP Theorem Alternate Query Languages Consistency Aggregates Reliability OLAP XQuery Partition tolerance MDX Single-point-of-failure RDF Object-Relational mapping SPARQL Key-value stores Architecture Tradeoff Modeling ATAM Column stores Document-stores Memcached Note that within the context of NoSQL many of these terms have different meanings!M D 6 Copyright Kelly-McCreary & Associates, LLC
  7. 7. Selecting a Database… "Selecting the right data storage solution is no longer a trivial task." Does it Yes Start look like Use Microsoft document? Office No Use the Stop RDBMSM D 7 Copyright Kelly-McCreary & Associates, LLC
  8. 8. Pressures on SQL Only Systems Scalability OLAP/BI/Data Warehouse SQL Social Networks Agile Schema FreeM D 8 Copyright Kelly-McCreary & Associates, LLC
  9. 9. Simplicity is a Virtue • Many systems derive their strength by dramatically limiting the features in their system • Simplicity allows database designers to focus on the primary business driver • Examples: – Touch screen interfaces – Key-value data storesM D 9 Copyright Kelly-McCreary & Associates, LLC
  10. 10. Historical Context Mainframe Era MapReduce Era • 1 CPU • 10,000 CPUs • COBOL and FORTRAN • Functional programming • Punchcards and flat files • MapReduce "server farms" • $10,000 per CPU hour • Pennies per CPU hourM D Copyright Kelly-McCreary & Associates, LLC 10
  11. 11. Two Approaches to Computation 1930s and 40s John Von Neumann Alonzo Church Manage state with a program counter. Make computations act like math functions. Which is simpler? Which is cheaper? Which will scale to 10,000 CPUs?M D 11 Copyright 2010 Dan McCreary & Associates
  12. 12. Standard vs. MapReduce Prices Johns Way Alonzos WayM D 12 Copyright Kelly-McCreary & Associates, LLC
  13. 13. MapReduce CPUs Cost Less! 40 Cost Per CPU Hour (Cents) 35 30 25 20 15 10 5 0 Standard MapReduce Cuts cost from 32 to 6 cents per CPU hour! CPU CPU Perhaps Alanzo was right! Why? (hint: how "shareable" is this process)M D 13 Copyright Kelly-McCreary & Associates, LLC
  14. 14. Perspectives Object OLAP Native Stores MDX XML NoSQL for Graph Web 2.0 Stores and BigDataM Perspective depends on your context D Kelly-McCreary & Associates, LLC 14
  15. 15. Architectural Tradeoffs "I want a fast car with good mileage." "I want a scaleable database with low cost that runs well on the 1,000 CPUs in our data center."M D Kelly-McCreary & Associates, LLC 15
  16. 16. NoSQL on Google Trends !M D 16 Kelly-McCreary & Associates, LLC
  17. 17. Recent History • The term NoSQL became re-popularized around 2009 • Used for conferences of advocates of non- relational databases • Became a contagious idea "meme" • First of many "NoSQL meetups" in San Francisco organized by Jon Oskarsson • Conversion from "No SQL" to "Not Only SQL" in recent yearM D 17 Kelly-McCreary & Associates, LLC
  18. 18. NoSQL and Web 2.0 Startups • Many web 2.0 startups did not use Oracle or MySQL • They built their own data stores influenced by Amazon’s Dynamo and Google’s BigTable in order to store and process huge amounts of data • In the social community or cloud computing applications, most of these data stores became OpenSource softwareM D 18 Kelly-McCreary & Associates, LLC
  19. 19. Google MapReduce • 2004 paper that had huge impact of functional programming in the entire community • Copied by many organizations, including YahooM D 19 Copyright Kelly-McCreary & Associates, LLC
  20. 20. Google Bigtable Paper • 2006 paper that gave focus to scaleable databases • designed to reliably scale to petabytes of data and thousands of machinesM D 20 Copyright Kelly-McCreary & Associates, LLC
  21. 21. Amazons Dynamo Paper • Werner Vogels • CTO - • October 2, 2007 • Used to power Amazons S3 service • One of the most influential papers in the NoSQL movement • Service in 2012 Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall and Werner Vogels, “Dynamo: Amazons Highly Available Key-Value Store”, in the Proceedings of the 21st ACM Symposium on Operating Systems Principles, Stevenson, WA, October 2007.M D 21 Copyright Kelly-McCreary & Associates, LLC
  22. 22. NoSQL "Meetups" “NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data.” Computerworld magazine, July 1st, 2009M D 22 Kelly-McCreary & Associates, LLC
  23. 23. Key Motivators • Licensing RDBMS on multiple CPUs • The Thee "V"s – Velocity – lots of data arriving fast – Volume – web-scale BigData – Variability – many exceptions • Desire to escape rigid schema design • Avoidance of complex Object-Relational Mapping (the "Vietnam" of computer science)M D 23 Kelly-McCreary & Associates, LLC
  24. 24. Many Processes Today Are Driven By… The constraints of yesterday… Challenge: Ask ourselves the question… Do our current method of solving problems with tabular data… Reflect the storage of the 1950s… Or our actual business requirements? What structures best solve the actual business problem?M 24 D Copyright 2008 Dan McCreary & Associates
  25. 25. No-Shredding! My Data • Relational databases take a single hierarchical document and shred it into many pieces so it will fit in tabular structures • Document stores prevent this shreddingM 25 D Copyright 2008 Dan McCreary & Associates
  26. 26. Is Shredding Really Necessary? • Every time you take hierarchical data and put it into a traditional database you have to put repeating groups in separate tables and use SQL “joins” to reassemble the dataM 26 D Copyright 2008 Dan McCreary & Associates
  27. 27. Object Relational Mapping T1 T2 T4 T3 Relational Web Browser Object Middle Database Tier • T1 – HTML into Objects • T2 –Objects into SQL Tables • T3 – Tables into Objects • T4 – Objects into HTMLM D 27 Kelly-McCreary & Associates, LLC
  28. 28. "The Vietnam of Applications" • Object-relational mapping has become one of the most complex components of building applications today • A "Quagmire" where many projects get lost • Many "heroic efforts" have been made to solve the problem: – Hibernate – Ruby on Rails • But sometimes the way to avoid complexity is to keep your architecture very simpleM D 28 Copyright Kelly-McCreary & Associates, LLC
  29. 29. Document Stores Need No Translation Document Document Application Layer Database • Documents in the database (JSON or XML) • Documents in the application • No object middle tier • No "shredding" • No reassembly • Simple!M 29 D Copyright 2010 Dan McCreary & Associates
  30. 30. The XML "Full Stack" XForms REST-Interfaces Web Browser XML database • XML lives in the web browser (XForms) • REST interfaces • XML in the database (Native XML, XQuery) • XRX Web Application Architecture • No translation!M 30 D Copyright 2010 Dan McCreary & Associates
  31. 31. "Schema Free" • Systems that automatically determine how to index data as the data is loaded into the database • No a priori knowledge of data structure • No need for up-front logical data modeling – …but some modeling is still critical • Adding new data elements or changing data elements is not disruptive • Searching millions of records still has sub- second response timeM 31 D Copyright 2010 Dan McCreary & Associates
  32. 32. Monoculture and Mono-architectureM Image Source: Wikipedia 32 D Copyright 2010 Dan McCreary & Associates
  33. 33. Eric Evans “The whole point of seeking alternatives [to RDBMS systems] is that you need to solve a problem that relational databases are a bad fit for.” Eric Evans RackspaceM D 33 Kelly-McCreary & Associates, LLC
  34. 34. Evolution of Ideas in OpenSource New Database Ideas New Products Proprietary Software Product A Schema-free Product B OpenSourceAuto-sharding MapReduce Product B Cloud Computing • How quickly can new ideas be recombined into new database products? • OpenSource software has proved to be the most efficient way to quickly recombine new ideas into new productsM D 34 Copyright Kelly-McCreary & Associates, LLC
  35. 35. Storage Architectural Patterns Tables Trees Stars TriplesM D 35 Copyright 2010 Dan McCreary & Associates
  36. 36. Finding the Right Match Schema-Free Standards Compliant Mature Query Language Use CMUs Architectural Tradeoff and Modeling (ATAM) ProcessM 36 D Copyright 2010 Dan McCreary & Associates
  37. 37. Avoidance of Unneeded Complexity • Relational databases provide a variety of features to ALWAYS support strict data consistency • Rich feature set and the ACID properties implemented by RDBMSs might be more than necessary for particular applications and use casesM D 37 Kelly-McCreary & Associates, LLC
  38. 38. "Once Size Fits…" "One Size Does Not Fit All" James Hamilton Nov. 3rd, 2009,guid,afe46691-a293-4f9a-8900-5688a597726a.aspxM D 38 Kelly-McCreary & Associates, LLC
  39. 39. Different Thinking Sequential Processing Parallel Processing • The output of any step can be used in the • Each loop of XQuery FLOWR next step statements are independent • State must be carefully thread (no side-effects) managedM D Kelly-McCreary & Associates, LLC 39
  40. 40. Cloud Computing • High scalability – Especially in the horizontal direction (multi CPUs) • Low administration overhead – Simple web page administrationM D 40 Kelly-McCreary & Associates, LLC
  41. 41. Databases work well in the cloud • Data warehousing specific databases for batch data processing and map/reduce operations • Simple, scalable and fast key/value-stores • Databases containing a richer feature set than key/value-stores fitting the gap with traditional • RDBMS while offering good performance and scalability properties (such as document databases).M D 41 Kelly-McCreary & Associates, LLC
  42. 42. Auto-Sharding • When one database gets almost full it tells a "coordinator" system and the data automatically gets migrated to other systems • Systems have "Partition Tolerance" Warning Disk Full! Before: one disk 90% full: Time to "Shard" After: two disks 45% full:M D 42 Copyright Kelly-McCreary & Associates, LLC
  43. 43. Brewers CAP Theorem Consistency You can not have all three so pick two! Availability Partition ToleranceM D Kelly-McCreary & Associates, LLC 43
  44. 44. Migrating to Partition Tolarance Consistency CA CP RDBMS Availability AP Partition ToleranceM D 44 Copyright Kelly-McCreary & Associates, LLC
  45. 45. Scale Up vs. Scale Out Scale Up Scale Out • Make a single CPU as fast as • Make Many CPUs work possible together • Increase clock speed • Learn how to divide your • Add RAM problems into independent • Make disk I/O go faster threadsM D Copyright Kelly-McCreary & Associates, LLC 45
  46. 46. Sample of NO-SQL Systems Document Stores Key-Value Stores Memcache XML Column Stores Graph Stores Object StoresM 46 D Copyright 2010 Dan McCreary & Associates
  47. 47. If you cant beat them…M D Kelly-McCreary & Associates, LLC 47
  48. 48. Key Value Stores Key Value • A table with two columns and a simple interface – Add a key-value – For this key, give me the value – Delete a key • Blazingly fast and easy to scaleM D 48 Copyright Kelly-McCreary & Associates, LLC
  49. 49. Types of Key-Value Stores • Eventually‐consistent Key‐Value store • Hierarchical Key-Value Stores • Key-Value Stores In RAM • Key Value Stores on Disk • Ordered Key-Value StoresM D 49 Copyright Kelly-McCreary & Associates, LLC
  50. 50. Cassendra • Apache open source project • Originally developed by Facebook • Designed for highly distributed high- reliable systems • No single point of failure • Column-family data model D 50 Copyright Kelly-McCreary & Associates, LLC
  51. 51. MongoDB • Open Source License • Document/Collection centric • Sharding built-in, automatic • Stores data in JSON format • Query language is JSON • Can be 10x faster than MySQL • Many languages (C++, JavaScript, Java, Perl, Python etc.)M D 51 Copyright Kelly-McCreary & Associates, LLC
  52. 52. Hadoop/Hbase • Open source implementation of MapReduce algorithm written in Java • Initially created by Yahoo – 300 person-years development • Column-oriented data store similar to Googles BigTable • Java interface • H-Base designed specifically to work with Hadoop and the Hadoop file systemM D 52 Copyright Kelly-McCreary & Associates, LLC
  53. 53. CouchDB • Commercial Company • Apache Project • Written in ERLANG • RESTful JSON API • Distributed, featuring robust, incremental replication with bi-directional conflict detection and managementM D 53 Copyright Kelly-McCreary & Associates, LLC
  54. 54. Memcached • Free & open source in-memory caching system • Designed to speeding up dynamic web applications by alleviating database load • RAM resident key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering • Simple interface • Designed for quick deployment, ease of development • APIs in many languagesM D 54 Copyright Kelly-McCreary & Associates, LLC
  55. 55. MarkLogic • Native XML database designed to used by Petabyte data stores • ACID compliant • Role-based access control • Heavy use by federal agencies, document publishers and "high-variability" data • Arguably the most successful NoSQL companyM D 55 Copyright Kelly-McCreary & Associates, LLC
  56. 56. eXist • OpenSource native XML database • Strong support for XQuery and XQuery extensions • Heavily used by the Text Encoding Initiative (TEI) community and XRX/XForms communities • Ideal for metadata management • Integrated Lucene search and structured searchM D 56 Copyright Kelly-McCreary & Associates, LLC
  57. 57. Riak • Community and Commercial licenses • A "Dynamo-inspired" database • Written in ERLANG • Query JSON or ERLANGM D 57 Copyright Kelly-McCreary & Associates, LLC
  58. 58. Hypertable • Open Source • Closely modeled after Googles Bigtable project • High performance distributed data storage system • Designed to support applications requiring maximum performance, scalability, and reliability • Hypertable Query Language (HQL) that is syntactically similar to SQLM D 58 Copyright Kelly-McCreary & Associates, LLC
  59. 59. Selecting a NoSQL Pilot Project • The "Goldilocks Pilot Project Strategy" • Not to big, not to small, just the right size • Duration • Sponsorship • Importance • Skills • MentorshipM 59 D Copyright 2010 Dan McCreary & Associates
  60. 60. The Future of the NoSQL Movement Growth Diversity • Will data sets continue to grow at exponential rates? • Will new system options become more diverse? • Will new markets have different demands? • Will some ideas be "absorbed" into existing RDBMS vendors products? • Will the NoSQL community continue to be the place where new database ideas and products are incubated? • Will the job of doing high-quality architectural tradeoffs analysisM become easier? D 60 Copyright Kelly-McCreary & Associates, LLC
  61. 61. Using the Wrong Architecture Start Finish Credit: Isaac Homelund – MN Office of the RevisorM D
  62. 62. Using the Right Architecture Finish Start Find ways to remove barriers to empowering the non programmers on your team.M D
  63. 63. Questions Dan McCreary President, Kelly-McCreary & Associates dan@danmccreary.comM D 63 Kelly-McCreary & Associates, LLC