0
NoSQL Database Patterns
June 25th, 2013
M
D
Background for Dan McCreary
• Bell Labs
• NeXT Computer (Steve Jobs)
• Owner of 75-person software
consulting firm
• U...
M
D
Making Sense of NoSQL
Copyright Kelly-McCreary & Associates, LLC
3
• Working with NoSQL since 2006
• Co-founders of th...
M
D
Today
1. What are the new database "architecture
patterns" introduced by the NoSQL
movement?
2. What types of problems...
M
D
Three Eras of Databases
• RDBMS for transactions, Data Warehouse
for analytics and NoSQL for …?
Copyright Kelly-McCrea...
M
D
Before NoSQL
Relational Analytical (OLAP)
6
M
D
Pressures on Single Node RDBMS Architectures
Copyright Kelly-McCreary & Associates, LLC
7
OLAP/BI/Data
Warehouse
Socia...
M
D
After NoSQL
Relational Analytical (OLAP) Key-Value
Column-Family DocumentGraph
key value
key value
key value
key value...
M
D
Before NoSQL DB Selection Was Easy!
Copyright Kelly-McCreary & Associates, LLC
9
Does it
look like
document?
Use Micro...
M
D
An evolving tree of data types
Copyright Kelly-McCreary & Associates, LLC
10
Read Mostly
Read/Write
Structured
Unstruc...
M
D
Many Uses of Data
Copyright Kelly-McCreary & Associates, LLC
11
• Transactions (OLTP)
• Analysis (OLAP)
• Search and F...
M
D
Strong Selection Bias
Anchoring bias - the tendency to produce an estimate near a cue amount - "Our managers were expe...
M
D
Simplicity is a Virtue
• Many modern systems
derive their strength by
dramatically limiting the
features in their syst...
M
D
Simplicity is a Design Style
• Focus only on simple systems that solve
many problems in a flexible way
• Examples:
– T...
M
D
RDBMS vs. NoSQL
• NoSQL is real and it’s here to stay
http://www.google.com/trends/explore#q=nosql%2C%20rdbms&date=1%2...
M
D
Eric Evans
“The whole point of seeking
alternatives [to RDBMS
systems] is that you need to
solve a problem that relati...
M
D
The NO-SQL Universe
17
Copyright 2010 Dan McCreary & Associates
Document StoresKey-Value Stores
Graph/Triple Stores
Ob...
M
D
Relational
• Data is usually stored in row by row
manner (row store)
• Standardized query language (SQL)
• Data model ...
M
D
Analytical (OLAP)
• Based on "Star" schema with
central fact table for each event
• Optimized for analysis of read-
an...
M
D
Key-Value Stores
• Keys used to access opaque
blobs of data
• Values can contain any type
of data (images, video)
Pros...
M
D
Key Value Stores
• A table with two columns
and a simple interface
– Add a key-value
– For this key, give me the
value...
M
D
The Locker Metaphor
Copyright Kelly-McCreary & Associates, LLC
22
Key:
Value:
An arbitrary
container
data
M
D
Key-Values Stores are Like Dictionaries
Copyright Kelly-McCreary & Associates, LLC
23
M
D
No Subset Queries in Key-Value Stores
Copyright Kelly-McCreary & Associates, LLC
24
M
D
Types of Key-Value Stores
• Eventually‐consistent key‐value store
• Hierarchical key‐value stores
• Key-Value stores i...
M
D
Memcached
• Open source in-memory key-value caching system
• Make effective use of RAM on many distributed web servers...
M
D
Riak
• Open source distributed key-value store with
support and commercial versions by Basho
• A "Dynamo-inspired" dat...
M
D
Redis
• Open source in-memory key-value store
with optional durability
• Focus on high speed reads and writes of
commo...
M
D
Amazon DynamoDB
• Amazon DynamoDB
• Based around scalable key-value store
• Fastest growing product in Amazon's
histor...
M
D
Column-Family
• Key includes a row, column family
and column name
• Store versioned blobs in one large
table
• Queries...
M
D
Column Family (Bigtable)
• The champion of "Big Data"
• Excel at highly saleable systems
• Tightly coupled with MapRed...
M
D
Spreadsheets Use a Row/Column as a Key
• Bigtable systems use a combination of row
and column information as part of t...
M
D
Keys Include Family and Timestamps
• Bigtable systems have keys that include not
just row and column ID but other attr...
M
D
Column Store Concepts
• Preserve the table-structure
familiar to RDBMS systems
• Not optimized for "joins"
• One row c...
M
D
Column Families
• Group columns into
"Column families"
• Group column families into
"Super-Columns"
• Be able to query...
M
D
Hadoop/Hbase
• Open source implementation of MapReduce
algorithm written in Java
• Initially created by Yahoo
– 300 pe...
M
D
Cassandra
• Apache open source column family
database supported by DataStax
• Peer-to-peer distribution model
• Strong...
M
D
Netflix
Copyright Kelly-McCreary & Associates, LLC
38
M
D
Graph Store
• Data is stored in a series of nodes,
relationships and properties
• Queries are really graph traversals
...
M
D
Graph Stores
• Used when the relationship and relationships
types between items are critical
• Used for
– Social netwo...
M
D
Nodes are "joined" to create graphs
• How do you know that two items reference
the same object?
• Node identification ...
M
D
Open Linked Data
Copyright Kelly-McCreary & Associates, LLC
42
M
D
Neo4J
• Graph database designed to
be easy to use by Java
developers
• Dual license (community
edition is GPL)
• Works...
M
D
Document Store
• Data stored in nested
hierarchies
• Logical data remains stored
together as a unit
• Any item in the ...
M
D
Document Stores
• Store machine readable documents together as a
single blob of data
• Use JSON or XML formats to stor...
M
D
Estimated Big Data and NoSQL Sales
Copyright Kelly-McCreary & Associates, LLC
46
Document Stores
M
D
Object Relational Mapping
• T1 – HTML into Objects
• T2 –Objects into SQL Tables
• T3 – Tables into Objects
• T4 – Obj...
M
D
The Addition of XML Web Services
• T1 – HTML into Java Objects
• T2 – Java Objects into SQL Tables
• T3 – Tables into ...
M
D
"The Vietnam of Applications"
• Object-relational mapping has become one of
the most complex components of building
ap...
M
D
Document Stores Need No Translation
• Documents in the database
• Documents in the application
• No object middle tier...
M
D
Zero Translation (XML)
• XML lives in the web browser (XForms)
• REST interfaces
• XML in the database (Native XML, XQ...
M
D
"Schema Free"
• Systems that automatically determine how to
index data as the data is loaded into the
database
• No a ...
M
D
Schema-Free Integration
"We can easily store the data that we
actually get, not the data we thought we
would get."
Cop...
M
D
Upfront ER Modeling is Not Required
• You do not have to finish
modeling your data before you
insert your first record...
M
D
Document Structure
Copyright Kelly-McCreary & Associates, LLC
55
<books> is our root element
<books> contain
a sequenc...
M
D
MarkLogic
• Native XML database designed to scale to
Petabyte data stores
• Leverages commodity hardware
• ACID compli...
M
D
MongoDB
• Open Source JSON data store created by
10gen
• Master-slave scale out model
• Strong developer community
• S...
M
D
Couchbase
• Open source JSON document store
• Code base separate from CouchDB
• Built around memcached
• Peer to peer ...
M
D
CouchDB
• Apache CouchDB
• Open source JSON data store
• Document Model
• Written in ERLANG
• RESTful JSON API
• Distr...
M
D
eXist
• Open source native XML database
• Strong support for XQuery and XQuery
extensions
• Heavily used by the Text E...
M
D
Two Models
"Bag of Words"
• All keywords in a single container
• Only count frequencies are stored
with each word
"Ret...
M
D
Keywords and Node IDs
• Keywords in the reverse index are now
associated with the node-id in every
document
Node-id
No...
M
D
Hybrid architectures
• Most real world implementations use some
combination of NoSQL solutions
• Example:
– Use docume...
M
D
Tools to Help You Select A System
• ATAM – Architecture Tradeoff
Methodology
• CMU developed process to objectively
se...
M
D
ATAM Process Flow
Copyright Kelly-McCreary & Associates, LLC
65
Business
Drivers
Quality
Attributes
User
Stories
Analy...
M
D
Insert/Select/Publish Comparison
Insert Query Create
Publishing
Web Service
SQL
WebDAV
SQL XQuery
Java
Tomcat
AXIS
JDB...
M
D
Sample Quality Attribute Tree
Kelly-McCreary & Associates, LLC
67
Utility
Searchability
XML Importability
Transformabi...
M
D
Quality Attribute Tree App
Kelly-McCreary & Associates, LLC
68
M
D
Making Sense of NoSQL
Copyright Kelly-McCreary & Associates, LLC
69
http://manning.com/mccreary
M
D
2013 NoSQL Now!
• Dataversity's NoSQL Conference
• August 20-22
• San Jose California
Copyright Kelly-McCreary & Assoc...
M
D
Questions
Thank You!
Dan McCreary
President, Kelly-McCreary & Associates
dan@danmccreary.com
twitter: dmccreary
Copyri...
Upcoming SlideShare
Loading in...5
×

NoSQL Now! NoSQL Architecture Patterns

8,274

Published on

The NoSQL movement has introduced four new database architectural patterns that complement, but not replace, traditional relational and analytical databases. This presentation will introduce these four patterns and discuss their relative strengths and weaknesses for solving a variety of business problems. These problems include Big Data (scalability), search, high availability and agility. For each type of problem we look at how NoSQL databases take different approaches to solving these problems and how you can use this knowledge to find the right database architecture for your business challenges.

Published in: Technology
2 Comments
75 Likes
Statistics
Notes
No Downloads
Views
Total Views
8,274
On Slideshare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
493
Comments
2
Likes
75
Embeds 0
No embeds

No notes for slide

Transcript of "NoSQL Now! NoSQL Architecture Patterns"

  1. 1. NoSQL Database Patterns June 25th, 2013
  2. 2. M D Background for Dan McCreary • Bell Labs • NeXT Computer (Steve Jobs) • Owner of 75-person software consulting firm • US Federal data integration (National Information Exchange Model NIEM.gov) • Native XML/XQuery for metadata management since 2006 • Advocate of web standards, NoSQL and XRX systems Copyright Kelly-McCreary & Associates, LLC 2
  3. 3. M D Making Sense of NoSQL Copyright Kelly-McCreary & Associates, LLC 3 • Working with NoSQL since 2006 • Co-founders of the NoSQL Now! conference • Authors of Manning book on NoSQL (MEAP now, print July 2013) • Guide for managers with a focus on business benefits • Focus on NoSQL architectural tradeoff analysis • http://manning.com/mccreary
  4. 4. M D Today 1. What are the new database "architecture patterns" introduced by the NoSQL movement? 2. What types of problems do they address? 3. How do you match the right problem with the right database pattern? Copyright Kelly-McCreary & Associates, LLC 4
  5. 5. M D Three Eras of Databases • RDBMS for transactions, Data Warehouse for analytics and NoSQL for …? Copyright Kelly-McCreary & Associates, LLC 5 RDBMS RDBMS Data Warehouse 1985-1995 1995-2010 2010-Now Data WarehouseRDBMS NoSQL
  6. 6. M D Before NoSQL Relational Analytical (OLAP) 6
  7. 7. M D Pressures on Single Node RDBMS Architectures Copyright Kelly-McCreary & Associates, LLC 7 OLAP/BI/Data Warehouse Social Networks Scalability Agile Schema Free Single Node RDBMS
  8. 8. M D After NoSQL Relational Analytical (OLAP) Key-Value Column-Family DocumentGraph key value key value key value key value 8
  9. 9. M D Before NoSQL DB Selection Was Easy! Copyright Kelly-McCreary & Associates, LLC 9 Does it look like document? Use Microsoft Office Use the RDBMS Start Stop No Yes
  10. 10. M D An evolving tree of data types Copyright Kelly-McCreary & Associates, LLC 10 Read Mostly Read/Write Structured Unstructured Transactional RDBMS BI/DW Web Crawlers Documents Log Files XML JSON Binary Open Linked Data Graph
  11. 11. M D Many Uses of Data Copyright Kelly-McCreary & Associates, LLC 11 • Transactions (OLTP) • Analysis (OLAP) • Search and Findability • Enterprise Agility • Discovery and Insight • Speed and Reliability • Consistency and Availability
  12. 12. M D Strong Selection Bias Anchoring bias - the tendency to produce an estimate near a cue amount - "Our managers were expecting an RDBMS solution so that’s what we gave them." Availability heuristic - the tendency to estimate that what is easily remembered is more likely than that which is not. - "I hear that NoSQL does not support ACID." or "I hear that XML is verbose?" Bandwagon effect - the tendency to do or believe what others do or believe - "Everyone else at this company and in our local area uses RDBMSs." Confirmation bias - the tendency to seek out only that information that supports one's preconceptions – "We only read posts from the Oracle|Microsoft|IBM groups." Framing effect - the tendency to react to how information is framed, beyond its factual content "We know of some NoSQL projects that failed." Gambler's fallacy (aka sunk cost bias) the failure to reset one's expectations based on one's current situation – "We already paid for our Oracle|Microsoft|IBM license so why spend more money?" Hindsight bias - the tendency to assess one's previous decisions as more efficacious than they were – "Our last five systems worked on RDBMS solutions". Halo effect - the tendency to attribute unverified capabilities in a person based on an observed capability. – "Oracle|Microsoft|IBM sells billions of dollars of licenses each year, how could so many people be wrong". Representativeness heuristic - the tendency to judge something as belonging to a class based on a few salient characteristics - "Our accounting systems work on RDBMS so why not our product search?" Copyright Kelly-McCreary & Associates, LLC 12
  13. 13. M D Simplicity is a Virtue • Many modern systems derive their strength by dramatically limiting the features in their system and focus on a specific task • Simplicity allows database designer to focus on the primary business drivers Copyright Kelly-McCreary & Associates, LLC 13 Photo from flickr by PSNZ Images
  14. 14. M D Simplicity is a Design Style • Focus only on simple systems that solve many problems in a flexible way • Examples: – Touch screen interfaces – Key/Value data stores Copyright Kelly-McCreary & Associates, LLC 14
  15. 15. M D RDBMS vs. NoSQL • NoSQL is real and it’s here to stay http://www.google.com/trends/explore#q=nosql%2C%20rdbms&date=1%2F2009%2051m&cmpt=q RDBMS NoSQL Google Trends 15 Copyright Kelly-McCreary & Associates, LLC
  16. 16. M D Eric Evans “The whole point of seeking alternatives [to RDBMS systems] is that you need to solve a problem that relational databases are a bad fit for.” Eric Evans Rackspace 16 Kelly-McCreary & Associates, LLC
  17. 17. M D The NO-SQL Universe 17 Copyright 2010 Dan McCreary & Associates Document StoresKey-Value Stores Graph/Triple Stores Object Stores Column-Family Stores XML
  18. 18. M D Relational • Data is usually stored in row by row manner (row store) • Standardized query language (SQL) • Data model defined before you add data • Joins merge data from multiple tables • Results are tables • Pros: mature ACID transactions with fine-grain security controls • Cons: Requires up front data modeling, does not scale well Copyright Kelly-McCreary & Associates, LLC 18 Examples: Oracle, MySQL, PostgreSQL, Microsoft SQL Server, IBM DB/2
  19. 19. M D Analytical (OLAP) • Based on "Star" schema with central fact table for each event • Optimized for analysis of read- analysis of historical data • Use of MDX language to count query "measures" for "categories" of data • Pros: fast queries for large data • Cons: not optimized for transactions and updates Copyright Kelly-McCreary & Associates, LLC 19 Examples: Cognos, Hyperion, Microstrategy, Pentaho, Microsoft, Oracle, Business Objects
  20. 20. M D Key-Value Stores • Keys used to access opaque blobs of data • Values can contain any type of data (images, video) Pros: scalable, simple API (put, get, delete) Cons: no way to query based on the content of the value Copyright Kelly-McCreary & Associates, LLC 20 key value key value key value key value Examples: Berkley DB, Memcache, DynamoDB, S3, Redis, Riak
  21. 21. M D Key Value Stores • A table with two columns and a simple interface – Add a key-value – For this key, give me the value – Delete a key • Blazingly fast and easy to scale (no joins) Copyright Kelly-McCreary & Associates, LLC 21 Key Value Blob datatype string datatype
  22. 22. M D The Locker Metaphor Copyright Kelly-McCreary & Associates, LLC 22 Key: Value: An arbitrary container data
  23. 23. M D Key-Values Stores are Like Dictionaries Copyright Kelly-McCreary & Associates, LLC 23
  24. 24. M D No Subset Queries in Key-Value Stores Copyright Kelly-McCreary & Associates, LLC 24
  25. 25. M D Types of Key-Value Stores • Eventually‐consistent key‐value store • Hierarchical key‐value stores • Key-Value stores in RAM • Key-Value stores on disk • High availability key-value store • Ordered key‐value stores • Values that allow simple list operations Copyright Kelly-McCreary & Associates, LLC 25
  26. 26. M D Memcached • Open source in-memory key-value caching system • Make effective use of RAM on many distributed web servers • Designed to speed up dynamic web applications by alleviating database load • RAM resident key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering • Simple interface for highly distributed RAM caches • 30ms read times typical • Designed for quick deployment, ease of development • APIs in many languages Copyright Kelly-McCreary & Associates, LLC 26
  27. 27. M D Riak • Open source distributed key-value store with support and commercial versions by Basho • A "Dynamo-inspired" database • Focus on availability, fault-tolerance, operational simplicity and scalability • Support for replication and auto-sharding and rebalancing on failures • Support for MapReduce, fulltext search and secondary indexes of value tags • Written in ERLANG Copyright Kelly-McCreary & Associates, LLC 27
  28. 28. M D Redis • Open source in-memory key-value store with optional durability • Focus on high speed reads and writes of common data structures to RAM • Allows simple lists, sets and hashes to be stored within the value and manipulated • Many features that developers like – expiration, transactions, pub/sub, partitioning Copyright Kelly-McCreary & Associates, LLC 28
  29. 29. M D Amazon DynamoDB • Amazon DynamoDB • Based around scalable key-value store • Fastest growing product in Amazon's history • SSD only database service • Focus on throughput not storage and predictable read and write times • Strong integration with S3 and Elastic MapReduce Copyright Kelly-McCreary & Associates, LLC 29
  30. 30. M D Column-Family • Key includes a row, column family and column name • Store versioned blobs in one large table • Queries can be done on rows, column families and column names • Pros: Good scale out, versioning • Cons: Cannot query blob content, row and column designs are critical Copyright Kelly-McCreary & Associates, LLC 30 Examples: Cassandra, HBase, Hypertable, Apache Accumulo, Bigtable
  31. 31. M D Column Family (Bigtable) • The champion of "Big Data" • Excel at highly saleable systems • Tightly coupled with MapReduce • Technically a "sparse matrix" were most cells have no data • Generating a list of all columns is non-trivial • Examples: – Google Bigtable – Hadoop HBase – Hypertable Copyright Kelly-McCreary & Associates, LLC 31
  32. 32. M D Spreadsheets Use a Row/Column as a Key • Bigtable systems use a combination of row and column information as part of their key Copyright Kelly-McCreary & Associates, LLC 32 Column ID Row ID
  33. 33. M D Keys Include Family and Timestamps • Bigtable systems have keys that include not just row and column ID but other attributes • Column Families are created when a table is created • Timestamps allows multiple versions of values • Values are just ordered bytes and have no strongly typed data system Copyright Kelly-McCreary & Associates, LLC 33
  34. 34. M D Column Store Concepts • Preserve the table-structure familiar to RDBMS systems • Not optimized for "joins" • One row could have millions of columns but the data can be very "sparse" • Ideal for high-variability data sets • Colum families allow to query all columns that have a specific property or properties • Allow new columns to be inserted without doing an "alter table" • Trigger new columns on inserts Copyright Kelly-McCreary & Associates, LLC 34 Col1 Col100000 …
  35. 35. M D Column Families • Group columns into "Column families" • Group column families into "Super-Columns" • Be able to query all columns with a family or super family • Similar data grouped together to improve speed Copyright Kelly-McCreary & Associates, LLC 35 Table Super Col X Super Col Y Fam1 Fam2 Col-A Col-B
  36. 36. M D Hadoop/Hbase • Open source implementation of MapReduce algorithm written in Java • Initially created by Yahoo – 300 person-years development • Column-oriented data store • Java interface • HBase designed specifically to work with Hadoop • High-level query language (Pig) • Strong support by many vendors Copyright Kelly-McCreary & Associates, LLC 36
  37. 37. M D Cassandra • Apache open source column family database supported by DataStax • Peer-to-peer distribution model • Strong reputation for linear scale out (millions of writes/second) • Database side security • Written in Java and works well with HDFS and MapReduce Copyright Kelly-McCreary & Associates, LLC 37
  38. 38. M D Netflix Copyright Kelly-McCreary & Associates, LLC 38
  39. 39. M D Graph Store • Data is stored in a series of nodes, relationships and properties • Queries are really graph traversals • Ideal when relationships between data is key: – e.g. social networks • Pros: fast network search, works with public linked data sets • Cons: Poor scalability when graphs don't fit into RAM, specialized query languages (RDF uses SPARQL) Copyright Kelly-McCreary & Associates, LLC 39 Examples: Neo4j, AllegroGraph, Bigdata triple store, InfiniteGraph, StarDog
  40. 40. M D Graph Stores • Used when the relationship and relationships types between items are critical • Used for – Social networking queries: "friends of my friends" – Inference and rules engines – Pattern recognition – Used for working with open-linked data • Automate "joins" of public data Copyright Kelly-McCreary & Associates, LLC 40
  41. 41. M D Nodes are "joined" to create graphs • How do you know that two items reference the same object? • Node identification – URI or similar structure Copyright Kelly-McCreary & Associates, LLC 41
  42. 42. M D Open Linked Data Copyright Kelly-McCreary & Associates, LLC 42
  43. 43. M D Neo4J • Graph database designed to be easy to use by Java developers • Dual license (community edition is GPL) • Works as an embedded java library in your application • Disk-based (not just RAM) • Full ACID Copyright Kelly-McCreary & Associates, LLC 43
  44. 44. M D Document Store • Data stored in nested hierarchies • Logical data remains stored together as a unit • Any item in the document can be queried • Pros: No object-relational mapping layer, ideal for search • Cons: Complex to implement, incompatible with SQL Copyright Kelly-McCreary & Associates, LLC 44 Examples: MarkLogic, MongoDB, Couchbase, CouchDB, eXist-db
  45. 45. M D Document Stores • Store machine readable documents together as a single blob of data • Use JSON or XML formats to store documents • Similar to "object stores" in many ways • No shredding of data into tables • Sub-trees and attributes of documents can still be queried XQuery or other document query languages • Quickly maturing to include ACID transaction support • Lack of object-relational mapping permits agile development • Fastest growing revenues (MarkLogic, MongoDB, Couchbase) Copyright Kelly-McCreary & Associates, LLC 45
  46. 46. M D Estimated Big Data and NoSQL Sales Copyright Kelly-McCreary & Associates, LLC 46 Document Stores
  47. 47. M D Object Relational Mapping • T1 – HTML into Objects • T2 –Objects into SQL Tables • T3 – Tables into Objects • T4 – Objects into HTML T1 T3 T2 T4 Object Middle Tier Relational Database Web Browser 47 Kelly-McCreary & Associates, LLC
  48. 48. M D The Addition of XML Web Services • T1 – HTML into Java Objects • T2 – Java Objects into SQL Tables • T3 – Tables into Objects • T4 – Objects into HTML • T5 – Objects to XML • T6 – XML to Objects 48 Copyright 2011 Kelly-McCreary & Associates T1 T3 T2 T4 Object Middle Tier Relational Database Web Browser T5 Web Service T6
  49. 49. M D "The Vietnam of Applications" • Object-relational mapping has become one of the most complex components of building applications today • A "Quagmire" where many projects get lost • Many "heroic efforts" have been made to solve the problem – Java Hibernate Framework – Ruby on Rails • But sometimes the best way to avoid complexity is to keep your architecture very simple Copyright Kelly-McCreary & Associates, LLC 49
  50. 50. M D Document Stores Need No Translation • Documents in the database • Documents in the application • No object middle tier • No "shredding" • No reassembly • Simple! 50 Copyright 2010 Dan McCreary & Associates Application Layer Database Document Document
  51. 51. M D Zero Translation (XML) • XML lives in the web browser (XForms) • REST interfaces • XML in the database (Native XML, XQuery) • XRX Web Application Architecture • No translation! 51 Copyright 2010 Dan McCreary & Associates Web Browser XML database XForms REST-Interfaces
  52. 52. M D "Schema Free" • Systems that automatically determine how to index data as the data is loaded into the database • No a priori knowledge of data structure • No need for up-front logical data modeling – …but some modeling is still critical • Adding new data elements or changing data elements is not disruptive • Searching millions of records still has sub- second response time 52 Copyright 2010 Dan McCreary & Associates
  53. 53. M D Schema-Free Integration "We can easily store the data that we actually get, not the data we thought we would get." Copyright Kelly-McCreary & Associates, LLC 53 XML v1 XML v2 XML v3 Enterprise Messaging System NoSQL Database
  54. 54. M D Upfront ER Modeling is Not Required • You do not have to finish modeling your data before you insert your first records • No Data Definition Language "DDL" is needed • Metadata is used to create indexes as data arrives • Modeling becomes a statistical process – write queries to find exceptions and normalize data • Exceptions make the rules but can still be used • Data validation can still be done on documents using tools such as XML Schema and business rules systems like Schematron Copyright Kelly-McCreary & Associates, LLC 54
  55. 55. M D Document Structure Copyright Kelly-McCreary & Associates, LLC 55 <books> is our root element <books> contain a sequence of one to many <book> elements Each <book> contains the following sequence of elements Darker lines mean "required" and light lines mean optional elements Id and title are required Books have 0 to many author-names Format and license elements are codes that must be in a fixed list of choices Only valid URL characters Must be a valid decimal number
  56. 56. M D MarkLogic • Native XML database designed to scale to Petabyte data stores • Leverages commodity hardware • ACID compliant, schema-free document store • Heavy use by federal agencies, document publishers and "high-variability" data • Arguably the most successful NoSQL company Copyright Kelly-McCreary & Associates, LLC 56
  57. 57. M D MongoDB • Open Source JSON data store created by 10gen • Master-slave scale out model • Strong developer community • Sharding built-in, automatic • Implemented in C++ with many APIs (C++, JavaScript, Java, Perl, Python etc.) Copyright Kelly-McCreary & Associates, LLC 57
  58. 58. M D Couchbase • Open source JSON document store • Code base separate from CouchDB • Built around memcached • Peer to peer scale out model • Written in C++ and Erlang • Strengths in scale out, replication and high-availability Copyright Kelly-McCreary & Associates, LLC 58
  59. 59. M D CouchDB • Apache CouchDB • Open source JSON data store • Document Model • Written in ERLANG • RESTful JSON API • Distributed, featuring robust, incremental replication with bi-directional conflict detection and management • B-Tree based indexing • Mobile version Copyright Kelly-McCreary & Associates, LLC 59
  60. 60. M D eXist • Open source native XML database • Strong support for XQuery and XQuery extensions • Heavily used by the Text Encoding Initiative (TEI) community and XRX/XForms communities • Integrated Lucene search • Collection triggers and versioning • Extensive XQuery libs (EXPath) • Version 2.0 has replication Copyright Kelly-McCreary & Associates, LLC 60
  61. 61. M D Two Models "Bag of Words" • All keywords in a single container • Only count frequencies are stored with each word "Retained Structure" • Keywords associated with each sub-document component 61 'love' 'hate' 'new' 'fear' keywords keywords keywords keywords keywords keywords doc-id Kelly-McCreary & Associates, LLC
  62. 62. M D Keywords and Node IDs • Keywords in the reverse index are now associated with the node-id in every document Node-id Node-id Node-id Node-id Node-id Node-id keywords keywords keywords keywords keywords keywords document-id 62
  63. 63. M D Hybrid architectures • Most real world implementations use some combination of NoSQL solutions • Example: – Use document stores for data – Use S3 for image/pdf/binary storage – Use Apache Lucene for document index stores – Use MapReduce for real-time index and aggregate creation and maintenance – Use OLAP for reporting sums and totals Copyright Kelly-McCreary & Associates, LLC 63
  64. 64. M D Tools to Help You Select A System • ATAM – Architecture Tradeoff Methodology • CMU developed process to objectively select a system architecture based on business driven use-cases and quality metrics Copyright Kelly-McCreary & Associates, LLC 64
  65. 65. M D ATAM Process Flow Copyright Kelly-McCreary & Associates, LLC 65 Business Drivers Quality Attributes User Stories Analysis Architecture Plan Architectural Approaches Architectural Decisions Tradeoffs Sensitivity Points Non-Risks RisksRisk Themes Distilled info Impacts
  66. 66. M D Insert/Select/Publish Comparison Insert Query Create Publishing Web Service SQL WebDAV SQL XQuery Java Tomcat AXIS JDBC Total Effort XQuery logical data modeling SQL SQL XQuery Effort 66
  67. 67. M D Sample Quality Attribute Tree Kelly-McCreary & Associates, LLC 67 Utility Searchability XML Importability Transformability Affordability Sustainability Interoperability Easy to add new XML data. (C, H) Use OpenSource Software. (H, H) Use long standing Standards. (VH, H) Use W3C Standards. (VH, H) Fulltext search on document data. (H, H) Easy to transform XML data. (H, H) Standards Ease of Change Use declarative languages. (VH, H) Security Prevents unauthorized access. (H, H) Fulltext Search XML Search Custom Scoring Drag-and-drop Bulk Import No License Fees XQuery XSLT Web Services Fine Grain Control Standards Based No Translation Key: (Importance, Score) Important to the Project C=Critical, VH=Very High, H=High, M=Medium Architectural Score H=High, M=Medium, L=Low Scoring via XQuery. (M, H) Role-based Works with W3C Forms. (H, M) Works with web standards. (VH, H) No complex languages to learn. (H, H) Fast searching. (H, H) Staff training. (H, M) Collection-based access. (H, M) Centralized security policy. (H, M) Mashups wtih REST Interfaces. (VH, H) Batch Import tools. (M, M) Transform to HTML or PDF. (VH, H) Customizable by non-programmers. (H, H)
  68. 68. M D Quality Attribute Tree App Kelly-McCreary & Associates, LLC 68
  69. 69. M D Making Sense of NoSQL Copyright Kelly-McCreary & Associates, LLC 69 http://manning.com/mccreary
  70. 70. M D 2013 NoSQL Now! • Dataversity's NoSQL Conference • August 20-22 • San Jose California Copyright Kelly-McCreary & Associates, LLC 70
  71. 71. M D Questions Thank You! Dan McCreary President, Kelly-McCreary & Associates dan@danmccreary.com twitter: dmccreary Copyright Kelly-McCreary & Associates, LLC 71
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×