UGC = User Generated ContentGGG = Giant Global Graph (what the web will become)Ontologies are the structural frameworks for organizing information and are used in artificial intelligence, the Semantic Web, systems engineering, software engineering, biomedical informatics, library science, enterprise bookmarking, and information architecture as a form of knowledge representation about the world or some part of it. The creation of domain ontologies is also fundamental to the definition and use of an enterprise architecture frameworkA folksonomy is a system of classification derived from the practice and method of collaboratively creating and managing tags to annotate and categorize content; this practice is also known as collaborative tagging, social classification, social indexing, and social tagging. Folksonomy, a term coined by Thomas Vander Wal, is a portmanteau offolk and taxonomy.RDFa (or Resource Description Framework – in – attributes) is a W3C Recommendation that adds a set of attribute-level extensions toXHTML for embedding rich metadata within Web documents. The RDF data-model mapping enables its use for embedding RDFsubject-predicate-object expressions within XHTML documents, it also enables the extraction of RDF model triples by compliant user agents.
This is strictly about connected data – joins kill performance there.No bashing of RDBMS performance for tabular transaction processingGreen line denotes “zone of SQL adequacy”
Fowler points out that KV/Column/Document stores are all aggregates: they’re different from graphs because they enforce structure at design time – as an aggregate of data.Clump of data that can be co-located on a cluster instance and which is accessed together.“a fundamental unit of storage which is a rich structure of closely related data: for key-value stores it's the value, for document stores it's the document, and for column-family stores it's the column family. In DDD terms, this group of data is an aggregate.”
History – Amazon decide that they always wanted the shopping basket to be available, but couldn’t take a chance on RDBMSSo they built their ownBig risk, but simple data model and well-known computing science underpinning it (e.g. consistent hashing, Bloom filters for sensible replication)+ Massive read/write scale- Simplistic data model moves heavy lifting into the app tier (e.g. map reduce)
Mongo DB has a reputation for taking liberties with durability to get speedCouch DB has good multimaster replication from Lotus Notes
People talk about Codd’s relational model being mature because it was proposed in 1969 – 42 years old.Euler’s graph theory was proposed in 1736 – 275 years old.
Can’t easily shard graphs like documents or KV stores.This means that high performance graph databases are limited in terms of data set size that can be handled by a single machine.Can use replicas to speed things up (and improve availability) but limits data set size limited to a single machine’s disk/memory.Some domains can shard easily (.e.g geo, most web apps) using consistent routing approach and cache sharding – we’ll cover that later.
1. its not "Never SQL" NOSQL is simply…Not Only SQL NOSQL no-seek-wool n. Describes ongoing trend where developers increasingly opt for non-relational databases to help solve their problems, in an effort to use the right tool for the right job
2. Why NOSQL now? Driving trends
3. Trend 1: Data Size
4. Trend 2: Connectedness GGG Onotologies RDFa FolksonomiesInformation connectivity Tagging Wikis UGC Blogs Feeds Hypertext Text Documents
5. Trend 3: Semi-structured information• Individualisation of content – 1970’s salary lists, all elements exactly one job – 2000’s salary lists, we need many job columns!• All encompassing “entire world views”• Store more data about each entity• Trend accelerated by the decentralization of content generation – Age of participation (“web 2.0”)
6. Trend 4: Architecture1980’s: Single Application Application DB
7. Trend 4: Architecture1990’s: IntegrationDatabase Antipattern Application Application Application DB
8. Trend 4: Architecture2000’s: SOA RESTful, hypermedia, composite apps Application Application Application DB DB DB
9. Side note: RDBMS performance Salary list Most Web apps Social Network Location-based services
10. Four NOSQL Categories
11. Four NOSQL CategoriesAggregate-Oriented Databases
12. Key-Value Stores• “Dynamo: Amazon’s Highly Available Key- Value Store” (2007)• Data model: – Global key-value mapping – Big scalable HashMap – Highly fault tolerant (typically)• Examples: – Riak, Redis, Voldemort
13. Pros and Cons• Strengths – Simple data model – Great at scaling out horizontally • Scalable • Available• Weaknesses: – Simplistic data model – Poor for complex data
14. Column Family (BigTable)• Google’s “Bigtable: A Distributed Storage System for Structured Data” (2006)• Data model: – A big table, with column families – Map-reduce for querying/processing• Examples: – HBase, HyperTable, Cassandra
15. Pros and Cons• Strengths – Data model supports semi-structured data – Naturally indexed (columns) – Good at scaling out horizontally• Weaknesses: – Unsuited for interconnected data
16. Document Databases• Data model – Collections of documents – A document is a key-value collection – Index-centric, lots of map-reduce• Examples – CouchDB, MongoDB
17. Pros and Cons• Strengths – Simple, powerful data model (just like SVN!) – Good scaling (especially if sharding supported)• Weaknesses: – Unsuited for interconnected data – Query model limited to keys (and indexes) • Map reduce for larger queries
18. Graph Databases• Data model: – Nodes with properties – Named relationships with properties – Hypergraph, sometimes• Examples: – Neo4j (of course), Sones GraphDB, OrientDB, InfiniteGraph, AllegroGraph
19. Pros and Cons• Strengths – Powerful data model – Fast • For connected data, can be many orders of magnitude faster than RDBMS• Weaknesses: – Sharding • Though they can scale reasonably well • And for some domains you can shard too!
20. Disclaimer• I don’t hold any sort of copyright on any of the content used including the photos, logos and text and trademarks used. They all belong to the respective individual and companies• I am not responsible for, and expressly disclaims all liability for, damages of any kind arising out of use, reference to, or reliance on any information contained within this slide .