An Introduction to NOSQL, Graph Databases and Neo4j

12,498 views
12,104 views

Published on

Class lecture on "An Introduction to NOSQL, Graph Databases and Neo4j"

Published in: Technology
1 Comment
17 Likes
Statistics
Notes
  • my question is if its good enough to beat an Oracle Enterprise. I allways think about the enterprise thats why I dont take serious things like ruby on rails.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
12,498
On SlideShare
0
From Embeds
0
Number of Embeds
79
Actions
Shares
0
Downloads
576
Comments
1
Likes
17
Embeds 0
No embeds

No notes for slide
  • Future stores will be mixed – right shape for the right jobPolyglot persistenceFrameworks (e.g. spring data) embracing this already
  • UGC = User Generated ContentGGG = Giant Global Graph (what the web will become)
  • This is strictly about connected data – joins kill performance there.No bashing of RDBMS performance for tabular transaction processingGreen line denotes “zone of SQL adequacy”
  • Fowler points out that KV/Column/Document stores are all aggregates: they’re different from graphs because they enforce structure at design time – as an aggregate of data.Clump of data that can be co-located on a cluster instance and which is accessed together.“a fundamental unit of storage which is a rich structure of closely related data: for key-value stores it's the value, for document stores it's the document, and for column-family stores it's the column family. In DDD terms, this group of data is an aggregate.”
  • History – Amazon decide that they always wanted the shopping basket to be available, but couldn’t take a chance on RDBMSSo they built their ownBig risk, but simple data model and well-known computing science underpinning it (e.g. consistent hashing, Bloom filters for sensible replication)+ Massive read/write scale- Simplistic data model moves heavy lifting into the app tier (e.g. map reduce)
  • People talk about Codd’s relational model being mature because it was proposed in 1969 – 42 years old.Euler’s graph theory was proposed in 1736 – 275 years old.
  • An Introduction to NOSQL, Graph Databases and Neo4j

    1. 1. NOSQL Databases and Neo4j
    2. 2. Database and DBMS• Database - Organized collection of data• The term database is correctly applied to the data and their supporting data structures.• DBMS - Database Management System: a software package with computer programs that controls the creation, maintenance and use of a database.
    3. 3. Types of Happening Databases• Relational database – nothing new but still in use and it seems it will always be a happening one.• Cloud databases – everything is cloudy.• Data warehouse – Huge! Huge! Huge! archives.• Embedded databases – you can’t see them :P• Document oriented database – the In thing.• Hypermedia database – WWW.• Graph database – facebook, twitter, social network.
    4. 4. NOSQL is simply…Not OnlySQL
    5. 5. Why NOSQL now? Driving trends
    6. 6. Trend 1: Data Size3000 2011?250020001500 20101000 500 2009 2007 2008 0
    7. 7. Trend 2: Connectedness GGG Onotologies RDFa FolksonomiesInformation connectivity Tagging Wikis UGC Blogs Feeds Hypertext Text Documents
    8. 8. Trend 3: Semi-structured information• Individualisationof content – 1970’s salary lists, all elements exactly one job – 2000’s salary lists, we need many job columns!• Store more data about each entity• Trend accelerated by the decentralization of content generation – Age of participation (“web 2.0”)
    9. 9. Trend 4: Architecture1980’s: Single Application Application DB
    10. 10. Trend 4: Architecture1990’s: IntegrationDatabase Antipattern Application Application Application DB
    11. 11. Trend 4: Architecture2000’s: SOA RESTful, hypermedia, composite apps Application Application Application DB DB DB
    12. 12. Side note: RDBMS performance Salary list Most Web apps Social Network Location-based services
    13. 13. Four NOSQL Categories
    14. 14. Four NOSQL Categories
    15. 15. Key-Value Stores• “Dynamo: Amazon’s Highly Available Key- Value Store” (2007)• Data model: – Global key-value mapping – Highly fault tolerant (typically)• Examples: – Riak, Redis, Voldemort
    16. 16. Column Family (BigTable)• Google’s “Bigtable: A Distributed Storage System for Structured Data” (2006)• Data model: – A big table, with column families – Map-reduce for querying/processing• Examples: – HBase, HyperTable, Cassandra
    17. 17. Document Databases• Data model – Collections of documents – A document is a key-value collection – Index-centric, lots of map-reduce• Examples – CouchDB, MongoDB
    18. 18. Graph Databases• Data model: – Nodes with properties – Named relationships with properties – Hypergraph, sometimes• Examples: – Neo4j (of course), SonesGraphDB, OrientDB, InfiniteGraph, AllegroGraph
    19. 19. Why Graph Databases?• Schema Less and Efficient storage of Semi Structured Information• No O/R mismatch – very natural to map a graph to an Object Oriented language like Ruby.• Express Queries as Traversals. Fast deep traversal instead of slow SQL queries that span many table joins.• Very natural to express graph related problem with traversals (recommendation engine, find shortest parth etc..)• Seamless integration with various existing programming languages.• ACID Transaction with rollbacks support.• Whiteboard friendly – you use the language of node,properties and relationship to describe your domain (instead of e.g. UML) and there is no need to have a complicated O/R mapping tool to implement it in your database. You can say that Neo4j is “Whiteboard friendly” !(http://video.neo4j.org/JHU6F/live-graph- session-how-allison-knows-james/)
    20. 20. Social Network “path exists” Performance• Experiment: • ~1k persons # persons query time • Average 50 friends per Relational 1000 2000ms database person • pathExists(a,b) limited to depth 4
    21. 21. Social Network “path exists” Performance• Experiment: • ~1k persons # persons query time • Average 50 friends per Relational 1000 2000ms database person Neo4j 1000 2ms • pathExists(a,b) limited to depth 4
    22. 22. Social Network “path exists” Performance• Experiment: • ~1k persons # persons query time • Average 50 friends per Relational 1000 2000ms database person Neo4j 1000 2ms • pathExists(a,b) Neo4j 1000000 2ms limited to depth 4
    23. 23. What are graphs good for?• Recommendations• Business intelligence• Social computing• Geospatial• Systems management• Web of things• Genealogy• Time series data• Product catalogue• Web analytics• Scientific computing (especially bioinformatics)• Indexing your slow RDBMS• And much more!
    24. 24. Graphs
    25. 25. Directed Graphs
    26. 26. Breadth First Search
    27. 27. Depth First Search ?????????????????
    28. 28. Graph Databases• A graph database stores data in a graph, the most generic of data structures, capable of elegantly representing any kind of data in a highly accessible way.
    29. 29. Graphs• “A Graph —records data in→ Nodes —which have→ Properties”
    30. 30. Graphs• “Nodes —are organized by→ Relationships — which also have→ Properties”
    31. 31. Query a graph with Traversal• “A Traversal —navigates→ a Graph; it — identifies→ Paths —which order→ Nodes”
    32. 32. Indexes• “An Index —maps from→ Properties —to either→ Nodes or Relationships”
    33. 33. Neo4j is a Graph Database• “A Graph Database —manages a→ Graph and —also manages related→ Indexes”
    34. 34. Neo4j – Hey! This is why I am a Graph Database.• The fundamental units that form a graph are nodes and relationships.• In Neo4j, both nodes and relationships can contain properties.• Nodes are often used to represent entities, but depending on the domain relationships may be used for that purpose as well.
    35. 35. Node in Neo4j
    36. 36. Relationships in Neo4j• Relationships between nodes are a key part of Neo4j.
    37. 37. Relationships in Neo4j
    38. 38. Twitter and relationships
    39. 39. Properties• Both nodes and relationships can have properties.• Properties are key-value pairs where the key is a string.• Property values can be either a primitive or anarray of one primitive type. For example String, int and int[] values are valid for properties.
    40. 40. Properties
    41. 41. Paths in Neo4j• A path is one or more nodes with connecting relationships, typically retrieved as a query or traversal result.
    42. 42. Traversals in Neo4j• Traversing a graph means visiting its nodes, following relationships according to some rules.• In most cases only a subgraph is visited, as you already know where in the graph the interesting nodes and relationships are found.• Traversal API• Depth first and Breadth first.
    43. 43. Starting and Stopping
    44. 44. Preparing the database
    45. 45. Wrap mutating operations in a transaction.
    46. 46. Creating a small graph
    47. 47. Print the data
    48. 48. Remove the data
    49. 49. The Matrix Graph Database
    50. 50. Traversing the Graph
    51. 51. Resources & References• Neo4j website : http://neo4j.org/• Neo4j learning resources: http://neo4j.org/resources/• Videos about Neo4j: http://video.neo4j.org/• Neo4j tutorial: http://docs.neo4j.org/chunked/snapshot/tuto rials.html• Neo4j Java API documentation: http://api.neo4j.org/current/

    ×