Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

OrientDB: Unlock the Value of Document Data Relationships

1,391 views

Published on

a) A general introduction of graph databases and OrientDB,

b) Why connected data has more value than just data,

c)How to "have fun" with OrientDB combining documents with graphs via SQL,

d) A use case on how OrientDB has helped to raise standards in Irish Public Office.

On OrientDB: NOSQL document databases provide an elegant way to deal with data in different shapes enabling developers to create better and faster products quickly. The main goal of these systems is to find the most efficient solution to manage data itself. With the Big Data Explosion we need to deal with a myriad of highly interconnected information. The challenge now is not only on how to store data but on how to manage, analyse, traverse and use your data within the context of relationships. Graph databases shine at maintaining highly connected data and is the fastest growing category in database management systems: 2014 registered an increase of 250% in terms of adoption and Forrester Research predicts that more than a quarter of enterprises will be using graphs by 2017. OrientDB combines more than one NOSQL model offering the unique flexibility of modelling data in the form of either documents, or graphs, while incorporating object oriented programming as a way of encapsulating relationships.

Published in: Data & Analytics

OrientDB: Unlock the Value of Document Data Relationships

  1. 1. OrientDB: Unlock the Value of Document Data Relationships Fabrizio Fortino @fabriziofortino 11th April 2016 #HUGIreland @boistartups
  2. 2. The world is changing Unstructured Data Big Data Explosion Connected Data Mobile, IOT http://destinhaus.com/internet-of-things-the-rise-of-smart-manufacturing/
  3. 3. “… starting a new strategic enterprise application you should no longer be assuming that your persistence should be relational. The relational option might be the right one - but you should seriously look at other alternatives.” Polyglot Persistence [2011] Martin Fowler Rethink how we store data
  4. 4. A Polyglot Persistence example E-commerce Application Primary Store + Financial Data (RDBMS) Recommendations (Graph) Products Catalog (Document) User Sessions (Key-Value) ETL Jobs / Data Synchronisation
  5. 5. • Hire experts for each database type • No standards between NOSQL products • Increased overall complexity • High TCO • Write and maintain ETL and data synchronisation • Hard to refactor • Testing can be tough More flexibility, at what price?
  6. 6. Entering Multi-Model Databases GraphDocument Object Key/Value Full-Text Spatial Multi-Model represents the intersection of multiple models in a single product
  7. 7. Product Positioning Quadrant RelationshipComplexity> Data Complexity > Relational Key Value Column Graph Document Multi-Model
  8. 8. • First Multi-Model DBMS with a Graph Engine • Community Edition FREE (Apache v2 License) • Enterprise Edition (profiler, live monitor, telereporter, etc) • Vibrant community (≈ 100 contributors, ≈ 15K commits) • Easy to install and use • Zero configuration Multi-Master Architecture • ACID • Reactive (Live Queries) OrientDB at a Glance
  9. 9. Quite a long journey 1998 2009 2010 2011 20152012 20142013 OrientDB: First ever multi-model DBMS released as Open Source R&D 2016 OrientDB Enterprise Launch 0 12K 70K 3K 1K 200 Downloads / month Orient ODBMS: First ever ODBMS with index-free adjacency
  10. 10. Under the hood Storage Memory Works in Memory Only (Ideal for Integration Testing) PLocal Write/Read to/from File System Remote Delegates all Operations to a Remote Server Document API Handles Records as Documents Graph API TinkerPop Blueprints Implementation Object API POJO to Document mapping User Application
  11. 11. • Embedded (in-process) • Single, Standalone Node • Multi-Master Replica • Mixed Deployment options Application Application Application Application Application
  12. 12. Document API • Lowest level API • Document (record) is the storage’s unit • An immutable id (ORID) is automatically set to each document • Documents can contain key-value pairs or nested/ embedded documents (no ORID) • Transactions support (optimistic mode with MVCC) • Classes are logical sets of documents
  13. 13. Schema-less, Schema-full or Hybrid? Schema-less relaxed model, the type of each field is inferred for each document Schema-full strict model, schema with constraints on fields and validation rules Hybrid mixed model, schema with mandatory and optional fields with constraints and validation rules
  14. 14. • Can inherits from other classes, creating a tree (similar to RDF Schema) • A sub-class inherits all the schema fields from the parents • An abstract class is used as the foundation for other classes (it cannot have records) • Class hierarchies allow native polymorphic queries • 1 to 1 mapping with domain objects Class concept is taken from OOP
  15. 15. Let’s create a Document ` { ”@rid": “#12:216”, ”@class": ”user", “name”: “Fabrizio”, “meetups”: [ { “name”: “HUG Ireland”, “city”: “Dublin”, “since”: “14-03-2014” } ], “details”: { “@type”: “d”, “@class”: “user_details” “city”:”Dublin”, “nationality”:”IT” } } Immutable Record ID Logical set Property Array of objects Embedded document
  16. 16. Let’s create a Document ` { ”@rid": “#12:216”, ”@class": ”user", “name”: “Fabrizio”, “meetups”: [ { “name”: “HUG Ireland”, “city”: “Dublin”, “since”: “14-03-2014” } ], “details”: { “@type”: “d”, “@class”: “user_details” “city”:”Dublin”, “nationality”:”IT” } } Immutable Record ID Logical set Property Array of objects Embedded document With a traditional Document DB you have to duplicate your data to some degree. The degree depends on how complex are the interdependencies of the application domain. OrientDB combines the unique flexibility of documents with the power of graphs to unlock the business value of Document Data Relationships.
  17. 17. Graphs: everything old is new again https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg
  18. 18. What is a Graph Database? “A Graph Database is any storage system that provides index-free adjacency” The Graph Traversal Pattern [2010] Marco A. Rodriguez G = (V, E) Graph Vertex Edge A
  19. 19. • Given a User (Fabrizio) • Find Fabrizio (id=10) in member table O(log n) • Find 18 and 24 (Hug Ireland & Microservices) in Meetup table O(log n) What’s wrong with joins? name id Fabrizio 10 Uli 12 John 13 Eddie 88 User user_id meetup_id 10 18 10 24 13 18 88 66 member id name 18 HUG Ireland 57 AWS Users 24 Microservices 66 Scala Meetup • Joins are computed every time you cross relationships • Time complexity grows with data: O(log n) • Joining 3-4 tables with million of records could create billion combinations
  20. 20. • Given a User (Fabrizio) • Traverse the edges member to reach Hug Ireland O(1) & Microservices O(1) • Fabrizio is the index to reach the linked Meetups! The Graph as an Index • Every vertex and edge is “hard wired” to its adjacent vertex or edge • Traversing an edge does not require complex computation, near O(1) • The traversal time is not affected by the database size Fabrizio HUG Ireland Micro Services member member Easier to sketch!
  21. 21. Combine Documents with Graphs ` { “@rid”: “12:216”, “@class”: ”user", “name”: “Fabrizio”, “details”: { “@type”: “d”, “@class”: “user_detail”, “city”: “Dublin”, “nationality”: ”IT” } ` { “@rid”: “13:12”, “@class”: “meetup”, “name”: “HUG Ireland”, “city”: “Dublin” } ` { “@rid”: “14:32”, “@class”: “member”, “since”: “14-03-2014”, “in”: “12:216”, “out”: “13:12” } out_member=14:32 in_member=14:32 { “@rid”: “15:79”, “@class”: “talk”, “title”: “OrientDB”, “on”: “11-04-2016”, “in”: “12:216”, “out”: “13:12” } out_talk=15:79 in_talk=15:79
  22. 22. Combine Documents with Graphs ` { “@rid”: “12:216”, “@class”: ”user", “name”: “Fabrizio”, “details”: { “@type”: “d”, “@class”: “user_detail”, “city”: “Dublin”, “nationality”: ”IT” } ` { “@rid”: “13:12”, “@class”: “meetup”, “name”: “HUG Ireland”, “city”: “Dublin” } ` { “@rid”: “14:32”, “@class”: “member”, “since”: “14-03-2014”, “in”: “12:216”, “out”: “13:12” } out_member=14:32 in_member=14:32 { “@rid”: “15:79”, “@class”: “talk”, “title”: “OrientDB”, “on”: “11-04-2016”, “in”: “12:216”, “out”: “13:12” } out_talk=15:79 in_talk=15:79 Multi-relational Document Graph
  23. 23. Will you believe me if I said you can query documents/graphs with SQL like syntax? Show me something now! OK, time for a quick demo. http://www.sharegoodstuffs.com/2011_12_12_archive.html
  24. 24. Use Case: raise standards in Irish Public Office
  25. 25. • Aggressive deadline • Large amount of data from different sources with different formats • Messy, dirty data • Connects records from different sources representing the same thing without a common identifier • Multiple steps traverse of fixed and inferred links to identify disparate entities connected by a path The challenges
  26. 26. The solution OrientDB Fuzzy Inference Engine
  27. 27. • Main Language: Groovy • Database Type: OrientDB Embedded • Fuzzy Inference Engine: Duke • minHash proximity index based on Lucene to avoid cartesian product • probabilistic model with configurable statistical algorithms (Levenshtein, NGram, Soundex, Custom, etc) to identify the same entities despite differences • End-To-End Process Time < 10 min • Deliverable: Database • Preset of queries to answer the main questions (analysts are completely independent to add / modify where conditions) • GraphView to visually search and visualise data Technical Details
  28. 28. What people from home perceived ≈ 20K tweets Top hashtag in Ireland for 24 hours#rteinvestigates
  29. 29. “While we’ve long understood the value of Big Data to better understand how people interact with us, we’ve noticed an alarming trend of Big Data envy: organizations using complex tools to handle “not-really-that-big” Data. Distributed map- reduce algorithms are a handy technique for large data sets, but many data sets we see could easily fit in a single node relational or graph database. Even if you do have more data than that, usually the best thing to do is to first pick out the data you need, which can often then be processed on such a single node” OK but what about Big Data? ThoughtWorksTechnology Radar, 5 April 2016
  30. 30. Begin the journey! https://www.udemy.com/orientdb-getting-started/
  31. 31. • http://martinfowler.com/bliki/PolyglotPersistence.html • https://en.wikipedia.org/wiki/Multi-model_database • http://orientdb.com/ • https://en.wikipedia.org/wiki/Seven_Bridges_of_Königsberg • http://arxiv.org/pdf/1004.1001.pdf • https://www.udemy.com/orientdb-getting-started/ • http://www.rte.ie/news/investigations-unit/2015/1207/751833-rte- investigates/ • https://github.com/larsga/Duke • https://www.thoughtworks.com/radar Resources
  32. 32. Q A Thank you! & Fabrizio Fortino @fabriziofortino 11th April 2016 #HUGIreland @boistartups

×