Grails goes Graph

  • 2,193 views
Uploaded on

Slidedeck from my talk at Springone2gx in Washington DC.

Slidedeck from my talk at Springone2gx in Washington DC.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,193
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
23
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Eric Schmidt: “Every two days we create as much information as up to 2003”

Transcript

  • 1. Grails Goes Graph Stefan Armbruster, presales engineer @neotechnology stefan.armbruster@neotechnology.com Twitter: @darthvader42© 2012 SpringOne 2GX. All rights reserved. Do not distribute without permission.
  • 2. about @self
  • 3. This talk: Grails goes Graph• Intro into Graph (Databases)• Intro into Neo4j• Grails Neo4j plugin• Live demo• case study3
  • 4. trend 1: data growth source: Digital Universe Study 2011 by IDC4
  • 5. trend 2: data connectedness GGG Onotologies RDF Folksonomies Information connectivity Tagging Wikis UGC Blogs Feeds Hypertext Text Documents5
  • 6. trend 3: semi-structured information• Individualisation of content – 1970’s salary lists, all elements exactly one job – 2000’s salary lists, we need many job columns!• All encompassing “entire world views” – Store more data about each entity• Trend accelerated by the decentralization of content generation• Age of participation (“web 2.0”)6
  • 7. trend 4: architecture1980s: mainframe
  • 8. trend 4: architecture1990s: DB as integration platform
  • 9. trend 4: architecture2000s: decoupling of services
  • 10. trend 4: architecture2010: SOA
  • 11. trend 4: scale for performance Salary list Most Web apps Social Network Location-based services
  • 12. data is different over times: 4 trends1) amount of data grows (bigdata)2) data gets more connected3) less structure – semi-structured4) architecture – massive horizontal scalability12
  • 13. NoSQL – what does that mean? NO to SQL ? not only SQL!13
  • 14. simplistic cartography of NoSQL14
  • 15. side note: aggregate oriented databases “ "There is a significant downside - the whole approach works really well when data access is 89% of all virtualized applications aligned with the aggregates, but what if you want to lookthe world in a different way? ...Order in at the data run on VMware. entry naturally stores orders as aggregates, but Gartner, December 2008 analyzing product sales cuts across the aggregate structure. This is why aggregate- oriented stores talk so much about map-reduce" Martin Fowler on http://martinfowler.com/bliki/AggregateOrientedDatabase.html15
  • 16. graphs are everywhere
  • 17. graphs everywhere● Relationships in ● Politics, Economics, History, Science, Transportation● Biology, Chemistry, Physics, Sociology ● Body, Ecosphere, Reaction, Interactions● Internet ● Hardware, Software, Interaction● Social Networks ● Family, Friends ● Work, Communities ● Neighbours, Cities, Society17
  • 18. relationships● the world is rich, messy and related data● relationships are as least as important as the things they connect● Graphs = Whole > Σ parts● complex interactions● always changing, change of structures as well● Graph: Relationships are part of the data● RDBMS: Relationships part of the fixed schema18
  • 19. questions & answers● Complex Questions● Answers lie between the lines (things)● Locality of the information● Global searches / operations very expensive● constant query time, regardless of data volume19
  • 20. categories● Categories == Classes, Trees ?● What if more than one category fits?● Tags● Categories via relationships like „IS_A“● any number, easy change● „virtual“ Relationships - Traversals● Category dynamically derived from queries20
  • 21. everyone is talking about graphsFacebook Open Graph21
  • 22. Neo4j
  • 23. example of a property graph
  • 24. querying the graph: your choice• Simple way: navigate relationship paths by core API• More powerful: simple traversers with callbacks for ted – Where to end traversal eca r – What should be in the dep set result• Even more powerful: Traversal API – Fluent interface for specifying traversals,• Shell: mimics unix filesystem commands (ls, cd, ...)• Gremlin: graphetraversaldlanguage to b dep recate• Cypher: “the SQL for Neo4j” – Declarative – Designed for Humans (Devs + Domain experts)24
  • 25. Cypher examples START john=node:node_auto_index(name = John) MATCH john-[:friend]->()-[:friend]->fof RETURN john, fof START user=node(5,4,1,2,3) MATCH user-[:friend]->follower WHERE follower.name =~ /S.*/ RETURN user, follower.name25
  • 26. query performance• a sample social graph – with ~1,000 persons• average 50 friends per person• pathExists(a,b) limited to depth 4• caches warmed up to eliminate disk I/O # Person query time relational DB 1.000 2.000 ms Neo4j 1.000 2 ms Neo4j 1.000.000 2 ms
  • 27. deployment options• Embedded in JVM – Just drop couple of jars into your application – Use EmbeddedGraphDatabase – Very fast → no marshalling/unmarshalling, no network overhead• Neo4j as Server – Exposes rich REST interface • granular API → many requests, consider network overhead • use batching or Cypher if possible – Add custom modules to the server (plugins/unmanaged extensions)• Both, embedded and server can be run as HA! – One master, multiple slaves – Zookeeper for managing the cluster, about to change for upcoming versions
  • 28. Neo4j HA architecture
  • 29. Licensing Neo4j 3 editions available:• Community: – GPL• Advanced – Community + enhanced Monitoring + enhanced Webadmin – AGPL or Commercial• Enterprise – Advanced + HA + online backup + GCR-Cache – AGPL or Commercial
  • 30. Neo4j - Overview Sharding LS Master/Slave TRAVERSA HIG S H_A TE V AIL RA . TEG IN RUN S _AS E AS SCAL S_TO _ NS RU 30
  • 31. graphconnect.com, Nov 6 – 7
  • 32. GORM• Grails Object Relational Mapping (GORM) aka grails-data-mapping – Lib: https://github.com/SpringSource/grails-data-mapping• manages meta-model of domain classes• Common data persistence abstraction layer• Methods for domain classes (CRUD + finders + X)• Extensible• Access to low level API of the implementation• TCK for implementation, +200 testcases• Existing implementations – Simple (In-Memory, hashmap based for unit testing) – Hibernate, JPA – MongoDB, SimpleDB, Dynamo, Redis, (Riak), Neo4j
  • 33. some key abstractions in g-d-m• MappingContext: – holds metainformation about mapping domain classes to the underlying datastore, does type conversion, holds list of EntityPersisters• Datastore: – create sessions – manage connection to low-level storage• Session: – similar HibernateSession• EntityPersister: – does the dirty work: interact with low level datastore• Query: – knows how to query the datastore by criteria (criterion, projections,...)
  • 34. GORM has a price tag ;-)
  • 35. Grails Neo4j Integration• Resources: – Lib: https://github.com/SpringSource/grails-data-mapping – Plugin: http://www.grails.org/plugin/neo4j – Plugin docs: http://springsource.github.com/grails-data-mapping/neo4j/manual/index.html• goal: use Neo4j as persistence layer for a standard Grails domain model
  • 36. Mapping Grails domain model to the nodespace domain class reference node subreference domain class instance instance domain instance property properties association
  • 37. 2 “challanges” involved• Locking of domain nodes in HA mode reference node• Category nodes become “super nodes” – causes potential bottleneck on domain node traversalsSolutions:• add intermediate category nodes instance nodes• use indexing instead
  • 38. currently working in the neo4j plugin (1/2)• passing >98% of GORM TCK (hurray!)• accessing embedded, REST and HA datasources – and ImpermanentGraphdatabase for testing• property type conversion• support of schemaless properties• access to native API – instance.getNode(), bean: graphDatabaseService• GORM enhancements: – <DomainClass>.traverseStatic, <DomainClass>.cypherStatic – <instance>.traverse, <instance>.cypher
  • 39. currently working in the neo4j plugin (2/2)• prevention of locking exceptions by using intermediate category nodes• Declarative Indexing – apply static mapping closure just the standard way• convenience methods on Neo4js nodes and relationships: – node.<prop> = <value>• JSON marshalling for Neo4js Node and Relationships• embed Neo4js webadmin into grails application
  • 40. praying to the demo god...
  • 41. looking into the crystal ball• get rid of subreferences in favour of indexing• migrate plugin to use Cypher only instead of core-API• option for mapping domain classes as a relationship – think of roads between cities having a distance property• fix open issues: http://bit.ly/KEmVX2• maybe use Spring Data Neo4j internally• … and more
  • 42. case study• back in 2010 a website to collect and aggregate opinions of soccer fans went life• votes can be based on almost everything – players, teams, matches, events in matches• hard to model with classic RDBMS• Neo4j to the rescue, used in embedded mode• as always: hard and very tight schedule – build up technical debt due to lack of automated tests• Neo4j HA scales very good for reads
  • 43. case study: lessons learned• massive amount of very small write transactions in HA mode caused trouble: – e.g. locking exceptions upon user registration – aggregate multiple write transactions using JMS queue• serious issues with full GCs – since app AND Neo4j reside in same JVM full GCs happen – if “stop-the-world” pause is too large: master switch• have loadbalancer with 2 setups (planned): – write-driven requests go to master node – read-driven requests go to slave nodes
  • 44. References• general overview of nosql: – http://www.nosql-databases.org/• Neo4j itself: http://www.neo4j.org – http://api.neo4j.org – http://doc.neo4j.org• neo4j grails plugin: – source: https://github.com/SpringSource/grails-data-mapping – docs: http://springsource.github.com/grails-data-mapping/neo4j/ – issues: http://jira.grails.org/browse/GPNEO4J – demo app: https://github.com/sarmbruster/neo4jsample• Java REST driver: https://github.com/jexp/neo4j-java-rest-binding• my blog: http://blog.armbruster-it.de• twitter: @darthvader42