Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

642 views

Published on

An introduction to Titan and TinkerPop.

http://www.meetup.com/nj-datascience/events/222637527/

Published in: Software
  • Be the first to comment

Graph Analytics - Titan and Cassandra @NJ Data Science Meetup

  1. 1. Text Titan By Isaac Rieksts @IsaacRieksts 1These thoughts are mine own and do not represent the company
  2. 2. Outline Graph Database overview Tinkerpop Titan Graph Queries Our use of Ttian Demo
  3. 3. Graph Databases Id Name 1 Bob 2 Tom 3 Joe Person Knows 1 2 1 3 Person Crosswalk Bob Joe TomKnows Knows
  4. 4. Tinkerpop Abstraction layer Query Language Computing
  5. 5. Graphs with Spark GraphX Pregel GraphLab
  6. 6. Why Titan Flexible backend No added infrastructure cost Community support
  7. 7. Backends Cassandra Hbase Hazelcastcache Persistit Berkeley
  8. 8. Text Database Triangle http://blog.nahurst.com/visual-guide-to-nosql-systems
  9. 9. HBase Strong consistency at the record level Transaction support Store procedures Replication
  10. 10. Cassandra Tunable consistency Multiple datacenter support Built in replication and fault tolerance CQL query language Keyspace passwords
  11. 11. Indexing Built-in Fast for exact matches Lucene More advanced queries Good for single box Elasticsearch Advanced queries large scale clusters
  12. 12. Gremlin vs SPARQL Support for complex queries http://gremlindocs.com/ Easy query language http://www.w3.org/TR/rdf- sparql-query/ Gremlin SPARQL
  13. 13. Gremlin vs SPARQL example 1 g.v(‘tg:1') .out('tg:knows') SELECT ?x WHERE { tg:1 tg:knows ?x } Gremlin SPARQL
  14. 14. g.v(‘tg:1') .out(‘tg:knows') .out('tg:name') SELECT ?y WHERE { tg:1 tg:knows ?x . ?x tg:name ?y } Gremlin SPARQL Gremlin vs SPARQL example 2
  15. 15. Our Mission ▪Deliver the most current information on the U.S. healthcare provider universe using integrated solutions in order for customers to: › Prevent fraud, waste and abuse across the healthcare system › Comply with evolving state and federal regulations › Improve market opportunity for non retail drugs and devices Health Market Science a Lexisnexis Company
  16. 16. The Business Business
 SolutionsHealth Care Provider & Facilities Variety/Velocity • >2000 of sources • 6 Million unique HCPs • 10+ years history Data Challenges • Constant change in real world data • Conflicting & partial info • Frequent changes to source structure • Authoritative sources vs. 
 crowdsource • Predicting source quality Master Data Solutions Medical Procedures & Diagnosis Volume/Velocity • ~1B claims annually • +5B records annually • 5+ years history Data Challenges • Sources have incomplete capture • Overlapping source data • Statistical projections & biases • Social media type relationships Medical Claims Data Batch (CompleteView, Expense Manager) Transactional (PRS/MDM/ VerifyRx) Big Data Relational DB & Analytics (Claims)
  17. 17. Master Data Management Visualization Dashboard / Reports Structured Storage RelationalIndexing Flexible Storage NoSQL Graph(s) Interfacing Web Services Distributed Processing Standardize Validate Match Consolidate Analytics Data Sources Government Web Customer I’m happy User Interface
  18. 18. Our use of Titan Link storage Analytics of links Affiliation of business influences Visualization of relationships
  19. 19. Demo

×