C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

1,707 views

Published on

Over the past few years, Health Market Science has transitioned from traditional relational databases and enterprise systems to a massively scalable Big Data platform that combines Cassandra and Storm to ingest thousands of feeds of data from the health market industry to produce a single high-quality masterfile. Come hear the "Why?", "What for?" and "How?" of that evolution.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,707
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
44
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

C* Summit 2013: Crossing the Chasm - SQL to NoSQL by Isaac Rieksts

  1. 1. © Health Market Science 2013, All Rights ReservedIsaac RiekstsSoftware Developer@IsaacRieksts, irieksts@gmail.comCROSSING THE CHASMSQL to NOSQL#Cassandra13
  2. 2. © Health Market Science 2013, All Rights ReservedOur Mission§ Deliver the most current information on the U.S. healthcareprovider universe using integrated solutions in order forcustomers to:›  Prevent fraud, waste and abuse across the healthcare system›  Comply with evolving state and federal regulations›  Improve market opportunity for non retail drugs and devices#Cassandra13
  3. 3. © Health Market Science 2013, All Rights ReservedThe BusinessBusinessSolutionsHealth Care Provider & FacilitiesVariety/Velocity•  >2000 of sources•  6 Million unique HCPs•  10+ years historyData Challenges•  Constant change in realworld data•  Conflicting & partial info•  Frequent changes tosource structure•  Authoritative sources vs.crowdsource•  Predicting source qualityMaster Data SolutionsMedical Procedures & DiagnosisVolume/Velocity•  ~1B claims annually•  +5B records annually•  5+ years historyData Challenges•  Sources haveincomplete capture•  Overlapping source data•  Statistical projections &biases•  Social media typerelationshipsMedical Claims DataBatch(CompleteView,Expense Manager,CompleteSpend)Transactional(PRS/PE)Big DataRelational DB &Analytics(Claims)#Cassandra13
  4. 4. © Health Market Science 2013, All Rights ReservedMaster Data ManagementVisualizationDashboard / ReportsStructured StorageRelationalIndexingFlexible StorageNoSQL Graph(s)InterfacingWeb ServicesDistributed ProcessingStandardizeValidateMatchConsolidateAnalyticsData SourcesGovernmentWebCustomerI’m happyUser Interface#Cassandra13
  5. 5. © Health Market Science 2013, All Rights ReservedConsolidationFirst Name: JohnMiddle Name: DavidLast Name: SmithFirst Name: MikeMiddle Name: SteveLast Name: SmithFirst Name: MikeMiddle Name: DavidLast Name: Smith#Cassandra13
  6. 6. © Health Market Science 2013, All Rights ReservedLegacy System§ Relational DB§ Jboss§ Jboss MQ§ 1 Week to process a record through the system#Cassandra13
  7. 7. © Health Market Science 2013, All Rights ReservedOur SolutionsBusinessNeedsFinance & LegalBusiness SystemsComplianceSales & MarketingSolutionsComplianceData Assessment, Integration, &OutsourcingEnrichment ServicesProvider Data01010011MarketIntelligenceHMSAuthoritativeSourcesPDC Federal StateMedical Claims Web DerivedAdvancedTechnologyStormHMS MDM#Cassandra13
  8. 8. © Health Market Science 2013, All Rights ReservedData Model§ Think of full entity§ Build entity as you go§ Get full view upon fetch§ Choose PK carefully#Cassandra13
  9. 9. © Health Market Science 2013, All Rights ReservedCassandra-Indexing§ Fast wide row alternate key for Cassandra§ Two row pull process›  Fetch PKs matching AK›  Use PK to fetch your datahttps://github.com/hmsonline/cassandra-indexing#Cassandra13
  10. 10. © Health Market Science 2013, All Rights ReservedCassandra-Indexing§ Key: Col1:Col2§ Index: Col2:Col1https://github.com/hmsonline/cassandra-indexing#Cassandra13
  11. 11. © Health Market Science 2013, All Rights ReservedCassandra-Indexing Example§ Key: <First Name>:<Last Name>§ Index: <Last Name>:<First Name>§ Data›  John:Smith›  Steve:Smith›  David:Jones§ Index fetch “Smith” => John:Smith, Steve:Smith§ Index fetch “Jones” => David:Joneshttps://github.com/hmsonline/cassandra-indexing#Cassandra13
  12. 12. © Health Market Science 2013, All Rights ReservedSystem Phase 1#Cassandra13
  13. 13. © Health Market Science 2013, All Rights ReservedSystem Phase 2#Cassandra13
  14. 14. © Health Market Science 2013, All Rights ReservedSystem Phase 3#Cassandra13
  15. 15. © Health Market Science 2013, All Rights ReservedOracle Advanced Queue§ Integrate Relation DB and JMS§ Near Real time processing of data›  Table trigger§ Bulk exports›  Keep only what you need on the queue#Cassandra13
  16. 16. © Health Market Science 2013, All Rights ReservedOracle Advanced Queue (cont)§ Distributed processing›  Write to Cassandra as of queue time›  Write only ids and query back for data#Cassandra13
  17. 17. © Health Market Science 2013, All Rights ReservedUnit testing§ Module level›  In memory mock›  Map<String, Map<String, Map<String, Map<String, String>>>>›  Map<Keyspace, Map<Column Family, Map<Column, Map<RowKey, Value>>>>§ Integration›  Embedded Cassandra super class›  Schema migration#Cassandra13
  18. 18. © Health Market Science 2013, All Rights ReservedQA§ Fail fast and early§ SoapUI and Maven#Cassandra13
  19. 19. © Health Market Science 2013, All Rights ReservedOrganization Design§ Project Manager§ Business Analyst§ Quality Assurance§ Software Developer§ Development Operations#Cassandra13
  20. 20. © Health Market Science 2013, All Rights ReservedDevops§ Virtual Hardware (VMware)§ Puppet›  Puppet Master›  Jenkins§ Promote using config›  Same script run in DEV as in Prod#Cassandra13
  21. 21. © Health Market Science 2013, All Rights ReservedReal-time SystemKafkaQueue(s)OffsetC*ABCC* ES1KafkaElasticSearchES2C*REST API#Cassandra13
  22. 22. © Health Market Science 2013, All Rights ReservedStorm•  Guaranteed once semantics•  Well-designed processing abstraction•  Beats BYODP•  Momentum#Cassandra13
  23. 23. © Health Market Science 2013, All Rights ReservedStorm and Cassandra§ Use Cases:›  Write Storm Tuple data to C*§  Computation Results§  Pre-computed indices›  Read data from C* and emit Storm Tuples§  Dynamic Lookupshttp://github.com/hmsonline/storm-cassandra#Cassandra13
  24. 24. © Health Market Science 2013, All Rights ReservedStorm-Cassandra Project§ ColumnsMapper Interface›  Tells the CassandraLookupBolt how to transform a C* row into aStorm Tuple§ Given a C* Row Key and list of Columns:›  Return a list of Storm Tupleshttp://github.com/hmsonline/storm-cassandra#Cassandra13
  25. 25. © Health Market Science 2013, All Rights ReservedVisionEngine•  Unpredictable schema/layout•  Expand data storagestructure dynamically•  Fuzzy SearchUnstructured Data•  Traversing relationships•  Building connections•  Real time relationshipchangesGraph Database•  Traditional data base•  Predictable, logical structure•  Faceted SearchStructured Data•  Scalability•  Performance•  Processing power•  Virtual grow/shrinkDistributed ProcessingData#Cassandra13
  26. 26. © Health Market Science 2013, All Rights ReservedSummary§ Cassandra-Indexing§ Oracle Advanced Queue§ Storm-Cassandra#Cassandra13
  27. 27. © Health Market Science 2013, All Rights ReservedTHE SCIENCE OFBETTER RESULTSwww.healthmarketscience.com2700 Horizon Drive • King of Prussia, PA 19406 • 800.593.4467 • info@healthmarketscience.comQuestions?#Cassandra13

×