Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Streaming Dimensional Models from 3NF Deltas in near real time

29 views

Published on

To integrate legacy systems by streaming updated Dimensional models in near real-time in a schema agnostic fashion.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Streaming Dimensional Models from 3NF Deltas in near real time

  1. 1. Copyright © 2017 QuintilesIMS. All rights reserved. Streaming Dimensional Models from 3NF Deltas in near real time Scott Lin, Director of Engineering
  2. 2. 1 • Q: What is the business need? – Want all dependent systems to be able to access the most up-to-date information. – Be able to pick and choose only the data areas of interest. – Example: Up-to-date provider contact info • Q: What is the solve? – Create a platform to send the updated Dimensional data (aka. de-normalized data) of your choosing to your system (DB) in near real-time. – In short, when the data changes, we want to send you the info and everything related to that piece of info. Introduction 11/9/1 7
  3. 3. 2 11/9/1 7 Architecture Overview – message driven system Mesosphere Platform Kafka Cluster Subscription Manager REST API Subscription Manager DCOS DCOS ELK Cluster Activity Logs Statistic Handler DCOS Akka - HTTP Streaming data GDM Compliant Table Topics MetaData_Job Topics Performance Stats Add Topics, Remove Topics, List Topics Schema Cacher DCOS Dimension Assembler DCOS Cluster App DB Oracle Performance Logs & Stats ElasticSearch GDM Avro Messages Dimension Data Synchronizer MetaData Response Topics HDFS Cluster HiveDB GDM DB Reference Schema Kafka Platform
  4. 4. 3 11/9/1 7 Architecture Where’s my delta?
  5. 5. 4 11/9/1 7 Architecture Process flow Kafka Cluster Subscription Manager DCOS Streaming data GDM Compliant Table Topics MetaData_Job Topics Performance Stats Schema Cacher DCOS Dimension Assembler DCOS Cluster GDM Avro Messages Dimension Data Synchronizer MetaData Response Topics HDFS Cluster HiveDB GDM DB Reference Schema Kafka Platform
  6. 6. 5 11/9/1 7 Architecture Schema agnostic and dimension model is data driven Provider Provider_ID (PK) First Last Provider_Address_ID(FK) Provider_Type_ID(FK) NPI Degree ... Provider_Address Provider_Address_ID(PK) Facility_Type_ID(FK) Address1 Address2 Work_Number Fax_Number Provider_Type Provider_Type_ID(PK) Provider_Type_Name Provider_Type_Desc Facility_Type Facility_Type_ID(PK) Facility_Name Facility_Desc 1..N ProviderDim Provider_ID (PK) First Last Provider_Address_ID Provider_Type_ID NPI Degree … ------------------ Provider_Type_Name Provider_Type_Desc ------------------ Facility_Type_ID Address1 Address2 Work_Number Fax_Number ------------------ Facility_Name Facility_Desc
  7. 7. 6 11/9/1 7 • List all active topics curl -header "token: cdtshei02d_RefStore_Tok" http://localhost:8082/subscriptions • Add or enable topic curl -X POST -header "token: cdtshei02d_RefStore_Tok" -H "Content-Type: application/json" -d '{"topic": "Golden.IMS.SANDBOX.LOCATION.3NF.CDC.GDM_LOC"}' http://localhost:8082/subscriptions • Delete or disable topic curl -X DELETE --header "token: cdtshei02d_RefStore_Tok" http://localhost:8082/subscriptions/Golden.IMS.SANDBOX.LOCATION.3NF.CDC.GDM_ ACTVTY_CTR Integration Topics management through REST apis
  8. 8. 7 11/9/1 7 Process tracking Subscription Manager – Kafka Message Received by Topic
  9. 9. 8 11/9/1 7 Process tracking Assembler Errors
  10. 10. 9 11/9/1 7 Process tracking Avro Messages Generated by Topic
  11. 11. 10 11/9/1 7 1. Don’t go all-in on a particular DB technology unless you have run a data proof of concept with real data. 2. Distributed backend processes are hard to debug, so capture critical statistics as part of the process output. Conclusion Lessons learned
  12. 12. 11 11/9/1 7 Thank you! scott.lin@quintilesims.com

×