Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Rethinking RDBMS Data Migration: Modernizing Traditional ETL Processes to Cloud-Native Event Driven Microservice

407 views

Published on

Published in: Technology
  • Be the first to comment

Rethinking RDBMS Data Migration: Modernizing Traditional ETL Processes to Cloud-Native Event Driven Microservice

  1. 1. 1 © 2016 Pivotal Rethinking RDBMS Data Migration Modernizing Traditional ETL Processes to Cloud-Native Event Driven Microservices Anupama Pradhan, Sr Technology Architect, HCSC Jeff Cherng, Advisory Data Engineer, Pivotal
  2. 2. 2 Traditional ETL. § Built for analytics and reporting § Typically a batch process • Stale data • Batch window shrinking § High CPU processing § Monolith § Long Development cycles § Testing challenges § Skillset for Operations
  3. 3. 3 Modern ETL Needs § Near Real time ETL processing § Scalability § Testability § Agility
  4. 4. 4 Simple ETL Process Source DB ETL PROCESS Destination metaData StatusCd 1 10 2 10 3 10
  5. 5. 5 ETL PROCESS Event Driven ETL § Need to manage state § Read events in batches and scale rest of process § Read Events needs to be single instance, as such unable to scale horizontally. EXTRACT metaData StatusCd 1 10 2 10 3 10 metadata Read Events Update Status Event Table Target Source DB Read Events TRANSFORM LOAD
  6. 6. 6 ETL Process: Distributed § Independently scalable § Micro-service Architecture over messaging middleware PROCESSOR PROCESSOR EXTRACT X Transform SrcId FunctionUnit StatusCd 1 grp1 10 2 grp2 10 3 grp3 10 Read Events Update Status Event Table Target Source DB
  7. 7. 7 Spring Cloud Stream § Framework for building message-driven microservice applications § Based spring integration framework. § Provides abstraction for message middleware. § Provides out of box configurations. § Cloud Native Friendly § Easy Continuous Integration, Continuous Delivery • Each component can be independently tested. SOURCE PROCESSOR EXTRACT X Transform SINK
  8. 8. 8 ETL Process with Spring Cloud Stream § Independently scalable § Independently deployable PROCESSOR PROCESSOR PROCESSORSOURCE EXTRACT X Transform SrcId FunctionUnit StatusCd 1 grp1 10 2 grp2 10 3 grp3 10 Read Events Update Status Event Table Target SINK Source DB
  9. 9. 9 Error Handling And Logging § Out of box error Queue configuration § Log asynchronously for message traceability SOURCE PROCESSOR PROCESSOR PROCESSOR EXTRACT X Transform SrcId FunctionUnit StatusCd 1 grp1 10 2 grp2 10 3 grp3 10 Read Events Update Status Event Table Target SINK SINK Error Queue Update Status Source DB
  10. 10. 10 Flexibility ETL § Easily extensible for multiple sources Source SOURCE PROCESSOR EXTRACT X Transform SrcId FunctionUnit StatusCd 1 grp1 10 2 grp2 10 3 grp3 10 Event Table Target PROCESSOR X Transform FILE SOURCE SINK PROCESSOR X Transform
  11. 11. 11 Extract Event table Design ID SRC_ID FNCT_UNIT STATUS_CD ACTN_CD ATMPT_CNT 1 10001 CUST 10 A 0 2 10021 CUST 10 D 0 3 20001 ORD 10 A 0 § SRC_ID: source key for extract. § FNCT_UNIT: extract group § STATUS_ID: Status code for extract § ACTN_CD: action code
  12. 12. 12 End To End ETL Process § Change Data Capture (CDC) § Initial Load and incremental changes. § Synthetic events for full load § Same process for Full or Incremental Loads CDC
  13. 13. 13 SCS on Pivotal Cloud foundry §RabbitMQ §No special skills for Prod Ops §Ease of scaling §Auto recovery §Monitoring §CI-CD
  14. 14. 14 Demo
  15. 15. 15 Learn More. Stay Connected. § anupama_pradhan@bcbsil.com § jcherng@pivotal.io § https://github.com/jcherng-pivotal/scs- etl-demo § http://cloud.spring.io/spring-cloud- stream-app-starters/ Twitter: twitter.com/springcentral YouTube: spring.io/video LinkedIn: spring.io/linkedin Google Plus: spring.io/gplus

×