This presentation demystifies the problem of moving a major portion of a company’s technological infrastructure from an unsustainable legacy system to a newer, more scalable solution. The goal is to show that with the proper approach, such a transition can be undertaken successfully and without assuming unmanageable risk. To this end, we will share anecdotes from Conductor’s own endeavor to replace an immense relational datastore and a corresponding data ETL process which, together, form a crucial component of our application that nonetheless cannot be supported indefinitely. The presentation will cover the process of deciding to redesign a large component of your tech stack, along with best practices for perceiving and mitigating risk and ultimately securing performance and cost improvements.
2. 2
· BDPA Los Angeles Chapter
· 4 year HSCC participant
· Columbia University, CC ‘14
· Conductor, Inc.
· linkedin.com/in/calltyrone
WHO AM I?
3. 3
· Web Presence Management
· SAAS
· Big data
· Collect 6TB of raw web data a week
· Scalable Collection & ETL pipelines
· Final Product: reports
· 6 years running
· Tons of data!
CONDUCTOR, INC.
4. 4
· Growth
· More users
· More data
· Systems have to keep up!
WHY WE CARE ABOUT SCALABILITY
7. 7
· Yesterday’s solution is tomorrow’s problem
· Under-prioritized
· It’s hard!
· Can require massive changes
· No cure-all
SCALABILITY IN THE REAL WORLD
8. 8
· Save money
· Improve performance
· Clear the way for progress
WHY REPLACE AN UNSCALABLE SYSTEM?
9. 9
· If it ain’t broke…
· Significant Resource Investment
· Time
· Money
· Software Downtime
· Data Quality Concerns
WHY NOT?
10. 10
1. Identify an unscalable system
2. Discover and vet a suitable successor
3. Replace the legacy system with the new system
· while minimizing risk and cost
Simple, no???
YOUR TASK, AT A GLANCE
12. 12
· MySql
· Normalized data model
· Helpful for initial modeling of our problem space
· Hosted by a single, very powerful machine
Overview
CASE STUDY: LEGACY REPORTING DATABASE
Talking about the Elephant: Diagnosing an Unscalable System
13. 13
· Powerful hardware isn’t cheap.
· Vertical Scaling
· Obsolete Schema
· Difficult to backup
· Queries aren’t getting any faster.
Unsustainable
CASE STUDY: LEGACY REPORTING DATABASE
Talking about the Elephant: Diagnosing an Unscalable System
14. 14
· If your solution…
· Scales vertically
· Prevents progress
· Can’t perform at scale
· Is difficult/slow/expensive to upgrade
…It’s time for a change!
SEE FOR YOURSELF
Talking about the Elephant: Diagnosing an Unscalable System
22. 22
· Time Frame
· Scheduling Constraints
· Operational Cost
· Resource Constraints
· Standards for data parity
INITIAL CONSIDERATIONS
Moving the Elephant: Migrating Legacy Data to the New System
23. 23
· Two-month finish line
· Developed COGS models
· Built data validation software
CASE STUDY: OUR UPFRONT PLANNING
Moving the Elephant: Migrating Legacy Data to the New System
24. 24
· Can be scaled up or down
· Speed up to save time
· Slow down to save resources
· Can be run in a testing capacity
· Configurable data sources/sinks
· Configurable hardware resource use
IDEAL MIGRATION SOFTWARE CHARACTERISTICS
Moving the Elephant: Migrating Legacy Data to the New System
25. 25
· Oozie and Hive
· Controllable time/resource tradeoff
· Testable in a qa environment
OUR MIGRATION SOFTWARE
26. 26
· Easy to track progress
· Enables concurrency
· Dilutes failure risks
· E.g. Conductor “Time Periods”
AN INCREMENTAL MIGRATION: PARTITIONING DATA
Moving the Elephant: Migrating Legacy Data to the New System
27. 27
· Limit client exposure to subtler bugs
· Incorporate customer feedback
· Demonstrate progress early
· E.g. Conductor Searchlight 3.0 Beta Program
AN INCREMENTAL RELEASE