SlideShare a Scribd company logo
1 of 30
Data Migration at Scale
MOVING THE ELEPHANT IN THE ROOM
2
· BDPA Los Angeles Chapter
· 4 year HSCC participant
· Columbia University, CC ‘14
· Conductor, Inc.
· linkedin.com/in/calltyrone
WHO AM I?
3
· Web Presence Management
· SAAS
· Big data
· Collect 6TB of raw web data a week
· Scalable Collection & ETL pipelines
· Final Product: reports
· 6 years running
· Tons of data!
CONDUCTOR, INC.
4
· Growth
· More users
· More data
· Systems have to keep up!
WHY WE CARE ABOUT SCALABILITY
5
HORIZONTAL SCALING
6
VERTICAL SCALING
7
· Yesterday’s solution is tomorrow’s problem
· Under-prioritized
· It’s hard!
· Can require massive changes
· No cure-all
SCALABILITY IN THE REAL WORLD
8
· Save money
· Improve performance
· Clear the way for progress
WHY REPLACE AN UNSCALABLE SYSTEM?
9
· If it ain’t broke…
· Significant Resource Investment
· Time
· Money
· Software Downtime
· Data Quality Concerns
WHY NOT?
10
1. Identify an unscalable system
2. Discover and vet a suitable successor
3. Replace the legacy system with the new system
· while minimizing risk and cost
Simple, no???
YOUR TASK, AT A GLANCE
TALKING ABOUT THE ELEPHANT
Identifying an Unscalable System
12
· MySql
· Normalized data model
· Helpful for initial modeling of our problem space
· Hosted by a single, very powerful machine
Overview
CASE STUDY: LEGACY REPORTING DATABASE
Talking about the Elephant: Diagnosing an Unscalable System
13
· Powerful hardware isn’t cheap.
· Vertical Scaling
· Obsolete Schema
· Difficult to backup
· Queries aren’t getting any faster.
Unsustainable
CASE STUDY: LEGACY REPORTING DATABASE
Talking about the Elephant: Diagnosing an Unscalable System
14
· If your solution…
· Scales vertically
· Prevents progress
· Can’t perform at scale
· Is difficult/slow/expensive to upgrade
…It’s time for a change!
SEE FOR YOURSELF
Talking about the Elephant: Diagnosing an Unscalable System
FINDING A BIGGER ROOM
Vetting Scalable Alternatives
16
· Price-efficient
· Easy to maintain
· Scales Horizontally
WHAT TO LOOK FOR
Finding a Bigger Room: Vetting Scalable Alternatives
17
· Write once, read many
· De-normalized reports
· High storage capacity
· High Availability
Our Use Case
CASE STUDY: AWS S3 DATASTORE
18
· Write once, read many
· Decent write performance, great read performance
· De-normalized reports
· Flat files
· High storage capacity
· No defined space limit
· High Availability
· Configurable file replication
Technical Overview
CASE STUDY: AWS S3 DATASTORE
Finding a Bigger Room: Vetting Scalable Alternatives
19
· Cheap
· Cloud-based
· Architecture facilitates testing
· Easy to back up
Benefits
CASE STUDY: AWS S3 DATASTORE
Finding a Bigger Room: Vetting Scalable Alternatives
20
· “Eventual Consistency”
· Switching to non-relational storage is nontrivial
· Application code must change
· Migration path gets complicated
Caveats
CASE STUDY: AWS S3 DATASTORE
Finding a Bigger Room: Vetting Scalable Alternatives
MOVING THE ELEPHANT
Migrating Legacy Data to the New System
22
· Time Frame
· Scheduling Constraints
· Operational Cost
· Resource Constraints
· Standards for data parity
INITIAL CONSIDERATIONS
Moving the Elephant: Migrating Legacy Data to the New System
23
· Two-month finish line
· Developed COGS models
· Built data validation software
CASE STUDY: OUR UPFRONT PLANNING
Moving the Elephant: Migrating Legacy Data to the New System
24
· Can be scaled up or down
· Speed up to save time
· Slow down to save resources
· Can be run in a testing capacity
· Configurable data sources/sinks
· Configurable hardware resource use
IDEAL MIGRATION SOFTWARE CHARACTERISTICS
Moving the Elephant: Migrating Legacy Data to the New System
25
· Oozie and Hive
· Controllable time/resource tradeoff
· Testable in a qa environment
OUR MIGRATION SOFTWARE
26
· Easy to track progress
· Enables concurrency
· Dilutes failure risks
· E.g. Conductor “Time Periods”
AN INCREMENTAL MIGRATION: PARTITIONING DATA
Moving the Elephant: Migrating Legacy Data to the New System
27
· Limit client exposure to subtler bugs
· Incorporate customer feedback
· Demonstrate progress early
· E.g. Conductor Searchlight 3.0 Beta Program
AN INCREMENTAL RELEASE
28
YOU CAN DO IT!
29
QUESTIONS?
Thanks for Listening!
30
(We’re Hiring!)

More Related Content

What's hot

MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
MeasureCamp 7   Bigger Faster Data by Andrew Hood and Cameron Gray from LynchpinMeasureCamp 7   Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from LynchpinLynchpin Analytics Consultancy
 
Monitoring and scaling postgres at datadog
Monitoring and scaling postgres at datadogMonitoring and scaling postgres at datadog
Monitoring and scaling postgres at datadogSeth Rosenblum
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containerskbajda
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureJoey Bolduc-Gilbert
 
Building highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisBuilding highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisParis Data Engineers !
 
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...InfluxData
 
goto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Checkgoto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in CheckCoburn Watson
 
presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15Zhenxiao Luo
 
Streaming options in the wild
Streaming options in the wildStreaming options in the wild
Streaming options in the wildAtif Akhtar
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWSPaolo latella
 
Stream Processing in Uber
Stream Processing in UberStream Processing in Uber
Stream Processing in UberC4Media
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
How to Enable Industrial Decarbonization with Node-RED and InfluxDB
How to Enable Industrial Decarbonization with Node-RED and InfluxDBHow to Enable Industrial Decarbonization with Node-RED and InfluxDB
How to Enable Industrial Decarbonization with Node-RED and InfluxDBInfluxData
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogC4Media
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Zhenxiao Luo
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubolekbajda
 
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingGyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingFlink Forward
 
Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019Petr Zapletal
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Coburn Watson
 

What's hot (20)

MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
MeasureCamp 7   Bigger Faster Data by Andrew Hood and Cameron Gray from LynchpinMeasureCamp 7   Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
MeasureCamp 7 Bigger Faster Data by Andrew Hood and Cameron Gray from Lynchpin
 
Monitoring and scaling postgres at datadog
Monitoring and scaling postgres at datadogMonitoring and scaling postgres at datadog
Monitoring and scaling postgres at datadog
 
Presto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix ContainersPresto Summit 2018 - 04 - Netflix Containers
Presto Summit 2018 - 04 - Netflix Containers
 
Case Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa ArchitectureCase Study: Stream Processing on AWS using Kappa Architecture
Case Study: Stream Processing on AWS using Kappa Architecture
 
Building highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin FrançoisBuilding highly reliable data pipeline @datadog par Quentin François
Building highly reliable data pipeline @datadog par Quentin François
 
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
Using InfluxDB for Full Observability of a SaaS Platform by Aleksandr Tavgen,...
 
Realtime search
Realtime searchRealtime search
Realtime search
 
goto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Checkgoto; London: Keeping your Cloud Footprint in Check
goto; London: Keeping your Cloud Footprint in Check
 
presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15presto-at-netflix-hadoop-summit-15
presto-at-netflix-hadoop-summit-15
 
Streaming options in the wild
Streaming options in the wildStreaming options in the wild
Streaming options in the wild
 
Data Analysis on AWS
Data Analysis on AWSData Analysis on AWS
Data Analysis on AWS
 
Stream Processing in Uber
Stream Processing in UberStream Processing in Uber
Stream Processing in Uber
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
How to Enable Industrial Decarbonization with Node-RED and InfluxDB
How to Enable Industrial Decarbonization with Node-RED and InfluxDBHow to Enable Industrial Decarbonization with Node-RED and InfluxDB
How to Enable Industrial Decarbonization with Node-RED and InfluxDB
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15Presto@Netflix Presto Meetup 03-19-15
Presto@Netflix Presto Meetup 03-19-15
 
Presto Summit 2018 - 10 - Qubole
Presto Summit 2018  - 10 - QubolePresto Summit 2018  - 10 - Qubole
Presto Summit 2018 - 10 - Qubole
 
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at KingGyula Fóra - RBEA- Scalable Real-Time Analytics at King
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
 
Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019Change Data Capture - Scale by the Bay 2019
Change Data Capture - Scale by the Bay 2019
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
 

Viewers also liked

Addressing the Elephant in the Room - Content Strategy
Addressing the Elephant in the Room - Content StrategyAddressing the Elephant in the Room - Content Strategy
Addressing the Elephant in the Room - Content StrategyRay Killebrew
 
The elephant in the room. discussion
The elephant in the room. discussionThe elephant in the room. discussion
The elephant in the room. discussionAndrew Gelston
 
asteRISK
asteRISKasteRISK
asteRISKkrnmcg
 
Elephant in the Room: Social Media ROI - WEB 2.0 NYC
Elephant in the Room: Social Media ROI - WEB 2.0 NYCElephant in the Room: Social Media ROI - WEB 2.0 NYC
Elephant in the Room: Social Media ROI - WEB 2.0 NYCMike Lewis
 
Risk: the Elephant in the Room
Risk: the Elephant in the RoomRisk: the Elephant in the Room
Risk: the Elephant in the RoomLast Call Media
 
Strategy - The elephant in the room
Strategy - The elephant in the roomStrategy - The elephant in the room
Strategy - The elephant in the roomIIBA UK Chapter
 
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...ad:tech London, MMS & iMedia
 
The elephant in the room mongo db + hadoop
The elephant in the room  mongo db + hadoopThe elephant in the room  mongo db + hadoop
The elephant in the room mongo db + hadoopiammutex
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterRobert H. McDonald
 
CMS Expo 2011 Keynote - The Elephant in the Room
CMS Expo 2011 Keynote - The Elephant in the RoomCMS Expo 2011 Keynote - The Elephant in the Room
CMS Expo 2011 Keynote - The Elephant in the RoomScott Liewehr
 
The elephant in the room
The elephant in the roomThe elephant in the room
The elephant in the roomJohn Gillis
 
The Elephant In The Room - Research Report 31 July 2013
The Elephant In The Room - Research Report 31 July 2013The Elephant In The Room - Research Report 31 July 2013
The Elephant In The Room - Research Report 31 July 2013Howard Cooke
 
RIDE 2011: Student dropout – the elephant in the room of distance education (...
RIDE 2011: Student dropout – the elephant in the room of distance education (...RIDE 2011: Student dropout – the elephant in the room of distance education (...
RIDE 2011: Student dropout – the elephant in the room of distance education (...Centre for Distance Education
 
Kanban. Dealing with the elephant in the room. One chunk at a time
Kanban. Dealing with the elephant in the room. One chunk at a timeKanban. Dealing with the elephant in the room. One chunk at a time
Kanban. Dealing with the elephant in the room. One chunk at a timejsonnevelt
 
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!How to Tame the Elephant in the Room- 6 steps to build trust and close deals!
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!Mitch Jackson
 

Viewers also liked (20)

Addressing the Elephant in the Room - Content Strategy
Addressing the Elephant in the Room - Content StrategyAddressing the Elephant in the Room - Content Strategy
Addressing the Elephant in the Room - Content Strategy
 
The elephant in the room. discussion
The elephant in the room. discussionThe elephant in the room. discussion
The elephant in the room. discussion
 
asteRISK
asteRISKasteRISK
asteRISK
 
Elephant in the Room: Social Media ROI - WEB 2.0 NYC
Elephant in the Room: Social Media ROI - WEB 2.0 NYCElephant in the Room: Social Media ROI - WEB 2.0 NYC
Elephant in the Room: Social Media ROI - WEB 2.0 NYC
 
Risk: the Elephant in the Room
Risk: the Elephant in the RoomRisk: the Elephant in the Room
Risk: the Elephant in the Room
 
Elephant in Room Version 2
Elephant in Room Version 2Elephant in Room Version 2
Elephant in Room Version 2
 
Strategy - The elephant in the room
Strategy - The elephant in the roomStrategy - The elephant in the room
Strategy - The elephant in the room
 
ELEARNING IN ART AND DESIGN: THE ELEPHANT IN THE ROOM
ELEARNING IN ART AND DESIGN: THE ELEPHANT IN THE ROOMELEARNING IN ART AND DESIGN: THE ELEPHANT IN THE ROOM
ELEARNING IN ART AND DESIGN: THE ELEPHANT IN THE ROOM
 
YUI The Elephant In The Room
YUI The Elephant In The RoomYUI The Elephant In The Room
YUI The Elephant In The Room
 
The elephant in the room
The elephant in the roomThe elephant in the room
The elephant in the room
 
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...
Lance Concannon, Sysomos: Simplifiying social - How marketers can manage the ...
 
The elephant in the room mongo db + hadoop
The elephant in the room  mongo db + hadoopThe elephant in the room  mongo db + hadoop
The elephant in the room mongo db + hadoop
 
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research CenterElephant in the Room: Scaling Storage for the HathiTrust Research Center
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
 
CMS Expo 2011 Keynote - The Elephant in the Room
CMS Expo 2011 Keynote - The Elephant in the RoomCMS Expo 2011 Keynote - The Elephant in the Room
CMS Expo 2011 Keynote - The Elephant in the Room
 
G!
G!G!
G!
 
The elephant in the room
The elephant in the roomThe elephant in the room
The elephant in the room
 
The Elephant In The Room - Research Report 31 July 2013
The Elephant In The Room - Research Report 31 July 2013The Elephant In The Room - Research Report 31 July 2013
The Elephant In The Room - Research Report 31 July 2013
 
RIDE 2011: Student dropout – the elephant in the room of distance education (...
RIDE 2011: Student dropout – the elephant in the room of distance education (...RIDE 2011: Student dropout – the elephant in the room of distance education (...
RIDE 2011: Student dropout – the elephant in the room of distance education (...
 
Kanban. Dealing with the elephant in the room. One chunk at a time
Kanban. Dealing with the elephant in the room. One chunk at a timeKanban. Dealing with the elephant in the room. One chunk at a time
Kanban. Dealing with the elephant in the room. One chunk at a time
 
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!How to Tame the Elephant in the Room- 6 steps to build trust and close deals!
How to Tame the Elephant in the Room- 6 steps to build trust and close deals!
 

Similar to Moving the Elephant in the Room: Data Migration at Scale

Cassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeCassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeMarc Fielding
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL OverviewTu Hoang
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtGenoveva Vargas-Solar
 
CommVault - Your Journey to A Secure Cloud Event
CommVault - Your Journey to A Secure Cloud EventCommVault - Your Journey to A Secure Cloud Event
CommVault - Your Journey to A Secure Cloud EventGoogle
 
Data flow in the data center
Data flow in the data centerData flow in the data center
Data flow in the data centerAdam Cataldo
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...Qian Lin
 
Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016Alluxio, Inc.
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Community
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.KGMGROUP
 
Understanding System Design and Architecture Blueprints of Efficiency
Understanding System Design and Architecture Blueprints of EfficiencyUnderstanding System Design and Architecture Blueprints of Efficiency
Understanding System Design and Architecture Blueprints of EfficiencyKnoldus Inc.
 
Big Data Rampage
Big Data RampageBig Data Rampage
Big Data RampageNiko Vuokko
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013StampedeCon
 
Orchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTAOrchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTARaghavendra Prabhu
 
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdfDatabase & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdfInSync2011
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataAlexMiowski
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 

Similar to Moving the Elephant in the Room: Data Migration at Scale (20)

Cassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeCassandra Essentials Day Cambridge
Cassandra Essentials Day Cambridge
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL Overview
 
Vargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbtVargas polyglot-persistence-cloud-edbt
Vargas polyglot-persistence-cloud-edbt
 
Big data pipelines
Big data pipelinesBig data pipelines
Big data pipelines
 
CommVault - Your Journey to A Secure Cloud Event
CommVault - Your Journey to A Secure Cloud EventCommVault - Your Journey to A Secure Cloud Event
CommVault - Your Journey to A Secure Cloud Event
 
Data flow in the data center
Data flow in the data centerData flow in the data center
Data flow in the data center
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016Alluxio Keynote at Strata+Hadoop World Beijing 2016
Alluxio Keynote at Strata+Hadoop World Beijing 2016
 
Data engineering
Data engineeringData engineering
Data engineering
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
 
Understanding System Design and Architecture Blueprints of Efficiency
Understanding System Design and Architecture Blueprints of EfficiencyUnderstanding System Design and Architecture Blueprints of Efficiency
Understanding System Design and Architecture Blueprints of Efficiency
 
Big Data Rampage
Big Data RampageBig Data Rampage
Big Data Rampage
 
Data stream mining
Data stream miningData stream mining
Data stream mining
 
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013Transforming Data Architecture Complexity at Sears - StampedeCon 2013
Transforming Data Architecture Complexity at Sears - StampedeCon 2013
 
Orchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTAOrchestrating Cassandra with Kubernetes Operator and PaaSTA
Orchestrating Cassandra with Kubernetes Operator and PaaSTA
 
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdfDatabase & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
Database & Technology 1 | Andrew Holdsworth | Orace Database Performance.pdf
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning Data
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 

Recently uploaded

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?Watsoo Telematics
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 

Recently uploaded (20)

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 

Moving the Elephant in the Room: Data Migration at Scale

  • 1. Data Migration at Scale MOVING THE ELEPHANT IN THE ROOM
  • 2. 2 · BDPA Los Angeles Chapter · 4 year HSCC participant · Columbia University, CC ‘14 · Conductor, Inc. · linkedin.com/in/calltyrone WHO AM I?
  • 3. 3 · Web Presence Management · SAAS · Big data · Collect 6TB of raw web data a week · Scalable Collection & ETL pipelines · Final Product: reports · 6 years running · Tons of data! CONDUCTOR, INC.
  • 4. 4 · Growth · More users · More data · Systems have to keep up! WHY WE CARE ABOUT SCALABILITY
  • 7. 7 · Yesterday’s solution is tomorrow’s problem · Under-prioritized · It’s hard! · Can require massive changes · No cure-all SCALABILITY IN THE REAL WORLD
  • 8. 8 · Save money · Improve performance · Clear the way for progress WHY REPLACE AN UNSCALABLE SYSTEM?
  • 9. 9 · If it ain’t broke… · Significant Resource Investment · Time · Money · Software Downtime · Data Quality Concerns WHY NOT?
  • 10. 10 1. Identify an unscalable system 2. Discover and vet a suitable successor 3. Replace the legacy system with the new system · while minimizing risk and cost Simple, no??? YOUR TASK, AT A GLANCE
  • 11. TALKING ABOUT THE ELEPHANT Identifying an Unscalable System
  • 12. 12 · MySql · Normalized data model · Helpful for initial modeling of our problem space · Hosted by a single, very powerful machine Overview CASE STUDY: LEGACY REPORTING DATABASE Talking about the Elephant: Diagnosing an Unscalable System
  • 13. 13 · Powerful hardware isn’t cheap. · Vertical Scaling · Obsolete Schema · Difficult to backup · Queries aren’t getting any faster. Unsustainable CASE STUDY: LEGACY REPORTING DATABASE Talking about the Elephant: Diagnosing an Unscalable System
  • 14. 14 · If your solution… · Scales vertically · Prevents progress · Can’t perform at scale · Is difficult/slow/expensive to upgrade …It’s time for a change! SEE FOR YOURSELF Talking about the Elephant: Diagnosing an Unscalable System
  • 15. FINDING A BIGGER ROOM Vetting Scalable Alternatives
  • 16. 16 · Price-efficient · Easy to maintain · Scales Horizontally WHAT TO LOOK FOR Finding a Bigger Room: Vetting Scalable Alternatives
  • 17. 17 · Write once, read many · De-normalized reports · High storage capacity · High Availability Our Use Case CASE STUDY: AWS S3 DATASTORE
  • 18. 18 · Write once, read many · Decent write performance, great read performance · De-normalized reports · Flat files · High storage capacity · No defined space limit · High Availability · Configurable file replication Technical Overview CASE STUDY: AWS S3 DATASTORE Finding a Bigger Room: Vetting Scalable Alternatives
  • 19. 19 · Cheap · Cloud-based · Architecture facilitates testing · Easy to back up Benefits CASE STUDY: AWS S3 DATASTORE Finding a Bigger Room: Vetting Scalable Alternatives
  • 20. 20 · “Eventual Consistency” · Switching to non-relational storage is nontrivial · Application code must change · Migration path gets complicated Caveats CASE STUDY: AWS S3 DATASTORE Finding a Bigger Room: Vetting Scalable Alternatives
  • 21. MOVING THE ELEPHANT Migrating Legacy Data to the New System
  • 22. 22 · Time Frame · Scheduling Constraints · Operational Cost · Resource Constraints · Standards for data parity INITIAL CONSIDERATIONS Moving the Elephant: Migrating Legacy Data to the New System
  • 23. 23 · Two-month finish line · Developed COGS models · Built data validation software CASE STUDY: OUR UPFRONT PLANNING Moving the Elephant: Migrating Legacy Data to the New System
  • 24. 24 · Can be scaled up or down · Speed up to save time · Slow down to save resources · Can be run in a testing capacity · Configurable data sources/sinks · Configurable hardware resource use IDEAL MIGRATION SOFTWARE CHARACTERISTICS Moving the Elephant: Migrating Legacy Data to the New System
  • 25. 25 · Oozie and Hive · Controllable time/resource tradeoff · Testable in a qa environment OUR MIGRATION SOFTWARE
  • 26. 26 · Easy to track progress · Enables concurrency · Dilutes failure risks · E.g. Conductor “Time Periods” AN INCREMENTAL MIGRATION: PARTITIONING DATA Moving the Elephant: Migrating Legacy Data to the New System
  • 27. 27 · Limit client exposure to subtler bugs · Incorporate customer feedback · Demonstrate progress early · E.g. Conductor Searchlight 3.0 Beta Program AN INCREMENTAL RELEASE

Editor's Notes

  1. Again, the section titles
  2. Talk about SERP model in a slide define ec2
  3. * Mention that migrations aren’t feasible ** downtime * mysql isn't distributed * it's HUGE
  4. Again, the section titles
  5. Flesh out use case
  6. Again, the section titles
  7. COGS were for prediction Introduce need for migration software
  8. The partitions must be immutable Emphasize diluting failure risk
  9. * No more "Crowd Source etc"