Kim Curtis, Brian Knauss, eBay
Evolution of eBay’s Enterprise Data
Ecosystem with Apache Spark
#EntSAIS13
Evolution
2#EntSAIS13
Principles
Opportunities
Investment
Principles
• Expand Capabilities
• Increase Flexibility
• Optimize Cost/Performance
3#EntSAIS13
• Expand Capabilities
• Increase Flexibility
• Optimize Cost/Performance
Principles
4#EntSAIS13
DO MORE
WITH FEWER BARRIERS
CHEAPER/FASTER
Spark in the EDW
• Increase Flexibility
– Investment and Engineering
• Expand Capabilities
– Pre-Load Transformation
• Optimize Cost/Performance
– Let’s see…
5#EntSAIS13
TCO Comparison
6#EntSAIS13
Vendor Open Source
HW Depr. HW Depr.
HW/SW Maint.HW/SW Maint.
DC Costs
DC Costs
Opportunity Assessment
7#EntSAIS13
2014 2015 2016 2017 2018 2019
Vendor TCO Open Source
TCO
Scope Design Implement Optimize
Investment
8#EntSAIS13
WHAT HOW WHEN
THEN
WHAT?
DO IT
9#EntSAIS13
Scope
• Engage with Customers
– Isolated our impact
• Define Boundaries
– Relational Batch Processing
• Set Intermediate Targets
– Offset 2017 growth
10#EntSAIS13
Design
• Extensible Framework
– (ELnTn to ETLn)
• Optimize Hardware
– Processing and Storage Nodes
• Optimize Software
– Automated Spark SQL Tuning
11#EntSAIS13
Load/MergeLoad/MergeTransform/MergeTransform/MergeLoadLoadExtract Load Transform/Merge Extract Load/Merge
Before (ELnTn )
Load/MergeLoad/Merge
Extract Transform Load/Merge
After (ETLn)
Implement
• Production HDFS Data Environment
– Tight Alignment with Platform
• Prioritized Effort
– Minimize effort to hit goals
• Distributed Engineering (internal Open Source)
– Built and tested framework additions
14#EntSAIS13
Optimize
• Scaling/DR(DA)
– Multi-platform architecture
• Cost/Performance Optimization
– HW/SW tuning
• Feature Expansion
– ML on platform
15#EntSAIS13
Challenges
• Scale of Data and Workload
– Data Validation Between Targets
• Migration Automation
– Migration of Non-Standard Processes
• Enterprise Readiness of Open Source
– Job-Level Workload Tracking/Management
16#EntSAIS13
The End Beginning…
17#EntSAIS13
Opportunity
Investment
Optimization

Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platform at eBay with Kim Curtis and Brian Knauss