Embed presentation
Download as PDF, PPTX


















The document discusses the evolution of eBay's enterprise data ecosystem using Apache Spark, highlighting principles aimed at expanding capabilities, increasing flexibility, and optimizing cost/performance. It outlines strategies for integrating customer engagement, designing an extensible framework, and implementing a production-ready HDFS data environment. The presentation also addresses challenges such as data scale, workload migration, and open-source enterprise readiness.

















Introduction to eBay's evolution of its enterprise data ecosystem using Apache Spark focusing on increased capabilities, flexibility, and cost-performance optimization.
Discussion on Spark's role in increasing flexibility, investment in engineering, and a total cost of ownership (TCO) comparison between vendor and open-source solutions.
Overview of project scope, design principles including engagement with customers, defining boundaries, and optimizing both hardware and software for the data environment.
Comparison of data processing methods before and after adopting ETLn model, highlighting transformation and merging processes.
Description of the implementation phase, focusing on production data environment, prioritized efforts, and key optimizations for hardware, software, and feature expansion.
Addressing challenges in scaling, migration automation, and ensuring enterprise readiness while looking forward to new opportunities in data processing.