Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Scribd will begin operating the SlideShare business on September 24, 2020 As of this date, Scribd will manage your SlideShare account and any content you may have on SlideShare, and Scribd's General Terms of Use and Privacy Policy will apply. If you wish to opt out, please close your SlideShare account. Learn more.
Published on
Budapest Spark Meetup - Apache Spark @enbrite.ly presentation held on
March 30, 2016.
The vision we all share at enbrite.ly is to create the next generation decision supporting system in online advertising that combines the market needs; anti-fraud, viewability, brand safety and traffic quality assurances in one platform. We do this by analyzing vast amount of data to create value for our customers. In the last 6 months we created our ETL pipeline, the core component of our data platform based on Apache Spark. In this presentation I share the journey from the whiteboard designs to the maintenance of a TB-scale data pipeline. I share the lessons we learned and the ups and downs using Spark in scale.