Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Published on
Zalando is Europe’s leading online fashion retailer and currently on its way to become the platform for all fashion related business — designers, producers, logistic solutions, on-and off-line retailers. The new platform architecture challenges the company’s current in-house solutions – including the fraud detection infrastructure – to become more scalable, dependable and versatile.
This talk is a travelogue describing the journey of rewriting an in-production classification system from scratch using Scala and Spark to run on AWS. Dragiev and Krueger will start by looking at the drawbacks that are inherent to the old sklearn based solution running a static cluster, most prominently: hard maintenance, data bottlenecks, too coarse-grained parallelisation. Then, they will outline the design principles which are at the bottom of our new Scala/Spark-based solution. Often neglected aspects like model specification, data organisation and evaluation and keeping track of multiple models will also be discussed.
Dragiev and Krueger’s new solution mitigates the identified pain points by leveraging the features that Scala and Spark bring into play, in particular: strong typing, data parallelisation and easy scale out. The talk will end with a comparison between both solution, conducting measurements that highlight the performance gains experienced with Spark, for both learning and prediction times.
Login to see the comments