Successfully reported this slideshow.
Your SlideShare is downloading. ×

Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 30 Ad

Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks

Download to read offline

Zalando SE is Europe’s leading online fashion platform and connects customers, brands and partners. With millions of visitors each month, we have petabytes of purchase, click-stream, product and other data in our data lake. This data is crucial to powering insights on shopper behavior and driving an AI-first strategy to improve site engagement.

Over 7 months ago, Zalando adopted Apache Spark, Delta Lake and Databricks as its de-facto computation platform for analytics and machine learning. During this period, we onboarded well over 50 internal teams ranging from BI teams, with no knowledge of Spark or big data running ETL pipelines to AI/ML teams already using EMR and Spark for heavy model training. Provided the spectrum of varied business problems they were trying to solve, we worked with each team individually, understanding their use cases, helping them validate assumptions, developing working code and taking them to production. In this talk we will share best practices for building a unified data and analytics architecture on Databricks, lessons learned rolling it out across the organization and provide a deep dive on AI & Analytics use cases in the fashion ecommerce space.

Zalando SE is Europe’s leading online fashion platform and connects customers, brands and partners. With millions of visitors each month, we have petabytes of purchase, click-stream, product and other data in our data lake. This data is crucial to powering insights on shopper behavior and driving an AI-first strategy to improve site engagement.

Over 7 months ago, Zalando adopted Apache Spark, Delta Lake and Databricks as its de-facto computation platform for analytics and machine learning. During this period, we onboarded well over 50 internal teams ranging from BI teams, with no knowledge of Spark or big data running ETL pipelines to AI/ML teams already using EMR and Spark for heavy model training. Provided the spectrum of varied business problems they were trying to solve, we worked with each team individually, understanding their use cases, helping them validate assumptions, developing working code and taking them to production. In this talk we will share best practices for building a unified data and analytics architecture on Databricks, lessons learned rolling it out across the organization and provide a deep dive on AI & Analytics use cases in the fashion ecommerce space.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks (20)

Advertisement

More from Databricks (20)

Recently uploaded (20)

Advertisement

Building an AI-Powered Retail Experience with Delta Lake, Spark, and Databricks

  1. 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  2. 2. Akhil Dhingra, Zalando Saurav Verma, Zalando AI-Powered Retail Experience with Databricks #UnifiedDataAnalytics #SparkAISummit
  3. 3. 3#UnifiedDataAnalytics #SparkAISummit ● Founded in 2008 in Berlin. ● Europe's leading online fashion platform ● Connects customers, brands and partners. Zalando SE
  4. 4. Zalando SE 4
  5. 5. Big-Data Stack @ Zalando 5
  6. 6. About Us 6 Akhil Dhingra Product Manager, Data Solutions @Zalando Exp: 7+ Years, Ex-Groupon, Ex-Wingify | MBA Saurav Verma Senior Engineer, Data Lake @Zalando Exp: 9+ Years , Ex-Visa | Masters NUS
  7. 7. Data Platform 7 Data Sources
  8. 8. Data Platform 8 ● Data Lake on top of S3 Data Sources
  9. 9. Data Platform 9 Data Sources ● Multi-tenant / single compute: more ingestion pipelines
  10. 10. Many Use Cases 10 Data Sources Team A
  11. 11. Many Use Cases 11 Data Sources Team B Team A
  12. 12. Many Use Cases 12 Data Sources Team C Team B Team A
  13. 13. Too Many Use Cases 13 Data Sources Team C Team B Team N Team M Team A
  14. 14. Too Many … Compute 14 Data Sources Team C Team B Team N Team M Team A Compute Auto-Scale Stream Batch Training Python / Scala
  15. 15. Too Many … Compute 15 Team C Team B Team N Team M Team A Compute Auto-Scale Stream Batch Training Python / Scala ● Cost control problem at Scale ● More Time To Production ● No Best Practices ● Duplication of work / Data ● Dependencies ● Inconsistent Environment ● No Community Knowledge ● Accidental Complexity
  16. 16. Spark as a Service 16 ● Foundational piece of Zalando’s Big Data Infrastructure ● GitOps Management, Decentralized Clusters ● Security / Compliance / CI-CD ● XX clusters/Jobs ● ~20 teams in production ● Thriving #Databricks community in Zalando Team C Team B Team N Team M Team A Auto-Scale Stream Batch Training Python / Scala
  17. 17. Spark as a Service 17 Migration Projects ETLs | Data Preparation in Spark-S3
  18. 18. Spark as a Service 18 Others: Structured Streams | Traceability
  19. 19. Spectrum of use cases 19
  20. 20. GDPR and Antitrust 20 Compliance with GDPR and antitrust laws
  21. 21. GDPR and Antitrust 21 Probe (pilot) - Use marker event to create heat map of the data path. - List of all datasets within the heat map.
  22. 22. GDPR and Antitrust 22 Pseudonymize/Remove - Identifier based, on-demand, in-place record updater with field precision - Great for semi-structured formats like JSON - Use S3 Inventory + Streaming
  23. 23. Personalized article ranking for relevance and user engagement. Search & Ranking 23
  24. 24. Search & Ranking 24 Using Spark in ML training pipeline !
  25. 25. Search & Ranking 25 Article Scoring and personalization ! ML Model
  26. 26. Others • Sizing: Reducing return rates due to size and fit issues. • Experimentation @Scale • Merchant Analytics • Marketing Services 26
  27. 27. First Impressions • GitOps | Self Service 27
  28. 28. First Impressions • Multi-Tiered support system • Delta Adoption | But few readers outside Databricks ecosystem • Communicating pricing downstream • Exploding Usage is Good • Fits all Size? 28
  29. 29. 29#UnifiedDataAnalytics #SparkAISummit Thank you. AI- Powered Retail Experience with Databricks Akhil Dhingra Saurav Verma www.zalando.com www.jobs.zalando.com/tech
  30. 30. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

×