Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with Delta Lake


Published on

Columbia is a data-driven enterprise, integrating data from all line-of-business-systems to manage its wholesale and retail businesses. This includes integrating real-time and batch data to better manage purchase orders and generate accurate consumer demand forecasts.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with Delta Lake

  1. 1. We Connect Active People With Their Passions COLUMBIA SPORTSWEAR – SPARK + AI SUMMIT 2020 JUNE 2020 LARAMINOR AND BILAL OBEIDAT
  2. 2. 2 Enterprise Information Management’s vision is to connect the RIGHT PEOPLE with the RIGHT DATA at the RIGHT TIME to support informed business decisions. Increase Columbia’s ability to: • Be data driven in support of business strategy & operations • Deliver enterprise data assets that meet global information needs • Scale, share and grow using governed data, aligned processes and shared products WHAT IS EIM? VISION
  3. 3. 3 Technology • Azure Data Management Stack: –Azure Data Factory –Azure Data Lake –Azure Databricks –Azure Synapse Data Warehouse • SAP BW / Hana DATA DELIVERY abc Development • Integration: –Columbia source systemsinto Azure –Integration to 3rd party analytic systems and applications –Partnered with Columbia’s integration team • Data Models: –Relational and dimensional models for business reporting and analytics –Data models for data science / analytics
  4. 4. 4 WHERE WE STARTED abc
  5. 5. 5
  6. 6. 6 BUSINESS ACCESS abc Info Consumer Business users (4k+) Data Analyst <15 Info Consumer Business SME’s 50 -100 Databricks /DataLake Data Warehouse Azure Analysis Services PowerBI Internal / Open Restricted Internal / Select Restricted Internal / Select Restricted
  7. 7. 7 DATA LAKE LAYOUT AND SECURITY abc Raw source Internal • Source system name • Object Name (table name) • Type (full, incremental) • Partition (date) Restricted_domain • Source system name • Object Name (table name) • Type (full, incremental) • Partition (date) Curated Internal • Data Domain (sales) • Schema • Table name Restricted_domain • Schema • Table name Computed(Analysts directory) Dtc_restricted • Analyst determine
  8. 8. 8
  9. 9. 9 • For bringing in data from source systems, what used to take a week takes a day • All computed data on the lake and available for use. Microservice drops to the lake for real time reporting through Databricks Streaming. • Databricks external metastoreallows for sharing with EIM • Everything we do is through CICD integration • Cloud based and elastic, speed/scalability enabled growth and efficient data processing at low cost • Expanded data access for business, self serve reporting, business analysis • Prepped for data science resourcesto engage • Easy team expansion and onboarding, we’ve increased dev team from 8 to 20 in a 1.5 years POSITIVE OUTCOMES abc
  10. 10. 10 General • Security groups • Security Model • Security Audit • Costs Vendor Engagement • Leverage vendors, but know there’s a limit • Professional services • Vendor experts and agreements LESSONS LEARNED abc Data Lake • Security • Organization, Enterprise • Audit, Monitoring • Backup,DR Team • A solid leader or two • Keep the team open to change, open to chaos • Allocate time for discovery • Managing expectationswith senior leadership
  11. 11. Questions? 11 Confidential