We Connect Active People With Their Passions
COLUMBIA SPORTSWEAR – SPARK + AI SUMMIT 2020
JUNE 2020
LARAMINOR AND BILAL OBEIDAT
2
Enterprise Information Management’s vision is to connect the RIGHT PEOPLE with the RIGHT
DATA at the RIGHT TIME to support informed business decisions.
Increase Columbia’s ability to:
• Be data driven in support of business strategy & operations
• Deliver enterprise data assets that meet global information needs
• Scale, share and grow using governed data, aligned processes and shared products
WHAT IS EIM?
VISION
3
Technology
• Azure Data Management Stack:
–Azure Data Factory
–Azure Data Lake
–Azure Databricks
–Azure Synapse Data Warehouse
• SAP BW / Hana
DATA DELIVERY
abc
Development
• Integration:
–Columbia source systemsinto Azure
–Integration to 3rd party analytic
systems and applications
–Partnered with Columbia’s integration
team
• Data Models:
–Relational and dimensional models for
business reporting and analytics
–Data models for data science /
analytics
4
WHERE WE STARTED
abc
5
6
BUSINESS ACCESS
abc
Info Consumer
Business users
(4k+)
Data Analyst
<15
Info Consumer
Business SME’s
50 -100
Databricks
/DataLake
Data
Warehouse
Azure Analysis
Services
PowerBI
Internal / Open Restricted
Internal / Select Restricted
Internal / Select Restricted
7
DATA LAKE LAYOUT AND SECURITY
abc
Raw source
Internal
• Source system name
• Object Name (table
name)
• Type (full, incremental)
• Partition (date)
Restricted_domain
• Source system name
• Object Name (table
name)
• Type (full, incremental)
• Partition (date)
Curated
Internal
• Data Domain (sales)
• Schema
• Table name
Restricted_domain
• Schema
• Table name
Computed(Analysts
directory)
Dtc_restricted
• Analyst determine
8
9
• For bringing in data from source systems, what used to take a week takes a
day
• All computed data on the lake and available for use. Microservice drops to
the lake for real time reporting through Databricks Streaming.
• Databricks external metastoreallows for sharing with EIM
• Everything we do is through CICD integration
• Cloud based and elastic, speed/scalability enabled growth and efficient
data processing at low cost
• Expanded data access for business, self serve reporting, business analysis
• Prepped for data science resourcesto engage
• Easy team expansion and onboarding, we’ve increased dev team from 8 to
20 in a 1.5 years
POSITIVE OUTCOMES
abc
10
General
• Security groups
• Security Model
• Security Audit
• Costs
Vendor Engagement
• Leverage vendors, but know there’s
a limit
• Professional services
• Vendor experts and agreements
LESSONS LEARNED
abc
Data Lake
• Security
• Organization, Enterprise
• Audit, Monitoring
• Backup,DR
Team
• A solid leader or two
• Keep the team open to change, open to
chaos
• Allocate time for discovery
• Managing expectationswith senior
leadership
Questions?
11
Confidential

Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with Delta Lake

  • 1.
    We Connect ActivePeople With Their Passions COLUMBIA SPORTSWEAR – SPARK + AI SUMMIT 2020 JUNE 2020 LARAMINOR AND BILAL OBEIDAT
  • 2.
    2 Enterprise Information Management’svision is to connect the RIGHT PEOPLE with the RIGHT DATA at the RIGHT TIME to support informed business decisions. Increase Columbia’s ability to: • Be data driven in support of business strategy & operations • Deliver enterprise data assets that meet global information needs • Scale, share and grow using governed data, aligned processes and shared products WHAT IS EIM? VISION
  • 3.
    3 Technology • Azure DataManagement Stack: –Azure Data Factory –Azure Data Lake –Azure Databricks –Azure Synapse Data Warehouse • SAP BW / Hana DATA DELIVERY abc Development • Integration: –Columbia source systemsinto Azure –Integration to 3rd party analytic systems and applications –Partnered with Columbia’s integration team • Data Models: –Relational and dimensional models for business reporting and analytics –Data models for data science / analytics
  • 4.
  • 5.
  • 6.
    6 BUSINESS ACCESS abc Info Consumer Businessusers (4k+) Data Analyst <15 Info Consumer Business SME’s 50 -100 Databricks /DataLake Data Warehouse Azure Analysis Services PowerBI Internal / Open Restricted Internal / Select Restricted Internal / Select Restricted
  • 7.
    7 DATA LAKE LAYOUTAND SECURITY abc Raw source Internal • Source system name • Object Name (table name) • Type (full, incremental) • Partition (date) Restricted_domain • Source system name • Object Name (table name) • Type (full, incremental) • Partition (date) Curated Internal • Data Domain (sales) • Schema • Table name Restricted_domain • Schema • Table name Computed(Analysts directory) Dtc_restricted • Analyst determine
  • 8.
  • 9.
    9 • For bringingin data from source systems, what used to take a week takes a day • All computed data on the lake and available for use. Microservice drops to the lake for real time reporting through Databricks Streaming. • Databricks external metastoreallows for sharing with EIM • Everything we do is through CICD integration • Cloud based and elastic, speed/scalability enabled growth and efficient data processing at low cost • Expanded data access for business, self serve reporting, business analysis • Prepped for data science resourcesto engage • Easy team expansion and onboarding, we’ve increased dev team from 8 to 20 in a 1.5 years POSITIVE OUTCOMES abc
  • 10.
    10 General • Security groups •Security Model • Security Audit • Costs Vendor Engagement • Leverage vendors, but know there’s a limit • Professional services • Vendor experts and agreements LESSONS LEARNED abc Data Lake • Security • Organization, Enterprise • Audit, Monitoring • Backup,DR Team • A solid leader or two • Keep the team open to change, open to chaos • Allocate time for discovery • Managing expectationswith senior leadership
  • 11.