SlideShare a Scribd company logo
1 of 38
Journey to self-service analytics on HADOOP
Hiren Patel
https://corporate.homedepot.com/about
QTD class sales?
How would you do this manually?
Selection Value
Metric Sales
Time Period Quarter to Date
Current Date Dec 13th
Identify your data sources
Selection Value
Metric Sales
Time Period Quarter to Date
Current Date Dec 13th
Source Grain Facts
Financial Store/Class/Month
Financial
Store/Class/Week
Operational
Date
Store/Class/Day
Select your cuts of data and sum
Selection Value
Metric Sales
Time Period Quarter to Date
Current Date Dec 13th
Source Grain Facts
Financial Store/Class/Month Nov 27th
Financial
Store/Class/Week
Dec 4th, Dec 11th
Operational Store/Class/Day Dec 12th
Simplicity on the other side of complexity
One Sales Metric
https://en.wikiquote.org/wiki/Talk:Oliver_Wendell_Holmes_Jr.
OLAP cube consideration
Selection Value
Metric Sales
Time Period Quarter to Date
Current Date Dec 13th
Source Grain Facts Adjustments
Financial Store/Class/Month Nov 27th Ops/Fin Blend
Financial
Store/Class/Week
Dec 4th, Dec 11th Ops/Fin Blend
Operational Store/Class/Day Dec 12th
Our journey
Data warehouse
Microstrategy
2009
~50TB
1
Data warehouse
Microstrategy
2012
Cube
Excel
200 Users
Top 100 metrics
~200TB
Hadoop
1
Data warehouse
Microstrategy
2016
SuperCube
Excel
2500 Users
1200 metrics
8TB SSAS
400TB
8000 Tables
5000 Weekly jobs
Hadoop
Tableau
Rate of growth is not sustainable
https://giphy.com/gifs/vintage-smoking-aq6wETurhpbc4
SuperCube
• 16 hours to recast over the weekend
• 3 weeks to reload after a major release
Slowing down release cycle times
Consider impacts
1. Daily/Weekly Processing
windows
2. Disk
3. Data warehouse capacity
4. Complexity
Batching cube loads
Superheroes
Developing a path forward
Implementation
Started rollout in beginning of 2017
Live Feb 2017…but…
1
Data warehouse
Microstrategy
Today
SuperCube
Excel
2500 Users
1200 metrics
12TB SSAS
400TB
8000 Tables
5000 Weekly jobs
Hadoop
Tableau
Excel Tableau
160 100
AtScale
Future
How
1. Team
2. Frameworks
3. Monitoring
4. Regression Testing
Team
Lessons Learned
Develop physical intuition of technology components
1. Data Storage: Partition, locality, format
2. Disk or In-Memory
3. Query Plan
4. Run-time environment
5. Execution flow path
6. Authentication
Data Engineering
1. Business logic in the data layer
2. Move as much processing as possible in the MPP
3. Minimize data movement
Process
1. Reevaluate your assumptions on periodic basis
2. Have a green / blue strategy
3. Team interrupt
careers.homedepot.com

More Related Content

What's hot

Informatica Capabilities As An ETL Tool
Informatica Capabilities As An ETL ToolInformatica Capabilities As An ETL Tool
Informatica Capabilities As An ETL ToolEdureka!
 
ITIL - iTop combodo tool
ITIL - iTop combodo toolITIL - iTop combodo tool
ITIL - iTop combodo toolAmit Lanjewar
 
Project calico - introduction
Project calico - introductionProject calico - introduction
Project calico - introductionHazzim Anaya
 
Google Cloud Networking Deep Dive
Google Cloud Networking Deep DiveGoogle Cloud Networking Deep Dive
Google Cloud Networking Deep DiveMichelle Holley
 
Oracle business intelligence overview
Oracle business intelligence overviewOracle business intelligence overview
Oracle business intelligence overviewnvvrajesh
 
Dimensionality & Dimensions of Hyperion Planning
Dimensionality & Dimensions of Hyperion PlanningDimensionality & Dimensions of Hyperion Planning
Dimensionality & Dimensions of Hyperion Planningepmvirtual.com
 
Parallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQLParallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQLMydbops
 
AWS 활용한 Data Lake 구성하기
AWS 활용한 Data Lake 구성하기AWS 활용한 Data Lake 구성하기
AWS 활용한 Data Lake 구성하기Nak Joo Kwon
 
Data Driven Possibilities with Qlik
Data Driven Possibilities with QlikData Driven Possibilities with Qlik
Data Driven Possibilities with QlikMischa van Werkhoven
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with PostgresEDB
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Amazon Web Services
 
Introduction to AWS Workshop Series
Introduction to AWS Workshop SeriesIntroduction to AWS Workshop Series
Introduction to AWS Workshop SeriesAmazon Web Services
 
Single Sign-On for APEX applications based on Kerberos (Important: latest ver...
Single Sign-On for APEX applications based on Kerberos (Important: latest ver...Single Sign-On for APEX applications based on Kerberos (Important: latest ver...
Single Sign-On for APEX applications based on Kerberos (Important: latest ver...Niels de Bruijn
 
Ipl자동화방안제안 애플트리랩
Ipl자동화방안제안 애플트리랩Ipl자동화방안제안 애플트리랩
Ipl자동화방안제안 애플트리랩JaeWoo Wie
 
Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI ArchitectureArthur Graus
 
Open SQL & Internal Table
Open SQL & Internal TableOpen SQL & Internal Table
Open SQL & Internal Tablesapdocs. info
 

What's hot (20)

Informatica Capabilities As An ETL Tool
Informatica Capabilities As An ETL ToolInformatica Capabilities As An ETL Tool
Informatica Capabilities As An ETL Tool
 
ITIL - iTop combodo tool
ITIL - iTop combodo toolITIL - iTop combodo tool
ITIL - iTop combodo tool
 
Fleet and elastic agent
Fleet and elastic agentFleet and elastic agent
Fleet and elastic agent
 
Project calico - introduction
Project calico - introductionProject calico - introduction
Project calico - introduction
 
Google Cloud Networking Deep Dive
Google Cloud Networking Deep DiveGoogle Cloud Networking Deep Dive
Google Cloud Networking Deep Dive
 
Oracle business intelligence overview
Oracle business intelligence overviewOracle business intelligence overview
Oracle business intelligence overview
 
Dimensionality & Dimensions of Hyperion Planning
Dimensionality & Dimensions of Hyperion PlanningDimensionality & Dimensions of Hyperion Planning
Dimensionality & Dimensions of Hyperion Planning
 
AWS Fargate on EKS 실전 사용하기
AWS Fargate on EKS 실전 사용하기AWS Fargate on EKS 실전 사용하기
AWS Fargate on EKS 실전 사용하기
 
Parallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQLParallel Query in AWS Aurora MySQL
Parallel Query in AWS Aurora MySQL
 
Oracle on AWS
Oracle on AWSOracle on AWS
Oracle on AWS
 
AWS 활용한 Data Lake 구성하기
AWS 활용한 Data Lake 구성하기AWS 활용한 Data Lake 구성하기
AWS 활용한 Data Lake 구성하기
 
Data Driven Possibilities with Qlik
Data Driven Possibilities with QlikData Driven Possibilities with Qlik
Data Driven Possibilities with Qlik
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
 
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
Big Data Analytics Architectural Patterns and Best Practices (ANT201-R1) - AW...
 
Megastore by Google
Megastore by GoogleMegastore by Google
Megastore by Google
 
Introduction to AWS Workshop Series
Introduction to AWS Workshop SeriesIntroduction to AWS Workshop Series
Introduction to AWS Workshop Series
 
Single Sign-On for APEX applications based on Kerberos (Important: latest ver...
Single Sign-On for APEX applications based on Kerberos (Important: latest ver...Single Sign-On for APEX applications based on Kerberos (Important: latest ver...
Single Sign-On for APEX applications based on Kerberos (Important: latest ver...
 
Ipl자동화방안제안 애플트리랩
Ipl자동화방안제안 애플트리랩Ipl자동화방안제안 애플트리랩
Ipl자동화방안제안 애플트리랩
 
Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI Architecture
 
Open SQL & Internal Table
Open SQL & Internal TableOpen SQL & Internal Table
Open SQL & Internal Table
 

Similar to How The Home Depot Creates Simplicity on the other Side of Complexity: A Journey to Self-Service Analytics on Big Data

INFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAININGINFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAININGZaranTech LLC
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasirguest7c8e5f
 
IDC sales productivity framework overview july 2009
IDC sales productivity framework overview july 2009IDC sales productivity framework overview july 2009
IDC sales productivity framework overview july 2009Lee Levitt
 
Spot Buy – Your Custom B2B Marketplace
Spot Buy – Your Custom B2B MarketplaceSpot Buy – Your Custom B2B Marketplace
Spot Buy – Your Custom B2B MarketplaceSAP Ariba
 
Spot Buy – Your Custom B2B Marketplace
Spot Buy – Your Custom B2B MarketplaceSpot Buy – Your Custom B2B Marketplace
Spot Buy – Your Custom B2B MarketplaceSAP Ariba
 
MivaCon Seattle - Taming The Cost Of Marketing
MivaCon Seattle - Taming The Cost Of MarketingMivaCon Seattle - Taming The Cost Of Marketing
MivaCon Seattle - Taming The Cost Of MarketingMiva
 
TradeSmart Webinar 11 24-15 slideshare
TradeSmart Webinar 11 24-15 slideshareTradeSmart Webinar 11 24-15 slideshare
TradeSmart Webinar 11 24-15 slideshareJanet Dorenkott
 
Next Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best PracticesNext Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best PracticesAlibaba Cloud
 
Using Pratt's Stats
Using Pratt's StatsUsing Pratt's Stats
Using Pratt's StatsGareth Young
 
IDC Sales Enablement Jan 2009
IDC Sales Enablement Jan 2009IDC Sales Enablement Jan 2009
IDC Sales Enablement Jan 2009Lee Levitt
 
Transforming Data to Unlock Its Latent Value
Transforming Data to Unlock Its Latent ValueTransforming Data to Unlock Its Latent Value
Transforming Data to Unlock Its Latent ValueTony Ojeda
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
Sample Pitch Deck by LetsVenture
Sample Pitch Deck by LetsVentureSample Pitch Deck by LetsVenture
Sample Pitch Deck by LetsVentureApurva Chawla
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introductionShivmohan Purohit
 

Similar to How The Home Depot Creates Simplicity on the other Side of Complexity: A Journey to Self-Service Analytics on Big Data (20)

INFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAININGINFORMATICA EASY LEARNING ONLINE TRAINING
INFORMATICA EASY LEARNING ONLINE TRAINING
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
Strategies for Joint Business Planning Sessions
Strategies for Joint Business Planning SessionsStrategies for Joint Business Planning Sessions
Strategies for Joint Business Planning Sessions
 
IDC sales productivity framework overview july 2009
IDC sales productivity framework overview july 2009IDC sales productivity framework overview july 2009
IDC sales productivity framework overview july 2009
 
Integrating ALL That Data!
Integrating ALL That Data!Integrating ALL That Data!
Integrating ALL That Data!
 
Spot Buy – Your Custom B2B Marketplace
Spot Buy – Your Custom B2B MarketplaceSpot Buy – Your Custom B2B Marketplace
Spot Buy – Your Custom B2B Marketplace
 
Spot Buy – Your Custom B2B Marketplace
Spot Buy – Your Custom B2B MarketplaceSpot Buy – Your Custom B2B Marketplace
Spot Buy – Your Custom B2B Marketplace
 
MivaCon Seattle - Taming The Cost Of Marketing
MivaCon Seattle - Taming The Cost Of MarketingMivaCon Seattle - Taming The Cost Of Marketing
MivaCon Seattle - Taming The Cost Of Marketing
 
TradeSmart Webinar 11 24-15 slideshare
TradeSmart Webinar 11 24-15 slideshareTradeSmart Webinar 11 24-15 slideshare
TradeSmart Webinar 11 24-15 slideshare
 
TradeSmart Webinar 11-24-15 slideshare
TradeSmart Webinar 11-24-15 slideshareTradeSmart Webinar 11-24-15 slideshare
TradeSmart Webinar 11-24-15 slideshare
 
Next Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best PracticesNext Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best Practices
 
Using Pratt's Stats
Using Pratt's StatsUsing Pratt's Stats
Using Pratt's Stats
 
Idc sales-enablement-jan-2009-1233092175549133-2 (1)
Idc sales-enablement-jan-2009-1233092175549133-2 (1)Idc sales-enablement-jan-2009-1233092175549133-2 (1)
Idc sales-enablement-jan-2009-1233092175549133-2 (1)
 
IDC Sales Enablement Jan 2009
IDC Sales Enablement Jan 2009IDC Sales Enablement Jan 2009
IDC Sales Enablement Jan 2009
 
Transforming Data to Unlock Its Latent Value
Transforming Data to Unlock Its Latent ValueTransforming Data to Unlock Its Latent Value
Transforming Data to Unlock Its Latent Value
 
Step_1._Know_The_Game
Step_1._Know_The_GameStep_1._Know_The_Game
Step_1._Know_The_Game
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
TradeSmart Case Studies
TradeSmart Case StudiesTradeSmart Case Studies
TradeSmart Case Studies
 
Sample Pitch Deck by LetsVenture
Sample Pitch Deck by LetsVentureSample Pitch Deck by LetsVenture
Sample Pitch Deck by LetsVenture
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

How The Home Depot Creates Simplicity on the other Side of Complexity: A Journey to Self-Service Analytics on Big Data

Editor's Notes

  1. Notes: At the Home Depot, we live by a simple premise by our founders: Put customers and associates first, and the rest will take care of itself.
  2. Notes: To give you a little insight to our scale. We have 2200 stores company wide, with more than 385,000 associates, a typical store averages 100k square feet of retail space and in that space we will have more than 35000 products with an additional 1 million available online. In 2016 the company generated 94.6 billion in revenue, 6 billion more than 2015. https://ir.homedepot.com [2016 Annual Report] On a quarterly basis our company post an infographic like this on our investor relations page. It's meant to communicate key takeaways from our quarter. Image from public page https://corporate.homedepot.com/about
  3. Notes: My customers are approximately 3000 leaders and analyst our corporate headquarters and 15000 leaders in the field. Our BI solutions allow these individuals to understand how their particular area of responsibility is performing and how it fits in with the overall picture. On a weekly basis leadership teams conduct performance reviews with our data and BI products. In turn they prioritize actions through the rest of Home Depot systems to ensure the right product is on the shelf at the right time in the most cost effective manner possible. The goal here is to: - Ensure frontline associates can focus on taking care of the customers - Effictively allocate capital to help drive productivity and efficiency To help support them our department attempts to absorb as much complexity as possible. So that when a leader in finance, merchandising and operations sit down together they have singular view of the truth. In order to do this we work with our product owners to understand the most appropriate information to show users based on the context of the question.
  4. What is the quarter to date sales for my class?
  5. Notes: We attempt to use as much finanical data that is available and fill in the gaps with operational data. Moreover, we work with our product owners to decide what particular source of data should be shown to the user based on metric, dimension and time period they are looking at. We do a lot of this with data engineering behind the scenes. In particular with sales becasue the appropriate source could be a blend of both financial and operational based on the horizon of time they are looking at.
  6. By 2012, We had implemented several Microstrategy solutions. However, could not keep up with the volume and variety of needs by our business partners. We came to a resolution that our customers needed to have drill down capability to the data so they could build their own dashboards and their preferred BI tool was Excel.
  7. By 2012, We had implemented several Microstrategy solutions. However, could not keep up with the volume and variety of needs by our business partners. We came to a resolution that our customers needed to have drill down capability to the data so they could build their own dashboards and their preferred BI tool was Excel.
  8. Happy Customers … but starting to have to say no to analysis
  9. Data Network Effect Can’t add data warehouse capacity fast enough - SSAS is a scale up technology ... governed by the size of the server that can process the data - Requires us to move the data from Teradata to the cube - Lot's of CPU consumption on Teradata that has to managed Moving the data out of the MPP system is not efficient
  10. - SSAS is a scale up technology ... governed by the size of the server that can process the data - Requires us to move the data from Teradata to the cube - Lot's of CPU consumption on Teradata that has to managed Moving the data out of the MPP system is not efficient
  11. Exploring solutions and new architectures Finding the right use case
  12. Most of the focus for the BI team has been on the product merchandising and store operations which is the bulk of our business. Meanwhile the Home Services group which is primarily focused on the Do-It-For-Me customer and generates about $4B (ir.homedepot.com 2016 annual report page 39) in revenue was underserved. Outside of Sales and product dimensions there wasn't a lot of overlap with our primary products or customers. So, we determined that this was a good oppurtunity to start building a path forward with this group.
  13. Notes: In the first half of 2016 we selected AtScale and deployed it to our lower lifecycles and started to learn. Had a pilot live in September. During this time we integrated in the source data needed for the Core Services metrics and started building out the MVP data products: Leads, Measures, Sales. We deployed to our product owners and picked up a few pilot users every month after that.
  14. Piloted in second half of 2016 with MVP metrics Continued building out metrics Stabilized products Solution fully deployed as of February 2017. The Core Metrics for Services were live however we had an adoption issue. Most of our users were over served by some of the legacy reporting that was still live. So, the business teams started turning off the legacy reporting.
  15. Piloted in second half of 2016 with MVP metrics Continued building out metrics Stabilized products Solution fully deployed as of February 2017. The Core Metrics for Services were live however we had an adoption issue. Most of our users were over served by some of the legacy reporting that was still live. So, the business teams started turning off the legacy reporting.
  16. By April, we had 160 self-service analytics users via Excel and 100 Tableau dashboard consumers. We started getting feedback from them on performance. We partnered with AtScale and by the end of month addressed the bulk of the performance concerns with the release of 5.4. Additionally, we started seeing that one of the most complex dimension designs was getting the most usage and also being misused. The organization hierarchy for Services was embedded in a self-serve store traits container. We left the container as-is and built a standalone hierarchy from it. This reduced the complexity of the queries and helped with performance. Currently, the executives responsible for the Services business are very happy with the solution. We have users on our application from 7am in the morning all the way to midnight every day of the week.
  17. Notes:  Continue partnering with AtScale to help branch out into new use cases (for us). • Eventual SuperCube refactoring • Dashboard Solutions
  18. Ownership Decision making Junior team members Senior team members Operations Manager
  19. Frameworks
  20. Monitoring
  21. Performance Testing
  22. Performance Testing
  23. Performance Testing