How do you possibly deliver ‘self-service’ and ‘interactive’ reporting to ‘2500 business leaders’ when you have 200 TBs of big data, generated by activity from 100MM Store SKU combinations that lives in a traditional EDW that grows bigger and more complex every day? You think outside of the big box.
This session will share The Home Depot’s journey of data strategy innovation, from traditional enterprise data warehouse (EDW) and Business Intelligence (BI) to our modern and approach to ad-hoc analysis live on a data lake. What began in 2009 with a traditional Teradata EDW + traditional MicroStrategy BI, to support business leaders’ performance analysis, ‘matured’ by 2011 such that we in IT could barely keep pace with the flood of report and data requests. Self-service demands prompted us to innovate our approach with an SSAS ‘super-cube’; which too became so successful, that business units far and wide requested more and more metrics and data added. When our super-cube grew to 1200 metrics, 12 TB and 2500 ad hoc users, and took 16 hours to realign and up to 5 weeks to reload, it was time to innovate again. The business loved the multi-dimensionality of the ‘cube’, and the ease of using BI tools they knew (Tableau, Excel), and my IT team needed cost-effective data storage and processing power, in the face of data explosion. How did we solve it this time? The result is a journey, that we are still on, and involves an interesting combination of technology, processes and people.
Please join me, Hiren Patel, Software Engineering Manager, as I share details of The Home Depot’s journey to successfully deliver ad-hoc analysis on our big data. From our history with traditional EDW, to the pros and cons of our ‘Super-cube’, to trials, hiccups and lessons learned across aggregates and partnering. I’ll will share the good, bad and ugly, that led me and my team to building a path forward for self-service analytics at THD on HADOOP, and in doing so, helped THD continue to fulfill our goal of helping the do-it-yourselfer be their own hero in home improvement.
6. How would you do this manually?
Selection Value
Metric Sales
Time Period Quarter to Date
Current Date Dec 13th
7. Identify your data sources
Selection Value
Metric Sales
Time Period Quarter to Date
Current Date Dec 13th
Source Grain Facts
Financial Store/Class/Month
Financial
Store/Class/Week
Operational
Date
Store/Class/Day
8. Select your cuts of data and sum
Selection Value
Metric Sales
Time Period Quarter to Date
Current Date Dec 13th
Source Grain Facts
Financial Store/Class/Month Nov 27th
Financial
Store/Class/Week
Dec 4th, Dec 11th
Operational Store/Class/Day Dec 12th
9. Simplicity on the other side of complexity
One Sales Metric
https://en.wikiquote.org/wiki/Talk:Oliver_Wendell_Holmes_Jr.
10. OLAP cube consideration
Selection Value
Metric Sales
Time Period Quarter to Date
Current Date Dec 13th
Source Grain Facts Adjustments
Financial Store/Class/Month Nov 27th Ops/Fin Blend
Financial
Store/Class/Week
Dec 4th, Dec 11th Ops/Fin Blend
Operational Store/Class/Day Dec 12th
35. Develop physical intuition of technology components
1. Data Storage: Partition, locality, format
2. Disk or In-Memory
3. Query Plan
4. Run-time environment
5. Execution flow path
6. Authentication
36. Data Engineering
1. Business logic in the data layer
2. Move as much processing as possible in the MPP
3. Minimize data movement
37. Process
1. Reevaluate your assumptions on periodic basis
2. Have a green / blue strategy
3. Team interrupt
Notes:
At the Home Depot, we live by a simple premise by our founders: Put customers and associates first, and the rest will take care of itself.
Notes:
To give you a little insight to our scale. We have 2200 stores company wide, with more than 385,000 associates, a typical store averages 100k square feet of retail space and in that space we will have more than 35000 products with an additional 1 million available online.
In 2016 the company generated 94.6 billion in revenue, 6 billion more than 2015. https://ir.homedepot.com [2016 Annual Report]
On a quarterly basis our company post an infographic like this on our investor relations page. It's meant to communicate key takeaways from our quarter.
Image from public page https://corporate.homedepot.com/about
Notes:
My customers are approximately 3000 leaders and analyst our corporate headquarters and 15000 leaders in the field.
Our BI solutions allow these individuals to understand how their particular area of responsibility is performing and how it fits in with the overall picture. On a weekly basis leadership teams conduct performance reviews with our data and BI products.
In turn they prioritize actions through the rest of Home Depot systems to ensure the right product is on the shelf at the right time in the most cost effective manner possible. The goal here is to:
- Ensure frontline associates can focus on taking care of the customers
- Effictively allocate capital to help drive productivity and efficiency
To help support them our department attempts to absorb as much complexity as possible. So that when a leader in finance, merchandising and operations sit down together they have singular view of the truth.
In order to do this we work with our product owners to understand the most appropriate information to show users based on the context of the question.
What is the quarter to date sales for my class?
Notes:
We attempt to use as much finanical data that is available and fill in the gaps with operational data. Moreover, we work with our product owners to decide what particular source of data should be shown to the user based on metric, dimension and time period they are looking at. We do a lot of this with data engineering behind the scenes. In particular with sales becasue the appropriate source could be a blend of both financial and operational based on the horizon of time they are looking at.
By 2012, We had implemented several Microstrategy solutions. However, could not keep up with the volume and variety of needs by our business partners. We came to a resolution that our customers needed to have drill down capability to the data so they could build their own dashboards and their preferred BI tool was Excel.
By 2012, We had implemented several Microstrategy solutions. However, could not keep up with the volume and variety of needs by our business partners. We came to a resolution that our customers needed to have drill down capability to the data so they could build their own dashboards and their preferred BI tool was Excel.
Happy Customers … but starting to have to say no to analysis
Data Network Effect
Can’t add data warehouse capacity fast enough
- SSAS is a scale up technology ... governed by the size of the server that can process the data
- Requires us to move the data from Teradata to the cube
- Lot's of CPU consumption on Teradata that has to managed
Moving the data out of the MPP system is not efficient
- SSAS is a scale up technology ... governed by the size of the server that can process the data
- Requires us to move the data from Teradata to the cube
- Lot's of CPU consumption on Teradata that has to managed
Moving the data out of the MPP system is not efficient
Exploring solutions and new architectures
Finding the right use case
Most of the focus for the BI team has been on the product merchandising and store operations which is the bulk of our business.
Meanwhile the Home Services group which is primarily focused on the Do-It-For-Me customer and generates about $4B (ir.homedepot.com 2016 annual report page 39) in revenue was underserved. Outside of Sales and product dimensions there wasn't a lot of overlap with our primary products or customers.
So, we determined that this was a good oppurtunity to start building a path forward with this group.
Notes:
In the first half of 2016 we selected AtScale and deployed it to our lower lifecycles and started to learn.
Had a pilot live in September.
During this time we integrated in the source data needed for the Core Services metrics and started building out the MVP data products: Leads, Measures, Sales. We deployed to our product owners and picked up a few pilot users every month after that.
Piloted in second half of 2016 with MVP metrics
Continued building out metrics
Stabilized products
Solution fully deployed as of February 2017. The Core Metrics for Services were live however we had an adoption issue. Most of our users were over served by some of the legacy reporting that was still live. So, the business teams started turning off the legacy reporting.
Piloted in second half of 2016 with MVP metrics
Continued building out metrics
Stabilized products
Solution fully deployed as of February 2017. The Core Metrics for Services were live however we had an adoption issue. Most of our users were over served by some of the legacy reporting that was still live. So, the business teams started turning off the legacy reporting.
By April, we had 160 self-service analytics users via Excel and
100 Tableau dashboard consumers. We started getting feedback from them on performance.
We partnered with AtScale and by the end of month addressed the bulk of the performance concerns with the release of 5.4. Additionally, we started seeing that one of the most complex dimension designs was getting the most usage and also being misused.
The organization hierarchy for Services was embedded in a self-serve store traits container. We left the container as-is and built a standalone hierarchy from it. This reduced the complexity of the queries and helped with performance.
Currently, the executives responsible for the Services business are very happy with the solution. We have users on our application from 7am in the morning all the way to midnight every day of the week.
Notes:
Continue partnering with AtScale to help branch out into new use cases (for us).
• Eventual SuperCube refactoring
• Dashboard Solutions
Ownership
Decision making
Junior team members
Senior team members
Operations
Manager