Like all consumer packaged goods (CPG) companies, PepsiCo relies on huge volumes of data to accurately replenish its retailers with the appropriate amount and type of product. Across the CPG industry, most analysts exclusively rely on Excel and Access for data wrangling, but as PepsiCo’s data surpassed the capabilities of those tools, they knew they needed a better way.
https://www.brighttalk.com/webcast/9573/227935
How PepsiCo's Big Data Strategy is Disrupting CPG Retail Analytics
1. How PepsiCo’s Big Data Strategy is
Disrupting CPG Retail Analytics
Mike Riegling, Analyst, PepsiCo
presented by:
Will Davis, Trifacta
Jeff Huckaby, Tableau
Camilo Silva, Hortonworks
3. Q&A Session
with your hosts:
Will Davis
Director of Product
Marketing
Jeff Huckaby
Market Segment Director,
Retail & Consumer Goods
Camilo Silva
Enterprise Account Manager
4. 4
Industry-leading data
wrangling solution for data
analysts
Self-service data exploration
& preparation
Supporting desktop, cloud and
big data deployments
The Best-of-Breed Analytics Stack
Leading solutions for data processing, wrangling & visualization
Industry-leading Enterprise
Analytics Platform
Governance & Self-service
analytics at scale
Deploy on premise, in the
cloud, or fully hosted
Future-proof scalable data
platform to enable storage and
growth of expanding data
Allows business decisions faster
and based on more actionable
insight
Enables corporate success in
consumer markets
5. Agenda
CPFR Data Wrangling & Analytics at PepsiCo – Mike Riegling
• CPFR Process at PepsiCo
• Challenges Managing Diverse Internal & External Data
• Walkthrough of Trifacta + Tableau
5
Question & Answer
• Will Davis - Trifacta
• Jeff Huckaby - Tableau
• Camilo Silva - Hortonworks
Analytics Infrastructure at PepsiCo – Will Davis
• History of Big Data at PepsiCo
• IT/Business Collaboration for Analytics
• Analytics Stack: Hortonworks + Trifacta + Tableau
7. Analytics Journey at PepsiCo
• PepsiCo’s journey with Big Data started over 4 years to
respond to ever-increasing data requirements across Pepsi
• Focus on providing technology infrastructure and applications
that bring shared success to Business & IT
• Eliminating traditional processes where IT was a bottleneck to
the business
• Unified Data Architecture has 3 main pillars:
• Enterprise Data Warehouse
• Hortonworks Data Lake Environment
• Data Discovery, Analytics & Business Intelligence tools
(Trifacta & Tableau)
8. Data Platform - Hortonworks
• Selected Hortonworks Data Platform (HDP) as foundational
technology to extend PepsiCo’s Unified Data Architecture
• Leveraging HDP to acquire, understand and incorporate
new forms of internal/external business and consumer data
• HDP provides the platform capable of scaling up to effectively leverage the rapid growth of
more granular consumer data
• Still early days on Hadoop at PepsiCo – only managing hundreds TB’s of data in HDP
• Use cases on Hadoop include CPFR (first use case), Consumer & Marketing Analytics
• Need only standard services to support use cases – Hive, YARN, PIG, etc…
• CPFR use case with Trifacta consumes approximately 25-50% of HDP resources
9. Data Wrangling - Trifacta
• Trifacta was selected as the standard self service data wrangling
tool within our data discovery infrastructure.
• Provides PepsiCo users with a familiar, yet powerful portal for
data discovery and process development.
• By empowering business users, Trifacta helps bridge across the time and resource
boundaries between business and IT
• Enables more rapid deployment of solutions that fit business needs precisely
• Collaborative effort, with both sides open to driving innovation and experimentation, delivers
greater speed to shared success
10. Data Visualization & Business Intelligence - Tableau
• Tableau is the data visualization & business intelligence
standard at PepsiCo
• Over 2000 users, 59 projects & 541 workbooks across
PepsiCo
• 7+ Tableau servers in production environment (each server has 8 cores & 64GB RAM)
• Tableau serves as corporate standard for Business Intelligence throughout PepsiCo on top of
EDW as well as self-service analytics for departments and individual analysts
• CPFR use case is completely self-service process for end users to discover and prep diverse
data in Trifacta and build dashboards in Tableau (without the help of IT)
11. Hortonworks + Trifacta + Tableau in the Pepsico Data
Architecture
Unified Data Architecture
ERP
SCM
CRM
Social
Media
Sensor
Data
Machine
Logs
Marketing
Planning
Data
Mining
Analytics
Language
Business
Analyst
Data
Analyst
Data
Scientist
Customer
Partners
Frontline
Workers
Data
Sources
Tools and
Apps
Users
ENTERPRISE DATA
WAREHOUSE
DATA
DISCOVERY/
ANALYTICS
BUSINESS
INTELLIGENCE
ETL
Data
Quality
17. 17
Forecasting Collaboration Process
Why combine this data together?
• Combining the data into a single master report gives a more accurate overall picture
performance
• Promotes collaboration between PepsiCo and the customer
• Traditionally the vendor–retailer relationship was contentious
• Combing PepsiCo data and retailer data helps promote shared success goals
• Through this process there was an increase in the forecast accuracy of PepsiCo which
resulted in reduced spoilage for retailers
20. Challenges Leading to Hortonworks + Trifacta + Tableau Solution
• Data Outgrowing Tools: Existing infrastructure pushed to the limits by the size of the source
datasets
• Technical Skills Required: Datasets were connected through a large series of elaborate
queries and macros.
• Data Quality Issues: Errors difficult to locate.
• Slow, Manual Process: Build time for one CPFR tool could take months.
21. PepsiCo’s Hortonworks + Trifacta Solution
21
Business
All structuring, enriching,
and cleansing
22. Hortonworks + Trifacta + Tableau Solution Benefits at
PepsiCo
• Business Benefits:
– Reporting time has been reduced by 70%
– Build time has been reduced as much as 90%
• Technical Benefits:
– Can easily work with large quantities of non-standard data
– Self-service prep for analysts reduces technical dependencies on IT
– Trifacta surfaces errors and data problems immediately to analysts
• PepsiCo CPFR teams can now respond more quickly to sales trends and adjust
forecasting and inventory distribution accordingly
23. DEMO Intro - Trifacta Wrangling Process for Retailer Data
• Structure the third party data
– BOH: Balance on Hand or Inventory Data
• Cleanse mismatched values and delimiters
– Remove the ‘,’ from values that exceed 1,000
• Extract embedded text/numbers
– Split the Customer Item Code and Item description into two separate columns
• Convert the customer Item Code to the PepsiCo UPC
– Join the BOH dataset with the Item Reference Dataset and build a new master report
• Run the job at scale and profile the results
– Publish to Tableau
27. Q&A Session
with your hosts:
Will Davis
Director of Product
Marketing
Jeff Huckaby
Market Segment Director,
Retail & Consumer Goods
Camilo Silva
Enterprise Account Manager
28. 28
Trifacta Wrangler
Enterprise for Hadoop
https://www.trifacta.com/gated-
form/bringing-hadoop-to-an-analysts-
fingertips/
Empowering CPG to
Drive Innovation with
Data
https://www.trifacta.com/resources/emp
owering-consumer-packaged-goods-
organizations-to-drive-innovation-with-
data/
Supporting Resources
About the Hortonworks
Solution
http://hortonworks.com/solutions/
Try Hortonworks
Sandbox
http://hortonworks.com/products/sandbo
x/
Big Data Analytics for
Retail with Hadoop
http://hortonworks.com/info/big-data-
analytics-for-retail-with-apache-hadoop/
Tableau for Big Data
Analysis
http://www.tableau.com/resource/big-data-
analysis
Faster, Smarter Retail
Analytics with Tableau
http://www.tableau.com/resource/big-data-
analysis