Getting from raw data to deploying data-driven solutions requires technology, data, and people. All of which exist. So why aren’t we seeing more truly data-driven companies: what's missing and why? During Strata Hadoop World Singapore 2015, Pauline Brown, Director of Marketing at Dataiku, explains how lack of collaboration is what is keeping companies from building and deploying data products effectively. Learn more about Dataiku and Data Science Studio: www.dataiku.com
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
The 3 Key Barriers Keeping Companies from Deploying Data Products
1. The 3 Key Barriers Keeping Companies
from Acting Upon the Possibilities
That Big Data has to Offer
2. A little bit about me…
• Born & raised in Palo Alto, California
• BA in European History From Columbia University
• Masters in Marketing & Communication from Sciences
Po Paris
• Director of Marketing at Dataiku
• Currently living in Paris
3. Shift from
Uomo Universale
« A man can do all things if he will. »
-Leon Battista Alberti (1404-72)
Excel at all things:
• Intellect
• Mathematics
• Science
• Art
• Social
• Physical
23. Building a Data Product
User
Interface
Stream / Real-time
Query
Data
Preprocessing
24. Building a (Predictive) Data Product
Predicted Data
Historical Data
Machine Learning
Model
Preprocessing
25. Building a (Predictive) Data Product
Predicted Data
Historical Data
Machine Learning
Model
Preprocessing
Pre-processing & cleaning data alone can take
up to 80% of the time spent on a data project
26. Building a (Predictive) Data Product
Predicted Data
Historical Data
Machine Learning
Model
Preprocessing
27. Building a (Predictive) Data Product
Predicted Data
Historical Data
Machine Learning
Model
Preprocessing
28. Running a (Predictive) Data Product
Deployment
Real-time / Stream
Model
Preprocessing
Predicted Data
42. Why is Production & Industrialisation of
(Predictive) Data Products Important?
43. Those who win are those who deliver
new data products continuously
44. Those who win are those
who deliver data products
Data products are supposed to deliver
business value… if you don’t deploy them,
where’s the long term value?
45. No Industrialisation =
Limited ROI
Those who win are those
who deliver data products
It’s like building your dream house but
never moving in.
46. No Industrialisation =
Limited ROI
Those who win are those
who deliver data products
It’s like building your dream house but
never moving in. Absurd!
47. So Why Aren’t More Companies Deploying
(Predictive) Data Products?
48. A Data Product must be
business focused (ROI) & mathematically accurate (RELIABLE)
52. …But Your Data Product Needs Both
Patterns, patterns, patterns
Performance, Truth, Anomaly
BUSINESS
KNOWLEDGE
MATHEMATICAL
ACCURACY
1° Business Analytic & Algorithmic Minds Are Different…
53. MINDSET
• Project alignment from conception to execution – install team mindset with
common goal – even if the paths to get there are different
FRAMEWORK
• One common platform with enough flexibility for both mindsets to fully
exercise their individual skill and expertise on a common project
Resolving the Skill Gap
66. Data reliability is hard to guarantee
Technological Complexity
3° Production and Industrialisation is Complex
67. Technological Complexity
The assumption that production will identically reproduce the analysis phase
is a hard promise to make
3° Production and Industrialisation is Complex
69. Human & Organisational Complexity
BUILDING
MAINTAINING
3° Production and Industrialisation is Complex
70. Human & Organisational Complexity
BUILDING
MAINTAINING
Business Analyst
• patterns
3° Production and Industrialisation is Complex
71. Human & Organisational Complexity
BUILDING
MAINTAINING
Business Analyst / Algorithmic
• patterns
• performance
• truth
3° Production and Industrialisation is Complex
72. Business Analyst / Algorithmic
• patterns
• performance
• truth
Data engineers
• stability
• reliability
• cost of ownership
Human & Organisational Complexity
BUILDING
MAINTAINING
3° Production and Industrialisation is Complex
73. Business Analyst / Algorithmic
• patterns
• performance
• truth
Data engineers
• stability
• reliability
• cost of ownership
Human & Organisational Complexity
BUILDING
MAINTAINING
3° Production and Industrialisation is Complex
74. TIP #1: Invest in a platform where development and production are the same
DEVELOPMENT
TEST
PRODUCTION
Making Complexity Work for You
75. TIP #2: Invest in monitoring capabilities & strategies
Making Complexity Work for You
76. Data Engineers must have visibility and understanding of the key business metrics
TIP #3: Name your Data Engineer(s) Wisely & Define Responsibilities
Making Complexity Work for You
77. Data Engineers must know if (and when) a model is diverging
TIP #3: Name your Data Engineer(s) Wisely & Define Responsibilities
Making Complexity Work for You
78. Data Engineer must be responsible for quality of service
TIP #3: Name your Data Engineer(s) Wisely & Define Responsibilities
Making Complexity Work for You
79. Differentiate builders of new data products
from those that maintain them.
TIP #4: It’s Not a One Man Show
Making Complexity Work for You
81. From the Renaissance of Big Data…
Where the Data Science Superstar is one person that excels at all skill sets…
…and where actual data products are rarely deployed and maintained
82. To the Enlightenment of (Big) Data
Where the Data Science Superstar is a team of complimentary skill sets…
… and where data products are designed, built, tested, and deployed by a
team of skilled individuals that each have a distinct role.
TEAM
84. The Rise of the
Data Science Team Manager
The Data Science Team Manager must understand the stakeholders’ needs,
translate them into a business need that can be answered with a data
product…
85. The Rise of the
Data Science Team Manager
The Data Science Team Manager must permit and enable collaboration
between business analysts, statisticians, & engineers…
Collaboratively
design, build, & deploy
Data Products
86. The Rise of the
Collaborative Data Science Team
…all the while maintaining the distinction between each individual role
and each individual skill set.
Business
Mathematics
Data
Engineering
87. The Secret to Building and Industrializing Data Products
is Collaboration.
Today, collaboration between different
skill sets, technologies, and data is finally possible.