21. Co n fid e n tia l 21
Co n fid e n tia l 21
Da ta Op s Fra m e w ork
22. Co n fid e n tia l 22
Da ta Op s Fra m e w ork Com p on e n ts
Te ch n olog y
● Architecture - selection of tools which comprise data supply chain
● Infrastructure - selection of platform to support architecture
Org a n iza tion
● Roles - division of labor across mixed-skill teams
● Structure - working model for projects across technical and business teams
P roce ss
● Agile - incremental delivery model
23. Co n fid e n tia l 23
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
Movement
ETL/ELT
Storage & Compute
Feedback
Catalog/
Registry
Publish
Citizens
Analysts
Data
Scientists
Developers
Mastering/Quality
Governance
Te ch n olog y - Arch ite ctu re Com p on e n ts
Internal Tabular Data
External Tabular Data
24. Co n fid e n tia l 24
Te ch n olog y - Arch ite ctu ra l P rin cip le s
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
Citizens
Analysts
Data
Scientists
Developers
● Cloud First
● Continuous (assume data will change)
● Highly Automated - automate whenever possible
● Open/Best of Breed (not one platform/vendor)
● Bi-Directional (Feedback)
● Collaborative (Humans at the Core)
● Service Oriented (clear endpoints for data)
● Loosely Coupled (Restful Interfaces Table(s) In/Out)
● Both aggregated AND federated storage
● Both batch AND Streaming
● Lineage/Provenance is essential
● Scale Out/Distributed
Internal Tabular Data
External Tabular Data
25. Co n fid e n tia l 25
In fra stru ctu re - Ke y Com p on e n ts
Management
Compute
Search
Storage
Infrastructure
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
Citizens
Analysts
Data
Scientists
Developers
Internal Tabular Data
External Tabular Data
26. Co n fid e n tia l 26
Internal Tabular Data
External Tabular Data
Data
Suppliers
Data
Consumers
CIO
Source Owner
DBA
IT Professional
CDO
Data Engineer
Curator
Steward
Business Owners and Other CxOs
Org a n iza tion - Role s
Data
Preparers
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
Citizens
Analysts
Data
Scientists
Developers
27. Co n fid e n tia l 27
Org a n iza tion - Role s
Role Goals Tools
Citizen Use data to make business decisions Viz, CRM, Excel, PowerPoint, Word, Web
Search
Analyst Deliver insights to the business, typically through dashboards and
reports
Viz, Excel, SSDP, Web Search
Scientist Deliver insights to the business, typically through models and algorithms R, Python, SAS, SSDP
Developer Build applications which leverage corporate data Python, Java, JS, SQL, REST
Engineer Deliver and manage data pipelines ETL, SQL
Curator Ensure consumers have the data they need, in the form they need it MDM, Catalog
Steward Create policies and drive governance MDM, Catalog, Governance
Source Owner Define and manage purpose, processes (data creation, consumption) &
users (i.e., access) of the data source
EDW, SQL, ERWin, LDAP, SAP
Consumers
Preparers
Suppliers
28. Co n fid e n tia l 28
Org a n iza tion - Stru ctu re
Sh a re d Se rvice s Mod e l
Full-service development of data applications, in
collaboration with business
Advantages
● Centralized technical knowledge
● Centralized resourcing - one-stop shop
● Accretive experience
Disadvantages
● Bandwidth contention - how to prioritize
competing projects?
Ad visory Mod e l
Bootstraps projects with best of breed tools and
approach, but does not complete them
Advantages
● Centralized technical knowledge
● Minimal resourcing - experts, not implementers
● Flexibility - options to deviate from standard
tools
Disadvantages
● Resource burden in on each project / department
- both in development and ongoing maintenance
● Limited feedback - does the advice get better
after each project?
Appropriate model will fluctuate with scale of DataOps project work
29. Co n fid e n tia l 29
P roce ss - Th e W ron g W a y
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
● Labor-intensive
● Monolithic
● IT driven
Delivery
Time
Remaining
Work
$
?
Modeling
Rules
Testing
?
$
!
Citizens
Analysts
Data
Scientists
Developers
External Tabular Data
Internal Tabular Data
30. Co n fid e n tia l 30
P roce ss - Th e Rig h t W a y
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
● Automated
● Incremental
● Collaborative
Time
Remaining
Work
$
$
$
$
!
?
?
?
?
Citizens
Analysts
Data
Scientists
Developers
Internal Tabular Data
External Tabular Data
31. Co n fid e n tia l 31
Co n fid e n tia l 31
Ca se Stu d ie s
32. Co n fid e n tia l 32
Ca se Stu d y - Fin a n cia l In stitu tion
A m a jor fin a n cia l in stitu tion b u ilt a d a ta la b th a t w orks to in ve n t solu tion s th a t h a rn e ss
d a ta a n d a d va n ce d a n a lytics.
Goa ls
● Better understanding of 60 million customers
● Create simpler, more intuitive and intelligent products and customer experiences
● Help businesses do more business with each other using the bank’s cards
Holistic a p p roa ch
● Mingles human-centered design, full-stack engineering and data science
● Project manager oversees entire end-to-end data pipeline
● Interdisciplinary team is made up of DevOps and data scientists
Da ta u n ifica tion a t th e ce n te r of th e p ip e lin e
● Raw data is cleaned and deduplicated, then fed into Tamr for classification and training
● Bulk matching allows bank to determine whether a supplier/vendor from its Master Data Source
overlaps with the list collected from the customer.
● Subject matter experts act as curators to improve accuracy of ML models
33. Co n fid e n tia l 33
Ca se Stu d y - P h a rm a ce u tica l Com p a n y
A m a jor p h a rm a ce u tica l com p a n y re a lize d th a t its R&D e n viron m e n t w a sn ’t u p to p a r,
w h ich w a s p re ve n tin g th e m from d e ve lop in g n e w d ru g s w ith th e le ve l of in n ova tion
a n d sp e e d re q u ire d .
Goa ls
● Make it easier to access and use data for exploratory analysis and decision-making about new
medicines
Ch a lle n g e s
● Conducted a survey about data across the organization
○ Result: very difficult to work with data outside of a departmental silo
○ Identified top 10 use cases for integrating diverse data
Re su lts a n d Be n e fits
● Turned to machine learning since a traditional MDM approach would have taken too long
● Use cases have expanded from 10 to 250
● Reduction in time to get answers to ad hoc questions
34. Co n fid e n tia l 34
In P a rtin g - W h a t NOT to d o
● Avoid boil the ocean/”waterfall” (projects measured in years/quarters)
○ Build rational long term infra while delivering real analytic value along the way
● Single “Platform”: Don’t overestimate what single piece of software can do
○ Focus on thoughtfully designed ecosystem of loosely coupled best of breed tools
● Single Vendor: Don’t overestimate what single vendor can do
○ Align vendors with APIs and expectations that they MUST work together
● Don’t Underestimate effort required to make FOSS work
○ Just because Google does it doesn’t mean you can do it
● Don’t underestimate human/behavioral challenges with data
○ Most often the reason that projects fail/stall are human/behavioral