SlideShare a Scribd company logo
1 of 35
Download to read offline
Grab some
coffee and
enjoy the
pre-show
banter
before the
top of the
hour!
Best Practices in DataOps: How to
Create Agile, Automated Data
Pipelines
Eric Kavanagh
CEO, Inside Analysis
Mark Marinelli
Head of Product, Tamr
Wayne Eckerson
President, Eckerson Group
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Best Practices in DataOps
How to Create Agile, Automated Data Pipelines
Wayne W. Eckerson
May 8, 2019
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
1. Your data team is flooded with minor request tickets and is burning out.
2. Business users don’t trust the data because it contains too many errors.
3. Source system changes keep breaking your ETL jobs and data pipelines.
4. Business users don’t understand why it takes so long to get data.
5. You have difficulty meeting service level agreements (SLAs).
6. Data analysts write the same jobs and reports with minor variations.
7. Data scientists wait for months for data and computing resources
8. Your company can’t discern the true cost of migrating to the cloud
9. Your data environment is too chaotic to implement predictive analytics
10. Your self-service initiative has spawned hundreds of data silos.
Bonus: Your data lake is more of a data swamp.
Bonus: It’s takes months to deploy a single predictive model.
10 Symptoms You Need DataOps
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
What is DataOps?
Lean
TQM
Agile Dev/Ops
• Scrum, Kanban
• Business engagement
• Self-organizing teams
• Retrospectives
• Automation
• Orchestration
• Efficiency
• Simplicity
• Team-based development
• Version control
• Continuous integration/ delivery
• Test-driven development
• Performance management
• Performance metrics
• Continuous monitoring
• Benchmarking
DataOps
“A set of practices, processes, and technologies for building, operationalizing,
automating, and managing data pipelines from source to consumption.”
DataOps = Data Operations
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
DataOps History
DataOps applies rigor of software engineering to the
development and execution of data pipelines.
“Cowboy
Coders”
Team-based
Development
DevOps-based
Development
1960s 1970s 1980s 1990s 2000s 2010s 2020s
First DevOps
event (2009)
Manifesto for Agile Software
Development published (2001)
DevOps DataOps
KEY:
DataOps Manifesto
published (2017)
First DataOps
Event (2019)
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Primary Use Cases
Big Data Data Science Self Service
Data
Warehousing
Standardize and
reuse core data
pipeline components:
ingest, transform,
clean, etc.
Create data science
sandboxes on
demand; deploy
models automatically;
monitor data drift.
Centralize logic and
permissions to
facilitate data access
and analysis while
eliminating data silos
Speed development
by assigning agile
teams to business
groups to build end-
to-end solutions
Agile but ungoverned Governed but not agile
Reuse and
Collaboration
Self Service and
Automation
Governance and
Infrastructure
Speed and
Prioritization
Biggest
Needs
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Adoption
Yes
27%
Somewhat
30%
No
43%
DOES YOUR ORGANIZATION HAVE A
DATAOPS INITIATIVE?
Based on 175 respondents from an Eckerson Group
survey conducted in April, 2019.
32%
29%
10%
9%
7%
6%
5%
2%
1%
0%
I T OR BI DI RE C T OR OR MA NA GE R
I T OR BI A RC H I T E C T ,…
C ONS ULT A NT
BUS I NE S S MA NA GE R - A NA LY T I C S
DA T A A NA LY S T OR S C I E NT I S T
BUS I NE S S E X E C UT I VE OR…
DA T A E NGI NE E R
A C A DE MI C
VE NDOR
DA T A OP S E NGI NE E R
RESPONDENT ROLES
18%
15%
11%
29%
26%
VE RY S MA LL < 100…
S MA LL <500…
ME DI UM <1, 000 E MP …
LA RGE <10, 000
VE RY LA RGE >…
COMPANY SIZE
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Benefits
Faster cycle time
Fewer data defects
More scalability, reliability
Lower costs
More innovation
Happier customers
Continuous integration/delivery, reuse, automation
Test-driven development and execution
Team-based development, continuous monitoring
Higher development capacity, fewer errors
Focus efforts on value-add solutions and technologies
Get more for less with greater trust and alignment
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Benefits from Survey
60%
55%
50%
50%
48%
47%
47%
42%
F A S T E R CY CLE T I ME S
HA P P I E R B US I NE S S US E RS
DE LI V E R NE W A P P LI CA T I ONS MORE QUI CK LY
F E W E R DE F E CT S A ND E RRORS
I NGE S T NE W DA T A S OURCE S MORE RA P I DLY
F A S T E R CHA NGE RE QUE S T S
I NCRE A S E D DE V E LOP ME NT CA P A CI T Y
I MP ROV E D DA T A GOV E RNA NCE
BENEFITS OF DATAOPS
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Challenges
55%
53%
50%
50%
47%
42%
35%
34%
26%
23%
E S T A B LI S HING F ORMA L P ROCE S S E S
ORCHE S T RA T I NG CODE A ND DA T A A CROS S T OOLS
S T A F F CA P A CI T Y
MONI T ORI NG T HE E ND-TO-E ND E NV I RONME NT
B UI LDI NG RI GOROUS T E S T S UP F RONT
LA CK OF A DE QUA T E A UT OMA T I ON T OOLS
GE T T I NG B US I NE S S US E RS T O B UY I NT O T HE
P ROCE S S
A DOP T I NG A GI LE ME T HODS A ND T E A MS
DA T A I S T OO HA RD T O F I ND
GE TTI NG T E CHNI CA L US E RS T O B UY I N T O T HE
P ROCE S S
DATAOPS CHALLENGES
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Components and Tools
58%
54%
53%
50%
50%
46%
46%
41%
38%
32%
28%
A GI LE DE V E LOP ME NT
CONT I NUOUS DE LI V E RY
COLLA B ORA T I ON A ND RE US E
CONT I NUOUS I NT E GRA T I ON
CODE RE P OS I T ORY
DA T A P I P E LI NE ORCHE S T RA T I ON
P E RF ORMA NCE A ND A P P LI CA T I ON MONI T ORI NG
CONT I NUOUS T E S T I NG
W ORKF LOW MA NA GE ME NT
CHA NGE MA NA GE ME NT RE QUE S T
CONT A I NE RS A ND ORCHE S T RA T I ON T OOLS
RATE THE IMPORTANCE OF EACH DATAOPS
COMPONENT?
High
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Use Cases
66%
60%
56%
52%
39%
29%
34%
27%
DA T A W A RE H OUS E S A ND MA RT S
RE P ORT I NG A ND DA S H BOA RDI NG
S E LF - S E RVI C E A NA LY S I S
DA T A S C I E NC E A ND MA C H I NE LE A RNI NG
DA T A LA K E
OLA P C UBE S F OR RE P ORT I NG A ND A NA LY S I S
C US T OME R- F A CI NG A P P LI C A T I ONS
A UDI T , C OMP LI A NC E , S E C URI T Y
DATAOPS USE CASES
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Best Practices
Form a data department (with a CDO)
Map and assess your data environment
Educate your team about DataOps
Create cross-functional dev teams
Align the teams with business priorities
Continuously review and refine processes
The “Soft
Stuff”
If you don’t have one already Add a CDO for executive clout
Map data flows; assess waste, inefficiencies,
manual processes, error sources, dev capacity.
Expect resistance: ”Data is different!” “Don’t
slow us down!”
Stick with it; “You can’t drive fast w/o brakes.”
Self-organizing, cross-trained; collaborative, agile
teams that build end-to-end solutions
Align agile themes, initiatives, epics, and stories
with business goals; get cross-functional priorities
It’s a journey; benchmark performance and
continuously improve cycle times, capacity,
reuse, and other core objectives.
Pull ”data people” out of IT; unite data engineers,
data scientists, and SW engineers.
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Best Practices (cont)
Start small and build incrementally
Build for reuse
Segregate duties and environments
Test and monitor everything
Use DevOps and DataOps tools
Create a self-service infrastructure
Build for the enterprise
The “Hard
Stuff”
Standardize ingest, transforms, configurations,
code, data sets; use repositories & containers.
Use tools to migrate code from dev to test, to
production environments and segregate duties
Build tests before and after coding; use tests to
monitor and automate data pipelines.
Repositories for data, code, configurations; tools
for agile collaboration, CI/CD, testing, data
catalog, orchestration, data glossary, unification.
Centralize logic; apply permissions for data
access and functionality; automate report and
model deployment; serverless, Kubernetes,
Plan for security, governance, auditability,
scalability, reliability, portability, and continuous
monitoring.
Insist on business representation on the dev
team; get cross-functional priorities monthly
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Summary
DataOps puts
your data on a
solid foundation
Speeds cycle time,
improves quality,
increases capacity,
reduces cost
Lets your data
team focus on
value-add
Such as predictive
analytics,
streaming data,
cloud computing
Increasing
customer
satisfaction and
business value
DataOps is light—
out, automated
data operations.
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Questions?
I’m listening!
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Wayne Eckerson
• 25+ year thought leader in data and analytics
• Sought-after speaker and consultant
• President, Eckerson Group
• Former director of research at TDWI
• Author of hundreds of articles and reports
Performance
Management
BI/Analytics
© Eckerson Group 2019 Twitter: @weckerson www.eckerson.com
Get More Value from
Data and Analytics
Co n fid e n tia l 21
Co n fid e n tia l 21
Da ta Op s Fra m e w ork
Co n fid e n tia l 22
Da ta Op s Fra m e w ork Com p on e n ts
Te ch n olog y
● Architecture - selection of tools which comprise data supply chain
● Infrastructure - selection of platform to support architecture
Org a n iza tion
● Roles - division of labor across mixed-skill teams
● Structure - working model for projects across technical and business teams
P roce ss
● Agile - incremental delivery model
Co n fid e n tia l 23
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
Movement
ETL/ELT
Storage & Compute
Feedback
Catalog/
Registry
Publish
Citizens
Analysts
Data
Scientists
Developers
Mastering/Quality
Governance
Te ch n olog y - Arch ite ctu re Com p on e n ts
Internal Tabular Data
External Tabular Data
Co n fid e n tia l 24
Te ch n olog y - Arch ite ctu ra l P rin cip le s
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
Citizens
Analysts
Data
Scientists
Developers
● Cloud First
● Continuous (assume data will change)
● Highly Automated - automate whenever possible
● Open/Best of Breed (not one platform/vendor)
● Bi-Directional (Feedback)
● Collaborative (Humans at the Core)
● Service Oriented (clear endpoints for data)
● Loosely Coupled (Restful Interfaces Table(s) In/Out)
● Both aggregated AND federated storage
● Both batch AND Streaming
● Lineage/Provenance is essential
● Scale Out/Distributed
Internal Tabular Data
External Tabular Data
Co n fid e n tia l 25
In fra stru ctu re - Ke y Com p on e n ts
Management
Compute
Search
Storage
Infrastructure
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
Citizens
Analysts
Data
Scientists
Developers
Internal Tabular Data
External Tabular Data
Co n fid e n tia l 26
Internal Tabular Data
External Tabular Data
Data
Suppliers
Data
Consumers
CIO
Source Owner
DBA
IT Professional
CDO
Data Engineer
Curator
Steward
Business Owners and Other CxOs
Org a n iza tion - Role s
Data
Preparers
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
Citizens
Analysts
Data
Scientists
Developers
Co n fid e n tia l 27
Org a n iza tion - Role s
Role Goals Tools
Citizen Use data to make business decisions Viz, CRM, Excel, PowerPoint, Word, Web
Search
Analyst Deliver insights to the business, typically through dashboards and
reports
Viz, Excel, SSDP, Web Search
Scientist Deliver insights to the business, typically through models and algorithms R, Python, SAS, SSDP
Developer Build applications which leverage corporate data Python, Java, JS, SQL, REST
Engineer Deliver and manage data pipelines ETL, SQL
Curator Ensure consumers have the data they need, in the form they need it MDM, Catalog
Steward Create policies and drive governance MDM, Catalog, Governance
Source Owner Define and manage purpose, processes (data creation, consumption) &
users (i.e., access) of the data source
EDW, SQL, ERWin, LDAP, SAP
Consumers
Preparers
Suppliers
Co n fid e n tia l 28
Org a n iza tion - Stru ctu re
Sh a re d Se rvice s Mod e l
Full-service development of data applications, in
collaboration with business
Advantages
● Centralized technical knowledge
● Centralized resourcing - one-stop shop
● Accretive experience
Disadvantages
● Bandwidth contention - how to prioritize
competing projects?
Ad visory Mod e l
Bootstraps projects with best of breed tools and
approach, but does not complete them
Advantages
● Centralized technical knowledge
● Minimal resourcing - experts, not implementers
● Flexibility - options to deviate from standard
tools
Disadvantages
● Resource burden in on each project / department
- both in development and ongoing maintenance
● Limited feedback - does the advice get better
after each project?
Appropriate model will fluctuate with scale of DataOps project work
Co n fid e n tia l 29
P roce ss - Th e W ron g W a y
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
● Labor-intensive
● Monolithic
● IT driven
Delivery
Time
Remaining
Work
$
?
Modeling
Rules
Testing
?
$
!
Citizens
Analysts
Data
Scientists
Developers
External Tabular Data
Internal Tabular Data
Co n fid e n tia l 30
P roce ss - Th e Rig h t W a y
Sou rce s Con su m e rs
Te ch n olog y, Org a n iza tion , P roce ss
● Automated
● Incremental
● Collaborative
Time
Remaining
Work
$
$
$
$
!
?
?
?
?
Citizens
Analysts
Data
Scientists
Developers
Internal Tabular Data
External Tabular Data
Co n fid e n tia l 31
Co n fid e n tia l 31
Ca se Stu d ie s
Co n fid e n tia l 32
Ca se Stu d y - Fin a n cia l In stitu tion
A m a jor fin a n cia l in stitu tion b u ilt a d a ta la b th a t w orks to in ve n t solu tion s th a t h a rn e ss
d a ta a n d a d va n ce d a n a lytics.
Goa ls
● Better understanding of 60 million customers
● Create simpler, more intuitive and intelligent products and customer experiences
● Help businesses do more business with each other using the bank’s cards
Holistic a p p roa ch
● Mingles human-centered design, full-stack engineering and data science
● Project manager oversees entire end-to-end data pipeline
● Interdisciplinary team is made up of DevOps and data scientists
Da ta u n ifica tion a t th e ce n te r of th e p ip e lin e
● Raw data is cleaned and deduplicated, then fed into Tamr for classification and training
● Bulk matching allows bank to determine whether a supplier/vendor from its Master Data Source
overlaps with the list collected from the customer.
● Subject matter experts act as curators to improve accuracy of ML models
Co n fid e n tia l 33
Ca se Stu d y - P h a rm a ce u tica l Com p a n y
A m a jor p h a rm a ce u tica l com p a n y re a lize d th a t its R&D e n viron m e n t w a sn ’t u p to p a r,
w h ich w a s p re ve n tin g th e m from d e ve lop in g n e w d ru g s w ith th e le ve l of in n ova tion
a n d sp e e d re q u ire d .
Goa ls
● Make it easier to access and use data for exploratory analysis and decision-making about new
medicines
Ch a lle n g e s
● Conducted a survey about data across the organization
○ Result: very difficult to work with data outside of a departmental silo
○ Identified top 10 use cases for integrating diverse data
Re su lts a n d Be n e fits
● Turned to machine learning since a traditional MDM approach would have taken too long
● Use cases have expanded from 10 to 250
● Reduction in time to get answers to ad hoc questions
Co n fid e n tia l 34
In P a rtin g - W h a t NOT to d o
● Avoid boil the ocean/”waterfall” (projects measured in years/quarters)
○ Build rational long term infra while delivering real analytic value along the way
● Single “Platform”: Don’t overestimate what single piece of software can do
○ Focus on thoughtfully designed ecosystem of loosely coupled best of breed tools
● Single Vendor: Don’t overestimate what single vendor can do
○ Align vendors with APIs and expectations that they MUST work together
● Don’t Underestimate effort required to make FOSS work
○ Just because Google does it doesn’t mean you can do it
● Don’t underestimate human/behavioral challenges with data
○ Most often the reason that projects fail/stall are human/behavioral
Best practices in data ops

More Related Content

Similar to Best practices in data ops

Making Data Governance Work - Think Big but Start Small
Making Data Governance Work - Think Big but Start SmallMaking Data Governance Work - Think Big but Start Small
Making Data Governance Work - Think Big but Start SmallEarley Information Science
 
SPS Omaha Create a Solid Communication Plan For Microsoft O365 Implementation...
SPS Omaha Create a Solid Communication Plan For Microsoft O365 Implementation...SPS Omaha Create a Solid Communication Plan For Microsoft O365 Implementation...
SPS Omaha Create a Solid Communication Plan For Microsoft O365 Implementation...Sasja Beerendonk
 
Measuring the Business Impact of Learning: Lagging indicators to predictive a...
Measuring the Business Impact of Learning: Lagging indicators to predictive a...Measuring the Business Impact of Learning: Lagging indicators to predictive a...
Measuring the Business Impact of Learning: Lagging indicators to predictive a...Watershed
 
Bridging Legacy Systems and Cloud Data Platforms to Unlock Valuable Enterpris...
Bridging Legacy Systems and Cloud Data Platforms to Unlock Valuable Enterpris...Bridging Legacy Systems and Cloud Data Platforms to Unlock Valuable Enterpris...
Bridging Legacy Systems and Cloud Data Platforms to Unlock Valuable Enterpris...Precisely
 
Lifecycle Integration with the University of Kentucky
Lifecycle Integration with the University of KentuckyLifecycle Integration with the University of Kentucky
Lifecycle Integration with the University of KentuckySalesforce.org
 
Oracle's Modern HR in the Cloud
Oracle's Modern HR in the CloudOracle's Modern HR in the Cloud
Oracle's Modern HR in the CloudJohnHansenHCM
 
Creating your Center of Excellence (CoE) for data driven use cases
Creating your Center of Excellence (CoE) for data driven use casesCreating your Center of Excellence (CoE) for data driven use cases
Creating your Center of Excellence (CoE) for data driven use casesFrank Vullers
 
Cloud Migration Checklist: A Better Way to Set Priorities, Assess Your Progre...
Cloud Migration Checklist: A Better Way to Set Priorities, Assess Your Progre...Cloud Migration Checklist: A Better Way to Set Priorities, Assess Your Progre...
Cloud Migration Checklist: A Better Way to Set Priorities, Assess Your Progre...Enterprise Management Associates
 
4 reasons you need to find budget for work management software
4 reasons you need to find budget for work management software4 reasons you need to find budget for work management software
4 reasons you need to find budget for work management softwareWorkfront
 
Deltek Clarity A&E Industry Study - Houston, TX
Deltek Clarity A&E Industry Study - Houston, TXDeltek Clarity A&E Industry Study - Houston, TX
Deltek Clarity A&E Industry Study - Houston, TXBCS ProSoft
 
How to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapHow to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapCCG
 
Managing Data Sprawl with Data Catalogs for Self-Service
Managing Data Sprawl with Data Catalogs for Self-ServiceManaging Data Sprawl with Data Catalogs for Self-Service
Managing Data Sprawl with Data Catalogs for Self-ServiceEckerson Group
 
Optimizing Application Performance Through Real-time Change Awareness
Optimizing Application Performance Through Real-time Change AwarenessOptimizing Application Performance Through Real-time Change Awareness
Optimizing Application Performance Through Real-time Change AwarenessEnterprise Management Associates
 
Understanding Business Data Analytics
Understanding Business Data AnalyticsUnderstanding Business Data Analytics
Understanding Business Data AnalyticsAlejandro Jaramillo
 
Licensed to Analyze? Strata Data NY 2019 IADSS Session - Usama Fayyad, Hamit ...
Licensed to Analyze? Strata Data NY 2019 IADSS Session - Usama Fayyad, Hamit ...Licensed to Analyze? Strata Data NY 2019 IADSS Session - Usama Fayyad, Hamit ...
Licensed to Analyze? Strata Data NY 2019 IADSS Session - Usama Fayyad, Hamit ...IADSS
 
8 Steps to Sustainability Reporting
8 Steps to Sustainability Reporting8 Steps to Sustainability Reporting
8 Steps to Sustainability ReportingJackson Seng
 
Metadata Mastery: A Big Step for BI Modernization
Metadata Mastery: A Big Step for BI ModernizationMetadata Mastery: A Big Step for BI Modernization
Metadata Mastery: A Big Step for BI ModernizationEric Kavanagh
 
KM SHOWCASE 2019 - The Cost of Doing Nothing
KM SHOWCASE 2019 - The Cost of Doing NothingKM SHOWCASE 2019 - The Cost of Doing Nothing
KM SHOWCASE 2019 - The Cost of Doing NothingKM Institute
 
RWDG Slides: Build an Effective Data Governance Framework
RWDG Slides: Build an Effective Data Governance FrameworkRWDG Slides: Build an Effective Data Governance Framework
RWDG Slides: Build an Effective Data Governance FrameworkDATAVERSITY
 

Similar to Best practices in data ops (20)

Making Data Governance Work - Think Big but Start Small
Making Data Governance Work - Think Big but Start SmallMaking Data Governance Work - Think Big but Start Small
Making Data Governance Work - Think Big but Start Small
 
SPS Omaha Create a Solid Communication Plan For Microsoft O365 Implementation...
SPS Omaha Create a Solid Communication Plan For Microsoft O365 Implementation...SPS Omaha Create a Solid Communication Plan For Microsoft O365 Implementation...
SPS Omaha Create a Solid Communication Plan For Microsoft O365 Implementation...
 
Measuring the Business Impact of Learning: Lagging indicators to predictive a...
Measuring the Business Impact of Learning: Lagging indicators to predictive a...Measuring the Business Impact of Learning: Lagging indicators to predictive a...
Measuring the Business Impact of Learning: Lagging indicators to predictive a...
 
Bridging Legacy Systems and Cloud Data Platforms to Unlock Valuable Enterpris...
Bridging Legacy Systems and Cloud Data Platforms to Unlock Valuable Enterpris...Bridging Legacy Systems and Cloud Data Platforms to Unlock Valuable Enterpris...
Bridging Legacy Systems and Cloud Data Platforms to Unlock Valuable Enterpris...
 
Lifecycle Integration with the University of Kentucky
Lifecycle Integration with the University of KentuckyLifecycle Integration with the University of Kentucky
Lifecycle Integration with the University of Kentucky
 
Oracle's Modern HR in the Cloud
Oracle's Modern HR in the CloudOracle's Modern HR in the Cloud
Oracle's Modern HR in the Cloud
 
Creating your Center of Excellence (CoE) for data driven use cases
Creating your Center of Excellence (CoE) for data driven use casesCreating your Center of Excellence (CoE) for data driven use cases
Creating your Center of Excellence (CoE) for data driven use cases
 
Cloud Migration Checklist: A Better Way to Set Priorities, Assess Your Progre...
Cloud Migration Checklist: A Better Way to Set Priorities, Assess Your Progre...Cloud Migration Checklist: A Better Way to Set Priorities, Assess Your Progre...
Cloud Migration Checklist: A Better Way to Set Priorities, Assess Your Progre...
 
4 reasons you need to find budget for work management software
4 reasons you need to find budget for work management software4 reasons you need to find budget for work management software
4 reasons you need to find budget for work management software
 
Deltek Clarity A&E Industry Study - Houston, TX
Deltek Clarity A&E Industry Study - Houston, TXDeltek Clarity A&E Industry Study - Houston, TX
Deltek Clarity A&E Industry Study - Houston, TX
 
How to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapHow to Create a Data Analytics Roadmap
How to Create a Data Analytics Roadmap
 
Managing Data Sprawl with Data Catalogs for Self-Service
Managing Data Sprawl with Data Catalogs for Self-ServiceManaging Data Sprawl with Data Catalogs for Self-Service
Managing Data Sprawl with Data Catalogs for Self-Service
 
Optimizing Application Performance Through Real-time Change Awareness
Optimizing Application Performance Through Real-time Change AwarenessOptimizing Application Performance Through Real-time Change Awareness
Optimizing Application Performance Through Real-time Change Awareness
 
Understanding Business Data Analytics
Understanding Business Data AnalyticsUnderstanding Business Data Analytics
Understanding Business Data Analytics
 
Licensed to Analyze? Strata Data NY 2019 IADSS Session - Usama Fayyad, Hamit ...
Licensed to Analyze? Strata Data NY 2019 IADSS Session - Usama Fayyad, Hamit ...Licensed to Analyze? Strata Data NY 2019 IADSS Session - Usama Fayyad, Hamit ...
Licensed to Analyze? Strata Data NY 2019 IADSS Session - Usama Fayyad, Hamit ...
 
8 Steps to Sustainability Reporting
8 Steps to Sustainability Reporting8 Steps to Sustainability Reporting
8 Steps to Sustainability Reporting
 
Metadata Mastery: A Big Step for BI Modernization
Metadata Mastery: A Big Step for BI ModernizationMetadata Mastery: A Big Step for BI Modernization
Metadata Mastery: A Big Step for BI Modernization
 
KM SHOWCASE 2019 - The Cost of Doing Nothing
KM SHOWCASE 2019 - The Cost of Doing NothingKM SHOWCASE 2019 - The Cost of Doing Nothing
KM SHOWCASE 2019 - The Cost of Doing Nothing
 
Why Core IT Automation Matters to Your Customers
Why Core IT Automation Matters to Your Customers Why Core IT Automation Matters to Your Customers
Why Core IT Automation Matters to Your Customers
 
RWDG Slides: Build an Effective Data Governance Framework
RWDG Slides: Build an Effective Data Governance FrameworkRWDG Slides: Build an Effective Data Governance Framework
RWDG Slides: Build an Effective Data Governance Framework
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Best practices in data ops

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
  • 3. Eric Kavanagh CEO, Inside Analysis Mark Marinelli Head of Product, Tamr Wayne Eckerson President, Eckerson Group
  • 4. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Best Practices in DataOps How to Create Agile, Automated Data Pipelines Wayne W. Eckerson May 8, 2019
  • 5. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com 1. Your data team is flooded with minor request tickets and is burning out. 2. Business users don’t trust the data because it contains too many errors. 3. Source system changes keep breaking your ETL jobs and data pipelines. 4. Business users don’t understand why it takes so long to get data. 5. You have difficulty meeting service level agreements (SLAs). 6. Data analysts write the same jobs and reports with minor variations. 7. Data scientists wait for months for data and computing resources 8. Your company can’t discern the true cost of migrating to the cloud 9. Your data environment is too chaotic to implement predictive analytics 10. Your self-service initiative has spawned hundreds of data silos. Bonus: Your data lake is more of a data swamp. Bonus: It’s takes months to deploy a single predictive model. 10 Symptoms You Need DataOps
  • 6. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com What is DataOps? Lean TQM Agile Dev/Ops • Scrum, Kanban • Business engagement • Self-organizing teams • Retrospectives • Automation • Orchestration • Efficiency • Simplicity • Team-based development • Version control • Continuous integration/ delivery • Test-driven development • Performance management • Performance metrics • Continuous monitoring • Benchmarking DataOps “A set of practices, processes, and technologies for building, operationalizing, automating, and managing data pipelines from source to consumption.” DataOps = Data Operations
  • 7. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com DataOps History DataOps applies rigor of software engineering to the development and execution of data pipelines. “Cowboy Coders” Team-based Development DevOps-based Development 1960s 1970s 1980s 1990s 2000s 2010s 2020s First DevOps event (2009) Manifesto for Agile Software Development published (2001) DevOps DataOps KEY: DataOps Manifesto published (2017) First DataOps Event (2019)
  • 8. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Primary Use Cases Big Data Data Science Self Service Data Warehousing Standardize and reuse core data pipeline components: ingest, transform, clean, etc. Create data science sandboxes on demand; deploy models automatically; monitor data drift. Centralize logic and permissions to facilitate data access and analysis while eliminating data silos Speed development by assigning agile teams to business groups to build end- to-end solutions Agile but ungoverned Governed but not agile Reuse and Collaboration Self Service and Automation Governance and Infrastructure Speed and Prioritization Biggest Needs
  • 9. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Adoption Yes 27% Somewhat 30% No 43% DOES YOUR ORGANIZATION HAVE A DATAOPS INITIATIVE? Based on 175 respondents from an Eckerson Group survey conducted in April, 2019. 32% 29% 10% 9% 7% 6% 5% 2% 1% 0% I T OR BI DI RE C T OR OR MA NA GE R I T OR BI A RC H I T E C T ,… C ONS ULT A NT BUS I NE S S MA NA GE R - A NA LY T I C S DA T A A NA LY S T OR S C I E NT I S T BUS I NE S S E X E C UT I VE OR… DA T A E NGI NE E R A C A DE MI C VE NDOR DA T A OP S E NGI NE E R RESPONDENT ROLES 18% 15% 11% 29% 26% VE RY S MA LL < 100… S MA LL <500… ME DI UM <1, 000 E MP … LA RGE <10, 000 VE RY LA RGE >… COMPANY SIZE
  • 10. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Benefits Faster cycle time Fewer data defects More scalability, reliability Lower costs More innovation Happier customers Continuous integration/delivery, reuse, automation Test-driven development and execution Team-based development, continuous monitoring Higher development capacity, fewer errors Focus efforts on value-add solutions and technologies Get more for less with greater trust and alignment
  • 11. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Benefits from Survey 60% 55% 50% 50% 48% 47% 47% 42% F A S T E R CY CLE T I ME S HA P P I E R B US I NE S S US E RS DE LI V E R NE W A P P LI CA T I ONS MORE QUI CK LY F E W E R DE F E CT S A ND E RRORS I NGE S T NE W DA T A S OURCE S MORE RA P I DLY F A S T E R CHA NGE RE QUE S T S I NCRE A S E D DE V E LOP ME NT CA P A CI T Y I MP ROV E D DA T A GOV E RNA NCE BENEFITS OF DATAOPS
  • 12. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Challenges 55% 53% 50% 50% 47% 42% 35% 34% 26% 23% E S T A B LI S HING F ORMA L P ROCE S S E S ORCHE S T RA T I NG CODE A ND DA T A A CROS S T OOLS S T A F F CA P A CI T Y MONI T ORI NG T HE E ND-TO-E ND E NV I RONME NT B UI LDI NG RI GOROUS T E S T S UP F RONT LA CK OF A DE QUA T E A UT OMA T I ON T OOLS GE T T I NG B US I NE S S US E RS T O B UY I NT O T HE P ROCE S S A DOP T I NG A GI LE ME T HODS A ND T E A MS DA T A I S T OO HA RD T O F I ND GE TTI NG T E CHNI CA L US E RS T O B UY I N T O T HE P ROCE S S DATAOPS CHALLENGES
  • 13. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Components and Tools 58% 54% 53% 50% 50% 46% 46% 41% 38% 32% 28% A GI LE DE V E LOP ME NT CONT I NUOUS DE LI V E RY COLLA B ORA T I ON A ND RE US E CONT I NUOUS I NT E GRA T I ON CODE RE P OS I T ORY DA T A P I P E LI NE ORCHE S T RA T I ON P E RF ORMA NCE A ND A P P LI CA T I ON MONI T ORI NG CONT I NUOUS T E S T I NG W ORKF LOW MA NA GE ME NT CHA NGE MA NA GE ME NT RE QUE S T CONT A I NE RS A ND ORCHE S T RA T I ON T OOLS RATE THE IMPORTANCE OF EACH DATAOPS COMPONENT? High
  • 14. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Use Cases 66% 60% 56% 52% 39% 29% 34% 27% DA T A W A RE H OUS E S A ND MA RT S RE P ORT I NG A ND DA S H BOA RDI NG S E LF - S E RVI C E A NA LY S I S DA T A S C I E NC E A ND MA C H I NE LE A RNI NG DA T A LA K E OLA P C UBE S F OR RE P ORT I NG A ND A NA LY S I S C US T OME R- F A CI NG A P P LI C A T I ONS A UDI T , C OMP LI A NC E , S E C URI T Y DATAOPS USE CASES
  • 15. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Best Practices Form a data department (with a CDO) Map and assess your data environment Educate your team about DataOps Create cross-functional dev teams Align the teams with business priorities Continuously review and refine processes The “Soft Stuff” If you don’t have one already Add a CDO for executive clout Map data flows; assess waste, inefficiencies, manual processes, error sources, dev capacity. Expect resistance: ”Data is different!” “Don’t slow us down!” Stick with it; “You can’t drive fast w/o brakes.” Self-organizing, cross-trained; collaborative, agile teams that build end-to-end solutions Align agile themes, initiatives, epics, and stories with business goals; get cross-functional priorities It’s a journey; benchmark performance and continuously improve cycle times, capacity, reuse, and other core objectives. Pull ”data people” out of IT; unite data engineers, data scientists, and SW engineers.
  • 16. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Best Practices (cont) Start small and build incrementally Build for reuse Segregate duties and environments Test and monitor everything Use DevOps and DataOps tools Create a self-service infrastructure Build for the enterprise The “Hard Stuff” Standardize ingest, transforms, configurations, code, data sets; use repositories & containers. Use tools to migrate code from dev to test, to production environments and segregate duties Build tests before and after coding; use tests to monitor and automate data pipelines. Repositories for data, code, configurations; tools for agile collaboration, CI/CD, testing, data catalog, orchestration, data glossary, unification. Centralize logic; apply permissions for data access and functionality; automate report and model deployment; serverless, Kubernetes, Plan for security, governance, auditability, scalability, reliability, portability, and continuous monitoring. Insist on business representation on the dev team; get cross-functional priorities monthly
  • 17. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Summary DataOps puts your data on a solid foundation Speeds cycle time, improves quality, increases capacity, reduces cost Lets your data team focus on value-add Such as predictive analytics, streaming data, cloud computing Increasing customer satisfaction and business value DataOps is light— out, automated data operations.
  • 18. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Questions? I’m listening!
  • 19. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Wayne Eckerson • 25+ year thought leader in data and analytics • Sought-after speaker and consultant • President, Eckerson Group • Former director of research at TDWI • Author of hundreds of articles and reports Performance Management BI/Analytics
  • 20. © Eckerson Group 2019 Twitter: @weckerson www.eckerson.com Get More Value from Data and Analytics
  • 21. Co n fid e n tia l 21 Co n fid e n tia l 21 Da ta Op s Fra m e w ork
  • 22. Co n fid e n tia l 22 Da ta Op s Fra m e w ork Com p on e n ts Te ch n olog y ● Architecture - selection of tools which comprise data supply chain ● Infrastructure - selection of platform to support architecture Org a n iza tion ● Roles - division of labor across mixed-skill teams ● Structure - working model for projects across technical and business teams P roce ss ● Agile - incremental delivery model
  • 23. Co n fid e n tia l 23 Sou rce s Con su m e rs Te ch n olog y, Org a n iza tion , P roce ss Movement ETL/ELT Storage & Compute Feedback Catalog/ Registry Publish Citizens Analysts Data Scientists Developers Mastering/Quality Governance Te ch n olog y - Arch ite ctu re Com p on e n ts Internal Tabular Data External Tabular Data
  • 24. Co n fid e n tia l 24 Te ch n olog y - Arch ite ctu ra l P rin cip le s Sou rce s Con su m e rs Te ch n olog y, Org a n iza tion , P roce ss Citizens Analysts Data Scientists Developers ● Cloud First ● Continuous (assume data will change) ● Highly Automated - automate whenever possible ● Open/Best of Breed (not one platform/vendor) ● Bi-Directional (Feedback) ● Collaborative (Humans at the Core) ● Service Oriented (clear endpoints for data) ● Loosely Coupled (Restful Interfaces Table(s) In/Out) ● Both aggregated AND federated storage ● Both batch AND Streaming ● Lineage/Provenance is essential ● Scale Out/Distributed Internal Tabular Data External Tabular Data
  • 25. Co n fid e n tia l 25 In fra stru ctu re - Ke y Com p on e n ts Management Compute Search Storage Infrastructure Sou rce s Con su m e rs Te ch n olog y, Org a n iza tion , P roce ss Citizens Analysts Data Scientists Developers Internal Tabular Data External Tabular Data
  • 26. Co n fid e n tia l 26 Internal Tabular Data External Tabular Data Data Suppliers Data Consumers CIO Source Owner DBA IT Professional CDO Data Engineer Curator Steward Business Owners and Other CxOs Org a n iza tion - Role s Data Preparers Sou rce s Con su m e rs Te ch n olog y, Org a n iza tion , P roce ss Citizens Analysts Data Scientists Developers
  • 27. Co n fid e n tia l 27 Org a n iza tion - Role s Role Goals Tools Citizen Use data to make business decisions Viz, CRM, Excel, PowerPoint, Word, Web Search Analyst Deliver insights to the business, typically through dashboards and reports Viz, Excel, SSDP, Web Search Scientist Deliver insights to the business, typically through models and algorithms R, Python, SAS, SSDP Developer Build applications which leverage corporate data Python, Java, JS, SQL, REST Engineer Deliver and manage data pipelines ETL, SQL Curator Ensure consumers have the data they need, in the form they need it MDM, Catalog Steward Create policies and drive governance MDM, Catalog, Governance Source Owner Define and manage purpose, processes (data creation, consumption) & users (i.e., access) of the data source EDW, SQL, ERWin, LDAP, SAP Consumers Preparers Suppliers
  • 28. Co n fid e n tia l 28 Org a n iza tion - Stru ctu re Sh a re d Se rvice s Mod e l Full-service development of data applications, in collaboration with business Advantages ● Centralized technical knowledge ● Centralized resourcing - one-stop shop ● Accretive experience Disadvantages ● Bandwidth contention - how to prioritize competing projects? Ad visory Mod e l Bootstraps projects with best of breed tools and approach, but does not complete them Advantages ● Centralized technical knowledge ● Minimal resourcing - experts, not implementers ● Flexibility - options to deviate from standard tools Disadvantages ● Resource burden in on each project / department - both in development and ongoing maintenance ● Limited feedback - does the advice get better after each project? Appropriate model will fluctuate with scale of DataOps project work
  • 29. Co n fid e n tia l 29 P roce ss - Th e W ron g W a y Sou rce s Con su m e rs Te ch n olog y, Org a n iza tion , P roce ss ● Labor-intensive ● Monolithic ● IT driven Delivery Time Remaining Work $ ? Modeling Rules Testing ? $ ! Citizens Analysts Data Scientists Developers External Tabular Data Internal Tabular Data
  • 30. Co n fid e n tia l 30 P roce ss - Th e Rig h t W a y Sou rce s Con su m e rs Te ch n olog y, Org a n iza tion , P roce ss ● Automated ● Incremental ● Collaborative Time Remaining Work $ $ $ $ ! ? ? ? ? Citizens Analysts Data Scientists Developers Internal Tabular Data External Tabular Data
  • 31. Co n fid e n tia l 31 Co n fid e n tia l 31 Ca se Stu d ie s
  • 32. Co n fid e n tia l 32 Ca se Stu d y - Fin a n cia l In stitu tion A m a jor fin a n cia l in stitu tion b u ilt a d a ta la b th a t w orks to in ve n t solu tion s th a t h a rn e ss d a ta a n d a d va n ce d a n a lytics. Goa ls ● Better understanding of 60 million customers ● Create simpler, more intuitive and intelligent products and customer experiences ● Help businesses do more business with each other using the bank’s cards Holistic a p p roa ch ● Mingles human-centered design, full-stack engineering and data science ● Project manager oversees entire end-to-end data pipeline ● Interdisciplinary team is made up of DevOps and data scientists Da ta u n ifica tion a t th e ce n te r of th e p ip e lin e ● Raw data is cleaned and deduplicated, then fed into Tamr for classification and training ● Bulk matching allows bank to determine whether a supplier/vendor from its Master Data Source overlaps with the list collected from the customer. ● Subject matter experts act as curators to improve accuracy of ML models
  • 33. Co n fid e n tia l 33 Ca se Stu d y - P h a rm a ce u tica l Com p a n y A m a jor p h a rm a ce u tica l com p a n y re a lize d th a t its R&D e n viron m e n t w a sn ’t u p to p a r, w h ich w a s p re ve n tin g th e m from d e ve lop in g n e w d ru g s w ith th e le ve l of in n ova tion a n d sp e e d re q u ire d . Goa ls ● Make it easier to access and use data for exploratory analysis and decision-making about new medicines Ch a lle n g e s ● Conducted a survey about data across the organization ○ Result: very difficult to work with data outside of a departmental silo ○ Identified top 10 use cases for integrating diverse data Re su lts a n d Be n e fits ● Turned to machine learning since a traditional MDM approach would have taken too long ● Use cases have expanded from 10 to 250 ● Reduction in time to get answers to ad hoc questions
  • 34. Co n fid e n tia l 34 In P a rtin g - W h a t NOT to d o ● Avoid boil the ocean/”waterfall” (projects measured in years/quarters) ○ Build rational long term infra while delivering real analytic value along the way ● Single “Platform”: Don’t overestimate what single piece of software can do ○ Focus on thoughtfully designed ecosystem of loosely coupled best of breed tools ● Single Vendor: Don’t overestimate what single vendor can do ○ Align vendors with APIs and expectations that they MUST work together ● Don’t Underestimate effort required to make FOSS work ○ Just because Google does it doesn’t mean you can do it ● Don’t underestimate human/behavioral challenges with data ○ Most often the reason that projects fail/stall are human/behavioral