SlideShare a Scribd company logo
1 of 38
DataOps @ Scale:
A Modern Framework for Data
Management in the Public Sector
Mark Marinelli, Head of Product, Tamr
May, 2020
Confidential – Tamr, Inc.
Agenda
2
● How DataOps Began
● A DataOps Framework
● DataOps in the Real World
○ DHS
○ Air Force
● Getting Started
● Q&A
Confidential – Tamr, Inc.
About the Speakers
3
Mark Marinelli
Head of Product at Tamr
Katie Everett
Marketing Manager at Tamr (Moderator)
Confidential – Tamr, Inc.
How DataOps Got Started &
What Actually is ‘DataOps’?
4
Confidential – Tamr, Inc.
Modern internet companies have data advantages
5
Unified Dataportal
Greenfield Infrastructure & High End Talent Pool
Confidential – Tamr, Inc.
Traditional orgs have significant “legacy drag coefficient”
Problem : Thousands of systems
generating data every day that
were built over decades to support
business processes.
6
Result: “Random Data
Salad”
Data debt from constant
change/entropy.
Data Silos created due to:
● Security concerns
● Organizational mismatch
Consequences:
1. Too much time spent on data prep vs.
analysis / action
2. High failure rate of BI / analytics projects
3. Game changing initiatives deemed
‘impossible’ and never start
Restructuring
Leadershi
p
Changes
Politics
Dynamic Schema
DBs - Mongo et al
“Data
Hoarding”
Legacy
Burden
M&A
Confidential – Tamr, Inc. 7
Why now? 7 years ago: we need data scientists!
Confidential – Tamr, Inc. 8
Today: we have data scientists! (and want to do cool AI stuff)
Confidential – Tamr, Inc.
Just not yet in the government… but it’s growing
● Within the last 6 months, the U.S. agencies have begun defining a “Data
Science Occupational Series”.
● This means adding the term “(Data Scientist)” at the end of a job title to
increase the odds of finding a candidate that understands data.
9
Confidential – Tamr, Inc.
“How do we take the data that we have — which is ubiquitous and it’s incredible across the
federal government — understand it, be able to leverage it at every step in the chain.”
- Deputy Federal CIO, Margie Graves
10
Confidential – Tamr, Inc.
Human/behavioral challenges don’t help
11
● Afraid to share data - Due to organizational
policies and security levels
● Hoarding data - A method of organizational
control or job preservation
● Obscuring data complexity - Failure to
embrace the complexity, diversity, and
idiosyncrasy of data generated in a large
organization
● Limiting access to a small number of users
- A method of control or as a reflection of
insecurity of data quality
Human Behavior Challenges
Confidential – Tamr, Inc.
This is a solved problem
12
● Item four
● Item five
● Item six
● Item sevenTop-down:
Architects drive
the spec
Monolithic: Single
application
A Priori Modeling:
Front-loaded view of
all components
Quality Assurance:
Manual QA
Waterfall Approach
12
Traditional SDLC:
Dev/test/prod →
major/minor release
Bottom-Up: Users
drive the spec
Distributed:
Loosely coupled,
scalable
Learn from Use:
Emergent feature set
Continuous
Integration:
Automated testing
Agile Development
Modern DevOps:
Continuous delivery
Just as DevOps drove rapid delivery of high-quality, scalable software applications,
DataOps is the path forward for data applications.
Confidential – Tamr, Inc.
What is DataOps? = Modern data engineering practice
13
DataOps is an automated, process oriented
methodology, used by analytic and data teams to
improve the quality and reduce the cycle time of data
analytics.
Confidential – Tamr, Inc.
A DataOps Framework:
Process, Technology, Organization
14
Confidential – Tamr, Inc.
DataOps framework components
15
Agile - Incremental delivery
method
Architecture - Tools which
comprise data supply chain
Infrastructure - Platform to
support architecture
Roles - division of labor across
mixed-skill teams
Structure - working model for
projects across technical and
business teams
OrganizationProcess Technology
Confidential – Tamr, Inc.
Process - The Wrong Way
16
Sources ConsumersProcess, Technology, Organization
● Labor-intensive
● Monolithic
● IT driven
Delivery
Time
RemainingWork
$
?
Modeling
Rules
Testing
?
$
!
Business
Users
Analysts
Data
Scientists
Developers
External Tabular Data
Internal Tabular Data
Confidential – Tamr, Inc.
Process - The Right Way
17
Sources ConsumersProcess, Technology, Organization
● Automated
● Incremental
● Collaborative
Time
RemainingWork
$
$
$
$
?
?
?
?
Analysts
Data
Scientists
Developers
Internal Tabular Data
External Tabular Data
!
Business
Users
Confidential – Tamr, Inc.
Why Use Human-Guided Machine Learning to
Master Data
18
Before: Data Scientists spent months and
100% of energy preparing data.
Today: ML can do 80% of
data mastering lift...
…. Enabling Data Scientists to put
final touches on the last 20%.
Confidential – Tamr, Inc.
The DataOps Component Architecture
19
Sources ConsumersTechnology, Organization, Process
Internal Tabular Data
External Tabular Data
Movement & Automation
Storage & Compute
Governance & Policy
Catalog &
Crawling
Publishing &
Versioning
Analysts
Data
Scientists
Developers
Mastering & Quality
Feedback & Usage
Business
Users
Confidential – Tamr, Inc.
Technology - Architectural Principles
20
Sources ConsumersProcess, Technology, Organization
Analysts
Data
Scientists
Developers
● Scale Out/Distributed
○ Cloud First
● Collaborative (Humans at the Core)
○ Highly Automated - automate whenever possible
○ Bi-Directional (Feedback)
● Open/Best of Breed (not one platform/vendor)
○ Service Oriented (clear endpoints for data)
○ Loosely Coupled (Restful Interfaces Table(s) In/Out)
● Continuous (assume data will change)
○ Both aggregated AND federated storage
○ Both batch AND Streaming
● Lineage/Provenance is essential
Internal Tabular Data
External Tabular Data
Business
Users
Confidential – Tamr, Inc.
Infrastructure - Key Components
21
Management
Compute
Search
Storage
Infrastructure
(Cloud & On-Prem)
Sources ConsumersProcess, Technology, Organization
Analysts
Data
Scientists
Developers
Internal Tabular Data
External Tabular Data
Business
Users
Confidential – Tamr, Inc.
Organization - Roles
22
Internal Tabular Data
External Tabular Data
Data
Suppliers
Data
Consumers
CIO
Source Owner
DBA
IT Professional
CDO
Data Engineer
Curator
Steward
Business Owners / CxOs
Data
Preparers
Sources ConsumersProcess, Technology, Organization
Analysts
Data
Scientists
Developers
Business
Users
Confidential – Tamr, Inc.
Organization - Roles
Role Goals Tools
Business
Users
Use data to make business decisions Viz, CRM, Excel, PowerPoint, Word, Web
Search
Analyst Deliver insights to the business, typically through dashboards and
reports
Viz, Excel, SSDP, Web Search
Data Scientist Deliver insights to the business, typically through models and
algorithms
R, Python, SAS, SSDP
Developer Build applications which leverage corporate data Python, Java, JS, SQL, REST
Data Engineer Deliver and manage data pipelines ETL, SQL
Curator Ensure consumers have the data they need, in the form they need it MDM, Catalog
Steward Create policies and drive governance MDM, Catalog, Governance
Source Owner Define and manage purpose, processes (data creation, consumption) &
users (i.e., access) of the data source
EDW, SQL, ERWin, LDAP, SAP
ConsumersPreparersSuppliers
Confidential – Tamr, Inc.
Organization - Structure
24Appropriate model will fluctuate with scale of DataOps project work
Shared Services Model
Full-service development of data applications, in
collaboration with business
Advantages
● Centralized technical knowledge
● Centralized resourcing - one-stop shop
● Accretive experience
Disadvantages
● Bandwidth contention - how to prioritize
competing projects?
Advisory Model
Bootstraps projects with best of breed tools and
approach, but does not complete them
Advantages
● Centralized technical knowledge
● Minimal resourcing - experts, not
implementers
● Flexibility - options to deviate from standard
tools
Disadvantages
● Resource burden in on each project /
department - both in development and
ongoing maintenance
● Limited feedback - does the advice get better
after each project?
Confidential – Tamr, Inc.
DataOps in the Real World
25
Confidential – Tamr, Inc. 26
Global Travel Assessment System
(GTAS)
U.S. Customs and Border Protection (CBP)
developed GTAS - an open source
application providing nation-states and
border security entities the capacity to
screen persons against a risk criteria for
threat prevention, public health or a
variety of other use cases.
DHS selected Tamr to provide improved
entity resolution capability due to its fast,
scalable human guided machine learning-
based approach
GTAS: CBP’s Passenger Data Screening and Analysis System
● Receive and store air traveler reservation and manifest data
● Perform real time risk assessment
● Manage risk criteria and watch lists
● View high risk travelers with associated flight details and
reservation information
● Query flight and travel history
Human trafficker profile match
Terrorist database match
Recent travel in pandemic
affected area
Risk Criteria
Passenger Manifest
Travel Reservation
Intergovernmental
INTERPOL Exchanges
Confidential – Tamr, Inc.
DHS Case Study con’t- 4 Phases to Deployment
27
Phase 1 -
Accuracy
Improve accuracy of
entity resolution with
biographical and
reservation data
4PHASESResults
● Leverage all data
● Label examples of
matching/distinct
● Measure precision
and recall
● Iterate & optimize
Phase 2 - Automation
Build data pipeline and
automate data
movement and system
controls
Phase 3 - Performance
Optimize performance
through a new low
latency match service
and reliable, robust
communications
Phase 4-
Interoperability
Ensure stability for
deployment into variety
of environments
● Prepare, ingest,
export data
● Triggering, data
exchanges and
error handling
● Security and
authorization
● Optimize models for
risk, timing, cultural
patterns, etc.ERP
Consolidation
● Create low latency
match capability
● Measure and iterate end-
to-end latency
● Create documentation
and communication
channels for
international support
● Installation, testing
and sustainment
● Advanced feature
offerings
2017 2017 2018 2019
Confidential – Tamr, Inc. 28
DHS Case Study
“When we were looking for
companies, Tamr fit our bill
perfectly. They were interested in
the mission, they understood what
we were trying to do and why it
was important to international
security, and they demonstrated
the capacity to execute at a
commercial level.”
Ari Schuler, Advisor
Office of the Commissioner
U.S. Customs and Border Protection
Recent Tamr recognition by DHS
● Science & Technology-Funded 2019
Performer Award:
Crossing the Valley of Death
■ This honor is awarded to an effort that
has a technical transition success story.
● DHS Silicon Valley Innovation Program:
Snapshot Article
■ The article highlights the transition of
Tamr technology to CBP, to be release
January 2019 via Meltwater and
GovDelivery
Confidential – Tamr, Inc.
Case Study: US Air Force
29
Semi Automated Aircraft Wing Flutter (vibration) Analysis
Technical Outcome
Business Outcome
Technical Challenge
Business Challenge
● Use ML to understand a large corpus (30
yrs worth) of past testing, simulations,
and analyses
● Automate large portions of the process
to predict aircraft "flutter”
● Users quickly interrogate decades worth
of technical data via rich metadata
● Reduce engineer process time
dramatically by identifying relevant
antecedents and technical predictions
● Deliver on big data initiative, enable
end users easy search of historic data
● Present SMEs (PhD engineers) with
short list of relevant antecedents and
flutter predictions
● Tamr tagged 35K files with 645K
descriptive labels in 22 tag types
(aircraft, stores, author, etc.)
● Automatically create recommendation
based on a machine learning model
built on the historical data
Confidential – Tamr, Inc.
Accelerating Subject Matter Expert Recommendations
30
Machine learning models
for each discipline
Discipline-specific
recommendations
Metadata extraction
Relevant documents
INPUT AUTOMATED ANALYSIS AUTOMATED OUTPUTS PRODUCTIVITY TOOLS
Document browsing powered by
clean, consistent metadata
New
Config.
Request
Confidential – Tamr, Inc.
Getting Started With DataOps
31
Confidential – Tamr, Inc.
Getting started - Process
32
Agile is the key
● If not already there, choose a model that works (Scrum,
SAFe)
Inventory the set of available projects
● Score on availability of data vs. value of solving a
problem
Define high-value, data-rich project that will demand a
complex solution
● Forcing function to ensure end-to-end functionality will
be covered
Process
Confidential – Tamr, Inc.
Getting started - Technology
33
Identify path to a modern, modular service architecture
● Create blueprint for next generation data
management platform
● Revisit cloud migration strategy
Inventory current tool set
● TCO / skill requirements / etc.
● Determine which should be replaced, and when this is
viable
Decouple monolithic processes
● Wrap components in APIs, expose as services
Start building with new tech
● Choose subset of tools for proof of concepts to replace old
tech
Technology
Confidential – Tamr, Inc.
Getting started - Optimization
34
Inventory current team
● Identify existing key roles - data engineers and their
consumers
● Find best candidates for new roles - data curators and
data stewards
Create cross-functional team
● Data consumers - will depend upon project
● Data Engineer(s)
● Curator
● Steward
Choose your operating model
● Start with Shared Services for first project
Ensure executive alignment
● CDO or equivalent
Optimization
Confidential – Tamr, Inc.
What NOT to do
35
● Avoid boil the ocean/“waterfall” (projects measured in years/quarters)
○ Build rational long term infra while delivering real analytic value along the way
● Single “Platform”: Don’t overestimate what single piece of software can do
○ Focus on thoughtfully designed ecosystem of loosely coupled best of breed tools
● Single Vendor: Don’t overestimate what single vendor can do
○ Align vendors with APIs and expectations that they MUST work together
● Don’t Underestimate effort required to make FOSS work
○ Just because Google does it doesn’t mean you can do it
● Don’t underestimate human/behavioral challenges with data
○ Most often the reason that projects fail/stall are human/behavioral
Confidential – Tamr, Inc.
Key DataOps Principles
36
OrganizationProcess Technology
Agile - Quick wins +
incremental value delivery
Architecture - Loosely-coupled
best of breed components
which incorporate automation
+ human feedback
Infrastructure - Cloud-native,
scalable and elastic tooling
Roles - Specialization and
separation of duties
Structure - Centralized
expertise + knowledge capture
across projects
Confidential – Tamr, Inc. 37
New podcast series!
Featured guests include:
● Nick Sinai - Deputy CTO in the Obama Administration
● Eric Iverson - Former CIO at Sony
● And more data leaders...
Listen today on Spotify, Apple Podcasts and Google Podcasts!
https://www.tamr.com/datamasters/
Listen today on Spotify, Apple Podcasts and Google Podcasts!
https://www.tamr.com/datamasters/
New Podcast Series - DataMasters
Confidential – Tamr, Inc. 38
Questions?
Contact Michael Gormey with additional questions after the
webinar:
michael.gormey@tamr.com

More Related Content

What's hot

Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...Edge AI and Vision Alliance
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldDATAVERSITY
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoDataKitchen
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesDATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
How to Implement Data Governance Best Practice
How to Implement Data Governance Best PracticeHow to Implement Data Governance Best Practice
How to Implement Data Governance Best PracticeDATAVERSITY
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesEric Kavanagh
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Adrien Blind
 
Data Profiling, Data Catalogs and Metadata Harmonisation
Data Profiling, Data Catalogs and Metadata HarmonisationData Profiling, Data Catalogs and Metadata Harmonisation
Data Profiling, Data Catalogs and Metadata HarmonisationAlan McSweeney
 

What's hot (20)

Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
 
Data ops in practice
Data ops in practiceData ops in practice
Data ops in practice
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Data Strategy
Data StrategyData Strategy
Data Strategy
 
How to Implement Data Governance Best Practice
How to Implement Data Governance Best PracticeHow to Implement Data Governance Best Practice
How to Implement Data Governance Best Practice
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
Data Profiling, Data Catalogs and Metadata Harmonisation
Data Profiling, Data Catalogs and Metadata HarmonisationData Profiling, Data Catalogs and Metadata Harmonisation
Data Profiling, Data Catalogs and Metadata Harmonisation
 

Similar to DataOps @ Scale: A Modern Framework for Data Management in the Public Sector

Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperativeTrillium Software
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersRuhollah Farchtchi
 
Optimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckOptimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckTamrMarketing
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfcedrinemadera
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData Blueprint
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesDATAVERSITY
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...CompTIA
 
Best Practices to Navigating Data and Application Integration for the Enterpr...
Best Practices to Navigating Data and Application Integration for the Enterpr...Best Practices to Navigating Data and Application Integration for the Enterpr...
Best Practices to Navigating Data and Application Integration for the Enterpr...Safe Software
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonIBM Danmark
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?Christopher Bradley
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxMrityunjay Emmi
 
Delivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data LakeDelivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data LakeJean-Michel Franco
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Precisely
 

Similar to DataOps @ Scale: A Modern Framework for Data Management in the Public Sector (20)

Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperative
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
Optimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckOptimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deck
 
dq_fail.pdf
dq_fail.pdfdq_fail.pdf
dq_fail.pdf
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdf
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing Strategies
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
 
Best Practices to Navigating Data and Application Integration for the Enterpr...
Best Practices to Navigating Data and Application Integration for the Enterpr...Best Practices to Navigating Data and Application Integration for the Enterpr...
Best Practices to Navigating Data and Application Integration for the Enterpr...
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter Jönsson
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Tamr overview
Tamr overviewTamr overview
Tamr overview
 
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
 
Delivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data LakeDelivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data Lake
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
 

More from TamrMarketing

Data Mastering at Scale with Michael Stonebraker
Data Mastering at Scale with Michael StonebrakerData Mastering at Scale with Michael Stonebraker
Data Mastering at Scale with Michael StonebrakerTamrMarketing
 
Data as a Strategic Asset
Data as a Strategic AssetData as a Strategic Asset
Data as a Strategic AssetTamrMarketing
 
7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D OutcomesTamrMarketing
 
How Santander UK Accelerates Digital Initiatives by Mastering Customer Data
How Santander UK Accelerates Digital Initiatives by Mastering Customer DataHow Santander UK Accelerates Digital Initiatives by Mastering Customer Data
How Santander UK Accelerates Digital Initiatives by Mastering Customer DataTamrMarketing
 
Sailing Toward Global Data Alignment with Carnival Corporation
 Sailing Toward Global Data Alignment with Carnival Corporation Sailing Toward Global Data Alignment with Carnival Corporation
Sailing Toward Global Data Alignment with Carnival CorporationTamrMarketing
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyTamrMarketing
 
How to Implement a Spend Analytics Program Using Machine Learning
 How to Implement a Spend Analytics Program Using Machine Learning How to Implement a Spend Analytics Program Using Machine Learning
How to Implement a Spend Analytics Program Using Machine LearningTamrMarketing
 
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...TamrMarketing
 
3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial services3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial servicesTamrMarketing
 

More from TamrMarketing (9)

Data Mastering at Scale with Michael Stonebraker
Data Mastering at Scale with Michael StonebrakerData Mastering at Scale with Michael Stonebraker
Data Mastering at Scale with Michael Stonebraker
 
Data as a Strategic Asset
Data as a Strategic AssetData as a Strategic Asset
Data as a Strategic Asset
 
7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes
 
How Santander UK Accelerates Digital Initiatives by Mastering Customer Data
How Santander UK Accelerates Digital Initiatives by Mastering Customer DataHow Santander UK Accelerates Digital Initiatives by Mastering Customer Data
How Santander UK Accelerates Digital Initiatives by Mastering Customer Data
 
Sailing Toward Global Data Alignment with Carnival Corporation
 Sailing Toward Global Data Alignment with Carnival Corporation Sailing Toward Global Data Alignment with Carnival Corporation
Sailing Toward Global Data Alignment with Carnival Corporation
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
 
How to Implement a Spend Analytics Program Using Machine Learning
 How to Implement a Spend Analytics Program Using Machine Learning How to Implement a Spend Analytics Program Using Machine Learning
How to Implement a Spend Analytics Program Using Machine Learning
 
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
 
3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial services3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial services
 

Recently uploaded

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

DataOps @ Scale: A Modern Framework for Data Management in the Public Sector

  • 1. DataOps @ Scale: A Modern Framework for Data Management in the Public Sector Mark Marinelli, Head of Product, Tamr May, 2020
  • 2. Confidential – Tamr, Inc. Agenda 2 ● How DataOps Began ● A DataOps Framework ● DataOps in the Real World ○ DHS ○ Air Force ● Getting Started ● Q&A
  • 3. Confidential – Tamr, Inc. About the Speakers 3 Mark Marinelli Head of Product at Tamr Katie Everett Marketing Manager at Tamr (Moderator)
  • 4. Confidential – Tamr, Inc. How DataOps Got Started & What Actually is ‘DataOps’? 4
  • 5. Confidential – Tamr, Inc. Modern internet companies have data advantages 5 Unified Dataportal Greenfield Infrastructure & High End Talent Pool
  • 6. Confidential – Tamr, Inc. Traditional orgs have significant “legacy drag coefficient” Problem : Thousands of systems generating data every day that were built over decades to support business processes. 6 Result: “Random Data Salad” Data debt from constant change/entropy. Data Silos created due to: ● Security concerns ● Organizational mismatch Consequences: 1. Too much time spent on data prep vs. analysis / action 2. High failure rate of BI / analytics projects 3. Game changing initiatives deemed ‘impossible’ and never start Restructuring Leadershi p Changes Politics Dynamic Schema DBs - Mongo et al “Data Hoarding” Legacy Burden M&A
  • 7. Confidential – Tamr, Inc. 7 Why now? 7 years ago: we need data scientists!
  • 8. Confidential – Tamr, Inc. 8 Today: we have data scientists! (and want to do cool AI stuff)
  • 9. Confidential – Tamr, Inc. Just not yet in the government… but it’s growing ● Within the last 6 months, the U.S. agencies have begun defining a “Data Science Occupational Series”. ● This means adding the term “(Data Scientist)” at the end of a job title to increase the odds of finding a candidate that understands data. 9
  • 10. Confidential – Tamr, Inc. “How do we take the data that we have — which is ubiquitous and it’s incredible across the federal government — understand it, be able to leverage it at every step in the chain.” - Deputy Federal CIO, Margie Graves 10
  • 11. Confidential – Tamr, Inc. Human/behavioral challenges don’t help 11 ● Afraid to share data - Due to organizational policies and security levels ● Hoarding data - A method of organizational control or job preservation ● Obscuring data complexity - Failure to embrace the complexity, diversity, and idiosyncrasy of data generated in a large organization ● Limiting access to a small number of users - A method of control or as a reflection of insecurity of data quality Human Behavior Challenges
  • 12. Confidential – Tamr, Inc. This is a solved problem 12 ● Item four ● Item five ● Item six ● Item sevenTop-down: Architects drive the spec Monolithic: Single application A Priori Modeling: Front-loaded view of all components Quality Assurance: Manual QA Waterfall Approach 12 Traditional SDLC: Dev/test/prod → major/minor release Bottom-Up: Users drive the spec Distributed: Loosely coupled, scalable Learn from Use: Emergent feature set Continuous Integration: Automated testing Agile Development Modern DevOps: Continuous delivery Just as DevOps drove rapid delivery of high-quality, scalable software applications, DataOps is the path forward for data applications.
  • 13. Confidential – Tamr, Inc. What is DataOps? = Modern data engineering practice 13 DataOps is an automated, process oriented methodology, used by analytic and data teams to improve the quality and reduce the cycle time of data analytics.
  • 14. Confidential – Tamr, Inc. A DataOps Framework: Process, Technology, Organization 14
  • 15. Confidential – Tamr, Inc. DataOps framework components 15 Agile - Incremental delivery method Architecture - Tools which comprise data supply chain Infrastructure - Platform to support architecture Roles - division of labor across mixed-skill teams Structure - working model for projects across technical and business teams OrganizationProcess Technology
  • 16. Confidential – Tamr, Inc. Process - The Wrong Way 16 Sources ConsumersProcess, Technology, Organization ● Labor-intensive ● Monolithic ● IT driven Delivery Time RemainingWork $ ? Modeling Rules Testing ? $ ! Business Users Analysts Data Scientists Developers External Tabular Data Internal Tabular Data
  • 17. Confidential – Tamr, Inc. Process - The Right Way 17 Sources ConsumersProcess, Technology, Organization ● Automated ● Incremental ● Collaborative Time RemainingWork $ $ $ $ ? ? ? ? Analysts Data Scientists Developers Internal Tabular Data External Tabular Data ! Business Users
  • 18. Confidential – Tamr, Inc. Why Use Human-Guided Machine Learning to Master Data 18 Before: Data Scientists spent months and 100% of energy preparing data. Today: ML can do 80% of data mastering lift... …. Enabling Data Scientists to put final touches on the last 20%.
  • 19. Confidential – Tamr, Inc. The DataOps Component Architecture 19 Sources ConsumersTechnology, Organization, Process Internal Tabular Data External Tabular Data Movement & Automation Storage & Compute Governance & Policy Catalog & Crawling Publishing & Versioning Analysts Data Scientists Developers Mastering & Quality Feedback & Usage Business Users
  • 20. Confidential – Tamr, Inc. Technology - Architectural Principles 20 Sources ConsumersProcess, Technology, Organization Analysts Data Scientists Developers ● Scale Out/Distributed ○ Cloud First ● Collaborative (Humans at the Core) ○ Highly Automated - automate whenever possible ○ Bi-Directional (Feedback) ● Open/Best of Breed (not one platform/vendor) ○ Service Oriented (clear endpoints for data) ○ Loosely Coupled (Restful Interfaces Table(s) In/Out) ● Continuous (assume data will change) ○ Both aggregated AND federated storage ○ Both batch AND Streaming ● Lineage/Provenance is essential Internal Tabular Data External Tabular Data Business Users
  • 21. Confidential – Tamr, Inc. Infrastructure - Key Components 21 Management Compute Search Storage Infrastructure (Cloud & On-Prem) Sources ConsumersProcess, Technology, Organization Analysts Data Scientists Developers Internal Tabular Data External Tabular Data Business Users
  • 22. Confidential – Tamr, Inc. Organization - Roles 22 Internal Tabular Data External Tabular Data Data Suppliers Data Consumers CIO Source Owner DBA IT Professional CDO Data Engineer Curator Steward Business Owners / CxOs Data Preparers Sources ConsumersProcess, Technology, Organization Analysts Data Scientists Developers Business Users
  • 23. Confidential – Tamr, Inc. Organization - Roles Role Goals Tools Business Users Use data to make business decisions Viz, CRM, Excel, PowerPoint, Word, Web Search Analyst Deliver insights to the business, typically through dashboards and reports Viz, Excel, SSDP, Web Search Data Scientist Deliver insights to the business, typically through models and algorithms R, Python, SAS, SSDP Developer Build applications which leverage corporate data Python, Java, JS, SQL, REST Data Engineer Deliver and manage data pipelines ETL, SQL Curator Ensure consumers have the data they need, in the form they need it MDM, Catalog Steward Create policies and drive governance MDM, Catalog, Governance Source Owner Define and manage purpose, processes (data creation, consumption) & users (i.e., access) of the data source EDW, SQL, ERWin, LDAP, SAP ConsumersPreparersSuppliers
  • 24. Confidential – Tamr, Inc. Organization - Structure 24Appropriate model will fluctuate with scale of DataOps project work Shared Services Model Full-service development of data applications, in collaboration with business Advantages ● Centralized technical knowledge ● Centralized resourcing - one-stop shop ● Accretive experience Disadvantages ● Bandwidth contention - how to prioritize competing projects? Advisory Model Bootstraps projects with best of breed tools and approach, but does not complete them Advantages ● Centralized technical knowledge ● Minimal resourcing - experts, not implementers ● Flexibility - options to deviate from standard tools Disadvantages ● Resource burden in on each project / department - both in development and ongoing maintenance ● Limited feedback - does the advice get better after each project?
  • 25. Confidential – Tamr, Inc. DataOps in the Real World 25
  • 26. Confidential – Tamr, Inc. 26 Global Travel Assessment System (GTAS) U.S. Customs and Border Protection (CBP) developed GTAS - an open source application providing nation-states and border security entities the capacity to screen persons against a risk criteria for threat prevention, public health or a variety of other use cases. DHS selected Tamr to provide improved entity resolution capability due to its fast, scalable human guided machine learning- based approach GTAS: CBP’s Passenger Data Screening and Analysis System ● Receive and store air traveler reservation and manifest data ● Perform real time risk assessment ● Manage risk criteria and watch lists ● View high risk travelers with associated flight details and reservation information ● Query flight and travel history Human trafficker profile match Terrorist database match Recent travel in pandemic affected area Risk Criteria Passenger Manifest Travel Reservation Intergovernmental INTERPOL Exchanges
  • 27. Confidential – Tamr, Inc. DHS Case Study con’t- 4 Phases to Deployment 27 Phase 1 - Accuracy Improve accuracy of entity resolution with biographical and reservation data 4PHASESResults ● Leverage all data ● Label examples of matching/distinct ● Measure precision and recall ● Iterate & optimize Phase 2 - Automation Build data pipeline and automate data movement and system controls Phase 3 - Performance Optimize performance through a new low latency match service and reliable, robust communications Phase 4- Interoperability Ensure stability for deployment into variety of environments ● Prepare, ingest, export data ● Triggering, data exchanges and error handling ● Security and authorization ● Optimize models for risk, timing, cultural patterns, etc.ERP Consolidation ● Create low latency match capability ● Measure and iterate end- to-end latency ● Create documentation and communication channels for international support ● Installation, testing and sustainment ● Advanced feature offerings 2017 2017 2018 2019
  • 28. Confidential – Tamr, Inc. 28 DHS Case Study “When we were looking for companies, Tamr fit our bill perfectly. They were interested in the mission, they understood what we were trying to do and why it was important to international security, and they demonstrated the capacity to execute at a commercial level.” Ari Schuler, Advisor Office of the Commissioner U.S. Customs and Border Protection Recent Tamr recognition by DHS ● Science & Technology-Funded 2019 Performer Award: Crossing the Valley of Death ■ This honor is awarded to an effort that has a technical transition success story. ● DHS Silicon Valley Innovation Program: Snapshot Article ■ The article highlights the transition of Tamr technology to CBP, to be release January 2019 via Meltwater and GovDelivery
  • 29. Confidential – Tamr, Inc. Case Study: US Air Force 29 Semi Automated Aircraft Wing Flutter (vibration) Analysis Technical Outcome Business Outcome Technical Challenge Business Challenge ● Use ML to understand a large corpus (30 yrs worth) of past testing, simulations, and analyses ● Automate large portions of the process to predict aircraft "flutter” ● Users quickly interrogate decades worth of technical data via rich metadata ● Reduce engineer process time dramatically by identifying relevant antecedents and technical predictions ● Deliver on big data initiative, enable end users easy search of historic data ● Present SMEs (PhD engineers) with short list of relevant antecedents and flutter predictions ● Tamr tagged 35K files with 645K descriptive labels in 22 tag types (aircraft, stores, author, etc.) ● Automatically create recommendation based on a machine learning model built on the historical data
  • 30. Confidential – Tamr, Inc. Accelerating Subject Matter Expert Recommendations 30 Machine learning models for each discipline Discipline-specific recommendations Metadata extraction Relevant documents INPUT AUTOMATED ANALYSIS AUTOMATED OUTPUTS PRODUCTIVITY TOOLS Document browsing powered by clean, consistent metadata New Config. Request
  • 31. Confidential – Tamr, Inc. Getting Started With DataOps 31
  • 32. Confidential – Tamr, Inc. Getting started - Process 32 Agile is the key ● If not already there, choose a model that works (Scrum, SAFe) Inventory the set of available projects ● Score on availability of data vs. value of solving a problem Define high-value, data-rich project that will demand a complex solution ● Forcing function to ensure end-to-end functionality will be covered Process
  • 33. Confidential – Tamr, Inc. Getting started - Technology 33 Identify path to a modern, modular service architecture ● Create blueprint for next generation data management platform ● Revisit cloud migration strategy Inventory current tool set ● TCO / skill requirements / etc. ● Determine which should be replaced, and when this is viable Decouple monolithic processes ● Wrap components in APIs, expose as services Start building with new tech ● Choose subset of tools for proof of concepts to replace old tech Technology
  • 34. Confidential – Tamr, Inc. Getting started - Optimization 34 Inventory current team ● Identify existing key roles - data engineers and their consumers ● Find best candidates for new roles - data curators and data stewards Create cross-functional team ● Data consumers - will depend upon project ● Data Engineer(s) ● Curator ● Steward Choose your operating model ● Start with Shared Services for first project Ensure executive alignment ● CDO or equivalent Optimization
  • 35. Confidential – Tamr, Inc. What NOT to do 35 ● Avoid boil the ocean/“waterfall” (projects measured in years/quarters) ○ Build rational long term infra while delivering real analytic value along the way ● Single “Platform”: Don’t overestimate what single piece of software can do ○ Focus on thoughtfully designed ecosystem of loosely coupled best of breed tools ● Single Vendor: Don’t overestimate what single vendor can do ○ Align vendors with APIs and expectations that they MUST work together ● Don’t Underestimate effort required to make FOSS work ○ Just because Google does it doesn’t mean you can do it ● Don’t underestimate human/behavioral challenges with data ○ Most often the reason that projects fail/stall are human/behavioral
  • 36. Confidential – Tamr, Inc. Key DataOps Principles 36 OrganizationProcess Technology Agile - Quick wins + incremental value delivery Architecture - Loosely-coupled best of breed components which incorporate automation + human feedback Infrastructure - Cloud-native, scalable and elastic tooling Roles - Specialization and separation of duties Structure - Centralized expertise + knowledge capture across projects
  • 37. Confidential – Tamr, Inc. 37 New podcast series! Featured guests include: ● Nick Sinai - Deputy CTO in the Obama Administration ● Eric Iverson - Former CIO at Sony ● And more data leaders... Listen today on Spotify, Apple Podcasts and Google Podcasts! https://www.tamr.com/datamasters/ Listen today on Spotify, Apple Podcasts and Google Podcasts! https://www.tamr.com/datamasters/ New Podcast Series - DataMasters
  • 38. Confidential – Tamr, Inc. 38 Questions? Contact Michael Gormey with additional questions after the webinar: michael.gormey@tamr.com

Editor's Notes

  1. Thank you all for joining us today. We are thrilled to put on today’s webinar, DataOps at Scale, a modern framework for data management in the public sector. My name is Katie Everett, and I’m the public sector marketing manager at Tamr and will serve as today’s moderator.
  2. In the webinar, we’ll review how DataOps began, what DataOps is and the components that go into a successful framework. Additionally, we’ll give you real life examples of DataOps successfully deployed in the public sector as well as actionable steps on how you can get started. We’ll leave some time at the end for Q&A.
  3. I’d like to introduce today’s speaker who will take us through the webinar, Mark Marinelli. Mark is the Head of Product at Tamr and is a 20-year veteran of Enterprise Data Management and Analytics software. He is well versed in coaching companies through deploying DataOps at scale across multiple industries. Mark has held engineering, product management, and technology strategy roles at Lucent Technologies, Macrovision, and most recently at Lavastorm, where he was Chief Technology Officer. So, over to you Mark.
  4. Manage data from their business systems more as “exhaust” than “asset” > “significant data debt”
  5. Heavy shortage of data scientists Rush to fill the gap
  6. Companies starting filling the gaps… rapidly scooping up data talent
  7. https://www.fedscoop.com/data-scientist-hiring-margie-graves/
  8. As a Chief Data Officer begins to tackle the human/behavioral challenges, - they need to also begin establishing their next generation technical infrastructure. Having worked with dozens of Global 2000 Customers on their data/analytics initiatives at Tamr, we’ve seen some key principles that work well as companies begin to establish their next generation data infrastructure.
  9. Just as DevOps drove rapid delivery of high-quality, scalable software applications, DataOps is the path forward for data applications.
  10. Components: What There are many ways to think about the potential components of a next gen data ecosystem for the enterprise. Our friends at DataKitchen have done a good job with this post which refers to some solid work by the Eckerson group. In the interest of trying to simplify the context of what you might consider buying vs. building and which vendors you might consider, I’ve tried to lay out the primary components of a next gen enterprise data ecosystem based on the environments I’ve seen people configuring over the past 8-10 years and the tools (new and old) that are available. These components can be summarized as follows :
  11. Components: What There are many ways to think about the potential components of a next gen data ecosystem for the enterprise. Our friends at DataKitchen have done a good job with this post which refers to some solid work by the Eckerson group. In the interest of trying to simplify the context of what you might consider buying vs. building and which vendors you might consider, I’ve tried to lay out the primary components of a next gen enterprise data ecosystem based on the environments I’ve seen people configuring over the past 8-10 years and the tools (new and old) that are available. These components can be summarized as follows :
  12. Components: What There are many ways to think about the potential components of a next gen data ecosystem for the enterprise. Our friends at DataKitchen have done a good job with this post which refers to some solid work by the Eckerson group. In the interest of trying to simplify the context of what you might consider buying vs. building and which vendors you might consider, I’ve tried to lay out the primary components of a next gen enterprise data ecosystem based on the environments I’ve seen people configuring over the past 8-10 years and the tools (new and old) that are available. These components can be summarized as follows :
  13. Q: Who are the people within the agencies that you work with that tend to champion DataOps initiatives from a leadership perspective?
  14. Q: Are you advocating to eliminate the need for data scientists? Can you talk a little bit more about that? ---- Taking the human out of the loop, is ML as good as the humans? - ML runs 24/7, doesn’t need coffee or sleep Q: Can subject matter experts really train this ML model? How is that possible?
  15. Q: Do you see processes/tools in DataOps becoming open-sourced? If so, which processes do you think should be open-sourced to enable better integration of multiple vendors?
  16. Thanks Mark. We have some very exciting news to share with you all today. Tamr launched a new podcast series called DataMasters. The podcast features data leaders that share stories about their journeys and offer insights on how they made their organizations more data driven. Featured guests on the inaugural launch include: Nick Sinai - Deputy CTO in the Obama Administration and Eric Iverson - Former CIO at Sony We invite you to listen and subscribe today. The podcast will host many more data leaders across government organizations and enterprises.