SlideShare a Scribd company logo
1 of 38
DataOps @ Scale:
A Modern Framework for Data
Management in the Public Sector
Mark Marinelli, Head of Product, Tamr
May, 2020
Confidential – Tamr, Inc.
Agenda
2
● How DataOps Began
● A DataOps Framework
● DataOps in the Real World
○ DHS
○ Air Force
● Getting Started
● Q&A
Confidential – Tamr, Inc.
About the Speakers
3
Mark Marinelli
Head of Product at Tamr
Katie Everett
Marketing Manager at Tamr (Moderator)
Confidential – Tamr, Inc.
How DataOps Got Started &
What Actually is ‘DataOps’?
4
Confidential – Tamr, Inc.
Modern internet companies have data advantages
5
Unified Dataportal
Greenfield Infrastructure & High End Talent Pool
Confidential – Tamr, Inc.
Traditional orgs have significant “legacy drag coefficient”
Problem : Thousands of systems
generating data every day that
were built over decades to support
business processes.
6
Result: “Random Data
Salad”
Data debt from constant
change/entropy.
Data Silos created due to:
● Security concerns
● Organizational mismatch
Consequences:
1. Too much time spent on data prep vs.
analysis / action
2. High failure rate of BI / analytics projects
3. Game changing initiatives deemed
‘impossible’ and never start
Restructuring
Leadershi
p
Changes
Politics
Dynamic Schema
DBs - Mongo et al
“Data
Hoarding”
Legacy
Burden
M&A
Confidential – Tamr, Inc. 7
Why now? 7 years ago: we need data scientists!
Confidential – Tamr, Inc. 8
Today: we have data scientists! (and want to do cool AI stuff)
Confidential – Tamr, Inc.
Just not yet in the government… but it’s growing
● Within the last 6 months, the U.S. agencies have begun defining a “Data
Science Occupational Series”.
● This means adding the term “(Data Scientist)” at the end of a job title to
increase the odds of finding a candidate that understands data.
9
Confidential – Tamr, Inc.
“How do we take the data that we have — which is ubiquitous and it’s incredible across the
federal government — understand it, be able to leverage it at every step in the chain.”
- Deputy Federal CIO, Margie Graves
10
Confidential – Tamr, Inc.
Human/behavioral challenges don’t help
11
● Afraid to share data - Due to organizational
policies and security levels
● Hoarding data - A method of organizational
control or job preservation
● Obscuring data complexity - Failure to
embrace the complexity, diversity, and
idiosyncrasy of data generated in a large
organization
● Limiting access to a small number of users
- A method of control or as a reflection of
insecurity of data quality
Human Behavior Challenges
Confidential – Tamr, Inc.
This is a solved problem
12
● Item four
● Item five
● Item six
● Item sevenTop-down:
Architects drive
the spec
Monolithic: Single
application
A Priori Modeling:
Front-loaded view of
all components
Quality Assurance:
Manual QA
Waterfall Approach
12
Traditional SDLC:
Dev/test/prod →
major/minor release
Bottom-Up: Users
drive the spec
Distributed:
Loosely coupled,
scalable
Learn from Use:
Emergent feature set
Continuous
Integration:
Automated testing
Agile Development
Modern DevOps:
Continuous delivery
Just as DevOps drove rapid delivery of high-quality, scalable software applications,
DataOps is the path forward for data applications.
Confidential – Tamr, Inc.
What is DataOps? = Modern data engineering practice
13
DataOps is an automated, process oriented
methodology, used by analytic and data teams to
improve the quality and reduce the cycle time of data
analytics.
Confidential – Tamr, Inc.
A DataOps Framework:
Process, Technology, Organization
14
Confidential – Tamr, Inc.
DataOps framework components
15
Agile - Incremental delivery
method
Architecture - Tools which
comprise data supply chain
Infrastructure - Platform to
support architecture
Roles - division of labor across
mixed-skill teams
Structure - working model for
projects across technical and
business teams
OrganizationProcess Technology
Confidential – Tamr, Inc.
Process - The Wrong Way
16
Sources ConsumersProcess, Technology, Organization
● Labor-intensive
● Monolithic
● IT driven
Delivery
Time
RemainingWork
$
?
Modeling
Rules
Testing
?
$
!
Business
Users
Analysts
Data
Scientists
Developers
External Tabular Data
Internal Tabular Data
Confidential – Tamr, Inc.
Process - The Right Way
17
Sources ConsumersProcess, Technology, Organization
● Automated
● Incremental
● Collaborative
Time
RemainingWork
$
$
$
$
?
?
?
?
Analysts
Data
Scientists
Developers
Internal Tabular Data
External Tabular Data
!
Business
Users
Confidential – Tamr, Inc.
Why Use Human-Guided Machine Learning to
Master Data
18
Before: Data Scientists spent months and
100% of energy preparing data.
Today: ML can do 80% of
data mastering lift...
…. Enabling Data Scientists to put
final touches on the last 20%.
Confidential – Tamr, Inc.
The DataOps Component Architecture
19
Sources ConsumersTechnology, Organization, Process
Internal Tabular Data
External Tabular Data
Movement & Automation
Storage & Compute
Governance & Policy
Catalog &
Crawling
Publishing &
Versioning
Analysts
Data
Scientists
Developers
Mastering & Quality
Feedback & Usage
Business
Users
Confidential – Tamr, Inc.
Technology - Architectural Principles
20
Sources ConsumersProcess, Technology, Organization
Analysts
Data
Scientists
Developers
● Scale Out/Distributed
○ Cloud First
● Collaborative (Humans at the Core)
○ Highly Automated - automate whenever possible
○ Bi-Directional (Feedback)
● Open/Best of Breed (not one platform/vendor)
○ Service Oriented (clear endpoints for data)
○ Loosely Coupled (Restful Interfaces Table(s) In/Out)
● Continuous (assume data will change)
○ Both aggregated AND federated storage
○ Both batch AND Streaming
● Lineage/Provenance is essential
Internal Tabular Data
External Tabular Data
Business
Users
Confidential – Tamr, Inc.
Infrastructure - Key Components
21
Management
Compute
Search
Storage
Infrastructure
(Cloud & On-Prem)
Sources ConsumersProcess, Technology, Organization
Analysts
Data
Scientists
Developers
Internal Tabular Data
External Tabular Data
Business
Users
Confidential – Tamr, Inc.
Organization - Roles
22
Internal Tabular Data
External Tabular Data
Data
Suppliers
Data
Consumers
CIO
Source Owner
DBA
IT Professional
CDO
Data Engineer
Curator
Steward
Business Owners / CxOs
Data
Preparers
Sources ConsumersProcess, Technology, Organization
Analysts
Data
Scientists
Developers
Business
Users
Confidential – Tamr, Inc.
Organization - Roles
Role Goals Tools
Business
Users
Use data to make business decisions Viz, CRM, Excel, PowerPoint, Word, Web
Search
Analyst Deliver insights to the business, typically through dashboards and
reports
Viz, Excel, SSDP, Web Search
Data Scientist Deliver insights to the business, typically through models and
algorithms
R, Python, SAS, SSDP
Developer Build applications which leverage corporate data Python, Java, JS, SQL, REST
Data Engineer Deliver and manage data pipelines ETL, SQL
Curator Ensure consumers have the data they need, in the form they need it MDM, Catalog
Steward Create policies and drive governance MDM, Catalog, Governance
Source Owner Define and manage purpose, processes (data creation, consumption) &
users (i.e., access) of the data source
EDW, SQL, ERWin, LDAP, SAP
ConsumersPreparersSuppliers
Confidential – Tamr, Inc.
Organization - Structure
24Appropriate model will fluctuate with scale of DataOps project work
Shared Services Model
Full-service development of data applications, in
collaboration with business
Advantages
● Centralized technical knowledge
● Centralized resourcing - one-stop shop
● Accretive experience
Disadvantages
● Bandwidth contention - how to prioritize
competing projects?
Advisory Model
Bootstraps projects with best of breed tools and
approach, but does not complete them
Advantages
● Centralized technical knowledge
● Minimal resourcing - experts, not
implementers
● Flexibility - options to deviate from standard
tools
Disadvantages
● Resource burden in on each project /
department - both in development and
ongoing maintenance
● Limited feedback - does the advice get better
after each project?
Confidential – Tamr, Inc.
DataOps in the Real World
25
Confidential – Tamr, Inc. 26
Global Travel Assessment System
(GTAS)
U.S. Customs and Border Protection (CBP)
developed GTAS - an open source
application providing nation-states and
border security entities the capacity to
screen persons against a risk criteria for
threat prevention, public health or a
variety of other use cases.
DHS selected Tamr to provide improved
entity resolution capability due to its fast,
scalable human guided machine learning-
based approach
GTAS: CBP’s Passenger Data Screening and Analysis System
● Receive and store air traveler reservation and manifest data
● Perform real time risk assessment
● Manage risk criteria and watch lists
● View high risk travelers with associated flight details and
reservation information
● Query flight and travel history
Human trafficker profile match
Terrorist database match
Recent travel in pandemic
affected area
Risk Criteria
Passenger Manifest
Travel Reservation
Intergovernmental
INTERPOL Exchanges
Confidential – Tamr, Inc.
DHS Case Study con’t- 4 Phases to Deployment
27
Phase 1 -
Accuracy
Improve accuracy of
entity resolution with
biographical and
reservation data
4PHASESResults
● Leverage all data
● Label examples of
matching/distinct
● Measure precision
and recall
● Iterate & optimize
Phase 2 - Automation
Build data pipeline and
automate data
movement and system
controls
Phase 3 - Performance
Optimize performance
through a new low
latency match service
and reliable, robust
communications
Phase 4-
Interoperability
Ensure stability for
deployment into variety
of environments
● Prepare, ingest,
export data
● Triggering, data
exchanges and
error handling
● Security and
authorization
● Optimize models for
risk, timing, cultural
patterns, etc.ERP
Consolidation
● Create low latency
match capability
● Measure and iterate end-
to-end latency
● Create documentation
and communication
channels for
international support
● Installation, testing
and sustainment
● Advanced feature
offerings
2017 2017 2018 2019
Confidential – Tamr, Inc. 28
DHS Case Study
“When we were looking for
companies, Tamr fit our bill
perfectly. They were interested in
the mission, they understood what
we were trying to do and why it
was important to international
security, and they demonstrated
the capacity to execute at a
commercial level.”
Ari Schuler, Advisor
Office of the Commissioner
U.S. Customs and Border Protection
Recent Tamr recognition by DHS
● Science & Technology-Funded 2019
Performer Award:
Crossing the Valley of Death
■ This honor is awarded to an effort that
has a technical transition success story.
● DHS Silicon Valley Innovation Program:
Snapshot Article
■ The article highlights the transition of
Tamr technology to CBP, to be release
January 2019 via Meltwater and
GovDelivery
Confidential – Tamr, Inc.
Case Study: US Air Force
29
Semi Automated Aircraft Wing Flutter (vibration) Analysis
Technical Outcome
Business Outcome
Technical Challenge
Business Challenge
● Use ML to understand a large corpus (30
yrs worth) of past testing, simulations,
and analyses
● Automate large portions of the process
to predict aircraft "flutter”
● Users quickly interrogate decades worth
of technical data via rich metadata
● Reduce engineer process time
dramatically by identifying relevant
antecedents and technical predictions
● Deliver on big data initiative, enable
end users easy search of historic data
● Present SMEs (PhD engineers) with
short list of relevant antecedents and
flutter predictions
● Tamr tagged 35K files with 645K
descriptive labels in 22 tag types
(aircraft, stores, author, etc.)
● Automatically create recommendation
based on a machine learning model
built on the historical data
Confidential – Tamr, Inc.
Accelerating Subject Matter Expert Recommendations
30
Machine learning models
for each discipline
Discipline-specific
recommendations
Metadata extraction
Relevant documents
INPUT AUTOMATED ANALYSIS AUTOMATED OUTPUTS PRODUCTIVITY TOOLS
Document browsing powered by
clean, consistent metadata
New
Config.
Request
Confidential – Tamr, Inc.
Getting Started With DataOps
31
Confidential – Tamr, Inc.
Getting started - Process
32
Agile is the key
● If not already there, choose a model that works (Scrum,
SAFe)
Inventory the set of available projects
● Score on availability of data vs. value of solving a
problem
Define high-value, data-rich project that will demand a
complex solution
● Forcing function to ensure end-to-end functionality will
be covered
Process
Confidential – Tamr, Inc.
Getting started - Technology
33
Identify path to a modern, modular service architecture
● Create blueprint for next generation data
management platform
● Revisit cloud migration strategy
Inventory current tool set
● TCO / skill requirements / etc.
● Determine which should be replaced, and when this is
viable
Decouple monolithic processes
● Wrap components in APIs, expose as services
Start building with new tech
● Choose subset of tools for proof of concepts to replace old
tech
Technology
Confidential – Tamr, Inc.
Getting started - Optimization
34
Inventory current team
● Identify existing key roles - data engineers and their
consumers
● Find best candidates for new roles - data curators and
data stewards
Create cross-functional team
● Data consumers - will depend upon project
● Data Engineer(s)
● Curator
● Steward
Choose your operating model
● Start with Shared Services for first project
Ensure executive alignment
● CDO or equivalent
Optimization
Confidential – Tamr, Inc.
What NOT to do
35
● Avoid boil the ocean/“waterfall” (projects measured in years/quarters)
○ Build rational long term infra while delivering real analytic value along the way
● Single “Platform”: Don’t overestimate what single piece of software can do
○ Focus on thoughtfully designed ecosystem of loosely coupled best of breed tools
● Single Vendor: Don’t overestimate what single vendor can do
○ Align vendors with APIs and expectations that they MUST work together
● Don’t Underestimate effort required to make FOSS work
○ Just because Google does it doesn’t mean you can do it
● Don’t underestimate human/behavioral challenges with data
○ Most often the reason that projects fail/stall are human/behavioral
Confidential – Tamr, Inc.
Key DataOps Principles
36
OrganizationProcess Technology
Agile - Quick wins +
incremental value delivery
Architecture - Loosely-coupled
best of breed components
which incorporate automation
+ human feedback
Infrastructure - Cloud-native,
scalable and elastic tooling
Roles - Specialization and
separation of duties
Structure - Centralized
expertise + knowledge capture
across projects
Confidential – Tamr, Inc. 37
New podcast series!
Featured guests include:
● Nick Sinai - Deputy CTO in the Obama Administration
● Eric Iverson - Former CIO at Sony
● And more data leaders...
Listen today on Spotify, Apple Podcasts and Google Podcasts!
https://www.tamr.com/datamasters/
Listen today on Spotify, Apple Podcasts and Google Podcasts!
https://www.tamr.com/datamasters/
New Podcast Series - DataMasters
Confidential – Tamr, Inc. 38
Questions?
Contact Michael Gormey with additional questions after the
webinar:
michael.gormey@tamr.com

More Related Content

What's hot

The Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data AnalyticsThe Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data AnalyticsStephanie Locke
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019DataKitchen
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoDataKitchen
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Ibm data governance framework
Ibm data governance frameworkIbm data governance framework
Ibm data governance frameworkkaiyun7631
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud ServicesMicrosoft Azure Cloud Services
Microsoft Azure Cloud ServicesDavid J Rosenthal
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 

What's hot (20)

The Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data AnalyticsThe Microsoft Well Architected Framework For Data Analytics
The Microsoft Well Architected Framework For Data Analytics
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
 
Cloud Migration: A How-To Guide
Cloud Migration: A How-To GuideCloud Migration: A How-To Guide
Cloud Migration: A How-To Guide
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Ibm data governance framework
Ibm data governance frameworkIbm data governance framework
Ibm data governance framework
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Microsoft Azure Cloud Services
Microsoft Azure Cloud ServicesMicrosoft Azure Cloud Services
Microsoft Azure Cloud Services
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 

Similar to DataOps @ Scale: A Modern Framework for Data Management in the Public Sector

Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperativeTrillium Software
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersRuhollah Farchtchi
 
Optimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckOptimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckTamrMarketing
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsDatameer
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfcedrinemadera
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData Blueprint
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesDATAVERSITY
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...CompTIA
 
Best Practices to Navigating Data and Application Integration for the Enterpr...
Best Practices to Navigating Data and Application Integration for the Enterpr...Best Practices to Navigating Data and Application Integration for the Enterpr...
Best Practices to Navigating Data and Application Integration for the Enterpr...Safe Software
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonIBM Danmark
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?Christopher Bradley
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxMrityunjay Emmi
 
Delivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data LakeDelivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data LakeJean-Michel Franco
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Precisely
 

Similar to DataOps @ Scale: A Modern Framework for Data Management in the Public Sector (20)

Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperative
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
Optimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckOptimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deck
 
dq_fail.pdf
dq_fail.pdfdq_fail.pdf
dq_fail.pdf
 
Fight Fraud with Big Data Analytics
Fight Fraud with Big Data AnalyticsFight Fraud with Big Data Analytics
Fight Fraud with Big Data Analytics
 
EPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdfEPF-datagov-part1-1.pdf
EPF-datagov-part1-1.pdf
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing Strategies
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
Is Your Staff Big Data Ready? 5 Things to Know About What It Will Take to Suc...
 
Best Practices to Navigating Data and Application Integration for the Enterpr...
Best Practices to Navigating Data and Application Integration for the Enterpr...Best Practices to Navigating Data and Application Integration for the Enterpr...
Best Practices to Navigating Data and Application Integration for the Enterpr...
 
Big Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter JönssonBig Data & Analytics, Peter Jönsson
Big Data & Analytics, Peter Jönsson
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Tamr overview
Tamr overviewTamr overview
Tamr overview
 
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptxch1vsat2k_BDA_Introduction11Jan17-converted.pptx
ch1vsat2k_BDA_Introduction11Jan17-converted.pptx
 
Delivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data LakeDelivering Analytics at Scale with a Governed Data Lake
Delivering Analytics at Scale with a Governed Data Lake
 
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
Engineering Machine Learning Data Pipelines Series: Big Data Quality - Cleans...
 

More from TamrMarketing

Data Mastering at Scale with Michael Stonebraker
Data Mastering at Scale with Michael StonebrakerData Mastering at Scale with Michael Stonebraker
Data Mastering at Scale with Michael StonebrakerTamrMarketing
 
Data as a Strategic Asset
Data as a Strategic AssetData as a Strategic Asset
Data as a Strategic AssetTamrMarketing
 
7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D OutcomesTamrMarketing
 
How Santander UK Accelerates Digital Initiatives by Mastering Customer Data
How Santander UK Accelerates Digital Initiatives by Mastering Customer DataHow Santander UK Accelerates Digital Initiatives by Mastering Customer Data
How Santander UK Accelerates Digital Initiatives by Mastering Customer DataTamrMarketing
 
Sailing Toward Global Data Alignment with Carnival Corporation
 Sailing Toward Global Data Alignment with Carnival Corporation Sailing Toward Global Data Alignment with Carnival Corporation
Sailing Toward Global Data Alignment with Carnival CorporationTamrMarketing
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyTamrMarketing
 
How to Implement a Spend Analytics Program Using Machine Learning
 How to Implement a Spend Analytics Program Using Machine Learning How to Implement a Spend Analytics Program Using Machine Learning
How to Implement a Spend Analytics Program Using Machine LearningTamrMarketing
 
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...TamrMarketing
 
3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial services3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial servicesTamrMarketing
 

More from TamrMarketing (9)

Data Mastering at Scale with Michael Stonebraker
Data Mastering at Scale with Michael StonebrakerData Mastering at Scale with Michael Stonebraker
Data Mastering at Scale with Michael Stonebraker
 
Data as a Strategic Asset
Data as a Strategic AssetData as a Strategic Asset
Data as a Strategic Asset
 
7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes7 Steps for Boosting R&D Outcomes
7 Steps for Boosting R&D Outcomes
 
How Santander UK Accelerates Digital Initiatives by Mastering Customer Data
How Santander UK Accelerates Digital Initiatives by Mastering Customer DataHow Santander UK Accelerates Digital Initiatives by Mastering Customer Data
How Santander UK Accelerates Digital Initiatives by Mastering Customer Data
 
Sailing Toward Global Data Alignment with Carnival Corporation
 Sailing Toward Global Data Alignment with Carnival Corporation Sailing Toward Global Data Alignment with Carnival Corporation
Sailing Toward Global Data Alignment with Carnival Corporation
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
 
How to Implement a Spend Analytics Program Using Machine Learning
 How to Implement a Spend Analytics Program Using Machine Learning How to Implement a Spend Analytics Program Using Machine Learning
How to Implement a Spend Analytics Program Using Machine Learning
 
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
 
3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial services3 Strategies to drive more data driven outcomes in financial services
3 Strategies to drive more data driven outcomes in financial services
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 

DataOps @ Scale: A Modern Framework for Data Management in the Public Sector

  • 1. DataOps @ Scale: A Modern Framework for Data Management in the Public Sector Mark Marinelli, Head of Product, Tamr May, 2020
  • 2. Confidential – Tamr, Inc. Agenda 2 ● How DataOps Began ● A DataOps Framework ● DataOps in the Real World ○ DHS ○ Air Force ● Getting Started ● Q&A
  • 3. Confidential – Tamr, Inc. About the Speakers 3 Mark Marinelli Head of Product at Tamr Katie Everett Marketing Manager at Tamr (Moderator)
  • 4. Confidential – Tamr, Inc. How DataOps Got Started & What Actually is ‘DataOps’? 4
  • 5. Confidential – Tamr, Inc. Modern internet companies have data advantages 5 Unified Dataportal Greenfield Infrastructure & High End Talent Pool
  • 6. Confidential – Tamr, Inc. Traditional orgs have significant “legacy drag coefficient” Problem : Thousands of systems generating data every day that were built over decades to support business processes. 6 Result: “Random Data Salad” Data debt from constant change/entropy. Data Silos created due to: ● Security concerns ● Organizational mismatch Consequences: 1. Too much time spent on data prep vs. analysis / action 2. High failure rate of BI / analytics projects 3. Game changing initiatives deemed ‘impossible’ and never start Restructuring Leadershi p Changes Politics Dynamic Schema DBs - Mongo et al “Data Hoarding” Legacy Burden M&A
  • 7. Confidential – Tamr, Inc. 7 Why now? 7 years ago: we need data scientists!
  • 8. Confidential – Tamr, Inc. 8 Today: we have data scientists! (and want to do cool AI stuff)
  • 9. Confidential – Tamr, Inc. Just not yet in the government… but it’s growing ● Within the last 6 months, the U.S. agencies have begun defining a “Data Science Occupational Series”. ● This means adding the term “(Data Scientist)” at the end of a job title to increase the odds of finding a candidate that understands data. 9
  • 10. Confidential – Tamr, Inc. “How do we take the data that we have — which is ubiquitous and it’s incredible across the federal government — understand it, be able to leverage it at every step in the chain.” - Deputy Federal CIO, Margie Graves 10
  • 11. Confidential – Tamr, Inc. Human/behavioral challenges don’t help 11 ● Afraid to share data - Due to organizational policies and security levels ● Hoarding data - A method of organizational control or job preservation ● Obscuring data complexity - Failure to embrace the complexity, diversity, and idiosyncrasy of data generated in a large organization ● Limiting access to a small number of users - A method of control or as a reflection of insecurity of data quality Human Behavior Challenges
  • 12. Confidential – Tamr, Inc. This is a solved problem 12 ● Item four ● Item five ● Item six ● Item sevenTop-down: Architects drive the spec Monolithic: Single application A Priori Modeling: Front-loaded view of all components Quality Assurance: Manual QA Waterfall Approach 12 Traditional SDLC: Dev/test/prod → major/minor release Bottom-Up: Users drive the spec Distributed: Loosely coupled, scalable Learn from Use: Emergent feature set Continuous Integration: Automated testing Agile Development Modern DevOps: Continuous delivery Just as DevOps drove rapid delivery of high-quality, scalable software applications, DataOps is the path forward for data applications.
  • 13. Confidential – Tamr, Inc. What is DataOps? = Modern data engineering practice 13 DataOps is an automated, process oriented methodology, used by analytic and data teams to improve the quality and reduce the cycle time of data analytics.
  • 14. Confidential – Tamr, Inc. A DataOps Framework: Process, Technology, Organization 14
  • 15. Confidential – Tamr, Inc. DataOps framework components 15 Agile - Incremental delivery method Architecture - Tools which comprise data supply chain Infrastructure - Platform to support architecture Roles - division of labor across mixed-skill teams Structure - working model for projects across technical and business teams OrganizationProcess Technology
  • 16. Confidential – Tamr, Inc. Process - The Wrong Way 16 Sources ConsumersProcess, Technology, Organization ● Labor-intensive ● Monolithic ● IT driven Delivery Time RemainingWork $ ? Modeling Rules Testing ? $ ! Business Users Analysts Data Scientists Developers External Tabular Data Internal Tabular Data
  • 17. Confidential – Tamr, Inc. Process - The Right Way 17 Sources ConsumersProcess, Technology, Organization ● Automated ● Incremental ● Collaborative Time RemainingWork $ $ $ $ ? ? ? ? Analysts Data Scientists Developers Internal Tabular Data External Tabular Data ! Business Users
  • 18. Confidential – Tamr, Inc. Why Use Human-Guided Machine Learning to Master Data 18 Before: Data Scientists spent months and 100% of energy preparing data. Today: ML can do 80% of data mastering lift... …. Enabling Data Scientists to put final touches on the last 20%.
  • 19. Confidential – Tamr, Inc. The DataOps Component Architecture 19 Sources ConsumersTechnology, Organization, Process Internal Tabular Data External Tabular Data Movement & Automation Storage & Compute Governance & Policy Catalog & Crawling Publishing & Versioning Analysts Data Scientists Developers Mastering & Quality Feedback & Usage Business Users
  • 20. Confidential – Tamr, Inc. Technology - Architectural Principles 20 Sources ConsumersProcess, Technology, Organization Analysts Data Scientists Developers ● Scale Out/Distributed ○ Cloud First ● Collaborative (Humans at the Core) ○ Highly Automated - automate whenever possible ○ Bi-Directional (Feedback) ● Open/Best of Breed (not one platform/vendor) ○ Service Oriented (clear endpoints for data) ○ Loosely Coupled (Restful Interfaces Table(s) In/Out) ● Continuous (assume data will change) ○ Both aggregated AND federated storage ○ Both batch AND Streaming ● Lineage/Provenance is essential Internal Tabular Data External Tabular Data Business Users
  • 21. Confidential – Tamr, Inc. Infrastructure - Key Components 21 Management Compute Search Storage Infrastructure (Cloud & On-Prem) Sources ConsumersProcess, Technology, Organization Analysts Data Scientists Developers Internal Tabular Data External Tabular Data Business Users
  • 22. Confidential – Tamr, Inc. Organization - Roles 22 Internal Tabular Data External Tabular Data Data Suppliers Data Consumers CIO Source Owner DBA IT Professional CDO Data Engineer Curator Steward Business Owners / CxOs Data Preparers Sources ConsumersProcess, Technology, Organization Analysts Data Scientists Developers Business Users
  • 23. Confidential – Tamr, Inc. Organization - Roles Role Goals Tools Business Users Use data to make business decisions Viz, CRM, Excel, PowerPoint, Word, Web Search Analyst Deliver insights to the business, typically through dashboards and reports Viz, Excel, SSDP, Web Search Data Scientist Deliver insights to the business, typically through models and algorithms R, Python, SAS, SSDP Developer Build applications which leverage corporate data Python, Java, JS, SQL, REST Data Engineer Deliver and manage data pipelines ETL, SQL Curator Ensure consumers have the data they need, in the form they need it MDM, Catalog Steward Create policies and drive governance MDM, Catalog, Governance Source Owner Define and manage purpose, processes (data creation, consumption) & users (i.e., access) of the data source EDW, SQL, ERWin, LDAP, SAP ConsumersPreparersSuppliers
  • 24. Confidential – Tamr, Inc. Organization - Structure 24Appropriate model will fluctuate with scale of DataOps project work Shared Services Model Full-service development of data applications, in collaboration with business Advantages ● Centralized technical knowledge ● Centralized resourcing - one-stop shop ● Accretive experience Disadvantages ● Bandwidth contention - how to prioritize competing projects? Advisory Model Bootstraps projects with best of breed tools and approach, but does not complete them Advantages ● Centralized technical knowledge ● Minimal resourcing - experts, not implementers ● Flexibility - options to deviate from standard tools Disadvantages ● Resource burden in on each project / department - both in development and ongoing maintenance ● Limited feedback - does the advice get better after each project?
  • 25. Confidential – Tamr, Inc. DataOps in the Real World 25
  • 26. Confidential – Tamr, Inc. 26 Global Travel Assessment System (GTAS) U.S. Customs and Border Protection (CBP) developed GTAS - an open source application providing nation-states and border security entities the capacity to screen persons against a risk criteria for threat prevention, public health or a variety of other use cases. DHS selected Tamr to provide improved entity resolution capability due to its fast, scalable human guided machine learning- based approach GTAS: CBP’s Passenger Data Screening and Analysis System ● Receive and store air traveler reservation and manifest data ● Perform real time risk assessment ● Manage risk criteria and watch lists ● View high risk travelers with associated flight details and reservation information ● Query flight and travel history Human trafficker profile match Terrorist database match Recent travel in pandemic affected area Risk Criteria Passenger Manifest Travel Reservation Intergovernmental INTERPOL Exchanges
  • 27. Confidential – Tamr, Inc. DHS Case Study con’t- 4 Phases to Deployment 27 Phase 1 - Accuracy Improve accuracy of entity resolution with biographical and reservation data 4PHASESResults ● Leverage all data ● Label examples of matching/distinct ● Measure precision and recall ● Iterate & optimize Phase 2 - Automation Build data pipeline and automate data movement and system controls Phase 3 - Performance Optimize performance through a new low latency match service and reliable, robust communications Phase 4- Interoperability Ensure stability for deployment into variety of environments ● Prepare, ingest, export data ● Triggering, data exchanges and error handling ● Security and authorization ● Optimize models for risk, timing, cultural patterns, etc.ERP Consolidation ● Create low latency match capability ● Measure and iterate end- to-end latency ● Create documentation and communication channels for international support ● Installation, testing and sustainment ● Advanced feature offerings 2017 2017 2018 2019
  • 28. Confidential – Tamr, Inc. 28 DHS Case Study “When we were looking for companies, Tamr fit our bill perfectly. They were interested in the mission, they understood what we were trying to do and why it was important to international security, and they demonstrated the capacity to execute at a commercial level.” Ari Schuler, Advisor Office of the Commissioner U.S. Customs and Border Protection Recent Tamr recognition by DHS ● Science & Technology-Funded 2019 Performer Award: Crossing the Valley of Death ■ This honor is awarded to an effort that has a technical transition success story. ● DHS Silicon Valley Innovation Program: Snapshot Article ■ The article highlights the transition of Tamr technology to CBP, to be release January 2019 via Meltwater and GovDelivery
  • 29. Confidential – Tamr, Inc. Case Study: US Air Force 29 Semi Automated Aircraft Wing Flutter (vibration) Analysis Technical Outcome Business Outcome Technical Challenge Business Challenge ● Use ML to understand a large corpus (30 yrs worth) of past testing, simulations, and analyses ● Automate large portions of the process to predict aircraft "flutter” ● Users quickly interrogate decades worth of technical data via rich metadata ● Reduce engineer process time dramatically by identifying relevant antecedents and technical predictions ● Deliver on big data initiative, enable end users easy search of historic data ● Present SMEs (PhD engineers) with short list of relevant antecedents and flutter predictions ● Tamr tagged 35K files with 645K descriptive labels in 22 tag types (aircraft, stores, author, etc.) ● Automatically create recommendation based on a machine learning model built on the historical data
  • 30. Confidential – Tamr, Inc. Accelerating Subject Matter Expert Recommendations 30 Machine learning models for each discipline Discipline-specific recommendations Metadata extraction Relevant documents INPUT AUTOMATED ANALYSIS AUTOMATED OUTPUTS PRODUCTIVITY TOOLS Document browsing powered by clean, consistent metadata New Config. Request
  • 31. Confidential – Tamr, Inc. Getting Started With DataOps 31
  • 32. Confidential – Tamr, Inc. Getting started - Process 32 Agile is the key ● If not already there, choose a model that works (Scrum, SAFe) Inventory the set of available projects ● Score on availability of data vs. value of solving a problem Define high-value, data-rich project that will demand a complex solution ● Forcing function to ensure end-to-end functionality will be covered Process
  • 33. Confidential – Tamr, Inc. Getting started - Technology 33 Identify path to a modern, modular service architecture ● Create blueprint for next generation data management platform ● Revisit cloud migration strategy Inventory current tool set ● TCO / skill requirements / etc. ● Determine which should be replaced, and when this is viable Decouple monolithic processes ● Wrap components in APIs, expose as services Start building with new tech ● Choose subset of tools for proof of concepts to replace old tech Technology
  • 34. Confidential – Tamr, Inc. Getting started - Optimization 34 Inventory current team ● Identify existing key roles - data engineers and their consumers ● Find best candidates for new roles - data curators and data stewards Create cross-functional team ● Data consumers - will depend upon project ● Data Engineer(s) ● Curator ● Steward Choose your operating model ● Start with Shared Services for first project Ensure executive alignment ● CDO or equivalent Optimization
  • 35. Confidential – Tamr, Inc. What NOT to do 35 ● Avoid boil the ocean/“waterfall” (projects measured in years/quarters) ○ Build rational long term infra while delivering real analytic value along the way ● Single “Platform”: Don’t overestimate what single piece of software can do ○ Focus on thoughtfully designed ecosystem of loosely coupled best of breed tools ● Single Vendor: Don’t overestimate what single vendor can do ○ Align vendors with APIs and expectations that they MUST work together ● Don’t Underestimate effort required to make FOSS work ○ Just because Google does it doesn’t mean you can do it ● Don’t underestimate human/behavioral challenges with data ○ Most often the reason that projects fail/stall are human/behavioral
  • 36. Confidential – Tamr, Inc. Key DataOps Principles 36 OrganizationProcess Technology Agile - Quick wins + incremental value delivery Architecture - Loosely-coupled best of breed components which incorporate automation + human feedback Infrastructure - Cloud-native, scalable and elastic tooling Roles - Specialization and separation of duties Structure - Centralized expertise + knowledge capture across projects
  • 37. Confidential – Tamr, Inc. 37 New podcast series! Featured guests include: ● Nick Sinai - Deputy CTO in the Obama Administration ● Eric Iverson - Former CIO at Sony ● And more data leaders... Listen today on Spotify, Apple Podcasts and Google Podcasts! https://www.tamr.com/datamasters/ Listen today on Spotify, Apple Podcasts and Google Podcasts! https://www.tamr.com/datamasters/ New Podcast Series - DataMasters
  • 38. Confidential – Tamr, Inc. 38 Questions? Contact Michael Gormey with additional questions after the webinar: michael.gormey@tamr.com

Editor's Notes

  1. Thank you all for joining us today. We are thrilled to put on today’s webinar, DataOps at Scale, a modern framework for data management in the public sector. My name is Katie Everett, and I’m the public sector marketing manager at Tamr and will serve as today’s moderator.
  2. In the webinar, we’ll review how DataOps began, what DataOps is and the components that go into a successful framework. Additionally, we’ll give you real life examples of DataOps successfully deployed in the public sector as well as actionable steps on how you can get started. We’ll leave some time at the end for Q&A.
  3. I’d like to introduce today’s speaker who will take us through the webinar, Mark Marinelli. Mark is the Head of Product at Tamr and is a 20-year veteran of Enterprise Data Management and Analytics software. He is well versed in coaching companies through deploying DataOps at scale across multiple industries. Mark has held engineering, product management, and technology strategy roles at Lucent Technologies, Macrovision, and most recently at Lavastorm, where he was Chief Technology Officer. So, over to you Mark.
  4. Manage data from their business systems more as “exhaust” than “asset” > “significant data debt”
  5. Heavy shortage of data scientists Rush to fill the gap
  6. Companies starting filling the gaps… rapidly scooping up data talent
  7. https://www.fedscoop.com/data-scientist-hiring-margie-graves/
  8. As a Chief Data Officer begins to tackle the human/behavioral challenges, - they need to also begin establishing their next generation technical infrastructure. Having worked with dozens of Global 2000 Customers on their data/analytics initiatives at Tamr, we’ve seen some key principles that work well as companies begin to establish their next generation data infrastructure.
  9. Just as DevOps drove rapid delivery of high-quality, scalable software applications, DataOps is the path forward for data applications.
  10. Components: What There are many ways to think about the potential components of a next gen data ecosystem for the enterprise. Our friends at DataKitchen have done a good job with this post which refers to some solid work by the Eckerson group. In the interest of trying to simplify the context of what you might consider buying vs. building and which vendors you might consider, I’ve tried to lay out the primary components of a next gen enterprise data ecosystem based on the environments I’ve seen people configuring over the past 8-10 years and the tools (new and old) that are available. These components can be summarized as follows :
  11. Components: What There are many ways to think about the potential components of a next gen data ecosystem for the enterprise. Our friends at DataKitchen have done a good job with this post which refers to some solid work by the Eckerson group. In the interest of trying to simplify the context of what you might consider buying vs. building and which vendors you might consider, I’ve tried to lay out the primary components of a next gen enterprise data ecosystem based on the environments I’ve seen people configuring over the past 8-10 years and the tools (new and old) that are available. These components can be summarized as follows :
  12. Components: What There are many ways to think about the potential components of a next gen data ecosystem for the enterprise. Our friends at DataKitchen have done a good job with this post which refers to some solid work by the Eckerson group. In the interest of trying to simplify the context of what you might consider buying vs. building and which vendors you might consider, I’ve tried to lay out the primary components of a next gen enterprise data ecosystem based on the environments I’ve seen people configuring over the past 8-10 years and the tools (new and old) that are available. These components can be summarized as follows :
  13. Q: Who are the people within the agencies that you work with that tend to champion DataOps initiatives from a leadership perspective?
  14. Q: Are you advocating to eliminate the need for data scientists? Can you talk a little bit more about that? ---- Taking the human out of the loop, is ML as good as the humans? - ML runs 24/7, doesn’t need coffee or sleep Q: Can subject matter experts really train this ML model? How is that possible?
  15. Q: Do you see processes/tools in DataOps becoming open-sourced? If so, which processes do you think should be open-sourced to enable better integration of multiple vendors?
  16. Thanks Mark. We have some very exciting news to share with you all today. Tamr launched a new podcast series called DataMasters. The podcast features data leaders that share stories about their journeys and offer insights on how they made their organizations more data driven. Featured guests on the inaugural launch include: Nick Sinai - Deputy CTO in the Obama Administration and Eric Iverson - Former CIO at Sony We invite you to listen and subscribe today. The podcast will host many more data leaders across government organizations and enterprises.