Building a Fault-Tolerant ETL Pipeline for Claims CAFé
Internship Presentation - Summer 2018
Kim Hammar
Data Analytics Engineer
khamq@allstate.com or kimham@kth.se
August 30, 2018
Kim Hammar (DAE) Claims CAFé August 30, 2018 1 / 24
Extract
Transform
Load
Kim Hammar (DAE) Claims CAFé August 30, 2018 2 / 24
Extract
Transform
Load
Corrupt data that cannot be parsed
Inconsistent Schema
Wrong File paths
Unexpected null values
Duplicates
Code bugs
Missing values
Inconsistent data types
Target database unavailable
Permission error
Scalability problems
Kim Hammar (DAE) Claims CAFé August 30, 2018 2 / 24
Outline
1 Claims CAFé Background
2 Ensuring Data Quality
3 DEMO
4 Conclusion
5 Questions
Kim Hammar (DAE) Claims CAFé August 30, 2018 3 / 24
What is Claims CAFé?
Claims CAFé: Claims (C)entral (A)nalytical (F)il(e)
claimId policy participant persVeh ...
B005 auto John Doe URK389 Audi ...
B007 home A.Svensson PTO291 Ford ...
B003 life C.Strömbäck RNU999 Volvo ...
B004 property G.Åsbrink WEM650 Benz ...
B002 health L.Löfven KQO209 Tesla ...



Claims
A one stop shop for claims analytics.
Optimized for analytical use-cases by saving raw denormalized data in
hadoop:
Increase scalability
Makes data processing easier: No more slow and complex SQL-Joins
Kim Hammar (DAE) Claims CAFé August 30, 2018 4 / 24
What Actually Goes Into Building a Model
Data Science & Analytics
Data Engineering
Building models,
reports &
Visualizations
ETL processes,
data transformations &
data cleaning
Kim Hammar (DAE) Claims CAFé August 30, 2018 5 / 24
Now:
complex and slow
join to pull data
complex and slow
join to pull data
complex and slow
join to pull data
complex and slow
join to pull data
transform & clean data
create derived fields
transform & clean data
create derived fields
transform & clean data
create derived fields
transform & clean data
create derived fields
Predict claim automation:
Deep FNN model
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Customer report
& dashboards
Link Analysis
Fraud Detection
Customer1
Customer2
Claim1
Claim2
Claimant
Participant
3
10
10
10
4 9 4
5
10
11
1
Classify Image Vehicle Loss
Deep CNN Model






repairable = 0
total loss = 1
...
no loss = 0






Kim Hammar (DAE) Claims CAFé August 30, 2018 6 / 24
Now:
complex and slow
join to pull data
complex and slow
join to pull data
complex and slow
join to pull data
complex and slow
join to pull data
transform & clean data
create derived fields
transform & clean data
create derived fields
transform & clean data
create derived fields
transform & clean data
create derived fields
Predict claim automation:
Deep FNN model
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Customer report
& dashboards
Link Analysis
Fraud Detection
Customer1
Customer2
Claim1
Claim2
Claimant
Participant
3
10
10
10
4 9 4
5
10
11
1
Classify Image Vehicle Loss
Deep CNN Model






repairable = 0
total loss = 1
...
no loss = 0






Future:
complex and slow
join to pull data
transform & clean data
create derived fields
Claims
CAFé
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Customer1
Customer2
Claim1
Claim2
Claimant
Participant
3
10
10
10
4 9 4
5
10
11






repairable = 0
total loss = 1
...
no loss = 0






Kim Hammar (DAE) Claims CAFé August 30, 2018 6 / 24
What Goes Into Making CAFé
Coffee Plants Harvest
Processing:
pulping the cherries,
filtering out bad ones,
sorting&drying
Prepare for consumption:
roasting,
grinding,
brewing
Consume:
drink,
sell
Making Coffee
Raw Data Sources ETL
X f () Y
Processing:
fill in null values,
create derived fields,
data cleaning
Prepare for consumption:
flatten,
create different views,
save to Hive
x
f(x)
Sekante
P
f(x0)
x0
Q
x0 + ε
f(x0 + ε) − f(x0)
ε
ε
f(x0 + ε)
Consume:
build models,
generate business reports
Making Claims CAFé
Kim Hammar (DAE) Claims CAFé August 30, 2018 7 / 24
Claims CAFé Data Architecture
Data Sources ADW NextGen . . . Allstate
Canada
Claims
MDM
Claims MDM
Synchronizes and reconciles data
from various sources into a single place
Claims
CAFé
Claims CAFé
Serves the need of data scientists.
Contains data optimized for analytics
Derived fields
Data cleaning
Data restructuring
Machine Learning
x
f(x)
Sekante
P
f(x0)
x0
Q
x0 + ε
f(x0 + ε) − f(x0)
ε
ε
f(x0 + ε)
Analytics
[−4, −2) [−2, 0) [0, 2) [2, 4)
0
500
1,000
1,500
2,000
2,500
Reports
Number of
fraudulent claims
during the
third quarter: X
Kim Hammar (DAE) Claims CAFé August 30, 2018 8 / 24
What Is a Claim From A Data Pespective?
Claim
Claim Details
...
Policy Details
...
Participants Details
...
Vehicle Details
...
...
...
Figure: Some of the data that a single Claim in Claims CAFé contains.
Kim Hammar (DAE) Claims CAFé August 30, 2018 9 / 24
Data Quality Assurance
Data Quality is key to the success of Claims CAFé
Examples of data quality issues:
Null values in the wrong place
Duplicates
Missing values
Inconsistent data types
...
Data Quality Issues
Data Size
Why does my Model not work?
John Doe, Data Scientist
Kim Hammar (DAE) Claims CAFé August 30, 2018 10 / 24
Data Quality Mechanisms Motivation
Failures are the norm, they are not an exception
Optimistic error estimate: P("CAFé error") =
P("Data quality issue")∪P("Hadoop failure")∪P("Network failure")∪
P("ClaimsMDM failure") ∪ P("CAFé code bug") ≈ 1/1000 = 0.001
CAFé stretch goal: near-real-time updates, say we pull data every 15
minutes =⇒ 96 pulls per day =⇒ 672 pulls per week
failure probability per week: 0.001 × 672 = 0.672 =⇒
a failure will happen on average every other week
We want built-in mechanisms in the CAFé pipeline to detect and
deal with errors before they affect end-users: Tests!
Kim Hammar (DAE) Claims CAFé August 30, 2018 11 / 24
How to detect data quality issues? Know your data
. . . Claim 1
...
Claim 2
...
Claim 3
...
Claim 4
...
Claim 5
...
How many claims
were recorded
in Rhode Island
2016-2017?
How many
null claims
exists in
Idaho?
What is the average
number of claim par-
ticipants?
What is the average
number of covered
claimants in a claim?
What is the maximum
number of vehicles re-
lated to a claim?
Kim Hammar (DAE) Claims CAFé August 30, 2018 12 / 24
Anomaly Detection for Detecting Potential Data Issues
(1/3)
Spikes in the number of null values indicate a data issue.
06/18/18 06/18/18 06/19/18 06/19/18 06/20/18 06/20/18 06/21/18 06/21/18
Date
0
2000
4000
6000
8000
10000
12000
14000
16000
NullCounts
Indication of a data pull issue
Kim Hammar (DAE) Claims CAFé August 30, 2018 13 / 24
Anomaly Detection for Detecting Potential Data Issues
(2/3)
04.2017
05.2017
06.2017
07.2017
08.2017
09.2017
10.2017
11.2017
12.2017
01.2018
02.2018
03.2018
6,000
8,000
10,000
12,000
14,000
Date
Count
Number of claims in Rhode Island over time
Why did this drop happen?
Kim Hammar (DAE) Claims CAFé August 30, 2018 14 / 24
Anomaly Detection for Detecting Potential Data Issues
(3/3)
CoverageClaimants PersVeh ClaimsParty
0
20
40
2
1
3
1
4
39
Count
Comparing a single claim statistics with the average
Average cross entire dataset Actual value for this claim
39 Parties in this claim vs average on 3
is this valid data or a bug?
Kim Hammar (DAE) Claims CAFé August 30, 2018 15 / 24
Regression Tests Background (1/2)
What is regression testing?
Regression tests verify modifications of a program or data.
If the modification fail the tests, the program can regress back.
I just refactored the CAFé pipeline, how do I know that I didn’t
break anything?
I just pulled new data into CAFé, how do I know I did not
introduce data quality issues?
Why do we want regression tests?
It increases the confidence in making code and data changes
We can detect bugs before they bother end-users
We can avoid unnecessary work: If the tests fail we can abort early
and save time.
Kim Hammar (DAE) Claims CAFé August 30, 2018 16 / 24
Regression Tests Background (2/2)
Naive ETL Pipeline For Updating Claims CAFé:
Load & Transform
Claims
MDM
Claims
CAFé
Kim Hammar (DAE) Claims CAFé August 30, 2018 17 / 24
Regression Tests Background (2/2)
Naive ETL Pipeline For Updating Claims CAFé:
Load & Transform
Claims
MDM
Claims
CAFé
A More Robust ETL Pipeline:
Claims
MDM
Regression
Tests
Claims
CAFé
Tests failed
Email
maintainers
Missing columns:...
Null values:...
Anomalies:...
...
Generate
report
Abort ETL
Load &
Transform
Tests passed
Kim Hammar (DAE) Claims CAFé August 30, 2018 17 / 24
Regression Tests Pipeline for CAFé
Data Sources
Abort
Abort
Abort
Data paths does not exist
Invalid schema
Data paths exist
CAFé Regression Tests
Verify data paths of sources
Verify schema of sources
Verify data quality
Count null values and
compare with snapshot
Count columns and
compare with snapshot
Verdict
¬∃ duplicates ∧ null increase < x ∧ counts increase < y =⇒ build
∃ duplicates ∧ null increase < x ∧ counts increase < y =⇒ build & email
∃ duplicates ∧ null increase > x ∧ counts increase > y =⇒ abort & email
Valid schema
Email
maintainersSave test reportSave snapshot
Kim Hammar (DAE) Claims CAFé August 30, 2018 18 / 24
Regression Tests Reporting
Test Report: 26/7-2018
Passed
79.0%
Failed
21.0%
Test Results
valid_data_paths: passed, time: 7s
valid_source_schema: passed, time: 19s
no_duplicates: failed, time: 26s
claim_counts: passed, time: 41s
claim_null_counts: passed, time: 38s
...
claimCount
claimNullCount
financialsCount
financialsNullCount
summaryCount
summaryNullCountidCount
idNullCount
claimOccurrenceCount
claimOccurrenceNullCount
policyCount
policyNullCount
persVehCount
persVehNullCount
claimsPartyCount
claimsPartyNullCount
locationCount
locationNullCount
Metric
0
500
Counts
Snapshot counts
07/23/18
08/06/18
08/20/18
09/03/18
09/17/18
10/01/18
10/15/18
10/29/18
Date
0
20
40
60
FailureCount
Build failures over time
07/23/18
08/06/18
08/20/18
09/03/18
09/17/18
10/01/18
10/15/18
10/29/18
Date
0
20
40
60
80
100
Count
Claims counts over time
claimNullCount
claimCount
Kim Hammar (DAE) Claims CAFé August 30, 2018 19 / 24
Data Quality Mechanisms: Putting It All Together
Cluster
Daily
Data Pull
Build → Unit tests → Regression Tests
Build → Unit tests → Regression Tests
Abort
Merge Master Branch
Update CAFé
Push Commit
Trigger Jenkins Job
Trigger Jenkins Job
Kim Hammar (DAE) Claims CAFé August 30, 2018 20 / 24
DEMO
Kim Hammar (DAE) Claims CAFé August 30, 2018 21 / 24
Conclusion
Fault-Tolerant ETL
What
An ETL pipeline that can detect
and deal with data quality errors and bugs
Why
Avoid errors in production.
Make full use of data.
More productive analytics.
How
Know your data.
Use tests.
Integrate with CI jobs.
Lessons Learned
Setting verdicts is hard.
Without CI the tests will not be run.
Security must be in the design.
Release management is important.
Trade-offs when choosing architecture.
Kim Hammar (DAE) Claims CAFé August 30, 2018 22 / 24
Thank You!
Questions?
Kim Hammar (DAE) Claims CAFé August 30, 2018 23 / 24
Links
Claims CAFé Schema Repository1
Claims CAFé Schema Confluence Page2
Claims CAFé Data Pipeline and Tests Repository3
Claims CAFé Tests Pipeline Confluence Page4
Claims CAFé Continuous Integration Confluence Page5
Claims CAFé API Documentation Confluence Page6
1
https://github.allstate.com/d3-cafe/ClaimsCAFeSchema
2
http://conflu.allstate.com/display/DOMF2/Schema
3
https://github.allstate.com/d3-cafe/CAFe
4
http://conflu.allstate.com/display/DOMF2/Tests
5
http://conflu.allstate.com/display/DOMF2/Continuous+Integration
6
http://conflu.allstate.com/display/DOMF2/API+Documentation
Kim Hammar (DAE) Claims CAFé August 30, 2018 24 / 24

Kim Hammar - Allstate Internship Presentation Data Engineering Analytics - Claims CAFe

  • 1.
    Building a Fault-TolerantETL Pipeline for Claims CAFé Internship Presentation - Summer 2018 Kim Hammar Data Analytics Engineer khamq@allstate.com or kimham@kth.se August 30, 2018 Kim Hammar (DAE) Claims CAFé August 30, 2018 1 / 24
  • 2.
    Extract Transform Load Kim Hammar (DAE)Claims CAFé August 30, 2018 2 / 24
  • 3.
    Extract Transform Load Corrupt data thatcannot be parsed Inconsistent Schema Wrong File paths Unexpected null values Duplicates Code bugs Missing values Inconsistent data types Target database unavailable Permission error Scalability problems Kim Hammar (DAE) Claims CAFé August 30, 2018 2 / 24
  • 4.
    Outline 1 Claims CAFéBackground 2 Ensuring Data Quality 3 DEMO 4 Conclusion 5 Questions Kim Hammar (DAE) Claims CAFé August 30, 2018 3 / 24
  • 5.
    What is ClaimsCAFé? Claims CAFé: Claims (C)entral (A)nalytical (F)il(e) claimId policy participant persVeh ... B005 auto John Doe URK389 Audi ... B007 home A.Svensson PTO291 Ford ... B003 life C.Strömbäck RNU999 Volvo ... B004 property G.Åsbrink WEM650 Benz ... B002 health L.Löfven KQO209 Tesla ...    Claims A one stop shop for claims analytics. Optimized for analytical use-cases by saving raw denormalized data in hadoop: Increase scalability Makes data processing easier: No more slow and complex SQL-Joins Kim Hammar (DAE) Claims CAFé August 30, 2018 4 / 24
  • 6.
    What Actually GoesInto Building a Model Data Science & Analytics Data Engineering Building models, reports & Visualizations ETL processes, data transformations & data cleaning Kim Hammar (DAE) Claims CAFé August 30, 2018 5 / 24
  • 7.
    Now: complex and slow jointo pull data complex and slow join to pull data complex and slow join to pull data complex and slow join to pull data transform & clean data create derived fields transform & clean data create derived fields transform & clean data create derived fields transform & clean data create derived fields Predict claim automation: Deep FNN model b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Customer report & dashboards Link Analysis Fraud Detection Customer1 Customer2 Claim1 Claim2 Claimant Participant 3 10 10 10 4 9 4 5 10 11 1 Classify Image Vehicle Loss Deep CNN Model       repairable = 0 total loss = 1 ... no loss = 0       Kim Hammar (DAE) Claims CAFé August 30, 2018 6 / 24
  • 8.
    Now: complex and slow jointo pull data complex and slow join to pull data complex and slow join to pull data complex and slow join to pull data transform & clean data create derived fields transform & clean data create derived fields transform & clean data create derived fields transform & clean data create derived fields Predict claim automation: Deep FNN model b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Customer report & dashboards Link Analysis Fraud Detection Customer1 Customer2 Claim1 Claim2 Claimant Participant 3 10 10 10 4 9 4 5 10 11 1 Classify Image Vehicle Loss Deep CNN Model       repairable = 0 total loss = 1 ... no loss = 0       Future: complex and slow join to pull data transform & clean data create derived fields Claims CAFé b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Customer1 Customer2 Claim1 Claim2 Claimant Participant 3 10 10 10 4 9 4 5 10 11       repairable = 0 total loss = 1 ... no loss = 0       Kim Hammar (DAE) Claims CAFé August 30, 2018 6 / 24
  • 9.
    What Goes IntoMaking CAFé Coffee Plants Harvest Processing: pulping the cherries, filtering out bad ones, sorting&drying Prepare for consumption: roasting, grinding, brewing Consume: drink, sell Making Coffee Raw Data Sources ETL X f () Y Processing: fill in null values, create derived fields, data cleaning Prepare for consumption: flatten, create different views, save to Hive x f(x) Sekante P f(x0) x0 Q x0 + ε f(x0 + ε) − f(x0) ε ε f(x0 + ε) Consume: build models, generate business reports Making Claims CAFé Kim Hammar (DAE) Claims CAFé August 30, 2018 7 / 24
  • 10.
    Claims CAFé DataArchitecture Data Sources ADW NextGen . . . Allstate Canada Claims MDM Claims MDM Synchronizes and reconciles data from various sources into a single place Claims CAFé Claims CAFé Serves the need of data scientists. Contains data optimized for analytics Derived fields Data cleaning Data restructuring Machine Learning x f(x) Sekante P f(x0) x0 Q x0 + ε f(x0 + ε) − f(x0) ε ε f(x0 + ε) Analytics [−4, −2) [−2, 0) [0, 2) [2, 4) 0 500 1,000 1,500 2,000 2,500 Reports Number of fraudulent claims during the third quarter: X Kim Hammar (DAE) Claims CAFé August 30, 2018 8 / 24
  • 11.
    What Is aClaim From A Data Pespective? Claim Claim Details ... Policy Details ... Participants Details ... Vehicle Details ... ... ... Figure: Some of the data that a single Claim in Claims CAFé contains. Kim Hammar (DAE) Claims CAFé August 30, 2018 9 / 24
  • 12.
    Data Quality Assurance DataQuality is key to the success of Claims CAFé Examples of data quality issues: Null values in the wrong place Duplicates Missing values Inconsistent data types ... Data Quality Issues Data Size Why does my Model not work? John Doe, Data Scientist Kim Hammar (DAE) Claims CAFé August 30, 2018 10 / 24
  • 13.
    Data Quality MechanismsMotivation Failures are the norm, they are not an exception Optimistic error estimate: P("CAFé error") = P("Data quality issue")∪P("Hadoop failure")∪P("Network failure")∪ P("ClaimsMDM failure") ∪ P("CAFé code bug") ≈ 1/1000 = 0.001 CAFé stretch goal: near-real-time updates, say we pull data every 15 minutes =⇒ 96 pulls per day =⇒ 672 pulls per week failure probability per week: 0.001 × 672 = 0.672 =⇒ a failure will happen on average every other week We want built-in mechanisms in the CAFé pipeline to detect and deal with errors before they affect end-users: Tests! Kim Hammar (DAE) Claims CAFé August 30, 2018 11 / 24
  • 14.
    How to detectdata quality issues? Know your data . . . Claim 1 ... Claim 2 ... Claim 3 ... Claim 4 ... Claim 5 ... How many claims were recorded in Rhode Island 2016-2017? How many null claims exists in Idaho? What is the average number of claim par- ticipants? What is the average number of covered claimants in a claim? What is the maximum number of vehicles re- lated to a claim? Kim Hammar (DAE) Claims CAFé August 30, 2018 12 / 24
  • 15.
    Anomaly Detection forDetecting Potential Data Issues (1/3) Spikes in the number of null values indicate a data issue. 06/18/18 06/18/18 06/19/18 06/19/18 06/20/18 06/20/18 06/21/18 06/21/18 Date 0 2000 4000 6000 8000 10000 12000 14000 16000 NullCounts Indication of a data pull issue Kim Hammar (DAE) Claims CAFé August 30, 2018 13 / 24
  • 16.
    Anomaly Detection forDetecting Potential Data Issues (2/3) 04.2017 05.2017 06.2017 07.2017 08.2017 09.2017 10.2017 11.2017 12.2017 01.2018 02.2018 03.2018 6,000 8,000 10,000 12,000 14,000 Date Count Number of claims in Rhode Island over time Why did this drop happen? Kim Hammar (DAE) Claims CAFé August 30, 2018 14 / 24
  • 17.
    Anomaly Detection forDetecting Potential Data Issues (3/3) CoverageClaimants PersVeh ClaimsParty 0 20 40 2 1 3 1 4 39 Count Comparing a single claim statistics with the average Average cross entire dataset Actual value for this claim 39 Parties in this claim vs average on 3 is this valid data or a bug? Kim Hammar (DAE) Claims CAFé August 30, 2018 15 / 24
  • 18.
    Regression Tests Background(1/2) What is regression testing? Regression tests verify modifications of a program or data. If the modification fail the tests, the program can regress back. I just refactored the CAFé pipeline, how do I know that I didn’t break anything? I just pulled new data into CAFé, how do I know I did not introduce data quality issues? Why do we want regression tests? It increases the confidence in making code and data changes We can detect bugs before they bother end-users We can avoid unnecessary work: If the tests fail we can abort early and save time. Kim Hammar (DAE) Claims CAFé August 30, 2018 16 / 24
  • 19.
    Regression Tests Background(2/2) Naive ETL Pipeline For Updating Claims CAFé: Load & Transform Claims MDM Claims CAFé Kim Hammar (DAE) Claims CAFé August 30, 2018 17 / 24
  • 20.
    Regression Tests Background(2/2) Naive ETL Pipeline For Updating Claims CAFé: Load & Transform Claims MDM Claims CAFé A More Robust ETL Pipeline: Claims MDM Regression Tests Claims CAFé Tests failed Email maintainers Missing columns:... Null values:... Anomalies:... ... Generate report Abort ETL Load & Transform Tests passed Kim Hammar (DAE) Claims CAFé August 30, 2018 17 / 24
  • 21.
    Regression Tests Pipelinefor CAFé Data Sources Abort Abort Abort Data paths does not exist Invalid schema Data paths exist CAFé Regression Tests Verify data paths of sources Verify schema of sources Verify data quality Count null values and compare with snapshot Count columns and compare with snapshot Verdict ¬∃ duplicates ∧ null increase < x ∧ counts increase < y =⇒ build ∃ duplicates ∧ null increase < x ∧ counts increase < y =⇒ build & email ∃ duplicates ∧ null increase > x ∧ counts increase > y =⇒ abort & email Valid schema Email maintainersSave test reportSave snapshot Kim Hammar (DAE) Claims CAFé August 30, 2018 18 / 24
  • 22.
    Regression Tests Reporting TestReport: 26/7-2018 Passed 79.0% Failed 21.0% Test Results valid_data_paths: passed, time: 7s valid_source_schema: passed, time: 19s no_duplicates: failed, time: 26s claim_counts: passed, time: 41s claim_null_counts: passed, time: 38s ... claimCount claimNullCount financialsCount financialsNullCount summaryCount summaryNullCountidCount idNullCount claimOccurrenceCount claimOccurrenceNullCount policyCount policyNullCount persVehCount persVehNullCount claimsPartyCount claimsPartyNullCount locationCount locationNullCount Metric 0 500 Counts Snapshot counts 07/23/18 08/06/18 08/20/18 09/03/18 09/17/18 10/01/18 10/15/18 10/29/18 Date 0 20 40 60 FailureCount Build failures over time 07/23/18 08/06/18 08/20/18 09/03/18 09/17/18 10/01/18 10/15/18 10/29/18 Date 0 20 40 60 80 100 Count Claims counts over time claimNullCount claimCount Kim Hammar (DAE) Claims CAFé August 30, 2018 19 / 24
  • 23.
    Data Quality Mechanisms:Putting It All Together Cluster Daily Data Pull Build → Unit tests → Regression Tests Build → Unit tests → Regression Tests Abort Merge Master Branch Update CAFé Push Commit Trigger Jenkins Job Trigger Jenkins Job Kim Hammar (DAE) Claims CAFé August 30, 2018 20 / 24
  • 24.
    DEMO Kim Hammar (DAE)Claims CAFé August 30, 2018 21 / 24
  • 25.
    Conclusion Fault-Tolerant ETL What An ETLpipeline that can detect and deal with data quality errors and bugs Why Avoid errors in production. Make full use of data. More productive analytics. How Know your data. Use tests. Integrate with CI jobs. Lessons Learned Setting verdicts is hard. Without CI the tests will not be run. Security must be in the design. Release management is important. Trade-offs when choosing architecture. Kim Hammar (DAE) Claims CAFé August 30, 2018 22 / 24
  • 26.
    Thank You! Questions? Kim Hammar(DAE) Claims CAFé August 30, 2018 23 / 24
  • 27.
    Links Claims CAFé SchemaRepository1 Claims CAFé Schema Confluence Page2 Claims CAFé Data Pipeline and Tests Repository3 Claims CAFé Tests Pipeline Confluence Page4 Claims CAFé Continuous Integration Confluence Page5 Claims CAFé API Documentation Confluence Page6 1 https://github.allstate.com/d3-cafe/ClaimsCAFeSchema 2 http://conflu.allstate.com/display/DOMF2/Schema 3 https://github.allstate.com/d3-cafe/CAFe 4 http://conflu.allstate.com/display/DOMF2/Tests 5 http://conflu.allstate.com/display/DOMF2/Continuous+Integration 6 http://conflu.allstate.com/display/DOMF2/API+Documentation Kim Hammar (DAE) Claims CAFé August 30, 2018 24 / 24