0
Rethinking Data
Management
Data Sharing in Business
Ecosystems
Prof. Dr. Christine Legner
Competence Center Corporate Data Quality (CC CDQ)
cc-cdq.ch / meta.cdq.ch
1
The sharing economy is in full swing
…. only data assets are still managed in silos!
Why not start with data sharing?
Asset sharing reduces costs, improves utilization and sustainability
YardClub
2
1 Background and motivation
2 Example: Business partner data sharing in trusted networks
3 Lessons learned and outlook
Agenda
3
Competence Center Corporate Data Quality (CC CDQ)
The CC CDQ is a research consortium and expert community in data management
2006
Foundation
+35
Members
+60
CC CDQ
Workshops
14
PhD
Graduates
+1500
Contacts within
CDQ community
CC CDQ co-creates methods, tools and solutions for managing data assets
NB: Overview comprises both current and former partner companies
4
Companies face major challenges in managing their data assets
Data silos
Regulations impose
transparency and traceability,
but most companies deal with
them as one-time efforts.
Company A Company B
Data is stored and
maintained in silos –
with significant overlaps
and gaps.
Data quality
Despite awareness of data
quality issues, data
maintenance is mostly reactive,
with high manual efforts.
Data is a strategic asset in the data economy.
But: Existing data management approaches do not scale!
Compliance
5
1 Background and motivation
2 Example: Business partner data sharing in a trusted network
3 Lessons learned and outlook
Agenda
6
The CDQ Data Sharing Community is a trusted network of
companies that manage business partner data collaboratively
Exam
ple
Exam
ple
https://meta.cdq.ch
7
Shared data
knowledge
Shared data
assets
Two levels of data sharing in the CDQ Data Sharing Community
2
1
à Data validation, curation and enrichment
à Shared data maintenance (and beyond)
8
Two levels of data sharing in the CDQ Data Sharing Community
Level 1: Shared data knowledge is the foundation
Shared data
knowledge
Linking reference data:
• Open / public data
Defining shared semantics:
• Semantic data model
• Business rules
à Data validation, curation and enrichment
Ontology
& linked
data
Data quality ⬈
Accuracy
Timeliness
Consistency
Completeness
1
9
Business
Partner
Address
Country
Germany
An
organization
which has….
Has address
Has country
Is a
A physical
location
that…
Has definition
Legal
form
Has legal form
Aktien-
gesell-
schaft
Is a
Has definition
Used in
Rule legal
form
allowed
…
…
Has constraint
Rule legal
form valid
…
Has
constraint
Rule
country
valid
…
Has
constraint
Name
Has name
Data model and semantics are collaboratively defined
Data model and business rules represented in a single ontology
Exam
ple
10
Business rules and reference data are collaboratively maintained
Exam
pleA semantic wiki acts as shared repository
Comprehensive rule base,
with >1200 data quality rules
(documented in business terms
and executable)
External reference data, including
258 countries, 2224 legal forms,
>70 externally managed data sources
11
Validation services check quality of business partner records
according to business rules
Some business partners (e.g.
hospitals) do not need a legal
form (category is assigned
automatically)
Missing legal
form
Known legal form,
but not valid for
Germany
c/o information in
name
For Germany, either a
VAT number or an
“old” tax number is ok
Exam
ple
12
Data sharing platform – Level 1: shared data knowledge
Workflows and platform
How does it work?
1. You provide your data as-is,
mapping is managed in CDQ Cloud
2. You can check your own data for errors or
duplicates and compare to reference data
3. You receive a weekly DQ summary report
4. You can select which records to update,
and/or subscribe to automatic updates
5. Updates in your local systems are performed through API,
connector or your data management team
Reference
Data Pool
regular
updates
CDQ
Data Sharing Cloud
Open Data/
Public Sources
Company A
CDQ Web
Apps
regular data
quality checks
check for
updates
reports
alerts
Your Data
Mgt. Team
Emails,
Push Info
Exchange
API
weekly
report
Business Partner Data
Data Mirror
for
Company A
Exam
ple
13
Risks ⬊, trust ⬈
Maintenance efforts ⬊
Data sharing in the CDQ Data Sharing Community
From shared data knowledge to peer-based sharing of validated data
Shared data
knowledge
Linking reference data:
• Open / public data
Defining shared semantics:
• Semantic data model
• Business rules
à Data validation, curation and enrichment
Ontology
& linked
data
Data quality ⬈
Accuracy
Timeliness
Consistency
Completeness
Shared
data assets
Fraud detection, risk mgmt.
• Whitelists, trust scores
Peer-based sharing of
validated data:
• Cross-company workflows
• Integration into IT systems
à Shared data maintenance (and beyond)
Auto-
mation &
Machine
learning
2
1
14
Data sharing platform – Level 2: shared data assets
Workflows and platform
How does it work?
1. You provide your data as-is,
mapping is managed in CDQ Cloud
2. You can check your own data for errors or
duplicates and compare to reference data
3. You receive a weekly DQ summary report
4. You can select which records to update,
and/or subscribe to automatic updates
5. Updates in your local systems are performed through API,
connector or your data management team
6. Sharing with peers can be decoupled from sync‘ing with public sources;
managed in reference data pool
Reference
Data Pool
regular
updates
Open Data/
Public Sources
Company A
CDQ Web
Apps
regular data
quality checks
check for
updates
reports
alerts
Your Data
Mgt. Team
Emails,
Push Info
Exchange
API
weekly
report
Business Partner Data
Data Mirror
for
Company A
Sharing Community
CDQ
Data Sharing Cloud
Community
Data Pool
changes
15
Data sharing for detecting fraud and increasing trust
Whitelist and trust score
Companies document known fraud cases in a CDL database and use this database to check if a given bank
account was used for a known attack or if a given business partner was affected by a known attack
Whitelist approach Trust score
Exam
ple
16
Risks ⬊, trust ⬈
Maintenance efforts ⬊
Two levels of data sharing in the CDQ Data Sharing Community
From shared data knowledge to peer-based sharing of validated data
Shared data
knowledge
Linking reference data:
• Open / public data
Defining shared semantics:
• Semantic data model
• Business rules
à Data validation, curation and enrichment
Ontology
& linked
data
Data quality ⬈
Accuracy
Timeliness
Consistency
Completeness
Shared
data assets
Fraud detection, risk mgmt.
• Whitelists, trust scores
Peer-based sharing of
validated data:
• Cross-company workflows
• Integration into IT systems
à Shared data maintenance efforts (and beyond)
Auto-
mation &
Machine
learning
2
1
17
1 Background and motivation
2 Example: Business partner data sharing in trusted networks
3 Lessons learned and outlook
Agenda
18
Lessons learned from sharing data
• Does data sharing really work?
Yes, it works! –
But: it requires trusted networks of (mature) companies and collaborative efforts for
data, process and platform design.
• What are the benefits of data sharing?
Shared data knowledge à higher data quality.
Shared data assets à lower data maintenance efforts, reduced risks, trust in data.
• Are there further opportunities for data sharing?
Yes, there are many - and we are working on further data sharing ideas:
– Material (spare parts)
– Medical licenses in pharmaceutical industry
– Sustainability reporting (certificates, labels, claims)
– …
19
Food for thought and outlook
My questions to you
1. Do you know how much time and effort
you spend for managing your most
critical data assets?
2. How do you get rid of your data silos
and leverage synergies within and
outside of your company?
3. Where can you make use of external
(open) data and data sharing with
peers?
Data sharing may become reality for you
earlier than you expect
https://www.government.nl/documents/reports/2019/02/01/
dutch-vision-on-data-sharing-between-businesses
https://skywise.airbus.com/
https://www.sophiagenetics.com
20
Contact
christine.legner@unil.ch
Academic Director
Competence Center Corporate
Data Quality (CC CDQ)
Prof. Dr.
Christine Legner
CDQ Data Sharing Community
Wiki: https://meta.cdq.ch
Competence Center Corporate Data Quality (CC CDQ)
https://www.cc-cdq.ch/
Executive Education
CDQ Academy for data managers
with University of St. Gallen
https://www.cdq.ch/training
CAS Data Science & Management
with EPFL / Swiss Data Science Center
and University of Lausanne
https://execed.unil.ch/certificat-data

Rethinking Data Management - Data Sharing in Business Ecosystem

  • 1.
    0 Rethinking Data Management Data Sharingin Business Ecosystems Prof. Dr. Christine Legner Competence Center Corporate Data Quality (CC CDQ) cc-cdq.ch / meta.cdq.ch
  • 2.
    1 The sharing economyis in full swing …. only data assets are still managed in silos! Why not start with data sharing? Asset sharing reduces costs, improves utilization and sustainability YardClub
  • 3.
    2 1 Background andmotivation 2 Example: Business partner data sharing in trusted networks 3 Lessons learned and outlook Agenda
  • 4.
    3 Competence Center CorporateData Quality (CC CDQ) The CC CDQ is a research consortium and expert community in data management 2006 Foundation +35 Members +60 CC CDQ Workshops 14 PhD Graduates +1500 Contacts within CDQ community CC CDQ co-creates methods, tools and solutions for managing data assets NB: Overview comprises both current and former partner companies
  • 5.
    4 Companies face majorchallenges in managing their data assets Data silos Regulations impose transparency and traceability, but most companies deal with them as one-time efforts. Company A Company B Data is stored and maintained in silos – with significant overlaps and gaps. Data quality Despite awareness of data quality issues, data maintenance is mostly reactive, with high manual efforts. Data is a strategic asset in the data economy. But: Existing data management approaches do not scale! Compliance
  • 6.
    5 1 Background andmotivation 2 Example: Business partner data sharing in a trusted network 3 Lessons learned and outlook Agenda
  • 7.
    6 The CDQ DataSharing Community is a trusted network of companies that manage business partner data collaboratively Exam ple Exam ple https://meta.cdq.ch
  • 8.
    7 Shared data knowledge Shared data assets Twolevels of data sharing in the CDQ Data Sharing Community 2 1 à Data validation, curation and enrichment à Shared data maintenance (and beyond)
  • 9.
    8 Two levels ofdata sharing in the CDQ Data Sharing Community Level 1: Shared data knowledge is the foundation Shared data knowledge Linking reference data: • Open / public data Defining shared semantics: • Semantic data model • Business rules à Data validation, curation and enrichment Ontology & linked data Data quality ⬈ Accuracy Timeliness Consistency Completeness 1
  • 10.
    9 Business Partner Address Country Germany An organization which has…. Has address Hascountry Is a A physical location that… Has definition Legal form Has legal form Aktien- gesell- schaft Is a Has definition Used in Rule legal form allowed … … Has constraint Rule legal form valid … Has constraint Rule country valid … Has constraint Name Has name Data model and semantics are collaboratively defined Data model and business rules represented in a single ontology Exam ple
  • 11.
    10 Business rules andreference data are collaboratively maintained Exam pleA semantic wiki acts as shared repository Comprehensive rule base, with >1200 data quality rules (documented in business terms and executable) External reference data, including 258 countries, 2224 legal forms, >70 externally managed data sources
  • 12.
    11 Validation services checkquality of business partner records according to business rules Some business partners (e.g. hospitals) do not need a legal form (category is assigned automatically) Missing legal form Known legal form, but not valid for Germany c/o information in name For Germany, either a VAT number or an “old” tax number is ok Exam ple
  • 13.
    12 Data sharing platform– Level 1: shared data knowledge Workflows and platform How does it work? 1. You provide your data as-is, mapping is managed in CDQ Cloud 2. You can check your own data for errors or duplicates and compare to reference data 3. You receive a weekly DQ summary report 4. You can select which records to update, and/or subscribe to automatic updates 5. Updates in your local systems are performed through API, connector or your data management team Reference Data Pool regular updates CDQ Data Sharing Cloud Open Data/ Public Sources Company A CDQ Web Apps regular data quality checks check for updates reports alerts Your Data Mgt. Team Emails, Push Info Exchange API weekly report Business Partner Data Data Mirror for Company A Exam ple
  • 14.
    13 Risks ⬊, trust⬈ Maintenance efforts ⬊ Data sharing in the CDQ Data Sharing Community From shared data knowledge to peer-based sharing of validated data Shared data knowledge Linking reference data: • Open / public data Defining shared semantics: • Semantic data model • Business rules à Data validation, curation and enrichment Ontology & linked data Data quality ⬈ Accuracy Timeliness Consistency Completeness Shared data assets Fraud detection, risk mgmt. • Whitelists, trust scores Peer-based sharing of validated data: • Cross-company workflows • Integration into IT systems à Shared data maintenance (and beyond) Auto- mation & Machine learning 2 1
  • 15.
    14 Data sharing platform– Level 2: shared data assets Workflows and platform How does it work? 1. You provide your data as-is, mapping is managed in CDQ Cloud 2. You can check your own data for errors or duplicates and compare to reference data 3. You receive a weekly DQ summary report 4. You can select which records to update, and/or subscribe to automatic updates 5. Updates in your local systems are performed through API, connector or your data management team 6. Sharing with peers can be decoupled from sync‘ing with public sources; managed in reference data pool Reference Data Pool regular updates Open Data/ Public Sources Company A CDQ Web Apps regular data quality checks check for updates reports alerts Your Data Mgt. Team Emails, Push Info Exchange API weekly report Business Partner Data Data Mirror for Company A Sharing Community CDQ Data Sharing Cloud Community Data Pool changes
  • 16.
    15 Data sharing fordetecting fraud and increasing trust Whitelist and trust score Companies document known fraud cases in a CDL database and use this database to check if a given bank account was used for a known attack or if a given business partner was affected by a known attack Whitelist approach Trust score Exam ple
  • 17.
    16 Risks ⬊, trust⬈ Maintenance efforts ⬊ Two levels of data sharing in the CDQ Data Sharing Community From shared data knowledge to peer-based sharing of validated data Shared data knowledge Linking reference data: • Open / public data Defining shared semantics: • Semantic data model • Business rules à Data validation, curation and enrichment Ontology & linked data Data quality ⬈ Accuracy Timeliness Consistency Completeness Shared data assets Fraud detection, risk mgmt. • Whitelists, trust scores Peer-based sharing of validated data: • Cross-company workflows • Integration into IT systems à Shared data maintenance efforts (and beyond) Auto- mation & Machine learning 2 1
  • 18.
    17 1 Background andmotivation 2 Example: Business partner data sharing in trusted networks 3 Lessons learned and outlook Agenda
  • 19.
    18 Lessons learned fromsharing data • Does data sharing really work? Yes, it works! – But: it requires trusted networks of (mature) companies and collaborative efforts for data, process and platform design. • What are the benefits of data sharing? Shared data knowledge à higher data quality. Shared data assets à lower data maintenance efforts, reduced risks, trust in data. • Are there further opportunities for data sharing? Yes, there are many - and we are working on further data sharing ideas: – Material (spare parts) – Medical licenses in pharmaceutical industry – Sustainability reporting (certificates, labels, claims) – …
  • 20.
    19 Food for thoughtand outlook My questions to you 1. Do you know how much time and effort you spend for managing your most critical data assets? 2. How do you get rid of your data silos and leverage synergies within and outside of your company? 3. Where can you make use of external (open) data and data sharing with peers? Data sharing may become reality for you earlier than you expect https://www.government.nl/documents/reports/2019/02/01/ dutch-vision-on-data-sharing-between-businesses https://skywise.airbus.com/ https://www.sophiagenetics.com
  • 21.
    20 Contact christine.legner@unil.ch Academic Director Competence CenterCorporate Data Quality (CC CDQ) Prof. Dr. Christine Legner CDQ Data Sharing Community Wiki: https://meta.cdq.ch Competence Center Corporate Data Quality (CC CDQ) https://www.cc-cdq.ch/ Executive Education CDQ Academy for data managers with University of St. Gallen https://www.cdq.ch/training CAS Data Science & Management with EPFL / Swiss Data Science Center and University of Lausanne https://execed.unil.ch/certificat-data