SlideShare a Scribd company logo
1 of 25
Download to read offline
Are you ready for Open
Metadata and the CDLA?
@twitterhandle
2
JEFFREY BOREK
WW Program Director, Open
Technology & IP Mgt.
IBM Cognitive Applications
jborek@us.ibm.com
@jeffborek
MANDY CHESSELL
CBE FREng CEng FBCS,
IBM Distinguished Engineer
ODPi Egeria leader
mandy_chessell@uk.ibm.com
@MandyChessell
Explosion of data over the last decade
”If Data is the new oil, should you keep it for yourself or share it?”
• Open source and big data have grown
exponentially over the last decade – now
seeing tightly coupled use of data and open
source in the same projects, especially in AI
• Data sets are widely available for research
and non-commercial use
• Enterprise friendly data sets are not as
common
“Data is replacing concrete as the
foundation of 21st century
transportation, and knitting this
increasingly complex array of public
and private data sources together
requires new approaches to data
licensing and data governance,
The CDLA provides a critical new
tool to facilitate collaboration and
data sharing between government
and private sector innovators.”
- Kevin Webb, Executive Director,
Open Transport Partnership.
Advancements are driving interest in “Open Data”
Interest in sharing data has grown significantly due to machine learning, AI,
blockchain and expansion of open geolocation solutions
Establishing the ability to analyze and “mine” open data is key to its value
Governments, companies and organizations are interested in sharing data
“just like we share source code”
• They’re looking to open source principles for how they may be applied
to data problems
• Open source development is viewed as an ideal model for
collaborating on datasets
• Open data is a still a new area, without many established licensing
models
Connected civil infrastructure and private systems are starting to intersect
(e.g. infrastructure-to-vehicle systems) with data created and shared
Community Data License Agreement: Summary
The Community Data License Agreement (“CDLA”) family is a set of two model agreements,
introduced and sponsored by the Linux Foundation. IBM was an early originator of concepts
• Announced on October 23, 2017
• Modeled on the structure of leading open source agreements
• Designed for use by independent data communities looking for “open” licenses that
reflect the nuances of data licensing (as opposed to software or copyright licensing)
CDLA promotes free exchange of data
• Permits data to be freely used, modified, and republished
• Explicit permissions to create separate works and analyses of licensed data
• Authorship and source attribution statements must be preserved
• No use restrictions permitted
• Broader license coverage than mere copyright
• Characteristic open-source style warranty disclaimers to minimize the legal exposure
of publishing data
Open Data at IBM: OSPO & Data Asset Exchange (DAX)
Benefit through data sharing with the Community Data License
Agreement (CDLA)
Enable individuals and organizations of all types to share data as
easily as they currently share open source software code
Our first contribution under the CDLA
At the Neural Information Processing Systems (NeurIPS 2018)
conference held December 3-8th in Montreal, IBM Research
presented "Learning beyond simulated physics" during the
Spatiotemporal Workshop
In July, IBM announces Data Asset eXchange (DAX) to help
developers use free and open data and AI
• Online hub for developers and data scientists to find carefully
curated free and open datasets under open data licenses
IBM extending our existing open source process model
Tiered approvals based on additional dimensions specific to data content
CDLA Sharing
• You can redistribute,
publish, or modify licensed
data, as long as it remains
under the CDLA Sharing
license. Copyleft-style
license
CDLA Permissive
• You can redistribute,
publish, and modify licensed
data under any license,
including a commercial
license.
Creative Commons Non-commercial variants
CDLA Sharing
Open Commons Open Database License (ODC-ODbL)
Creative Commons (CC-BY, CC-BY-SA, CC-BY-ND)
Open Data Commons Public Domain Dedication and
License (PDDL)
CDLA Permissive
Open Data Commons Attribution (ODC-BY)
AdditionalapprovalsEasier
A hybrid multi-cloud world
Data Lake
Mobile
Apps
Databases
ApplicationsFiles
Independent
metadata
Repository
Linked
metadata
Repositories
Business Partners
Sharing data
IoT devices and
systems
Applications
New applications
deployed to cloud
Creating the protected shell
• Knowledge and action (to scale)
– Metadata captures the knowledge
– Metadata driven technologies provide the action
• Automation is needed to keep costs down and
to ensure action is complete and consistent
• Business agility is critical
– Flexibility to change actions
– Enable the adoption of new technologies and
techniques as appropriate.
9
Using a metadata repository to describe data
10
Metadata
Repository
Today’s reality – organizations buy lots of tools
11
The value of open, standardized metadata
12
ODPi Egeria enables the exchange of metadata
between tools from different vendors
13
Open and
Unified Metadata
Development DevOps Data Science
Open metadata ecosystem
Data Lake
Mobile
Apps
Databases
ApplicationsFiles
Independent
metadata
Repository
Linked
metadata
Repositories
Business Partners
Sharing data
IoT devices and
systems
Applications
New applications
deployed to cloud
Scale-out, scale down
15
OMAG
Server
Platform
OMAG
Server
Platform
OMAG
Server
Platform
OMAG
Server
Platform
Egeria Server 1
Egeria Server 2
Egeria Server 3
Kubernetes
OMAG Server
Platform
Egeria Server 1
Egeria Server 2
Egeria Server 3
Multi-tenant
OMAG Server
Platform
Egeria Server 1
Edge
Automating governance example
16
IBM
Information
Governance
Catalog
Apache Atlas
Apache Ranger
Gaian
Define
Policies
Hadoop
Metadata
Manage Data Access
Egeria Cohort
(Open metadata exchange and federated queries)
Access
Data
Egeria
Open
Governance APIs
configure
configure
Using ODPi Egeria …
• Eases the cost of metadata integration
through
– Comprehensive standards and libraries.
– Active vendor recruitment program.
• Provides direct support to many
governance roles, filling the gaps
between function offered through
commercial tools.
• Provides best practices and content
packs to accelerate an organization’s
journey to becoming data driven.
17
The ODPi foundation is part of
the Linux Foundation
• Delivering core
technology
• Recruiting vendors
• Assisting practitioners
18
Vendors
Practitioners
Core
Technology
Conformance
Suite
Best Practices
Project
Egeria
Project
Data
Governance
Policy
Taking the next step with ODPi Egeria
• Kubernetes/Docker
deployment
• Hands on labs
• Coding samples
• Guidance on
governance
PolicyPolicy
PolicyOperations
Development
Metadata
Remediate
Deploy
Develop
Classify
Rollout
M
onitorAudit
Define
Design
Detect
Execute
Governance Teams
Enterprise Architecture
Developers
Asset Owners
Business Users
Data Scientists
DevOps
Stewards
Time for questions?
z
zz
zz
z
z
Questions?
For more information on Open Data initiatives
CDLA agreements and FAQs
https://CDLA.io
ODPi Egeria Home Page
https://egeria.odpi.org/
Project Egeria on GitHub
https://github.com/odpi/egeria
Do we need another agreement for data?
There are current agreements but none has gained traction for a variety of reasons.
• There is a clear consensus that open source licenses useful for software are not appropriate for
data. For example, software license agreements typically do not address text and data mining
(“TDM”) rights
• The Creative Commons licenses have been used – in particular CC0 – but many think that a
license specific to data would be preferable.
• Use restrictions abound and licenses too often enable adding them
• Many companies hesitate to publish or use data without clear license terms for their use
• Copyright law is not entirely uniform across national boundaries, particularly with respect to “fair
use” rights
The CDLA is offered to avoid a period of license proliferation and aggregations of valuable data under
licenses that prevent combinations necessary to optimally analyze the data over time.
• Tenure of data value is much longer than software – and potentially perpetual.
• Many data collections (e.g., temperature readings) grow over time and consistent rights to
analyze them (irrespective of changes in the law) is of increasing value over time
Data is not the same as source code
Traditional open source agreements are primarily based on copyright law
• Software source and object code are works of authorship protected by copyright
• Some agreements also contain express or implied licenses to patents that may apply to licensed code
Data is different from code
• In the US and elsewhere, data itself is often not protectable by copyright
(see Feist Publications, Inc., v. Rural Telephone Service Co.)1
• Only the creative expression of the data is protectable by copyright; Facts are not
• Patents are typically not applicable to data itself
• Much of the value of open data (unlike code) is in the ability to study and analyze it, but traditional
open source licenses do not explicitly provide for these rights
In order to protect data usage rights, a data license needs to invoke a broader spectrum of rights to
make up for the incomplete coverage of copyright law
• Some data provider organizations are trying any means available to lock down access to data,
sometimes with direct or ambiguous terms around usage rights
• The CDLA contains explicit terms to prevent access to licensed data from being restricted
1Available at: http://caselaw.findlaw.com/us-supreme-court/499/340.html
Community Data License Agreement: Key Terms
Two versions of the CDLA:
• Sharing Version: All data re-published must be licensed under the terms of the CDLA
Sharing License
• Permissive Version: Data may be republished under any terms (including commercial
terms) not inconsistent with the terms of the CDLA Permissive License
Both versions of the CDLA license provide for the following:
• The right to make “Computational Use” of data
• “Computational Use” is broadly defined as enabling the right to analyze data and
create analytical works based upon it (i.e., a TDM right)
• Analytical works do not need to be relicensed under the terms of the CDLA licenses
• Analytical works cannot contain more that a de minimis amount of the data itself
• No prohibition on commercial use of data
• Minimal reps and no warranties
• Broad Limitation of Liability

More Related Content

What's hot

Doc Book Vs Dita Teresa
Doc Book Vs Dita TeresaDoc Book Vs Dita Teresa
Doc Book Vs Dita Teresa
day
 
E Discovery Cloud
E Discovery CloudE Discovery Cloud
E Discovery Cloud
gjohansen
 
Amcto presentation final
Amcto presentation finalAmcto presentation final
Amcto presentation final
Dan Michaluk
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous Events
Edward Curry
 

What's hot (20)

Dr H K Kaul
Dr H K KaulDr H K Kaul
Dr H K Kaul
 
Doc Book Vs Dita Teresa
Doc Book Vs Dita TeresaDoc Book Vs Dita Teresa
Doc Book Vs Dita Teresa
 
i4Trust Info-Sessions - Edition 1
i4Trust Info-Sessions - Edition 1i4Trust Info-Sessions - Edition 1
i4Trust Info-Sessions - Edition 1
 
Building Optimisation using Scenario Modeling and Linked Data
Building Optimisation using Scenario Modeling and Linked DataBuilding Optimisation using Scenario Modeling and Linked Data
Building Optimisation using Scenario Modeling and Linked Data
 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Eu gdpr technical workflow and productionalization   neccessary w privacy ass...Eu gdpr technical workflow and productionalization   neccessary w privacy ass...
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
 
Introduction to Data Licences
Introduction to Data LicencesIntroduction to Data Licences
Introduction to Data Licences
 
Daisy: CMS or Wiki?
Daisy: CMS or Wiki?Daisy: CMS or Wiki?
Daisy: CMS or Wiki?
 
An Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing ConsumersAn Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing Consumers
 
Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...Transforming the European Data Economy: A Strategic Research and Innovation A...
Transforming the European Data Economy: A Strategic Research and Innovation A...
 
E Discovery Cloud
E Discovery CloudE Discovery Cloud
E Discovery Cloud
 
Cloud computing kp final
Cloud computing kp finalCloud computing kp final
Cloud computing kp final
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Linked Building (Energy) Data
Linked Building (Energy) DataLinked Building (Energy) Data
Linked Building (Energy) Data
 
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
Towards Lightweight Cyber-Physical Energy Systems using Linked Data, the Web ...
 
Eclipse IoT Summit 2016: In The Age of IoT Think Data-Centric
Eclipse IoT Summit 2016: In The Age of IoT Think Data-CentricEclipse IoT Summit 2016: In The Age of IoT Think Data-Centric
Eclipse IoT Summit 2016: In The Age of IoT Think Data-Centric
 
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...
Bangalore Executive Seminar 2015: Case Study - Text Analysis on MongoDB for a...
 
Linked Water Data For Water Information Management
Linked Water Data For Water Information ManagementLinked Water Data For Water Information Management
Linked Water Data For Water Information Management
 
Virtual & Remote Practice: Reach from the Beach & Manage from the Mountains
Virtual & Remote Practice: Reach from the Beach & Manage from the MountainsVirtual & Remote Practice: Reach from the Beach & Manage from the Mountains
Virtual & Remote Practice: Reach from the Beach & Manage from the Mountains
 
Amcto presentation final
Amcto presentation finalAmcto presentation final
Amcto presentation final
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous Events
 

Similar to Open metadataos summit_28oct2019vfinal

The Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance LandscapeThe Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance Landscape
Trillium Software
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source Brief
DataTactics
 
ICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
ICIC 2013 Conference Proceedings Kim Zwollo Rights DirectICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
ICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
Dr. Haxel Consult
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 

Similar to Open metadataos summit_28oct2019vfinal (20)

Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
Finding Your Ideal Data Architecture: Data Fabric, Data Mesh or Both?
 
Data Residency: Challenges and the Need for Standards
Data Residency: Challenges and the Need for StandardsData Residency: Challenges and the Need for Standards
Data Residency: Challenges and the Need for Standards
 
The Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance LandscapeThe Changing Data Quality & Data Governance Landscape
The Changing Data Quality & Data Governance Landscape
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Happily Married or Warring Factions? Open Source and Standards
Happily Married or Warring Factions? Open Source and StandardsHappily Married or Warring Factions? Open Source and Standards
Happily Married or Warring Factions? Open Source and Standards
 
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
Your Data is Waiting. What are the Top 5 Trends for Data in 2022? (ASEAN)
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source Brief
 
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshopBelgium & Luxembourg dedicated online Data Virtualization discovery workshop
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
 
ICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
ICIC 2013 Conference Proceedings Kim Zwollo Rights DirectICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
ICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
 
Open source 101
Open source 101Open source 101
Open source 101
 
Establishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry developmentEstablishing data sharing standards to promote global industry development
Establishing data sharing standards to promote global industry development
 
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualizationMyth Busters VII: I’m building a data mesh, so I don’t need data virtualization
Myth Busters VII: I’m building a data mesh, so I don’t need data virtualization
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Open Source in Storage -- Myth or Reality?
Open Source in Storage -- Myth or Reality?Open Source in Storage -- Myth or Reality?
Open Source in Storage -- Myth or Reality?
 
Translating Open Source Value to the Cloud
Translating Open Source Value to the CloudTranslating Open Source Value to the Cloud
Translating Open Source Value to the Cloud
 
Open Source License Compliance in the Cloud (CELESQ) (October 2012)
Open Source License Compliance in the Cloud (CELESQ) (October 2012)Open Source License Compliance in the Cloud (CELESQ) (October 2012)
Open Source License Compliance in the Cloud (CELESQ) (October 2012)
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
Back to school: Big Data IDEA 101
Back to school: Big Data IDEA 101Back to school: Big Data IDEA 101
Back to school: Big Data IDEA 101
 
II-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in NiceII-SDV 2015, 20 - 21 April, in Nice
II-SDV 2015, 20 - 21 April, in Nice
 

Recently uploaded

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 

Open metadataos summit_28oct2019vfinal

  • 1. Are you ready for Open Metadata and the CDLA? @twitterhandle
  • 2. 2 JEFFREY BOREK WW Program Director, Open Technology & IP Mgt. IBM Cognitive Applications jborek@us.ibm.com @jeffborek MANDY CHESSELL CBE FREng CEng FBCS, IBM Distinguished Engineer ODPi Egeria leader mandy_chessell@uk.ibm.com @MandyChessell
  • 3. Explosion of data over the last decade ”If Data is the new oil, should you keep it for yourself or share it?” • Open source and big data have grown exponentially over the last decade – now seeing tightly coupled use of data and open source in the same projects, especially in AI • Data sets are widely available for research and non-commercial use • Enterprise friendly data sets are not as common “Data is replacing concrete as the foundation of 21st century transportation, and knitting this increasingly complex array of public and private data sources together requires new approaches to data licensing and data governance, The CDLA provides a critical new tool to facilitate collaboration and data sharing between government and private sector innovators.” - Kevin Webb, Executive Director, Open Transport Partnership.
  • 4. Advancements are driving interest in “Open Data” Interest in sharing data has grown significantly due to machine learning, AI, blockchain and expansion of open geolocation solutions Establishing the ability to analyze and “mine” open data is key to its value Governments, companies and organizations are interested in sharing data “just like we share source code” • They’re looking to open source principles for how they may be applied to data problems • Open source development is viewed as an ideal model for collaborating on datasets • Open data is a still a new area, without many established licensing models Connected civil infrastructure and private systems are starting to intersect (e.g. infrastructure-to-vehicle systems) with data created and shared
  • 5. Community Data License Agreement: Summary The Community Data License Agreement (“CDLA”) family is a set of two model agreements, introduced and sponsored by the Linux Foundation. IBM was an early originator of concepts • Announced on October 23, 2017 • Modeled on the structure of leading open source agreements • Designed for use by independent data communities looking for “open” licenses that reflect the nuances of data licensing (as opposed to software or copyright licensing) CDLA promotes free exchange of data • Permits data to be freely used, modified, and republished • Explicit permissions to create separate works and analyses of licensed data • Authorship and source attribution statements must be preserved • No use restrictions permitted • Broader license coverage than mere copyright • Characteristic open-source style warranty disclaimers to minimize the legal exposure of publishing data
  • 6. Open Data at IBM: OSPO & Data Asset Exchange (DAX) Benefit through data sharing with the Community Data License Agreement (CDLA) Enable individuals and organizations of all types to share data as easily as they currently share open source software code Our first contribution under the CDLA At the Neural Information Processing Systems (NeurIPS 2018) conference held December 3-8th in Montreal, IBM Research presented "Learning beyond simulated physics" during the Spatiotemporal Workshop In July, IBM announces Data Asset eXchange (DAX) to help developers use free and open data and AI • Online hub for developers and data scientists to find carefully curated free and open datasets under open data licenses
  • 7. IBM extending our existing open source process model Tiered approvals based on additional dimensions specific to data content CDLA Sharing • You can redistribute, publish, or modify licensed data, as long as it remains under the CDLA Sharing license. Copyleft-style license CDLA Permissive • You can redistribute, publish, and modify licensed data under any license, including a commercial license. Creative Commons Non-commercial variants CDLA Sharing Open Commons Open Database License (ODC-ODbL) Creative Commons (CC-BY, CC-BY-SA, CC-BY-ND) Open Data Commons Public Domain Dedication and License (PDDL) CDLA Permissive Open Data Commons Attribution (ODC-BY) AdditionalapprovalsEasier
  • 8. A hybrid multi-cloud world Data Lake Mobile Apps Databases ApplicationsFiles Independent metadata Repository Linked metadata Repositories Business Partners Sharing data IoT devices and systems Applications New applications deployed to cloud
  • 9. Creating the protected shell • Knowledge and action (to scale) – Metadata captures the knowledge – Metadata driven technologies provide the action • Automation is needed to keep costs down and to ensure action is complete and consistent • Business agility is critical – Flexibility to change actions – Enable the adoption of new technologies and techniques as appropriate. 9
  • 10. Using a metadata repository to describe data 10 Metadata Repository
  • 11. Today’s reality – organizations buy lots of tools 11
  • 12. The value of open, standardized metadata 12
  • 13. ODPi Egeria enables the exchange of metadata between tools from different vendors 13 Open and Unified Metadata Development DevOps Data Science
  • 14. Open metadata ecosystem Data Lake Mobile Apps Databases ApplicationsFiles Independent metadata Repository Linked metadata Repositories Business Partners Sharing data IoT devices and systems Applications New applications deployed to cloud
  • 15. Scale-out, scale down 15 OMAG Server Platform OMAG Server Platform OMAG Server Platform OMAG Server Platform Egeria Server 1 Egeria Server 2 Egeria Server 3 Kubernetes OMAG Server Platform Egeria Server 1 Egeria Server 2 Egeria Server 3 Multi-tenant OMAG Server Platform Egeria Server 1 Edge
  • 16. Automating governance example 16 IBM Information Governance Catalog Apache Atlas Apache Ranger Gaian Define Policies Hadoop Metadata Manage Data Access Egeria Cohort (Open metadata exchange and federated queries) Access Data Egeria Open Governance APIs configure configure
  • 17. Using ODPi Egeria … • Eases the cost of metadata integration through – Comprehensive standards and libraries. – Active vendor recruitment program. • Provides direct support to many governance roles, filling the gaps between function offered through commercial tools. • Provides best practices and content packs to accelerate an organization’s journey to becoming data driven. 17
  • 18. The ODPi foundation is part of the Linux Foundation • Delivering core technology • Recruiting vendors • Assisting practitioners 18 Vendors Practitioners Core Technology Conformance Suite Best Practices Project Egeria Project Data Governance
  • 19. Policy Taking the next step with ODPi Egeria • Kubernetes/Docker deployment • Hands on labs • Coding samples • Guidance on governance PolicyPolicy PolicyOperations Development Metadata Remediate Deploy Develop Classify Rollout M onitorAudit Define Design Detect Execute Governance Teams Enterprise Architecture Developers Asset Owners Business Users Data Scientists DevOps Stewards
  • 21. For more information on Open Data initiatives CDLA agreements and FAQs https://CDLA.io ODPi Egeria Home Page https://egeria.odpi.org/ Project Egeria on GitHub https://github.com/odpi/egeria
  • 22.
  • 23. Do we need another agreement for data? There are current agreements but none has gained traction for a variety of reasons. • There is a clear consensus that open source licenses useful for software are not appropriate for data. For example, software license agreements typically do not address text and data mining (“TDM”) rights • The Creative Commons licenses have been used – in particular CC0 – but many think that a license specific to data would be preferable. • Use restrictions abound and licenses too often enable adding them • Many companies hesitate to publish or use data without clear license terms for their use • Copyright law is not entirely uniform across national boundaries, particularly with respect to “fair use” rights The CDLA is offered to avoid a period of license proliferation and aggregations of valuable data under licenses that prevent combinations necessary to optimally analyze the data over time. • Tenure of data value is much longer than software – and potentially perpetual. • Many data collections (e.g., temperature readings) grow over time and consistent rights to analyze them (irrespective of changes in the law) is of increasing value over time
  • 24. Data is not the same as source code Traditional open source agreements are primarily based on copyright law • Software source and object code are works of authorship protected by copyright • Some agreements also contain express or implied licenses to patents that may apply to licensed code Data is different from code • In the US and elsewhere, data itself is often not protectable by copyright (see Feist Publications, Inc., v. Rural Telephone Service Co.)1 • Only the creative expression of the data is protectable by copyright; Facts are not • Patents are typically not applicable to data itself • Much of the value of open data (unlike code) is in the ability to study and analyze it, but traditional open source licenses do not explicitly provide for these rights In order to protect data usage rights, a data license needs to invoke a broader spectrum of rights to make up for the incomplete coverage of copyright law • Some data provider organizations are trying any means available to lock down access to data, sometimes with direct or ambiguous terms around usage rights • The CDLA contains explicit terms to prevent access to licensed data from being restricted 1Available at: http://caselaw.findlaw.com/us-supreme-court/499/340.html
  • 25. Community Data License Agreement: Key Terms Two versions of the CDLA: • Sharing Version: All data re-published must be licensed under the terms of the CDLA Sharing License • Permissive Version: Data may be republished under any terms (including commercial terms) not inconsistent with the terms of the CDLA Permissive License Both versions of the CDLA license provide for the following: • The right to make “Computational Use” of data • “Computational Use” is broadly defined as enabling the right to analyze data and create analytical works based upon it (i.e., a TDM right) • Analytical works do not need to be relicensed under the terms of the CDLA licenses • Analytical works cannot contain more that a de minimis amount of the data itself • No prohibition on commercial use of data • Minimal reps and no warranties • Broad Limitation of Liability