UKSG Conference                   April 2013Phil Nicolson
Data Governance What is Data Governance What is Data Quality The challenges Data governance programme A publisher app...
Data governance“I think that the key issue here, is that theinformation is probably incorrect, inaccurate and in aform tha...
Data Governance – a definition Data governance is defined as the processes, policies, standards, organisation, and techno...
Data Quality - definitions Data are of high quality "if they are fit for their intended uses  in operations, decision mak...
Data Quality Data quality attributes:   Accurate   Reliable   Complete   Appropriate   Timely   Credible   Up-to-d...
The challenge: Data Sources Multiple data sources – ‘system’ data silos Multiple locations – ‘geographic’ data silos Da...
The challenge: Data SourcesTypical publisher systems:   Data can be entered by:    Financial system         Organisation...
The challenge: Institutions UCL:         University College London (UK)         Université Catholique de Louvain (Belgi...
The challenge: IndividualsHow can we uniquely identify individuals? Of the 700,000individuals known to the RSC in 2012 the...
Consequences of poor data
Biggest obstacle(s) to data qualityimprovement in your organization?Lack of accountability and responsibility for data qua...
Data Governance – why it is vital            “processes, policies, standards… ensure quality and consistency” Increase co...
Data governance – a new culture
Data governance programme
Plan & prioritise Sponsorship: director level sponsor? Program management: business or IT driven? Organisational struct...
Plan & prioritise Resources: dedicated staff? Funding: which area of the business will fund the program? Business drive...
Audit & Analyse Audit existing data quality Review all relevant systems How poor is it?   Incomplete data   Invalid  ...
Clean existing data Prioritise Quick wins Highlight progress What can be automated? Introduce unique identifiers
Identifiers available People                           Organisations   International Standard Name      International ...
ISNIISNI is designed      ISNI Number          ISNI Numberto be a “bridgeidentifier”                       Party ID 1     ...
Author IDs ORCID is designed to persistently identify and disambiguate  scholarly researchers and attach them to research...
Use cases Disambiguation of researchers  and connection to all their  research Links to  contributors, editors, compiler...
Institutional IDs Ringgold is an ISNI Registration Agency Unique institutional ID number maps data across systems ISNI ...
Minimising the impact of data silos Standard identifiers (both individual and institution) can be  used to breakdown silo...
Improve data capture Data quality policy Web forms Closer collaboration with 3rd parties to encourage use of  industry ...
Data capture - data quality policy Design to ensure accuracy, quality and consistency Individual responsibilities:    A...
Improve data capture – web forms Required fields Validation Address validation – postcode lookup Institution validatio...
On-going monitoring Dashboards Regular audits Metrics – Institutional  Linking Rate Staff awareness Reporting of errors
A publisher example Develop a Data Governance Programme   Data ‘champion’   Engagement – at all levels   Ownership – a...
A publisher example Ringgold and DataSalon client   All institutional records contain Ringgold Identifiers   System lin...
Author database1.       Create a data governance dashboard to         monitor problem areas:     •      Book authors with ...
Author database3.       Ensure new records are created correctly     •      Raise staff understanding of the importance of...
Author database – results                    100.00%   10% will never link:                              • Missing data (o...
ICEDIS The international standards organization EDItEUR is working to    encourage improvements in the ways that "party" ...
Summary Your data is a very valuable asset when managed correctly Establishing a data governance programme will enable y...
Phil NicolsonData ManagerRinggold Inc.phil.nicolson@ringgold.com
Upcoming SlideShare
Loading in...5
×

Rubbish in Rubbish out: applying good data governance techniques to gain maximum benefit from publisher data

2,768

Published on

By Phil Nicholson.
Presented at UKSG, April 2013.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,768
On Slideshare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Smith: 1,418Jones: 982Li: 9,500+RSC 700,000 individuals
  • Data amnesty
  • Quick wins – something as simple as standardising country names
  • DUNS:MDR:
  • RSC - ScholarOne
  • C Able example3rd party fulfilment house
  • Rubbish in Rubbish out: applying good data governance techniques to gain maximum benefit from publisher data

    1. 1. UKSG Conference April 2013Phil Nicolson
    2. 2. Data Governance What is Data Governance What is Data Quality The challenges Data governance programme A publisher approach The outcome: Book author example ICEDIS Summary
    3. 3. Data governance“I think that the key issue here, is that theinformation is probably incorrect, inaccurate and in aform that almost certainly shouldnt have been used” Dr John Thomson cardiologist at Leeds General Infirmary, Sky News 30/3/2013
    4. 4. Data Governance – a definition Data governance is defined as the processes, policies, standards, organisation, and technologies required to manage and ensure the availability, accessibility, quality, consistency, auditability, and security of data
    5. 5. Data Quality - definitions Data are of high quality "if they are fit for their intended uses in operations, decision making and planning" Data are deemed of high quality if they correctly represent the real-world construct to which they refer
    6. 6. Data Quality Data quality attributes:  Accurate  Reliable  Complete  Appropriate  Timely  Credible  Up-to-date
    7. 7. The challenge: Data Sources Multiple data sources – ‘system’ data silos Multiple locations – ‘geographic’ data silos Data entered through multiple channels Data entered by different people
    8. 8. The challenge: Data SourcesTypical publisher systems: Data can be entered by:  Financial system  Organisation staff  CRM/Sales database  Authors  Authentication system  Society members  Fulfilment  Agents in the supply chain  Usage statistics  3rd party organisations  Submissions system  …..  Author database  …..
    9. 9. The challenge: Institutions UCL:  University College London (UK)  Université Catholique de Louvain (Belgium)  Universidad Cristiana Latinoamericana (Ecuador)  University College Lillebælt (Denmark)  Centro Universitario Celso Lisboa (Brazil)  Union County Library (USA) NPL:  National Physical Laboratory (UK)  National Physical Laboratory (India) York Uni.  University of York (UK)  York University (Canada) Northeastern University:  Northeastern University (Boston, USA)  Northeastern University (Shenyang, China)
    10. 10. The challenge: IndividualsHow can we uniquely identify individuals? Of the 700,000individuals known to the RSC in 2012 there were: Smith:  ~1,500 Jones:  ~1,000 Li:  >10,000
    11. 11. Consequences of poor data
    12. 12. Biggest obstacle(s) to data qualityimprovement in your organization?Lack of accountability and responsibility for data quality 55.4%Too many information silos 51.8%Lack of awareness or communication of the magnitude of data quality problems 51.4%Lack of common understanding of what data quality means 50.2%Lack of awareness or communication of the opportunities associated with high quality data 45.0%Lack of senior leadership in tackling data quality issues 44.2%Lack of data quality policies, plans, and procedures 42.2%Perception that data quality is an IT issue only rather than an organisation wide issue 41.8% The State of Information and Data Quality 2012 Industry Survey& Report, (IAIDQ) Understanding how Organizations Manage the Quality of their Information and Data Assets. Pierce, Yonke, Malik, Nagaraj
    13. 13. Data Governance – why it is vital “processes, policies, standards… ensure quality and consistency” Increase consistency and confidence in our decision making Maximise the income generation potential of our data Provide excellent customer service Designating accountability for information quality Minimising or eliminating re-work Optimise staff effectiveness Decreasing the risk of regulatory fines Improving data security Data is one of the most valuable assets within an organisation
    14. 14. Data governance – a new culture
    15. 15. Data governance programme
    16. 16. Plan & prioritise Sponsorship: director level sponsor? Program management: business or IT driven? Organisational structure: local, national, international? Scope: focus on the most important data? Ownership: who are the business owners of critical data? New system implementation: protect investment
    17. 17. Plan & prioritise Resources: dedicated staff? Funding: which area of the business will fund the program? Business drivers: what are the major business drivers? Barriers: what are the main barriers (cultural, funding, resources, priorities etc.) and can they be mitigated
    18. 18. Audit & Analyse Audit existing data quality Review all relevant systems How poor is it?  Incomplete data  Invalid  Out of date  ….
    19. 19. Clean existing data Prioritise Quick wins Highlight progress What can be automated? Introduce unique identifiers
    20. 20. Identifiers available People  Organisations  International Standard Name  International Standard Name Identifier (ISNI) Identifier (ISNI)  Open Researcher and  Ringgold ID Contributor ID (ORCID)  DUNS Number (D&B) and  Scopus Author Identifier other business and finance  ResearcherID IDs  MDR PID Numbers and other marketing IDs  Library of Congress MARC Code List for Organizations
    21. 21. ISNIISNI is designed ISNI Number ISNI Numberto be a “bridgeidentifier” Party ID 1 Party ID 2 Proprietary Proprietary Information and/or Information and/or Metadata Metadata
    22. 22. Author IDs ORCID is designed to persistently identify and disambiguate scholarly researchers and attach them to research output ORCID identifiers utilize a format compliant with the ISNI ISO standard ISNI has reserved a block of identifiers for use by ORCID, so there will be no overlaps in assignments Recorded as http://orcid.org/0000-0001-2345-6789http://about.orcid.org/http://www.isni.org/
    23. 23. Use cases Disambiguation of researchers and connection to all their research Links to contributors, editors, compiler s and others involved in the research process Embed IDs into research workflows and the supply chain Integrate systems
    24. 24. Institutional IDs Ringgold is an ISNI Registration Agency Unique institutional ID number maps data across systems ISNI numbers should be used across the scholarly supply chain to:  Disambiguate institutional records  Eradicate duplication of data  Map institutions into their hierarchy  Link systems using the institutional ID as the lynchpin
    25. 25. Minimising the impact of data silos Standard identifiers (both individual and institution) can be used to breakdown silos by enabling better system linking:
    26. 26. Improve data capture Data quality policy Web forms Closer collaboration with 3rd parties to encourage use of industry standard identifiers such as ISNI or ORCID
    27. 27. Data capture - data quality policy Design to ensure accuracy, quality and consistency Individual responsibilities:  All staff are responsible for the accuracy and consistency of data  Capture data in such a way that it is uniquely identifiable and easily shared within the organisation and with 3rd parties  Records relating to individuals  Records relating to institutions  Reporting of inaccuracies to Data Owners Data owners responsibilities:  All source data systems must have a designated Data Owner  Data owner retains overall responsibility for all records within their source data system
    28. 28. Improve data capture – web forms Required fields Validation Address validation – postcode lookup Institution validation – institution lookup ‘Internal’ and ‘external’ web form consistency Language barriers Help and hints Free-text fields
    29. 29. On-going monitoring Dashboards Regular audits Metrics – Institutional Linking Rate Staff awareness Reporting of errors
    30. 30. A publisher example Develop a Data Governance Programme  Data ‘champion’  Engagement – at all levels  Ownership – at all levels  Allocate necessary resources  Guidelines/Policy - Data quality policy  Processes put in place  Education - raise awareness  New staff – training on Data Governance and their wider impact  Change of culture
    31. 31. A publisher example Ringgold and DataSalon client  All institutional records contain Ringgold Identifiers  System linking via Individual and Institutional identifiers  Data (both good and bad) visible to all via MasterVision  Use of data governance dashboards  Tidying of existing data  Simple reporting of incorrect data across organisation  New data captured correctly
    32. 32. Author database1. Create a data governance dashboard to monitor problem areas: • Book authors with no related institution • Unknown book authors • Author records without an affiliation entry • Author records with commas in the affiliation entry • Book authors without an email address • Book authors with an invalid email address2. Correct problem records in existing data • Dashboard clearly highlighted all records of concern and these records were corrected
    33. 33. Author database3. Ensure new records are created correctly • Raise staff understanding of the importance of capturing data correctly and the impact it has across the organisation as a whole (data silos) • Training covering data governance4. Ensure appropriate Ringgold coverage • Where institutions were discovered in the Author database that didn’t exist within Identify these were reported to Ringgold. This not only means that individual authors can be linked to the new institution but that any individuals in other data sources at the same institution can be linked. This benefits all users of our data and potentially highlights new sales opportunities.5. Monitor data quality on an on-going basis • Books data governance dashboard update on a weekly basis.
    34. 34. Author database – results 100.00% 10% will never link: • Missing data (old records) 95.00% • Institution no longer exists 90.00% • Retired author 85.00% • Genuinely no related institution All data sources ANKO 80.00% 75.00% End of process: 70.00% • 15% increase in authors linked to institutions - information valuable in supporting all areas of the business • Ready for data migration
    35. 35. ICEDIS The international standards organization EDItEUR is working to encourage improvements in the ways that "party" information is communicated Some parts of the supply chain continue to send unstructured name & address records, making matching, disambiguation and automatic ingest near impossible ICEDIS has collaborated with EDItEUR to develop a highly structured data model for exchanging names, addresses and standard identifiers. The group has recently been validating the model by means of a "paper pilot", using a small library of about 100 name & address types An XML schema and HTML documentation are freely availablewww.editeur.orgwww.editeur.org/138/Structured-Name-and-Address-Modelinfo@editeur.org
    36. 36. Summary Your data is a very valuable asset when managed correctly Establishing a data governance programme will enable you to gain maximum benefit from that data Data governance is as much about changing the culture of an organisation as it is about processes and procedures It will take time but the benefits can be enormous
    37. 37. Phil NicolsonData ManagerRinggold Inc.phil.nicolson@ringgold.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×