A member of the CGIAR Consortium www.iita.org
Introduction to Data
Management
Olatunbosun Obileye
Institutional Data Manager (IITA)
A member of the CGIAR Consortium www.iita.org
Outline
• Why data management and what for?
• What is data management?
– 3 major topics of data management
– Data life cycle
– Definition
• Data management policy
Approx. 40min.
A member of the CGIAR Consortium www.iita.org
Basic understanding of Data
Management
Why data management?
A member of the CGIAR Consortium www.iita.org
Human thinking vs.
computing
Computers compare patterns of electronic signals without
“understanding” what make sense.
http://www.rochester.edu/newscenter/machine-
learning-advances-human-computer-interaction/
Human cognition is based on categorization and similarity
to things we have already experienced through our senses.
A member of the CGIAR Consortium www.iita.org
Researchers reputation
Data Management paves the
way for data sharing, reuse
and recognition for the
researchers scientific efforts.
Data
Management
Data sharing
Data reuse &
citation
Credibility and
recognition
A member of the CGIAR Consortium www.iita.org
Improved Management
Data Management is
Management.
It is influenced and
influencing resource
availabilities, achievements,
bottlenecks, opportunities,
costs.
Project
Mgt.
Data
Mgt.
Time
mgt.
A member of the CGIAR Consortium www.iita.org
Basic understanding of Data
Management
What are data ?
A member of the CGIAR Consortium www.iita.org
What is data ?
Data are the values
recorded in the field
books, record books or
data-logging devices,
that are to be entered
into the computer and
then analyzed.
facts and statistics
collected together for
reference or analysis.
(Oxford dictionary)
Data is a set of values of
qualitative or quantitative
variables. (Wikipedia)
things known or assumed
as facts, making the basis
of reasoning or calculation.
A member of the CGIAR Consortium www.iita.org
Which data do you manage?
data
Administrative
personal
professional
Communication
email
Self-organizing
private
Financial &
inventory
Org structure
project
Unit/team
Human
resources
health
Research
By research
area
By output type
By processings
By publishing
Delivery
external
A member of the CGIAR Consortium www.iita.org
What is data?
What is Data Management ?
A member of the CGIAR Consortium www.iita.org
Definition
"Data management is
the development,
execution and supervision
of plans, policies,
programs and practices
that control, protect,
deliver and enhance
the value of data and
information assets.“
Data Management Association International, 2001
A member of the CGIAR Consortium www.iita.org
Major topics of Data Mgt.
Data
security
Data
organizing
Data
quality
A member of the CGIAR Consortium www.iita.org
Data management principles
Data are…
• correct
• Consistent = uniform in:
– content,
– content structure,
– notation,
– units,
– methods used,
– meaning,
– language)
• complete
• up to date
• relevant
• precise
• reliable and comprehensible
• understandable by all involved users and
processible by machines
• unambiguous/explicit
• Datasets are free of redundancies
Data quality
• Every data is frequently backed up
• no data without access permission
control
• Treatment of data of different
ownership (private) is clarified
• Data is physically and electronically
saved
• There is no data without a person responsible
for it (clear roles & responsibilities)
• There is no data without one, clearly defined,
easy to find and communicated location for it
Data
organization
Data security
A member of the CGIAR Consortium www.iita.org
Data “life” time
data mgt. principles
Deletion
date
Generation
date
A member of the CGIAR Consortium www.iita.org
lifecycle
Data
create metadata and
documentation
back-up data
collect data (experiment, observe, sensing, measure, simulate)
re-assess situation
locate and assess
existing data
enter data, digitize, transcribe, translate
check, validate, clean data
migrate data to
best format
Locate, explore and understand data
scrutinize findings
define access
permission
control temporal
and personal
access
establish copyright
follow-up research
undertake research reviews
teach and learn
Exposing metadata through a
searchable interface
derive data (apply
statistical and
analytical
methods)
produce research outputs
author publications
design research
promote data
Specify question
interpret input data
Aggregate results
Determine tool
anonymize data where necessary
Identify (tracking)
Categorize
migrate data to
suitable medium
archive data in OA
repository
discuss with peers on
collaboration platform
Generate reports
Disseminate recommendations
share data
interpret output data
prepare data
for storage
centralize access to data
plan data management (interoperability, formats,
data type, storage, access, responsibilities etc.)
plan consent for sharing
identify sources
create templates
identify and respect standards, metadata standards, laws and regulations
create/apply protocols
organize data transfers
apply naming standards,
terminology
distribute & “sell” data
define information / data demand
collect feedback
Prepare existing data
Define mode of collection
Data criticism
create validation rules and tools
A member of the CGIAR Consortium www.iita.org
Data Management Policy
• Research data management follows
donors policy
• Data from IITA funded research belongs to
IITA
• Permission has to be taken if the research
data will be used elsewhere
A member of the CGIAR Consortium www.iita.org
Data Management Plan Content
• https://www.iita.org/wp-content/uploads/2018/09/Data-Management-Plan-
Template-for-IITA.pdf
A member of the CGIAR Consortium www.iita.org
Elements of DMP
Element Guidance
Data
description
Give a description of the information to be gathered including the
volume, scale and nature of data that will be generated or
collected.
Data
creation,
collection
and re-use
Outline how the data will be collected or created. Include details
of relevant existing data and how these data will be integrated
Data format Describe the format the data will be generated, maintained and
made available. Explain the reason for your procedure and
chosen archival format.
Metadata
standard
Describe the types of documentation that will accompany the data
to help secondary users to understand and reuse it. This should
at least include basic details that will help people to find the data,
including who created or contributed to the data, its title, date of
creation and under what conditions it can be accessed. Metadata
represents the who, what, when, where, why and how of the
collected data.
A member of the CGIAR Consortium www.iita.org
Elements of DMP (cont’d)
**Data
storage and
backup
State how often the data will be backed up and to which
locations. How many copies are being made? Storing data on
laptops, computer hard drives or external storage devices
alone is very risky. The use of robust, managed collaboration
platform (SharePoint) provided by DIMU is preferable. If you
choose to use a third-party service (like Amazon or other
cloud services), you should ensure that this does not conflict
with any funder, institutional, departmental or group policies,
for example in terms of the legal jurisdiction in which data are
held or the protection of sensitive data.
**Access and
security
Give a description of technical process to protect confidential and
no confidential information as well as how access
permission/restriction will be provided during the embargo period.
**Ethics and
privacy
Ethical issues affect how you store data, who can see/use it and
how long it is kept. Managing ethical concerns may include:
anonymization of data; referral to departmental or institutional
ethics committees; and formal consent agreements. You should
show that you are aware of any issues and have planned
accordingly. If you are carrying out research involving human
A member of the CGIAR Consortium www.iita.org
Elements of DMP (cont’d)
Copyright
and
Intellectual
Property
Right (IPR)
State who will own the copyright and IPR of any data that
you will collect or create, along with the license(s) for its use
and re-use. For multi-partner projects, IPR ownership may
be worth covering in a consortium agreement. Consider any
relevant funder, institutional, departmental or group policies
on copyright or IPR. Also, consider permissions to reuse
third-party data and any restrictions needed on data sharing
Data
sharing
Give a description of how data will be shared, including
access procedures, embargo periods, technical
mechanisms for dissemination and whether access will be
open or granted only to specific user groups. A timeframe
for data sharing and publishing should also be provided
Selection
and
embargo
period
Provide a description of how data will be selected for
archiving, how long the data will be held, and plans for
eventual transition or termination of the data collection in
the future.
Long term
preservatio
Where will you store your data for long term access after
the end of the project (e.g. CKAN)
A member of the CGIAR Consortium www.iita.org
Elements of DMP (cont’d)
Responsibili
ties
Outline the roles and responsibilities for all
activities e.g. data capture, metadata production,
data quality, storage and backup, data archiving
& data sharing. Consider who will be responsible
for ensuring relevant policies will be respected.
The role name should be named where possible.
Data
organization
Explain your folder structure, version control, naming
convention etc…
Quality
assurance
Explain how you will ensure good data quality during
the project life cycle
Budget Prepare the cost of data management planning, data
storage, archiving, data personnel and how the cost
will be paid. Request for funding may be included
Legal
requirement
Make a list of all relevant federal or funder
requirements for data management and data sharing.
A member of the CGIAR Consortium www.iita.org
Questions and Answers
Thank you
A member of the CGIAR Consortium www.iita.org
Thank you

Data management for proposal writing

  • 1.
    A member ofthe CGIAR Consortium www.iita.org Introduction to Data Management Olatunbosun Obileye Institutional Data Manager (IITA)
  • 2.
    A member ofthe CGIAR Consortium www.iita.org Outline • Why data management and what for? • What is data management? – 3 major topics of data management – Data life cycle – Definition • Data management policy Approx. 40min.
  • 3.
    A member ofthe CGIAR Consortium www.iita.org Basic understanding of Data Management Why data management?
  • 4.
    A member ofthe CGIAR Consortium www.iita.org Human thinking vs. computing Computers compare patterns of electronic signals without “understanding” what make sense. http://www.rochester.edu/newscenter/machine- learning-advances-human-computer-interaction/ Human cognition is based on categorization and similarity to things we have already experienced through our senses.
  • 5.
    A member ofthe CGIAR Consortium www.iita.org Researchers reputation Data Management paves the way for data sharing, reuse and recognition for the researchers scientific efforts. Data Management Data sharing Data reuse & citation Credibility and recognition
  • 6.
    A member ofthe CGIAR Consortium www.iita.org Improved Management Data Management is Management. It is influenced and influencing resource availabilities, achievements, bottlenecks, opportunities, costs. Project Mgt. Data Mgt. Time mgt.
  • 7.
    A member ofthe CGIAR Consortium www.iita.org Basic understanding of Data Management What are data ?
  • 8.
    A member ofthe CGIAR Consortium www.iita.org What is data ? Data are the values recorded in the field books, record books or data-logging devices, that are to be entered into the computer and then analyzed. facts and statistics collected together for reference or analysis. (Oxford dictionary) Data is a set of values of qualitative or quantitative variables. (Wikipedia) things known or assumed as facts, making the basis of reasoning or calculation.
  • 9.
    A member ofthe CGIAR Consortium www.iita.org Which data do you manage? data Administrative personal professional Communication email Self-organizing private Financial & inventory Org structure project Unit/team Human resources health Research By research area By output type By processings By publishing Delivery external
  • 10.
    A member ofthe CGIAR Consortium www.iita.org What is data? What is Data Management ?
  • 11.
    A member ofthe CGIAR Consortium www.iita.org Definition "Data management is the development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets.“ Data Management Association International, 2001
  • 12.
    A member ofthe CGIAR Consortium www.iita.org Major topics of Data Mgt. Data security Data organizing Data quality
  • 13.
    A member ofthe CGIAR Consortium www.iita.org Data management principles Data are… • correct • Consistent = uniform in: – content, – content structure, – notation, – units, – methods used, – meaning, – language) • complete • up to date • relevant • precise • reliable and comprehensible • understandable by all involved users and processible by machines • unambiguous/explicit • Datasets are free of redundancies Data quality • Every data is frequently backed up • no data without access permission control • Treatment of data of different ownership (private) is clarified • Data is physically and electronically saved • There is no data without a person responsible for it (clear roles & responsibilities) • There is no data without one, clearly defined, easy to find and communicated location for it Data organization Data security
  • 14.
    A member ofthe CGIAR Consortium www.iita.org Data “life” time data mgt. principles Deletion date Generation date
  • 15.
    A member ofthe CGIAR Consortium www.iita.org lifecycle Data create metadata and documentation back-up data collect data (experiment, observe, sensing, measure, simulate) re-assess situation locate and assess existing data enter data, digitize, transcribe, translate check, validate, clean data migrate data to best format Locate, explore and understand data scrutinize findings define access permission control temporal and personal access establish copyright follow-up research undertake research reviews teach and learn Exposing metadata through a searchable interface derive data (apply statistical and analytical methods) produce research outputs author publications design research promote data Specify question interpret input data Aggregate results Determine tool anonymize data where necessary Identify (tracking) Categorize migrate data to suitable medium archive data in OA repository discuss with peers on collaboration platform Generate reports Disseminate recommendations share data interpret output data prepare data for storage centralize access to data plan data management (interoperability, formats, data type, storage, access, responsibilities etc.) plan consent for sharing identify sources create templates identify and respect standards, metadata standards, laws and regulations create/apply protocols organize data transfers apply naming standards, terminology distribute & “sell” data define information / data demand collect feedback Prepare existing data Define mode of collection Data criticism create validation rules and tools
  • 16.
    A member ofthe CGIAR Consortium www.iita.org Data Management Policy • Research data management follows donors policy • Data from IITA funded research belongs to IITA • Permission has to be taken if the research data will be used elsewhere
  • 17.
    A member ofthe CGIAR Consortium www.iita.org Data Management Plan Content • https://www.iita.org/wp-content/uploads/2018/09/Data-Management-Plan- Template-for-IITA.pdf
  • 18.
    A member ofthe CGIAR Consortium www.iita.org Elements of DMP Element Guidance Data description Give a description of the information to be gathered including the volume, scale and nature of data that will be generated or collected. Data creation, collection and re-use Outline how the data will be collected or created. Include details of relevant existing data and how these data will be integrated Data format Describe the format the data will be generated, maintained and made available. Explain the reason for your procedure and chosen archival format. Metadata standard Describe the types of documentation that will accompany the data to help secondary users to understand and reuse it. This should at least include basic details that will help people to find the data, including who created or contributed to the data, its title, date of creation and under what conditions it can be accessed. Metadata represents the who, what, when, where, why and how of the collected data.
  • 19.
    A member ofthe CGIAR Consortium www.iita.org Elements of DMP (cont’d) **Data storage and backup State how often the data will be backed up and to which locations. How many copies are being made? Storing data on laptops, computer hard drives or external storage devices alone is very risky. The use of robust, managed collaboration platform (SharePoint) provided by DIMU is preferable. If you choose to use a third-party service (like Amazon or other cloud services), you should ensure that this does not conflict with any funder, institutional, departmental or group policies, for example in terms of the legal jurisdiction in which data are held or the protection of sensitive data. **Access and security Give a description of technical process to protect confidential and no confidential information as well as how access permission/restriction will be provided during the embargo period. **Ethics and privacy Ethical issues affect how you store data, who can see/use it and how long it is kept. Managing ethical concerns may include: anonymization of data; referral to departmental or institutional ethics committees; and formal consent agreements. You should show that you are aware of any issues and have planned accordingly. If you are carrying out research involving human
  • 20.
    A member ofthe CGIAR Consortium www.iita.org Elements of DMP (cont’d) Copyright and Intellectual Property Right (IPR) State who will own the copyright and IPR of any data that you will collect or create, along with the license(s) for its use and re-use. For multi-partner projects, IPR ownership may be worth covering in a consortium agreement. Consider any relevant funder, institutional, departmental or group policies on copyright or IPR. Also, consider permissions to reuse third-party data and any restrictions needed on data sharing Data sharing Give a description of how data will be shared, including access procedures, embargo periods, technical mechanisms for dissemination and whether access will be open or granted only to specific user groups. A timeframe for data sharing and publishing should also be provided Selection and embargo period Provide a description of how data will be selected for archiving, how long the data will be held, and plans for eventual transition or termination of the data collection in the future. Long term preservatio Where will you store your data for long term access after the end of the project (e.g. CKAN)
  • 21.
    A member ofthe CGIAR Consortium www.iita.org Elements of DMP (cont’d) Responsibili ties Outline the roles and responsibilities for all activities e.g. data capture, metadata production, data quality, storage and backup, data archiving & data sharing. Consider who will be responsible for ensuring relevant policies will be respected. The role name should be named where possible. Data organization Explain your folder structure, version control, naming convention etc… Quality assurance Explain how you will ensure good data quality during the project life cycle Budget Prepare the cost of data management planning, data storage, archiving, data personnel and how the cost will be paid. Request for funding may be included Legal requirement Make a list of all relevant federal or funder requirements for data management and data sharing.
  • 22.
    A member ofthe CGIAR Consortium www.iita.org Questions and Answers Thank you
  • 23.
    A member ofthe CGIAR Consortium www.iita.org Thank you