Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data management for CRP DS research: 
where do we currently stand at ICARDA? 
• CO: CGIAR Open Access and Data Management ...
Data management for CRP DS research: 
where do we currently stand at ICARDA? 
Plan 
•Sources of data under CRP DS 
•Status...
Scope: Sources of data 
Scope of work is determined by observing a 
complex interplay of 
•Base components: crops, livesto...
Partners in the DM (OA) 
• Who generates the data? Who owns them? Who regulates 
their sharing? 
Outcome: What after archi...
CRP DS DM Current Status 
DM Status at ICARDA /Its Flagships Target 
Regions 
•ICARDA Projects: D and DM with scientists, ...
ICARDA Projects dealing with CRP DS 
ICARDA Programs: DSIPS, IWLM, SEPR, BIGM 
(also generating data for other CRPs) 
1. C...
CRP DS DM - Research quality 
On-farm Trials: 
• Less frequent: research for technology 
generation 
– Experimental design...
ICARDA Projects dealing with CRP DS 
• A list of data sources for CRP DS and other ICARDA 
projects: 
• Design of crop rot...
ICARDA Projects dealing with CRP DS 
ICARDA Outlook: 
•Decentralization of ICARDA has changed the way we do 
our business....
ICARDA Projects dealing with CRP DS: 
Key to DQ 
Research quality (RQ) 
•Experimental design could be an issue (in terms o...
ICARDA Projects dealing with CRP DS 
Examples of Data quality issues: 
•Experimental design accepted; crop management prop...
ICARDA Projects dealing with CRP DS 
2. Crop Improvement CRPs (CRP Wheat, CRP DC, CRP GL) 
•Single factor- Crop varieties ...
ICARDA Projects dealing with CRP DS 
3. Issues of the Poor Quality of Data- Indicators and 
resolves 
A frequent issue of ...
ICARDA Projects dealing with CRP DS 
Some cases of data quality issues: 
•1. Research Quality– experimental design OK but ...
ICARDA Projects dealing with CRP DS 
Some cases of data quality issues… continued 
•3. Relationships between the traits ap...
ICARDA Projects dealing with CRP DS 
Some checks and balances: 
•Data care bulletin (see References) 
Tools: 
•Design expe...
ICARDA Projects dealing with CRP DS 
Some checks and balances … continued. 
•No go with ANOVA may turn to be a good thing ...
ICARDA Projects dealing with CRP DS 
Some checks and balances...continued: 
•Benefiting from ICRISAT Tools and Techniques ...
ICARDA Projects dealing with CRP DS 
An attractive specialization: 
•Data Science 
•The Data Scientist’s Toolbox: 
https:/...
ICARDA Projects dealing with CRP DS 
Crystalizing an approach: 
4. CRP: Dryland Systems Management: Workflow 
components 
...
ICARDA Projects dealing with CRP DS 
…..continued: 
•Information Management. This refers to the [statistically 
analysed] ...
Thank you
Upcoming SlideShare
Loading in …5
×

Where do we currently stand at ICARDA?

505 views

Published on

Data management: Where do we currently stand at ICARDA?

Published in: Environment
  • Be the first to comment

Where do we currently stand at ICARDA?

  1. 1. Data management for CRP DS research: where do we currently stand at ICARDA? • CO: CGIAR Open Access and Data Management Plans & Implementation (Article 4.1.9) states “Open Access and Data Management Plans should be prepared in order to ensure implementation of this Policy. Such Plans shall, in particular, outline a strategy for maximizing opportunities to make information products Open Access”. • Output: Research quality and data quality issues in CRP DS research and mechanism/workflow
  2. 2. Data management for CRP DS research: where do we currently stand at ICARDA? Plan •Sources of data under CRP DS •Status of DM at ICARDA •CRP DS research areas for data generation •Issues and solutions related to Research quality and data quality •Workflow for DM sharing
  3. 3. Scope: Sources of data Scope of work is determined by observing a complex interplay of •Base components: crops, livestock, rangelands, trees etc. & production systems × •Biophysical environment constraints: water scarcity, land degradation × •Technological access : Access to the product and regulatory environment
  4. 4. Partners in the DM (OA) • Who generates the data? Who owns them? Who regulates their sharing? Outcome: What after archiving with an Open Assess (OA) System? • Data mining • Exploration of large or even BIG data leading to a wider picture viewed from the bridge • No dearth of random factors/sources in data • Availability of prior information • Bayesian analysis to span the statistical inference domain to reality
  5. 5. CRP DS DM Current Status DM Status at ICARDA /Its Flagships Target Regions •ICARDA Projects: D and DM with scientists, archived in their laptops, different various locations/countries •GU data on Central servers, Amman, Jordan •D Manager to be recruited •NARS data with NARS
  6. 6. ICARDA Projects dealing with CRP DS ICARDA Programs: DSIPS, IWLM, SEPR, BIGM (also generating data for other CRPs) 1. Cropping systems and Agronomy on-station and on-farm •On-station Trials – Single factor, multi-factors including: – systems of rotations, intercrop, monocrops – crop components – fertilizer input – IPDM controls and other management factors
  7. 7. CRP DS DM - Research quality On-farm Trials: • Less frequent: research for technology generation – Experimental design with small number of treatments, small blocks, variable treatment designated as control or farmer-technology, relatively large number of replications •Most frequent: technology verification and demonstration •Sampling design: large plots, small number of sample is a concern
  8. 8. ICARDA Projects dealing with CRP DS • A list of data sources for CRP DS and other ICARDA projects: • Design of crop rotation trials [general] • DM and Analyses of data from the 2-course long-term wheat rotations (productivity, sustainability aspects including time-trend estimation). [NAWA: Long-term crop rotation trials on wheat & Barley at Tel Hadya, Syria, Long-term wheat rotation trial at Kamishly, Syria, Long-term sustainability trials in Egypt, etc.] • Evaluation of conservation tillage data [CA trials in Jordan and Iraq] • Analyses of data from livestock evaluation experiments [Long-term trials, wheat & Barley at Tel Hadya]
  9. 9. ICARDA Projects dealing with CRP DS ICARDA Outlook: •Decentralization of ICARDA has changed the way we do our business. •Archiving data and sharing has [essentially] become the way of our business. •We need to extend [quality] data sharing from within ICARDA to Public. NARS/ five Flagship target regions •1) The West African Sahel and dry savannas , 2) East and Southern Africa, 3) North Africa and West Asia, 4) Central Asia, 5) South Asia
  10. 10. ICARDA Projects dealing with CRP DS: Key to DQ Research quality (RQ) •Experimental design could be an issue (in terms of blocking and replications) •Approach/Solution: thorough discussion with subject matter specialists and biometrician/statistician •Resources for enhancing RQ and DQ: • •JNR Jeffers (1978). Statistical Checklist: Design of Experiments No. 1 (Statistical checklists). Institute of Terrestrial Ecology, Natural Environment Research Council, Cambridge, UK. http://www.sawleystudios.co.uk/jnrj/StatisticalCheck/Design.htm) • •JNR Jeffers (1979). Sampling (Statistical Checklist 2). Institute of Terrestrial Ecology, Natural Environment Research Council, Cambridge, UK. http://www.sawleystudios.co.uk/jnrj/StatisticalCheck/Sampling.htm • •David J. Finney (1990). Statistical data-their care and maintenance. Indian Society of Agricultural Statistics. •“This bulletin is extremely useful for students and research workers … topics dealt with are: acquisition of data, design of data gathering , care for data, types and units of data analysis and databases, copying, statistical ethics, data-entry to the computer, data scrutiny, integrity and some illustrations.”
  11. 11. ICARDA Projects dealing with CRP DS Examples of Data quality issues: •Experimental design accepted; crop management properly followed. • Experimental plots: plot size, harvested areas [2-row, 3-row, 4-row plots], calculation per hectare basis •Days to 50% flowering- how many plants were actually observed? •plant height (cm)- number of plants •seed yield, bio yield – area used; drying methods •Data entry? •Lack of Data recording electronic devices and transfer to file at laptop – Early days: field-books – Recent: Android Apps etc. – Data in Excel worksheet •What checks should we perform? •What should be the level of Experimental data quality for public sharing
  12. 12. ICARDA Projects dealing with CRP DS 2. Crop Improvement CRPs (CRP Wheat, CRP DC, CRP GL) •Single factor- Crop varieties •Unreplicated designs for test materials + replicated or repeated checks •Replicated variety trials in RCB, IBD (alpha-designs), p-rep designs •METs (Multi-environments/Multi-location and multi-year trials) •Two-factor experiments – Crops + Crop varieties – Sometimes agronomic trials – planting dates, IPDMs etc. •Result outputs from Commodity CRPs, where breeding is the key component (CRP Wheat, CRP DC, CRP GL) flow to CRP DS. Where are the data? •Data with scientists in their laptops •Status in relation to DM(OA)/sharing is unknown to me
  13. 13. ICARDA Projects dealing with CRP DS 3. Issues of the Poor Quality of Data- Indicators and resolves A frequent issue of data quality •Is really something wrong with my data? Some statistical procedures work and some others do not, BUT the data are the same. Regression, GLM works but ANOVA does not. What is wrong with Stats? •ANOVA may turn to be a great tool for data checking, -- missing values in data variables may be the reality. •How about missing or repeats by mistake in a factor levels or factorial combinations?
  14. 14. ICARDA Projects dealing with CRP DS Some cases of data quality issues: •1. Research Quality– experimental design OK but Data on Design not OK/ design factors incorrectly entered; frequently encountered; Must be corrected before analysis else we have carried out a study different from what we planned and still think. – factor combinations not aligning with design (not missing observations) •2. Observed data values; traits values: errors of recording/data transfers to files – values out of range (a variable to lie within 0-100 or 0-1 goes outside; recording error) – Outliers/ recorded values appear too extreme. Will require validation with the assistant/scientists and if errors are found then must be corrected; generally viewed as the context of uni-variate analysis. – Outliers may have issues of interpretation and detection. Looks outlier in BY but not in log(BY) or sqrt(BY). There might be multivariate outliers. A column of remarks, possibly in the field book may support the recorded data.
  15. 15. ICARDA Projects dealing with CRP DS Some cases of data quality issues… continued •3. Relationships between the traits appearing along the crop development cycle may also be identified and used to build in data quality • DAF << DMAT • GY << BY •4. Helpful: Electronic data loggers (balance, Android Apps, with GIS/Date) •5. Role of the scientist/ a data supervisor must be made effective— random checks on data recording in the field book as well as in the file. Observations should be validated by another researchers experienced in the same discipline, particularly with visual scores. Random checks could be more effective. Data errors could be linked to the observer.
  16. 16. ICARDA Projects dealing with CRP DS Some checks and balances: •Data care bulletin (see References) Tools: •Design experiment/survey specific tools (Biometrician/Statistician to Data Manager). Clearly define the roles. •Examine factors combinations appearing in the data •Examine tables/cross-tables for qualitative data •Descriptive statistics • min, max, range, ratio=max/min (min>0) • Histograms •Box-plots and other diagnostics
  17. 17. ICARDA Projects dealing with CRP DS Some checks and balances … continued. •No go with ANOVA may turn to be a good thing to check bad data. However, as in above, – Missing values in response/covariate variables are a reality – But missing a factor level or factor combinations appear due to data entry error; combinations being different from those in the design. – Cases of repeated units – data entry errors •Outliers, if detected via a model fitting should stay in the data. Of course data validation, where possible, is encouraged.
  18. 18. ICARDA Projects dealing with CRP DS Some checks and balances...continued: •Benefiting from ICRISAT Tools and Techniques • on data checking tools • archiving the data on public platforms (an enforcer of Data Quality) • e.g. data systems from ICRISAT, Dataverse ( http://dvn.iq.harvard.edu/dvn/) •Computing tools/procedures: Training and development •Excel macros, Genstat/SAS/SPSS/R/other software •Database development/datasheet preparation/ archiving
  19. 19. ICARDA Projects dealing with CRP DS An attractive specialization: •Data Science •The Data Scientist’s Toolbox: https://www.coursera.org/course/datascitoolbox
  20. 20. ICARDA Projects dealing with CRP DS Crystalizing an approach: 4. CRP: Dryland Systems Management: Workflow components Home Center: Project/Meta data: Project ID, objectives, location, year, personnel (Planner, M&E team, data collector etc.), trial level information, factors (design and treatment), variables etc., A report of data validation in Step 2; links to data; •Data <<<< validation (via agreed tools) •Mechanism for Data Quality Check •1. Scientists >> >2. Statistician/DM team: apply the agreed tools • a) If fails-----> (1) to scientist for update • b) If passes----> Get metadata and links to data •2. Archiving (what? who will do this? DM Team?) • Sharing permissions etc. This could be a Workflow of permissions: Requester ---> Approval 1--->Approval 2 ---…---> Director CRP DS/nominee.
  21. 21. ICARDA Projects dealing with CRP DS …..continued: •Information Management. This refers to the [statistically analysed] results files/publications generated. •Knowledge Management: Key findings, Implications, lessons learned NARS •Identify the active NARS partners •Training on the above tools and workflow, Share Policy and Procedure on CRP DS DM (OA) •Identify the risk factors and their indicators and develop an action plan with resources required •Measure and Monitor the impact
  22. 22. Thank you

×