Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Efficient & effective
data management for research projects
ILRI's Data Management
Platform
Carlos Quiros
June, 2015
• Back in 2011
• Current status
• How we did it
• Example of a process
• CKAN
• Key decisions made
• Technology and skills...
Back in 2011
Survey design
• Too many
• Not common indicators
• <> Variables
• <> Calculations
Survey implementation
• Too...
How we went around it
Storage• Server database
• How to integrate ODK and CSPro?
• How to make it easy for scientists?
• H...
Example of a process
Testing &
Review (.xls)
Uploaded to
Formhub to test
account
Testing &
Review
(ODK Collect)
Ok
?
Field...
ILRI’s data portal (CKAN) – http://data.ilri.org/portal/
• CKAN?
• The Open Knowledge Foundation
• Biggest deployed data p...
Key decisions made
• Use open source for all RDM
Pros:
• Bigger pool of tools
• Flexible
• Innovation
Cons:
• Complex skil...
Technology and skills required
• Server
• Linux (Ubuntu server) [Linux administration]
• http://www.ubuntu.com/download/se...
Thank you
Visit us @
http://data.ilri.org/
Upcoming SlideShare
Loading in …5
×

Efficient & effective data management for research projects : ILRI's Data Management Platform

1,275 views

Published on

By Carlos Quiros (ILRI) at the Forum on Open Data and Open Science in Agriculture on 15th June 2015

Published in: Education
  • Be the first to comment

  • Be the first to like this

Efficient & effective data management for research projects : ILRI's Data Management Platform

  1. 1. Efficient & effective data management for research projects ILRI's Data Management Platform Carlos Quiros June, 2015
  2. 2. • Back in 2011 • Current status • How we did it • Example of a process • CKAN • Key decisions made • Technology and skills required Contents
  3. 3. Back in 2011 Survey design • Too many • Not common indicators • <> Variables • <> Calculations Survey implementation • Too many tools • No protocols • Poor field data cleaning • No standard process Storage • In files • Too many formats • Too many versions • Messy data cleaning • No accountability Availability & accessibility • Nothing Now Survey design • Too many • Common indicators • = Variables • = Calculations Storage • Server database • No formats • One version • Central cleaning • Accountability Availability & accessibility • CKAN • OData Survey implementation • 2 tools (ODK, CSPro) • Protocols • Field data cleaning • Standard process • Standard tools
  4. 4. How we went around it Storage• Server database • How to integrate ODK and CSPro? • How to make it easy for scientists? • How to manage user decentralization? • Increase accountability? Availability and accessibility• What to use? CKAN, Dataverse, etc.  CKAN • How to extend it to serve our purpose? • How to integrate it with a server database? • How to manage our metadata and vocabularies? • How to do this? • Data interoperability? RDF, OData, Gdata, etc?  OData • How to do it? Survey implementation• Support only two tools • Wrote protocols • Wrote field data cleaning applications • Wrote policies and implementation plans • Wrote standard processes and tools for processing the data • Worked closely with teams • Created a central place for all the surveys • Separated surveys in modules • Worked on common indicators • Management supports this process Survey design (ongoing)
  5. 5. Example of a process Testing & Review (.xls) Uploaded to Formhub to test account Testing & Review (ODK Collect) Ok ? Field Deployment Uploaded to Formhub to project account Data collection Upload data to Formhub End of Data Collecti on Sharing in Data Portal Data Cleaning from server using MySQL for Excel Detailed breakdown of ILRI’s RMD workflow with ODK Coding .doc  .xls Start Draft tool (.doc) Consultation Final tool (.doc) Who Code s RMG Staff Project Team Member Create MySQL schema with ODKToMySQL MySQL schema in server Convert data to JSON with FormhubToJSO N Data in JSON format Upload JSON into MySQL Schema with JSONToMySQL Metadata for portal Initialize META in schema S = Scientist input / usage S S S S S S S
  6. 6. ILRI’s data portal (CKAN) – http://data.ilri.org/portal/ • CKAN? • The Open Knowledge Foundation • Biggest deployed data portal software • USA data portal • UK data portal • EU data portal • Open Africa • What do you get out of the box? • Create datasets with minimum metadata • Name, Abstract, Author, Date • Tags into controlled vocabulary • Powerful search engine • Public / private access to datasets • Able to attach resources (files) to a dataset • Data interoperability through powerful API and RDF • Arrange datasets into organization and topics • What can you do by creating extensions • Add new vocabularies (e.g., Language, Countries, etc.) • Add new metadata fields • Visualize different kinds of data (e.g., maps) • Change theme (colors, logos, fonts, etc.) • Create data hubs by harvesting other CKANs • What ever else you want…..
  7. 7. Key decisions made • Use open source for all RDM Pros: • Bigger pool of tools • Flexible • Innovation Cons: • Complex skill set • Learning curve • Relational Database Management System (RDMS) Pros: • Central place • Auditing Cons: • DB management skill set • Scientist have no idea on how to work with a RDMS • CKAN Pros: • There is nothing better out there • Flexible and extendible Cons: • Programming in several languages is required • Learning curve
  8. 8. Technology and skills required • Server • Linux (Ubuntu server) [Linux administration] • http://www.ubuntu.com/download/server • Database server • MySQL – An open source database system [DB administration, SQL] • http://www.mysql.com/ • Data processing software [Linux, C++, Python] • ODK – A toolset for collecting data on mobile devices. • https://opendatakit.org/ • CSPro – A software for creating data entry applications. • https://www.census.gov/population/international/software/cspro/ • Formhub – A software tools that collects ODK data. • https://github.com/SEL-Columbia/formhub • ODK Tools – A toolbox for processing ODK survey data into MySQL databases. • https://github.com/ilri/odktools • META – A toolbox for managing research data in MySQL databases. • https://github.com/ilri/meta • CSProTools – A toolbox for processing CSPro survey data into MySQL databases. • https://github.com/ilri/csprotools • Data sharing and interoperability • CKAN – The open source data portal software. [Linux, Python, WebDev] • http://ckan.org/ • http://docs.ckan.org/en/latest/maintaining/installing/index.html • http://docs.ckan.org/en/latest/extensions/index.html • Odata – Allow the creation and consumption of queryable and interoperable data resources in a simple and standard way. [Linux, Java, WebDev] • http://www.odata.org/
  9. 9. Thank you Visit us @ http://data.ilri.org/

×