SlideShare a Scribd company logo
1 of 12
Bad Data is No Better Than No Data! -
Impact of Automation in Data Stewardship
Workflows in Plant Agriculture Industry
Karnam Vasudeva Rao
Senior Scientist, Data Science Team
Monsanto
1
Innovations at Monsanto
R & D
Discovering innovative solutions to
challenges big and small, helping
farmers grow more sustainably.
Agricultural Biologicals
Using naturally-occurring microbes to
benefit the soil and seed.
Modern Agriculture
Evolving the approach to agricultural
innovations and farming practices that
helps farmers increase efficiency.
Crop Protection
Guarding plants from disease, weeds,
and pests.
Data Science
Measuring the health of plants,
available natural resources, and the
efficiency of a farm..
Biotechnology
Introducing greater tolerance and
adaptability to a seed product.
Plant Breeding
Merging plant genetics for improved
yield, water efficiency, and more.
Biotechnol
ogy
• Headquarters: St. Louis, Missouri, United States
• Fortune 500 company
• Over 20,000 employees globally
• Facilities in 69 countries
2
Monsanto
01 – Getting Data,
Compilation
02 – Curation and
Ontology
03 –Data
processing and DB
management
04 –Data analysis,
App development
and Visual
analytics
Acquisition Normalization Integration Analytics
Data Stewardship phases to enable Data2Decisions
3
Tracing entities in R & D pipeline is difficult
Registration Cloning
Gene
transfer
Green-
house
Field
studies
in-house IDs,
Gene, Protein IDs
DB1 & 3
Gene, Monsanto
vector name
DB 1, 2, & 4
Monsanto vector
name, Plant
barcode, sample
barcode
DB 5 & 6
Monsanto vector
name, Plant
barcode, sample
barcode
DB 7
Monsanto vector
name, Plant
barcode, sample
barcode, Field IDs
DB 8-10
EntitiesDatabases
4
Pipeline
Identifiers Gene/in-house IDs Monsanto vector ID Plant/Seed IDs
Data
storage
Example
Field studies
Green-house
experiments
Lab experiments
Sample name Insect Name
NCR
Corn rootworm
CRW
SCR
5
DB1
DB6
DB2 DB3
DB7
DB4 DB5
DB8 DB9 DB10
D3 data from in-house research studies
From D3 to C3
6
Common name (Acronym) Scientific name Colloquial term
Northern corn rootworm (NCR) Diabrotica barberi
Corn rootworm
(CRW)
Southern corn rootworm (SCR) Diabrotica undecimpunctata howardi
Implementing CV and ontologies removes ambiguity caused by using
colloquial terms and makes the data Clean, Consistent and Connected.
Corn/corn = Maize/maize = Zea mays
Data Stewardship to Achieve Data Integrity
• Ensures data reusability, accessibility, and quality
• Has consistent data definitions, data aliases
• Metadata (data about data) enables organized information retrieval
• Integrated, enterprise-wide view of the data provides the foundation for the shared data
7
Standardizing metadata is important for data integrity, reproducibility and accessibility
Raw (dirty)
data
Metadata (Crop, Insect,
Plant stage and Gen)
Curated data
Clean and consistent Data
Dashboards - analytics DB 3 (Oracle) PostgreSQL
API API
dataCuratoR: Automated data standardization
of real-time insect assay data to enable decisions
CRON
8
DB1 DB2
Decisions
Automation increased accuracy and
minimized resource usage
9
• Data access
• Requirement gathering
• Patterns, missing data and
inconsistencies
• Source for answers
• Manual curation
• Programming
• More patterns,
gaps and
inconsistencies?
• Maintenance &
enhancements
• Minimal coding
• Patterns, gaps
and
inconsistencies
• Coding & APIs
• More patterns,
inconsistances?
• Maintenance &
enchantements
• Minimal coding
FY16
2.2 Resource
hours
FY17
0.9 Resource
hours
FY18
0.4 Resource
hours
FY19
0.3 Resource
hours
Increased data
accessibility
Documentation
01
02
03
04
Software best practices were followed to
ensure reusability of code
Version control
Code Review
Unit testing
10
Minimizing sampling points by predicting
protein expression saved resource and time
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6
+ +
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6
+ +
ML
11
Thanks
12

More Related Content

Similar to Bad Data is No Better Than No Data - Impact of Automation in Data Stewardship workflows

SBSF presentation at RVSKVV, Gwalior, 20/07/2020
SBSF presentation at RVSKVV, Gwalior, 20/07/2020SBSF presentation at RVSKVV, Gwalior, 20/07/2020
SBSF presentation at RVSKVV, Gwalior, 20/07/2020Dr. Mukti Sadhan Basu, Ph.D
 
DDRS Presentation Dec 19
DDRS Presentation Dec 19DDRS Presentation Dec 19
DDRS Presentation Dec 19Stealth Project
 
Plant Leaf Recognition Using Machine Learning: A Review
Plant Leaf Recognition Using Machine Learning: A ReviewPlant Leaf Recognition Using Machine Learning: A Review
Plant Leaf Recognition Using Machine Learning: A ReviewIRJET Journal
 
BigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCHBigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCHJohn Koch
 
Data Quality and the FAIR principles
Data Quality and the FAIR principlesData Quality and the FAIR principles
Data Quality and the FAIR principlesAmrapali Zaveri, PhD
 
Implementation of mobile computing system to support the management of the op...
Implementation of mobile computing system to support the management of the op...Implementation of mobile computing system to support the management of the op...
Implementation of mobile computing system to support the management of the op...CIAT
 
Analysis and prediction of seed quality using machine learning
Analysis and prediction of seed quality using machine learning Analysis and prediction of seed quality using machine learning
Analysis and prediction of seed quality using machine learning IJECEIAES
 
Herbal plant recognition using deep convolutional neural network
Herbal plant recognition using deep convolutional neural networkHerbal plant recognition using deep convolutional neural network
Herbal plant recognition using deep convolutional neural networkjournalBEEI
 
Artifical Intelligence in DEMETER
Artifical Intelligence in DEMETERArtifical Intelligence in DEMETER
Artifical Intelligence in DEMETERH2020 DEMETER
 
A global information portal to facilitate and promote accessibility and ratio...
A global information portal to facilitate and promote accessibility and ratio...A global information portal to facilitate and promote accessibility and ratio...
A global information portal to facilitate and promote accessibility and ratio...IAALD Community
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingDenodo
 
eROSA Stakeholder WS1: Ontological annotations supporting FAIR agricultural data
eROSA Stakeholder WS1: Ontological annotations supporting FAIR agricultural dataeROSA Stakeholder WS1: Ontological annotations supporting FAIR agricultural data
eROSA Stakeholder WS1: Ontological annotations supporting FAIR agricultural datae-ROSA
 
The Importance Of Data Mining By Musa Mohd. Nordin, Noor
The Importance Of Data Mining By Musa Mohd. Nordin, NoorThe Importance Of Data Mining By Musa Mohd. Nordin, Noor
The Importance Of Data Mining By Musa Mohd. Nordin, Noormuzkara
 
Genesys: Online portal to Genebank Data
Genesys: Online portal to Genebank DataGenesys: Online portal to Genebank Data
Genesys: Online portal to Genebank DataLuigi Guarino
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsBrett Tully
 

Similar to Bad Data is No Better Than No Data - Impact of Automation in Data Stewardship workflows (20)

SBSF presentation at RVSKVV, Gwalior, 20/07/2020
SBSF presentation at RVSKVV, Gwalior, 20/07/2020SBSF presentation at RVSKVV, Gwalior, 20/07/2020
SBSF presentation at RVSKVV, Gwalior, 20/07/2020
 
DDRS Presentation Dec 19
DDRS Presentation Dec 19DDRS Presentation Dec 19
DDRS Presentation Dec 19
 
Plant Leaf Recognition Using Machine Learning: A Review
Plant Leaf Recognition Using Machine Learning: A ReviewPlant Leaf Recognition Using Machine Learning: A Review
Plant Leaf Recognition Using Machine Learning: A Review
 
BigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCHBigDataInPractice_EXLPHARMA_KOCH
BigDataInPractice_EXLPHARMA_KOCH
 
Data Quality and the FAIR principles
Data Quality and the FAIR principlesData Quality and the FAIR principles
Data Quality and the FAIR principles
 
Bioinformatics principles and applications
Bioinformatics principles and applicationsBioinformatics principles and applications
Bioinformatics principles and applications
 
Implementation of mobile computing system to support the management of the op...
Implementation of mobile computing system to support the management of the op...Implementation of mobile computing system to support the management of the op...
Implementation of mobile computing system to support the management of the op...
 
Accurate plant species analysis for plant classification using convolutional...
Accurate plant species analysis for plant classification using  convolutional...Accurate plant species analysis for plant classification using  convolutional...
Accurate plant species analysis for plant classification using convolutional...
 
Analysis and prediction of seed quality using machine learning
Analysis and prediction of seed quality using machine learning Analysis and prediction of seed quality using machine learning
Analysis and prediction of seed quality using machine learning
 
Herbal plant recognition using deep convolutional neural network
Herbal plant recognition using deep convolutional neural networkHerbal plant recognition using deep convolutional neural network
Herbal plant recognition using deep convolutional neural network
 
Artifical Intelligence in DEMETER
Artifical Intelligence in DEMETERArtifical Intelligence in DEMETER
Artifical Intelligence in DEMETER
 
A global information portal to facilitate and promote accessibility and ratio...
A global information portal to facilitate and promote accessibility and ratio...A global information portal to facilitate and promote accessibility and ratio...
A global information portal to facilitate and promote accessibility and ratio...
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes Biobanking
 
01 pgr data base management
01 pgr data base management01 pgr data base management
01 pgr data base management
 
Data Management: Geo-informatics
Data Management: Geo-informaticsData Management: Geo-informatics
Data Management: Geo-informatics
 
What we do
What we doWhat we do
What we do
 
eROSA Stakeholder WS1: Ontological annotations supporting FAIR agricultural data
eROSA Stakeholder WS1: Ontological annotations supporting FAIR agricultural dataeROSA Stakeholder WS1: Ontological annotations supporting FAIR agricultural data
eROSA Stakeholder WS1: Ontological annotations supporting FAIR agricultural data
 
The Importance Of Data Mining By Musa Mohd. Nordin, Noor
The Importance Of Data Mining By Musa Mohd. Nordin, NoorThe Importance Of Data Mining By Musa Mohd. Nordin, Noor
The Importance Of Data Mining By Musa Mohd. Nordin, Noor
 
Genesys: Online portal to Genebank Data
Genesys: Online portal to Genebank DataGenesys: Online portal to Genebank Data
Genesys: Online portal to Genebank Data
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
 

Recently uploaded

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 

Bad Data is No Better Than No Data - Impact of Automation in Data Stewardship workflows

  • 1. Bad Data is No Better Than No Data! - Impact of Automation in Data Stewardship Workflows in Plant Agriculture Industry Karnam Vasudeva Rao Senior Scientist, Data Science Team Monsanto 1
  • 2. Innovations at Monsanto R & D Discovering innovative solutions to challenges big and small, helping farmers grow more sustainably. Agricultural Biologicals Using naturally-occurring microbes to benefit the soil and seed. Modern Agriculture Evolving the approach to agricultural innovations and farming practices that helps farmers increase efficiency. Crop Protection Guarding plants from disease, weeds, and pests. Data Science Measuring the health of plants, available natural resources, and the efficiency of a farm.. Biotechnology Introducing greater tolerance and adaptability to a seed product. Plant Breeding Merging plant genetics for improved yield, water efficiency, and more. Biotechnol ogy • Headquarters: St. Louis, Missouri, United States • Fortune 500 company • Over 20,000 employees globally • Facilities in 69 countries 2 Monsanto
  • 3. 01 – Getting Data, Compilation 02 – Curation and Ontology 03 –Data processing and DB management 04 –Data analysis, App development and Visual analytics Acquisition Normalization Integration Analytics Data Stewardship phases to enable Data2Decisions 3
  • 4. Tracing entities in R & D pipeline is difficult Registration Cloning Gene transfer Green- house Field studies in-house IDs, Gene, Protein IDs DB1 & 3 Gene, Monsanto vector name DB 1, 2, & 4 Monsanto vector name, Plant barcode, sample barcode DB 5 & 6 Monsanto vector name, Plant barcode, sample barcode DB 7 Monsanto vector name, Plant barcode, sample barcode, Field IDs DB 8-10 EntitiesDatabases 4 Pipeline
  • 5. Identifiers Gene/in-house IDs Monsanto vector ID Plant/Seed IDs Data storage Example Field studies Green-house experiments Lab experiments Sample name Insect Name NCR Corn rootworm CRW SCR 5 DB1 DB6 DB2 DB3 DB7 DB4 DB5 DB8 DB9 DB10 D3 data from in-house research studies
  • 6. From D3 to C3 6 Common name (Acronym) Scientific name Colloquial term Northern corn rootworm (NCR) Diabrotica barberi Corn rootworm (CRW) Southern corn rootworm (SCR) Diabrotica undecimpunctata howardi Implementing CV and ontologies removes ambiguity caused by using colloquial terms and makes the data Clean, Consistent and Connected. Corn/corn = Maize/maize = Zea mays
  • 7. Data Stewardship to Achieve Data Integrity • Ensures data reusability, accessibility, and quality • Has consistent data definitions, data aliases • Metadata (data about data) enables organized information retrieval • Integrated, enterprise-wide view of the data provides the foundation for the shared data 7 Standardizing metadata is important for data integrity, reproducibility and accessibility
  • 8. Raw (dirty) data Metadata (Crop, Insect, Plant stage and Gen) Curated data Clean and consistent Data Dashboards - analytics DB 3 (Oracle) PostgreSQL API API dataCuratoR: Automated data standardization of real-time insect assay data to enable decisions CRON 8 DB1 DB2 Decisions
  • 9. Automation increased accuracy and minimized resource usage 9 • Data access • Requirement gathering • Patterns, missing data and inconsistencies • Source for answers • Manual curation • Programming • More patterns, gaps and inconsistencies? • Maintenance & enhancements • Minimal coding • Patterns, gaps and inconsistencies • Coding & APIs • More patterns, inconsistances? • Maintenance & enchantements • Minimal coding FY16 2.2 Resource hours FY17 0.9 Resource hours FY18 0.4 Resource hours FY19 0.3 Resource hours Increased data accessibility
  • 10. Documentation 01 02 03 04 Software best practices were followed to ensure reusability of code Version control Code Review Unit testing 10
  • 11. Minimizing sampling points by predicting protein expression saved resource and time Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 + + Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6 + + ML 11

Editor's Notes

  1. Steps: Collect data from relevant sources. Organize data and upload to centralized repository for normalization. Define the meta data and capture relevant information using CV. Annotate, map and enrich data relations for consistency. Development and maintenance of database for (un)structured data. Format and load data in to DB. Exploratory, inferential statistics and predictive analytics. Web services, Web app development; cloud based solutions. Build user friendly interface to access and analyze data.
  2. How easy it is for us to track entities across pipeline? “Entity” is defined as anything that is tracked in the pipeline at a relatively high frequency and has a physical presence “Relationships” connect entities using a specific property ·       “Meta-data” is defined as data that supports entity discovery in the research pipeline Research pipeline has numerous entities and relationships. Linking different data systems and the lineage to enable faster decisions is always a challenge in the pipelines. For example genes to be tested are identified by different IDs during different phases of the pipeline. Creating one data system for all by linking different data to enable faster decisions is possible only if the metadata for these entities is uniform. Which means there should be a consistency in construct names, gene and protein names and also linkage between these, means, consistent mapping of protein names to construct names. So stewardship work would be important to bring consistency in the data for these entities.