Bad Data is No Better Than No Data - Impact of Automation in Data Stewardship workflows

Bad Data is No Better Than No Data! -
Impact of Automation in Data Stewardship
Workflows in Plant Agriculture Industry
Karnam Vasudeva Rao
Senior Scientist, Data Science Team
Monsanto
1

Innovations at Monsanto
R & D
Discovering innovative solutions to
challenges big and small, helping
farmers grow more sustainably.
Agricultural Biologicals
Using naturally-occurring microbes to
benefit the soil and seed.
Modern Agriculture
Evolving the approach to agricultural
innovations and farming practices that
helps farmers increase efficiency.
Crop Protection
Guarding plants from disease, weeds,
and pests.
Data Science
Measuring the health of plants,
available natural resources, and the
efficiency of a farm..
Biotechnology
Introducing greater tolerance and
adaptability to a seed product.
Plant Breeding
Merging plant genetics for improved
yield, water efficiency, and more.
Biotechnol
ogy
• Headquarters: St. Louis, Missouri, United States
• Fortune 500 company
• Over 20,000 employees globally
• Facilities in 69 countries
2
Monsanto

01 – Getting Data,
Compilation
02 – Curation and
Ontology
03 –Data
processing and DB
management
04 –Data analysis,
App development
and Visual
analytics
Acquisition Normalization Integration Analytics
Data Stewardship phases to enable Data2Decisions
3

Tracing entities in R & D pipeline is difficult
Registration Cloning
Gene
transfer
Green-
house
Field
studies
in-house IDs,
Gene, Protein IDs
DB1 & 3
Gene, Monsanto
vector name
DB 1, 2, & 4
Monsanto vector
name, Plant
barcode, sample
barcode
DB 5 & 6
Monsanto vector
name, Plant
barcode, sample
barcode
DB 7
Monsanto vector
name, Plant
barcode, sample
barcode, Field IDs
DB 8-10
EntitiesDatabases
4
Pipeline

Identifiers Gene/in-house IDs Monsanto vector ID Plant/Seed IDs
Data
storage
Example
Field studies
Green-house
experiments
Lab experiments
Sample name Insect Name
NCR
Corn rootworm
CRW
SCR
5
DB1
DB6
DB2 DB3
DB7
DB4 DB5
DB8 DB9 DB10
D3 data from in-house research studies

From D3 to C3
6
Common name (Acronym) Scientific name Colloquial term
Northern corn rootworm (NCR) Diabrotica barberi
Corn rootworm
(CRW)
Southern corn rootworm (SCR) Diabrotica undecimpunctata howardi
Implementing CV and ontologies removes ambiguity caused by using
colloquial terms and makes the data Clean, Consistent and Connected.
Corn/corn = Maize/maize = Zea mays

Data Stewardship to Achieve Data Integrity
• Ensures data reusability, accessibility, and quality
• Has consistent data definitions, data aliases
• Metadata (data about data) enables organized information retrieval
• Integrated, enterprise-wide view of the data provides the foundation for the shared data
7
Standardizing metadata is important for data integrity, reproducibility and accessibility

Raw (dirty)
data
Metadata (Crop, Insect,
Plant stage and Gen)
Curated data
Clean and consistent Data
Dashboards - analytics DB 3 (Oracle) PostgreSQL
API API
dataCuratoR: Automated data standardization
of real-time insect assay data to enable decisions
CRON
8
DB1 DB2
Decisions

Automation increased accuracy and
minimized resource usage
9
• Data access
• Requirement gathering
• Patterns, missing data and
inconsistencies
• Source for answers
• Manual curation
• Programming
• More patterns,
gaps and
inconsistencies?
• Maintenance &
enhancements
• Minimal coding
• Patterns, gaps
and
inconsistencies
• Coding & APIs
• More patterns,
inconsistances?
• Maintenance &
enchantements
• Minimal coding
FY16
2.2 Resource
hours
FY17
0.9 Resource
hours
FY18
0.4 Resource
hours
FY19
0.3 Resource
hours
Increased data
accessibility

Documentation
01
02
03
04
Software best practices were followed to
ensure reusability of code
Version control
Code Review
Unit testing
10

Minimizing sampling points by predicting
protein expression saved resource and time
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6
+ +
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6
+ +
ML
11

Bad Data is No Better Than No Data - Impact of Automation in Data Stewardship workflows

Recommended

Recommended

More Related Content

Similar to Bad Data is No Better Than No Data - Impact of Automation in Data Stewardship workflows

Similar to Bad Data is No Better Than No Data - Impact of Automation in Data Stewardship workflows (20)

Recently uploaded

Recently uploaded (20)

Bad Data is No Better Than No Data - Impact of Automation in Data Stewardship workflows

Editor's Notes