SlideShare a Scribd company logo
Graphs	
  are	
  Feeding	
  the	
  World

Tim	
  Williamson	
  (@TimWilliate)

Data	
  Scientist	
  
Monsanto
Our	
  Growing	
  Planet	
  Faces	
  Difficult	
  Challenges
Sources: http://esa.un.org/unpd/wpp/; UN FAO Food Balance Sheet, “World Health Organization
Global and regional food consumption patterns and trends”; The World Bank, Food and Agriculture
Organization of the United Nations (FAO-STAT), Monsanto Internal Calculations; @TimWilliate #MonDataScience
Rising
Population
Growing enough for
a growing world
Global Population
1980 TODAY 2050
4.4B
7.1B
9.6B+
Limited
Farmland
Farmers will need to
produce enough food
with fewer resources
to support our
world population
Acres per Person
1961 2050
1 <1/3
Changing
Economies
and Diets
A growing global middle
class is choosing animal
protein – meat, eggs,
and dairy – as a larger
part of their diet
Dietary Percentage of Protein
14%
1965 2030
9%
Changing
Climate
Farmers are impacted
by climate change
in many ways:
WATER AVAILABILITY ISSUES
INCREASINGLY
UNPREDICTABLE WEATHER
INSECT RANGE EXPANSION
WEED PRESSURE CHANGES
CROP DISEASE INCREASES
PLANTING ZONE SHIFTS
Improved	
  Genetic	
  Gain	
  is	
  One	
  of	
  Several	
  Tools	
  
Humanity	
  has	
  to	
  Address	
  These	
  Challenges
Sources: http://www.ers.usda.gov/data-products/feed-grains-database/feed-grains-yearbook-tables.aspx
• 8	
  commodity	
  crops	
  and	
  18	
  vegetable	
  crop	
  
families,	
  sold	
  in	
  160	
  countries
Average US Corn Yield 1866 - 2014
Yield(Bushels/Acre)
0
45
90
135
180
Year
1865 1890 1915 1940 1965 1990 2015
@TimWilliate #MonDataScience
10,000 Years
Genetic	
  Gain	
  is	
  Created	
  Through	
  Breeding	
  Cycles
@TimWilliate #MonDataScience
X
Lab Data (Genotypes)
Field Data (Phenotypes)
Lab Data (Genotypes)
Field Data (Phenotypes)
Lab Data (Genotypes)
Lab Data (Genotypes)
Select the Best,
Discard the Rest
All Progeny of Two Parents Enter
Best One Leaves to
Become a Future Parent
1000’s crosses/year
Dozens progeny/cross
5-10 locations/progeny
$3-5 million/year
Screening
Field Trials
Every	
  Breeding	
  Cycle	
  Extends	
  a	
  Tree	
  of	
  Genetic	
  Ancestry
@TimWilliate #MonDataScience
C
A B
A B
C
A	
  single	
  parent
Forcing	
  Genetic	
  Ancestry	
  Data	
  into	
  Rows	
  and	
  Columns
• In	
  our	
  relational	
  store,	
  genetic	
  ancestry	
  data	
  was	
  spread	
  across	
  a	
  hierarchy	
  of	
  ~11	
  
tables	
  representing	
  a	
  total	
  of	
  ~895	
  million	
  rows	
  
• Every	
  read	
  became	
  an	
  unpleasant	
  exercise	
  in	
  CONNECT BY PRIOR
@TimWilliate #MonDataScience
Plant Plant:Plant Relationship
plant id attributes… plant id parent plant id parental role
Given	
  a	
  Starting	
  Population,	
  Return	
  All	
  Ancestors
ResponseTime(s)
0
6
12
18
24
30
Depth
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
SQL on Oracle Exadata
@TimWilliate #MonDataScience
Genetic	
  Ancestry	
  is	
  a	
  Naturally	
  Occurring	
  Graph
• ~700	
  million	
  nodes	
  
• ~1.2	
  billion	
  relationships	
  
• ~1.7	
  billion	
  properties
@TimWilliate #MonDataScience
:Plant :Plant
:PARENT
:Plant Inventory
:Plant Inventory
:PARENT
:Planting
:PLANTED
:Selection :SELECTED
:HARVESTED
:INVENTORY
Given	
  a	
  Starting	
  Population,	
  Return	
  All	
  Ancestors
ResponseTime(s)
0
6
12
18
24
30
Depth
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
SQL on Oracle Exadata Traversal Framework on Neo4j
~90x	
  Difference
@TimWilliate #MonDataScience
Retrieving	
  Genetic	
  Ancestry	
  in	
  a	
  ‘RESTful’	
  Style
4
2
3
:PARENT
{parental_role: male}
:PARENT
{parental_role: female}
1
5
:PARENT
{parental_role: female}
:PARENT
{parental_role: female}
6
:PARENT
{parental_role: female}
/population/1/ancestors
RESTful	
  Resource
{“nodes”: [
{“id”: 1},
{“id”: 2},
{“id”: 3},
{“id”: 4},
{“id”: 5},
{“id”: 6}
],
“relationships”: [
{“from”: 1, “to”: 2, “relation”: “PARENT”},
{“from”: 2, “to”: 3, “relation”: “PARENT”},
{“from”: 2, “to”: 4, “relation”: “PARENT”},
{“from”: 3, “to”: 5, “relation”: “PARENT”}
{“from”: 4, “to”: 6, “relation”: “PARENT”}
]}
@TimWilliate #MonDataScience
Building	
  a	
  Grammar	
  for	
  Ancestral	
  Milestones
/population/1/binary-­‐cross
RESTful	
  Resource
{
“male”: {“id”: 4},
“female”: {“id”: 3}
}
4
2
3
:PARENT
{parental_role: male}
:PARENT
{parental_role: female}
1
5
:PARENT
{parental_role: female}
:PARENT
{parental_role: female}
6
:PARENT
{parental_role: female}
@TimWilliate #MonDataScience
Pruning	
  Genetic	
  Ancestry	
  Trees	
  ‘On	
  the	
  Fly’
/population/1/ancestors?until-­‐first=binary-­‐cross
RESTful	
  Resource
{“nodes”: [
{“id”: 1},
{“id”: 2},
{“id”: 3},
{“id”: 4}
],
“relationships”: [
{“from”: 1, “to”: 2, “relation”: “PARENT”},
{“from”: 2, “to”: 3, “relation”: “PARENT”},
{“from”: 2, “to”: 4, “relation”: “PARENT”}
]}
4
2
3
:PARENT
{parental_role: male}
:PARENT
{parental_role: female}
1
5
:PARENT
{parental_role: female}
:PARENT
{parental_role: female}
6
:PARENT
{parental_role: female}
@TimWilliate #MonDataScience
Ancestry-­‐as-­‐a-­‐Service	
  is	
  Released	
  September	
  2014
REST API (Ancestry-as-a-Service)
Data Scientists
Application
Developers • >30	
  elements	
  of	
  RESTful	
  grammar	
  
• ~120	
  applications	
  and	
  data	
  scientists	
  
• 	
  >	
  600	
  million	
  REST	
  requests	
  
• 10x	
  performance	
  boost	
  	
  
• 1	
  month	
  analysis	
  now	
  takes	
  3	
  hours
@TimWilliate #MonDataScience
Real-­‐Time	
  Reads	
  Require	
  Real-­‐Time	
  Data
• Ingestion	
  volume	
  is	
  ~10	
  million	
  writes/day	
  (not	
  a	
  write	
  heavy	
  flow)	
  
• https://github.com/MonsantoCo/goldengate-­‐kafka-­‐adapter
Field + Lab
Applications
{
“table”: “foo”
“type”: “INSERT”
“columns”: [
{
“name”: “bar”,
“before”: “fizz”,
“after”: “buzz”
}
]
}
REST API
REST API (Ancestry-as-a-Service)
POST /population
PUT /population/1234
PUT /population/parents
DELETE /population
@TimWilliate #MonDataScience
We’ve	
  Got	
  Ancestry	
  Figured	
  Out…What’s	
  Next?
Genotype Phenotype
Environment
Ancestry
@TimWilliate #MonDataScience
Layering	
  Genotype	
  Data	
  Over	
  Ancestry	
  Trees
Genotype	
  nodes	
  act	
  
as	
  simple	
  pointers	
  to	
  
remote	
  systems	
  
which	
  store	
  the	
  raw	
  
data
@TimWilliate #MonDataScience
:Plant :Plant
:PARENT
:Plant Inventory
:Plant Inventory
:PARENT
:Planting
:PLANTED
:Selection :SELECTED
:HARVESTED
:INVENTORY
:Genotype
:HAS_GENOTYPE
:Genotype
:HAS_GENOTYPE
Retrieving	
  Ancestry	
  Trees	
  Annotated	
  with	
  Genotypes	
  
{“nodes”: [
{“id”: 1, “genotypes”: [{“id”: 123}]},
{“id”: 2},
{“id”: 3},
{“id”: 4, “genotypes”: [{“id”: 456}]},
{“id”: 5, “genotypes”: [{“id”: 789}]}
],
“relationships”: [
{“from”: 1, “to”: 2, “relation”: “PARENT}”,
{“from”: 2, “to”: 3, “relation”: “PARENT}”,
{“from”: 3, “to”: 4, “relation”: “PARENT”},
{“from”: 3, “to”: 5, “relation”: “PARENT”}
]}
3
2
1
:Genotype
{marker_count: 300}
:Genotype
{marker_count: 60,000}
:Genotype
{marker_count: 60,000}
54
/population/1/ancestors?until=genotyped-­‐ancestor&props=genotypes
@TimWilliate #MonDataScience
Estimate	
  the	
  Genotype	
  of	
  Every	
  Seed	
  Produced
Genotypes
Field + Lab
Applications
REST API
REST API (Ancestry-as-a-Service)
Genotype Estimation
Engine
Genotype Annotated
Ancestry Trees
Required Genotype
DataSets
Estimated
Genotypes
New Estimated
Genotypes Messages
@TimWilliate #MonDataScience
Let’s	
  Revisit	
  the	
  Flow	
  of	
  a	
  Breeding	
  Cycle
@TimWilliate #MonDataScience
X
Lab Data (Genotypes)
Estimate Hi-Res Genotypes
Lab Data (Genotypes)
Field Data (Phenotypes)
Lab Data (Genotypes)
Lab Data (Genotypes)
Select the Best,
Discard the Rest
All Progeny of Two Parents Enter
Best One Leaves to
Become a Future Parent
1000’s crosses/year
Dozens progeny/cross
1 genotype/progeny
< $1 million/year
Genome-Wide
Selection
Width of Pipeline
Increases to
Accommodate More
Crosses
A	
  Glimpse	
  Inside	
  Our	
  Active	
  ‘Graphy’	
  Work
Sources: http://biodiversitylibrary.org/page/27066167#page/125/mode/1up @TimWilliate #MonDataScience
Constructing	
  Coancestry	
  Matrices
A
B C
ED GF
A B C D E F G
A 1 0.5 0.5 0.25 0.25 0.25 0.25
B 1 0 0.5 0.5 0 0
C 1 0 0 0.5 0.5
D 1 0 0 0
E 1 0 0
F 1 0
G 1
Coancestry(A)
• Consider	
  a	
  reduced	
  ancestor	
  tree	
  only	
  between	
  crosses	
  
• A	
  progeny	
  inherits	
  50%	
  of	
  its	
  genetics	
  from	
  each	
  parent	
  
• Key	
  input	
  for	
  a	
  large	
  class	
  of	
  predictive	
  genetic	
  analysis	
  algorithms
@TimWilliate #MonDataScience
Thank	
  You	
  All
@TimWilliate
http://engineering.monsanto.com/
Special	
  thanks	
  to	
  my	
  teammates	
  
• Jason	
  Clark	
  
• Marshall	
  Marietta	
  

More Related Content

What's hot

Gene Editing
Gene EditingGene Editing
Gene Editing
University of Florida
 
May 2012 Santa Barbara Audubon
May 2012 Santa Barbara AudubonMay 2012 Santa Barbara Audubon
May 2012 Santa Barbara Audubon
xx5v4
 
2015 Soil Science of America Meeting
2015 Soil Science of America Meeting2015 Soil Science of America Meeting
2015 Soil Science of America Meeting
Adina Chuang Howe
 
Organ cloning
Organ cloningOrgan cloning
Organ cloning
MorganScience
 
Mutation powerpoint
Mutation powerpointMutation powerpoint
Mutation powerpointfarrellw
 
Pattern of fecal progestagens, estrogens, and andorgens associated with repro...
Pattern of fecal progestagens, estrogens, and andorgens associated with repro...Pattern of fecal progestagens, estrogens, and andorgens associated with repro...
Pattern of fecal progestagens, estrogens, and andorgens associated with repro...Leslie Sterling
 
Research Symposium Poster Draft
Research Symposium Poster DraftResearch Symposium Poster Draft
Research Symposium Poster DraftSara Nass
 
Roots tech 2016
Roots tech 2016Roots tech 2016
Roots tech 2016
Dina Zielinski
 

What's hot (8)

Gene Editing
Gene EditingGene Editing
Gene Editing
 
May 2012 Santa Barbara Audubon
May 2012 Santa Barbara AudubonMay 2012 Santa Barbara Audubon
May 2012 Santa Barbara Audubon
 
2015 Soil Science of America Meeting
2015 Soil Science of America Meeting2015 Soil Science of America Meeting
2015 Soil Science of America Meeting
 
Organ cloning
Organ cloningOrgan cloning
Organ cloning
 
Mutation powerpoint
Mutation powerpointMutation powerpoint
Mutation powerpoint
 
Pattern of fecal progestagens, estrogens, and andorgens associated with repro...
Pattern of fecal progestagens, estrogens, and andorgens associated with repro...Pattern of fecal progestagens, estrogens, and andorgens associated with repro...
Pattern of fecal progestagens, estrogens, and andorgens associated with repro...
 
Research Symposium Poster Draft
Research Symposium Poster DraftResearch Symposium Poster Draft
Research Symposium Poster Draft
 
Roots tech 2016
Roots tech 2016Roots tech 2016
Roots tech 2016
 

Similar to Graphs are Feeding the World

Using the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support EcoinformaticsUsing the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support Ecoinformatics
ebiquity
 
Computational approaches to study Genetics
Computational approaches to study GeneticsComputational approaches to study Genetics
Computational approaches to study Genetics
Arithmer Inc.
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
FOODCROPS
 
Project Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant BreedingProject Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant Breeding
Phenome Networks
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
c.titus.brown
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Adina Chuang Howe
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' world
Joe Parker
 
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
StampedeCon
 
Open Tree of Life @NSF
Open Tree of Life @NSFOpen Tree of Life @NSF
Open Tree of Life @NSF
Karen Cranston
 
20110222 behesty monitoring and measuring biodiversity
20110222 behesty monitoring and measuring biodiversity20110222 behesty monitoring and measuring biodiversity
20110222 behesty monitoring and measuring biodiversity
agosti
 
iPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio WorkshopiPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio Workshop
Naim Matasci
 
Big Data Field Museum
Big Data Field MuseumBig Data Field Museum
Big Data Field Museum
Adina Chuang Howe
 
Remsen sherborne
Remsen sherborneRemsen sherborne
Remsen sherborne
David Remsen
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
c.titus.brown
 
Tair workshop stanford2017
Tair workshop stanford2017Tair workshop stanford2017
Tair workshop stanford2017
Phoenix Bioinformatics
 
Partnering on crop wild relative research at three scales: commonalities for ...
Partnering on crop wild relative research at three scales: commonalities for ...Partnering on crop wild relative research at three scales: commonalities for ...
Partnering on crop wild relative research at three scales: commonalities for ...
CWRofUS
 

Similar to Graphs are Feeding the World (20)

Using the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support EcoinformaticsUsing the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support Ecoinformatics
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Computational approaches to study Genetics
Computational approaches to study GeneticsComputational approaches to study Genetics
Computational approaches to study Genetics
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
 
Project Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant BreedingProject Unity: The Way of the Future for Plant Breeding
Project Unity: The Way of the Future for Plant Breeding
 
2014 nyu-bio-talk
2014 nyu-bio-talk2014 nyu-bio-talk
2014 nyu-bio-talk
 
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back AgainIowa State Bioinformatics BCB Symposium 2018 - There and Back Again
Iowa State Bioinformatics BCB Symposium 2018 - There and Back Again
 
Inference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' worldInference and informatics in a 'sequenced' world
Inference and informatics in a 'sequenced' world
 
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
 
Open Tree of Life @NSF
Open Tree of Life @NSFOpen Tree of Life @NSF
Open Tree of Life @NSF
 
20110222 behesty monitoring and measuring biodiversity
20110222 behesty monitoring and measuring biodiversity20110222 behesty monitoring and measuring biodiversity
20110222 behesty monitoring and measuring biodiversity
 
iPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio WorkshopiPlant TNRS for digital collections - iDigBio Workshop
iPlant TNRS for digital collections - iDigBio Workshop
 
Big Data Field Museum
Big Data Field MuseumBig Data Field Museum
Big Data Field Museum
 
Remsen sherborne
Remsen sherborneRemsen sherborne
Remsen sherborne
 
Remsen sherborne
Remsen sherborneRemsen sherborne
Remsen sherborne
 
2015 mcgill-talk
2015 mcgill-talk2015 mcgill-talk
2015 mcgill-talk
 
Tair workshop stanford2017
Tair workshop stanford2017Tair workshop stanford2017
Tair workshop stanford2017
 
Partnering on crop wild relative research at three scales: commonalities for ...
Partnering on crop wild relative research at three scales: commonalities for ...Partnering on crop wild relative research at three scales: commonalities for ...
Partnering on crop wild relative research at three scales: commonalities for ...
 

Recently uploaded

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 

Recently uploaded (20)

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 

Graphs are Feeding the World

  • 1. Graphs  are  Feeding  the  World
 Tim  Williamson  (@TimWilliate)
 Data  Scientist   Monsanto
  • 2. Our  Growing  Planet  Faces  Difficult  Challenges Sources: http://esa.un.org/unpd/wpp/; UN FAO Food Balance Sheet, “World Health Organization Global and regional food consumption patterns and trends”; The World Bank, Food and Agriculture Organization of the United Nations (FAO-STAT), Monsanto Internal Calculations; @TimWilliate #MonDataScience Rising Population Growing enough for a growing world Global Population 1980 TODAY 2050 4.4B 7.1B 9.6B+ Limited Farmland Farmers will need to produce enough food with fewer resources to support our world population Acres per Person 1961 2050 1 <1/3 Changing Economies and Diets A growing global middle class is choosing animal protein – meat, eggs, and dairy – as a larger part of their diet Dietary Percentage of Protein 14% 1965 2030 9% Changing Climate Farmers are impacted by climate change in many ways: WATER AVAILABILITY ISSUES INCREASINGLY UNPREDICTABLE WEATHER INSECT RANGE EXPANSION WEED PRESSURE CHANGES CROP DISEASE INCREASES PLANTING ZONE SHIFTS
  • 3. Improved  Genetic  Gain  is  One  of  Several  Tools   Humanity  has  to  Address  These  Challenges Sources: http://www.ers.usda.gov/data-products/feed-grains-database/feed-grains-yearbook-tables.aspx • 8  commodity  crops  and  18  vegetable  crop   families,  sold  in  160  countries Average US Corn Yield 1866 - 2014 Yield(Bushels/Acre) 0 45 90 135 180 Year 1865 1890 1915 1940 1965 1990 2015 @TimWilliate #MonDataScience 10,000 Years
  • 4. Genetic  Gain  is  Created  Through  Breeding  Cycles @TimWilliate #MonDataScience X Lab Data (Genotypes) Field Data (Phenotypes) Lab Data (Genotypes) Field Data (Phenotypes) Lab Data (Genotypes) Lab Data (Genotypes) Select the Best, Discard the Rest All Progeny of Two Parents Enter Best One Leaves to Become a Future Parent 1000’s crosses/year Dozens progeny/cross 5-10 locations/progeny $3-5 million/year Screening Field Trials
  • 5. Every  Breeding  Cycle  Extends  a  Tree  of  Genetic  Ancestry @TimWilliate #MonDataScience C A B A B C
  • 7. Forcing  Genetic  Ancestry  Data  into  Rows  and  Columns • In  our  relational  store,  genetic  ancestry  data  was  spread  across  a  hierarchy  of  ~11   tables  representing  a  total  of  ~895  million  rows   • Every  read  became  an  unpleasant  exercise  in  CONNECT BY PRIOR @TimWilliate #MonDataScience Plant Plant:Plant Relationship plant id attributes… plant id parent plant id parental role
  • 8. Given  a  Starting  Population,  Return  All  Ancestors ResponseTime(s) 0 6 12 18 24 30 Depth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SQL on Oracle Exadata @TimWilliate #MonDataScience
  • 9. Genetic  Ancestry  is  a  Naturally  Occurring  Graph • ~700  million  nodes   • ~1.2  billion  relationships   • ~1.7  billion  properties @TimWilliate #MonDataScience :Plant :Plant :PARENT :Plant Inventory :Plant Inventory :PARENT :Planting :PLANTED :Selection :SELECTED :HARVESTED :INVENTORY
  • 10. Given  a  Starting  Population,  Return  All  Ancestors ResponseTime(s) 0 6 12 18 24 30 Depth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SQL on Oracle Exadata Traversal Framework on Neo4j ~90x  Difference @TimWilliate #MonDataScience
  • 11. Retrieving  Genetic  Ancestry  in  a  ‘RESTful’  Style 4 2 3 :PARENT {parental_role: male} :PARENT {parental_role: female} 1 5 :PARENT {parental_role: female} :PARENT {parental_role: female} 6 :PARENT {parental_role: female} /population/1/ancestors RESTful  Resource {“nodes”: [ {“id”: 1}, {“id”: 2}, {“id”: 3}, {“id”: 4}, {“id”: 5}, {“id”: 6} ], “relationships”: [ {“from”: 1, “to”: 2, “relation”: “PARENT”}, {“from”: 2, “to”: 3, “relation”: “PARENT”}, {“from”: 2, “to”: 4, “relation”: “PARENT”}, {“from”: 3, “to”: 5, “relation”: “PARENT”} {“from”: 4, “to”: 6, “relation”: “PARENT”} ]} @TimWilliate #MonDataScience
  • 12. Building  a  Grammar  for  Ancestral  Milestones /population/1/binary-­‐cross RESTful  Resource { “male”: {“id”: 4}, “female”: {“id”: 3} } 4 2 3 :PARENT {parental_role: male} :PARENT {parental_role: female} 1 5 :PARENT {parental_role: female} :PARENT {parental_role: female} 6 :PARENT {parental_role: female} @TimWilliate #MonDataScience
  • 13. Pruning  Genetic  Ancestry  Trees  ‘On  the  Fly’ /population/1/ancestors?until-­‐first=binary-­‐cross RESTful  Resource {“nodes”: [ {“id”: 1}, {“id”: 2}, {“id”: 3}, {“id”: 4} ], “relationships”: [ {“from”: 1, “to”: 2, “relation”: “PARENT”}, {“from”: 2, “to”: 3, “relation”: “PARENT”}, {“from”: 2, “to”: 4, “relation”: “PARENT”} ]} 4 2 3 :PARENT {parental_role: male} :PARENT {parental_role: female} 1 5 :PARENT {parental_role: female} :PARENT {parental_role: female} 6 :PARENT {parental_role: female} @TimWilliate #MonDataScience
  • 14. Ancestry-­‐as-­‐a-­‐Service  is  Released  September  2014 REST API (Ancestry-as-a-Service) Data Scientists Application Developers • >30  elements  of  RESTful  grammar   • ~120  applications  and  data  scientists   •  >  600  million  REST  requests   • 10x  performance  boost     • 1  month  analysis  now  takes  3  hours @TimWilliate #MonDataScience
  • 15. Real-­‐Time  Reads  Require  Real-­‐Time  Data • Ingestion  volume  is  ~10  million  writes/day  (not  a  write  heavy  flow)   • https://github.com/MonsantoCo/goldengate-­‐kafka-­‐adapter Field + Lab Applications { “table”: “foo” “type”: “INSERT” “columns”: [ { “name”: “bar”, “before”: “fizz”, “after”: “buzz” } ] } REST API REST API (Ancestry-as-a-Service) POST /population PUT /population/1234 PUT /population/parents DELETE /population @TimWilliate #MonDataScience
  • 16. We’ve  Got  Ancestry  Figured  Out…What’s  Next? Genotype Phenotype Environment Ancestry @TimWilliate #MonDataScience
  • 17. Layering  Genotype  Data  Over  Ancestry  Trees Genotype  nodes  act   as  simple  pointers  to   remote  systems   which  store  the  raw   data @TimWilliate #MonDataScience :Plant :Plant :PARENT :Plant Inventory :Plant Inventory :PARENT :Planting :PLANTED :Selection :SELECTED :HARVESTED :INVENTORY :Genotype :HAS_GENOTYPE :Genotype :HAS_GENOTYPE
  • 18. Retrieving  Ancestry  Trees  Annotated  with  Genotypes   {“nodes”: [ {“id”: 1, “genotypes”: [{“id”: 123}]}, {“id”: 2}, {“id”: 3}, {“id”: 4, “genotypes”: [{“id”: 456}]}, {“id”: 5, “genotypes”: [{“id”: 789}]} ], “relationships”: [ {“from”: 1, “to”: 2, “relation”: “PARENT}”, {“from”: 2, “to”: 3, “relation”: “PARENT}”, {“from”: 3, “to”: 4, “relation”: “PARENT”}, {“from”: 3, “to”: 5, “relation”: “PARENT”} ]} 3 2 1 :Genotype {marker_count: 300} :Genotype {marker_count: 60,000} :Genotype {marker_count: 60,000} 54 /population/1/ancestors?until=genotyped-­‐ancestor&props=genotypes @TimWilliate #MonDataScience
  • 19. Estimate  the  Genotype  of  Every  Seed  Produced Genotypes Field + Lab Applications REST API REST API (Ancestry-as-a-Service) Genotype Estimation Engine Genotype Annotated Ancestry Trees Required Genotype DataSets Estimated Genotypes New Estimated Genotypes Messages @TimWilliate #MonDataScience
  • 20. Let’s  Revisit  the  Flow  of  a  Breeding  Cycle @TimWilliate #MonDataScience X Lab Data (Genotypes) Estimate Hi-Res Genotypes Lab Data (Genotypes) Field Data (Phenotypes) Lab Data (Genotypes) Lab Data (Genotypes) Select the Best, Discard the Rest All Progeny of Two Parents Enter Best One Leaves to Become a Future Parent 1000’s crosses/year Dozens progeny/cross 1 genotype/progeny < $1 million/year Genome-Wide Selection Width of Pipeline Increases to Accommodate More Crosses
  • 21. A  Glimpse  Inside  Our  Active  ‘Graphy’  Work Sources: http://biodiversitylibrary.org/page/27066167#page/125/mode/1up @TimWilliate #MonDataScience
  • 22. Constructing  Coancestry  Matrices A B C ED GF A B C D E F G A 1 0.5 0.5 0.25 0.25 0.25 0.25 B 1 0 0.5 0.5 0 0 C 1 0 0 0.5 0.5 D 1 0 0 0 E 1 0 0 F 1 0 G 1 Coancestry(A) • Consider  a  reduced  ancestor  tree  only  between  crosses   • A  progeny  inherits  50%  of  its  genetics  from  each  parent   • Key  input  for  a  large  class  of  predictive  genetic  analysis  algorithms @TimWilliate #MonDataScience
  • 23. Thank  You  All @TimWilliate http://engineering.monsanto.com/ Special  thanks  to  my  teammates   • Jason  Clark   • Marshall  Marietta