Interpreting  yield variation in commercial production of crops / Como interpretar  la variación de la productividad a partir de información comercial de cultivos
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,096
On Slideshare
657
From Embeds
439
Number of Embeds
6

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 439

http://dapa.ciat.cgiar.org 429
https://twitter.com 3
http://ciat.cgiar.org 2
http://ciatcorporate.staging.wpengine.com 2
http://dapablog.local 2
http://www.google.com.co 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. www.ciat.cgiar.org Agricultura Eco-Eficiente para Reducir la Pobrezawww.ciat.cgiar.org Agricultura Eco-Eficiente para Reducir la PobrezaInterpreting yield variation in commercialproduction of cropsDAPA(Decision and Policy Analysis Program)
  • 2. Farmers’ productionexperiences/ commercialproduction of cropsPrinciples ofoperationalresearchModerninformationtechnologyWhat wedoEnvironmental characterization of the productionsystemAnalysis of the Observations to optimize the systemKg/Arbol Temperatura EdadObservations made by farmers according to theirparticular circumstancesInterpreting yield variation in commercial production of crops
  • 3. Distribution of yieldThe challenges !Parametric, non-parametric?.... The reality!Introduction
  • 4. 23• Models rely on on assumptions of:• Normality• Homogeneity of Variance• Independence• Mostly based on linear relationships• Models do not rely on assumptions• Linear/ non-linear relationshipsThe challenges !Parametric, non-parametric?... depends on distribution of residualsIntroductionPARAMETRICNON- PARAMETRIC
  • 5. As Sharon quoted:“La sabiduria del internet”:I have never come across a situation where a normal test is the rightthing to do.When the sample size is small, even big departures from normalityare not detected, and when your sample size is large, even thesmallest deviation from normality will lead to a rejected nullhttp://stackoverflow.com/questions/7781798/seeing-if-data-is-normally-distributed-in-r :The challenges !Parametric, non-parametric?Introduction
  • 6. “La sabiduria de”: Nassim Nicholas Taleb a “superhero of the mind”(The Black Swan, Fooled by Randommess, Antifragile) - Nassim Nicholas TalebThe statistical regress argument“We need the data to tells us what the probability distribution is,and a probability distribution to tell us how much data we need”The challenges !Parametric, non-parametric?Introduction
  • 7. The challenges !Parametric, non-parametric?IntroductionIn terms of Big Data• Approaching “N=All”• The first is to collect and use a lot of data rather than settle for small amountsor samples, as researchers have done for well over a century• We can learn from a large body of information things that we could notcomprehend when we used only smaller amounts• Sometimes to inform is better than explain – Looking for patternsDoctors save lives in Canada by knowing that something is likely to occur,this can be far more important than understanding exactly whyBig Data (Foreign Affairs magazine / McKinseys High Tech)
  • 8. What people think it is…What it actually is…Was clear for Antoine de Saint-Exupéry(The little prince )What people think it is…What it actually is… Some of ourfindings !The challenges !Parametric, non-parametric? Not always normal distribution !Introduction
  • 9. Analytical approachesV1 V2 V3 V4 V5 … V60 L 2 L 3 L 4 L 5 … Kg/plotObs 1 0.1 18 3 312 0.3 … 89 0 1 0 1 0 … 2.39Obs 2 0.2 15 4 526 0.1 … 52 1 0 0 0 1 … 30.35Obs 3 0.6 14 1 489 0.2 … 64 0 1 1 1 1 … 42.25Obs 4 0.05 19 2 523 0.5 … 13 0 0 0 0 1 … 52.50Obs 5 0.4 13 3 214 0.6 … 57 1 1 1 1 1 …Obs 6 0.8 12 4 265 0.4 … 24 1 1 0 1 0 … 82.25Obs 7 0.2 15 1 236 0.8 … 26 0 0 1 0 0 … 89.28Obs 8 0.1 17 3 541 0.1 … 35 0 1 1 1 0 … 125.0Obs9 0.6 16 2 845 0.3 … 51 0 0 1 1 0 … 142.8Obs10 0.1 18 1 126 0.1 … 43 1 1 0 0 1 … 150.0… … … … … … … … … … … … … … …Obs3000 0.04 15 3 235 0.6 … 85 1 1 1 1 0 … 18070.52L 1Supervised models – Parametric and non parametricsIndependent variables/ Inpust/predictorsdependent/output/response(known)…11
  • 10. 12L 1UnsupervisedmodelsV1 V2 V3 V4 V5 … V60 L 2 L 3 L 4 L 5Obs 1 0.1 18 3 312 0.3 … 89 0 1 0 1 0Obs 2 0.2 15 4 526 0.1 … 52 1 0 0 0 1Obs 3 0.6 14 1 489 0.2 … 64 0 1 1 1 1Obs 4 0.05 19 2 523 0.5 … 13 0 0 0 0 1Obs 5 0.4 13 3 214 0.6 … 57 1 1 1 1 1Obs 6 0.8 12 4 265 0.4 … 24 1 1 0 1 0Obs 7 0.2 15 1 236 0.8 … 26 0 0 1 0 0Obs 8 0.1 17 3 541 0.1 … 35 0 1 1 1 0Obs9 0.6 16 2 845 0.3 … 51 0 0 1 1 0Obs10 0.1 18 1 126 0.1 … 43 1 1 0 0 1… … … … … … … … … … … … …Obs3000 0.04 15 3 235 0.6 … 85 1 1 1 1 0L 1………………………………………………………………………………………………………………………………Analytical approaches – Parametric and nonparametricsSelf-organizing Maps (SOM)Observations close to each other in thevisualization space-4 -2 0 2 4 6 8-4-2024Axis1Axis2
  • 11. 1st case study- Andean blackberry based on ANNsScatter plot displaying MLP predicted yield versus real Andean blackberry yield, using only thevalidation dataset1715R² = 0.892-0.20.30.81.31.8-0.2 0.3 0.8 1.3 1.8Predictedyield(kg/plant/week)Real yield (kg/plant/week)PredictedSupervised models - Non-linear regressionCoefficient of determination= 0.89Histogram displaying yield data distribution of Andean blackberry(Kg/plant/week)Numberofobservations
  • 12. 00.010.020.030.040.050.060.070.08EffDepthTempAvg_1Na_un_chicalNa_un_cusbaTempAvg_0TempAvg_2TempAvg_3ExtDrainPrecAcc_1Trmm_3Nar-CalCal_riosu_zrSrtmSlopePrecAcc_0Trmm_2Na_un_cusalTrmm_0PrecAcc_3TempRang_0TempRang_2AB_Thorn_NNa_un_lajacPrecAcc_2Trmm_1IntDrainTempRang_3TempRang_112 20 3 5 17 23 26 11 22 16 2 7 8 9 19 15 4 13 28 18 24 1 6 25 14 10 27 21%SensitivitySensitivity distribution of the model with respect to the inputs/predictorsJiménez, D., Cock, J., Satizábal, F., Barreto, M., Pérez-Uribe, A., Jarvis, A. and Van Damme, P., 2009. Computers andElectronics in Agriculture. 69 (2): 198–208Sensitivity MatrixResults - Andean blackberry16Effective soil depthTemperature averagesGeographic location
  • 13. Results - Andean blackberry(a) Kohonen map displaying the resultant 6 clusters and their labels according to yield values (b)Component plane of Andean blackberry yield, the scale bar (right) indicates the range value ofproductivity in kg/plant/week The upper side exhibits high values of yield, whereas the lower displayslow valuesUnsupervised model - Visualization – component planes - SOM17Andean blackberry yieldKohonen map – 6 clusters(a) (b)
  • 14. Results - Andean blackberryComponent plane of effective soil depth. The scale bar (right) indicates the range value in cm of soil depth:the upper side of the scale exhibits high values, whereas the lower displays low values18Effective soil depthUnsupervised model - Visualization – component planes - SOM
  • 15. Results - Andean blackberryComponents planes of the temperature averages. In all figures, the scale bar (right)indicates the range value in ◦C of temperature. The upper side exhibits high values,whereas the lower displays low values19Unsupervised model - Visualization – component planes - SOM
  • 16. Results - Andean blackberryComponent planes of the specifics geographic areas Nariño–La Union–Chical alto (left) and Nariño–Launion–Cusillo bajo (right). The highest values indicate presence and the lowest absence as they arecategorical variablesVisualization – component planes - SOM20Nariño - La Union – Chical Alto Nariño - La Union – Cusillo bajo
  • 17. Drawbacks20• Crop management factors not included (only variety)• Only non-parametric approaches (Based on ANNs)• Limited spatial variation (Two locations- two departaments)Advantages• Predictor-predictor and predictor- response dependencies through Kohonen’sMaps• Combination of factors• Non-linear approach
  • 18. 2nd case study- LuloDistribution of R2 obtained with each modelRegression R2(mean)Confidenceinterval (95%)Robust (linear) 0.65 0.63 - 0.66MLP (non-linear) 0.69 0.67 - 0.70Both models explained more than 60% ofvariability in Lulo production2321Histogram displaying yield data distribution of lulo(g/plant/week)R2provided by each approachMLPRobust regression0.2877 0.3545 0.4214 0.4883 0.5552 0.6221 0.6889 0.7558 0.822702468101214161820222426NumberofobservationsNumberofobservationsNumberofobservationsSupervised modelling
  • 19. Results - LuloThe Sensitivity Matrix00.020.040.060.080.10.120.140.160.18%SensitivityJiménez, D., Cock, J., Jarvis, A., Garcia, J., Satizábal, H.F., Van Damme, Pérez-Uribe, A., and Barreto, M., 2010.Interpretation of Commercial Production Information: A case study of lulo, an under-researched Andean fruit.Agricultural Systems. 104 (3): 258-27022Sensitivity distribution of the model with respect to the inputs/predictorsEffective soil depthTemperature averagesSlope
  • 20. (a) U-matrix displaying the distance among prototypes. The scale bar (right) indicates the values ofdistance. The upper side exhibits high distances, whilst the lower displays low distances; (b) Kohonenmap displaying the 3 clusters obtained after using the K-means algorithm and the Davies–Bouldin indexThe three most relevant variables were used to train a Kohonen map and identify clusters ofHomogeneous Environmental Conditions (HECs)Results - LuloUnsupervised model - Clustering – component planes - SOM23U-Matrix Kohonen map – 3 clusters
  • 21. Results - LuloClustering – component planes - SOMA mixed model with the categorical variables of three HECs, location and farmerexplained more than 80% of variation in lulo yieldParameters Estimate(g/plant/week)StandardError%of total varianceModel including categorical variables of 3 HECs, location and farmHEC 1.85 2.01 61.2%Location 0.07 0.20 2.5%Site-Farm 0.57 0.21 19.0%Error 0.52 0.04 17.3%Total 100.0%Variance components of the mixed model estimations24
  • 22. Variable ranges HECSlope (degrees) EffDepth (cm) TempAvg_0(°C)5-14 21-40 15 -16.5 18-15 32-69 15 -18.9 213-24 40-67 15.8 -19 3HEC 3 yielded 41 g/plant/weekmore fruit than averageResults - Lulo-30.00-20.00-10.000.0010.0020.0030.0040.0050.001 2 3Luloyield(g/plant/week)Effects of clusters of environmentalconditions25
  • 23. Results - LuloFarm 7 and 9 in HEC 3. Farm 7 produced 68 g/plant/week less than average, whilstfarm 9 produced 51 g/plant/week more than average-80.00-60.00-40.00-20.000.0020.0040.0060.001 2 3 4 5 8 17 5 6 8 10 11 12 13 15 16 17 19 20 7 9 14 18 19 20 211 2 3Luloyield(g/plant/week)Effects of farms across clusters of environmental conditions1 2 326Jiménez, D., Cock, J., Jarvis, A., Garcia, J., Satizábal, H.F., Van Damme, Pérez-Uribe, A., and Barreto, M., 2010. Interpretation of Commercial ProductionInformation: A case study of lulo, an under-researched Andean fruit. Agricultural Systems. 104 (3): 258-270
  • 24. Drawbacks20• Crop management factors not included (only variety)• Compared with the Andean blackberry study, even more limited spatialVariation (locations within one department)Advantages• Iterative procedure (combination of parametric & non parametric /linear & non-linear)• Combination of factors• The study is the first formal research study that evidences the yield gapbetween farmers under similar climatic conditions in Colombia...provided thebasis for the site-specific analytical approaches• Successfully identified farms that have superior management practices forgiven environmental conditions
  • 25. 23Facto Class (Clusters de Clima)-1.0 -0.5 0.0 0.5 1.0-1.0-0.50.00.51.0Variables factor map (PCA)Dim 1 (44.64%)Dim2(27.62%)bio_1bio_2bio_3bio_4bio_5bio_6bio_7bio_8bio_9bio_10bio_11bio_12bio_13bio_14bio_15bio_16bio_17bio_18bio_19-5 0 5 10-4-20246Dim 1 (43.43%)Dim2(29.83%)Cluster123456783er Estudio de Caso- Plátano
  • 26. 23PCACATPCA (Clusters de Suelo)3er Estudio de Caso- Plátano
  • 27. 23C4S5Cluster de Clima 4Cluster Suelo 53er Estudio de Caso- Plátano
  • 28. C4S53er Estudio de Caso- PlátanoModelo Linear Generalizado ( MLG)Log(Yield) = (1.22) + densidad de siembra (0.0008) + EEl modelo - Dependencias entre predictores y la variable de respuestaNivel designificancia al 5%Log (Y) = B0 + X (B1) + E
  • 29. Log (Y) = B0 + X (B1) + X(B2) + EC5S5Log(Yield) = 0.80 + densidad de siembra (0.00101) + MezcVar (0.324154) + EModelo Linear Generalizado ( MLG)3er Estudio de Caso- PlátanoNivel designificancia al 5%
  • 30. 23log(Yield) = β0+ β1 𝑋1 + β2 𝑋2 + … + ε𝑒log(𝑌𝑖𝑒𝑙𝑑)= 𝑒β0+ β1 𝑋1+ β2 𝑋2+ … + ε(No linear)𝑌𝑖𝑒𝑙𝑑 = 𝑒β0+ β1 𝑋1+ β2 𝑋2+ … + ε (regresando a unidad inicial Tons/ha)𝑌𝑖𝑒𝑙𝑑 = 𝑒β0 𝑒β1 𝑋1 𝑒β2 𝑋2 … 𝑒ε (dependencias entre predictores y Tons/ha)Con el modelo es posible calcular en cuantas veces se aumenta odisminuye el rendimiento, mediante el cambio de una práctica específica• Interpretación de los parámetros3er Estudio de Caso- PlátanoModelo Linear Generalizado ( MLG)
  • 31. 23Log(Yield) = (1.22) + densidad de siembra (0.0008) + EYield = 𝒆(1.22) 𝒆densidad de siembra (0.0008) 𝒆EDensidad de siembra = 100  𝑒100 (0.008)Con un nivel de confianza del 90%, se puede esperar que por cada100 árboles/ha, el rendimiento anual en tons/ha aumente de un3.2% a un 14.2%.C4S5(Densidad de siembra)• Interpretación de los parámetrosModelo Linear Generalizado ( MLG)3er Estudio de Caso- Plátano
  • 32. 233rd case study- PlantainMezc Var = 𝟎. 𝟎𝟎𝟏𝟎  𝑒presencia (0.0010)Con un nivel de confianza del 90% se puede esperar que sembrarvariedades mezcladas pueda aumentar la producción en más de 10.46%.Log(Yield) = 0.80 + densidad de siembra (0.00101) + Mezc Var (0.324154) + EYield = 𝒆(0.80) 𝒆 densidad de siembra (0.00101) 𝒆Mezc Var (0.00101) 𝒆EC5S5 (Mezcla de Variedades)• Interpretación de los parámetrosModelo Linear Generalizado ( MLG)
  • 33. 23C4S5 (densidad de siembra)Yield = 𝒆(−2.078) 𝒆 densidad de siembra (0.0077) 𝒆dibujo de siembra(0.2079) 𝒆ECon un nivel de confianza del 90%, se puede esperar que por cada 10árboles/ha que se aumente en la densidad de siembra, el rendimiento anualen toneladas por hectárea puede aumentar de un 2.3% a un 13.2 %Densidad de siembra = 10 𝑒10 (0.0077)• Interpretación de los parámetrosModelo Linear Generalizado ( MLG)4to Estudio de Caso- Aguacate
  • 34. 23C2S4 (Dibujo de siembra)Yield = 𝒆(3.6) 𝒆 densidad de siembra (−0.006) 𝒆variedad (0.434) 𝒆dibujo de siembra (0.7946) 𝒆EDibujo de siembra = 10 𝑒presencia (0.7946)Con un nivel de confianza de 90%, se puede esperar que un productor de esta zonaque siembre en tresbolillo en vez de cuadrado, puede aumentar su producción enmás de 30.21%4to Estudio de Caso- Aguacate• Interpretación de los parámetrosModelo Linear Generalizado ( MLG)
  • 35. Drawbacks20• Not enough crop management factors to applied a hierarchical approach such asmixed models• Limited temporal variationAdvantages• Iterative procedure (combination of parametric and semi-parametric)• Crop management factors included (Farmer can control them)• Predictors- response dependencies through GLM• Large spatial variation• Soil information included• Linear & non-linear approach
  • 36. Gracias !!!
  • 37. -5 0 5-4-2024Factor 1: 3.8369 (48%)Factor2:2.518(31.5%)119424752476247724782479248424852486248724882489249024912492249324942497249824992500250125022503250425052506250725082510251125132514251525162517251825192524252525262527252825292530253125322533253425352536253725382539254030103011301230133014301530163017301830193020302130223023302430253026302730283029303030313032303330353036303730383039304030413042304330443045304630473048304930503051305230533054305730583059306030623063306430653067336073613201321132213231324132513261327132813291331133213331335135513591360136113621363136413651366136713681369137013811382138613901391139213931394139513991400140114021403140414051415141614171419142014211422155015511594161116121616162420642067206920702077207820792081208420892090209320962099210021012102210421052106211021112112211321142115211621172118211921202121212221232124212521262127212821292130213121322133213421352136213721382139214021412142214321442145214621472148214921502433254625472548254925502551255225532554255525562557255825592560256125622563256425652566256725682569257025712572257325742575257625772578257925802728577578579580581582583584585586587588589590592595596597605610613615619621 62464365067067167267367467567667968068268769069169283984084284484527062707270827092711271227132714271527162717271827192720272127222726272727292730273127362740274127422743274427452748274927502751275227532754275727913182319832003261326232633264326532663267326832693270327132723273998099819982998399859986998799889989999099916478698708718728738748758768778788798808938948958968978988999009019049059069079089099109119129139149159189199209239249259299389509519539549559569579589649651983198419852012201423862390246524802481248224832495249625092512252025212522252328222824282528262828282928302836284928502851285528562857285828592860286128632864286528662867286828702872287328772878287928802881288530343055305630613066306831073131313233249984bio_7bio_12bio_13bio_4bio_6bio_15cons_mthsbio_14cl1cl2cl3Parametric methods•Ordinary Least Squares regression (OLS)•Principal component analysis (PCA)•Robust linear regressions•Mixed Models•Best Linear Unbiased Prediction (BLUP)•Facto Class (Factor analysis, Wards method ,K-means•Categorical Principal Components Analysis(CATPCA)Semi or non-parametric methods• Generalized linear model (GLM)• Self Organizing Maps (SOM)• Multilayer perceptron (MLP)• Fuzzy logicAnalytical approaches – Data-drivenWe adapt a range of methodologies to the analysis of real data … rather than data to somemethodologies.