Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

CMSY workshop - Gianpaolo Coro (ISTI-CNR)

309 views

Published on

This presentation introduces the CMSY model and the use of the BlueBRIDGE e-Infrastructure services to conduct experiments on data-limited stocks.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

CMSY workshop - Gianpaolo Coro (ISTI-CNR)

  1. 1. BlueBRIDGE receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 675680 www.bluebridge-vres.eu CMSY Workshop Gianpaolo Coro ISTI-CNR gianpaolo.coro@isti.cnr.it
  2. 2. Verhulst (1844) Model of Population Growth
  3. 3. The Schaefer Model (1954) Fmsy = ½ rmax Bmsy = ½ k
  4. 4. http://onlinelibrary.wiley.com/doi/10.1111/faf. 12190/full CMSY An Open-source software for data- limited stock assessment https://github.com/SISTA16/cmsy
  5. 5. From Catch-MSY to CMSY • Catch-MSY gave robust estimates of MSY, but biased estimates of r (too low) and k (too high).  Catch-MSY could not reliably predict biomass  CMSY overcomes the bias and gives reasonable estimates of Fmsy and Bmsy  CMSY gives reasonable estimates of biomass
  6. 6. Input https://github.com/SISTA16/cmsy https://github.com/SISTA16/cmsy/blob/master/CMSY_UserGuide_24Oct16.docx Resilience prior r range High 0.6 – 1.5 Medium 0.2 – 0.8 Low 0.05 – 0.5 Very low 0.015 – 0.1 stock Name English Name Scientific Name Source Resilience StartYear EndYear Biomass status beginning Biomass status end Type Possible Crash her-47d3 Herring in Sub-area IV, Divisions VIId & IIIa (autumn- spawners) Atlantic herring Clupea harengus www.ices .dk Medium 1947 2013 Good/Bad Good/Bad Biomass/ CPUE/ none No Stock ID Year Catch Biomass/CPUE her-47d3 1947 581760 7053257 her-47d3 1948 502100 6362933 her-47d3 1949 508500 6070794 her-47d3 1950 491700 6119555 her-47d3 1951 600400 6199629 her-47d3 1952 664400 6058665 her-47d3 1953 698500 5950584 her-47d3 1954 762900 5809471 … … … … ID File: Time Series File: Estimated status of the biomass at the beginning and the end of the time series
  7. 7. Output Illex coindetii broadtail shortfin squid Analysis charts Management charts
  8. 8. CMSY - Approach • Given a catch trend estimate the best pair of values for the intrinsic rate of increase (r) and the carrying capacity (k) that generated the trend • Goal: estimate r and k. Constraint: the Schaefer function 𝒃 𝒕+𝟏 = 𝒃 𝒕 − 𝒄 𝒕 + 𝒓 𝒃 𝒕 𝟏 − 𝒃 𝒕 𝒌 𝒗 𝒔 CMSY has a double approach: Monte Carlo Analysis and Bayesian Schaefer Model
  9. 9. 𝑏𝑡+1 = 𝑏𝑡 − 𝑐𝑡 + 𝒓 𝑏𝑡 1 − 𝑏𝑡 𝒌 𝑣𝑠 Step 1: sample all possible r and k pairs compliant with the Schaefer function and the priors Step 2: resample in the lower tip. We search for the mean of maximum viable r-values Step 3: divide the tip in 25 ranges Step 4: take the median of the non-empty ranges Result by CMSY analysis True value Monte Carlo approach 𝑀𝑆𝑌 = 𝑟𝑏𝑒𝑠𝑡 𝑘 𝑏𝑒𝑠𝑡 4 MonteCarlo Analysis
  10. 10. Bayesian Schaefer Analysis • In the case the Biomass or CPUE trends are available, CMSY increases the precision of the estimation: • Goal: estimate r and k. Constraint: the Schaefer function 𝒃 𝒕+𝟏 = 𝒃 𝒕 − 𝒄 𝒕 + 𝒓 𝒃 𝒕 𝟏 − 𝒃 𝒕 𝒌 𝒗 𝒔
  11. 11. Issues Simple curve fitting does not work Estimate after curve fitting
  12. 12. 1. Clustering Analysis (DBScan) 4. Viable pairs densities 3. Gaussian Mixtures2. Trapezoidal density over the best fit r-k line Gm of the largest cluster Simulation of r density Search in the tip of the r-k triangle X Other unpromising approaches
  13. 13. Difficulty of the problem At each step of the sampling process: • The biomass values are strongly correlated between them • An iterative fitting model should • approximate the complete biomass curve using better and better r and k values • produce a new biomass curve correlated to the previous biomass curve • account for time dependency between the samples of one curve
  14. 14. Brain signals Robotics Biology Statistics Speech processing Mathematics Promising approach: Markov Chain Monte Carlo methods
  15. 15. MCMC and the Schaefer function 𝜃 = {𝛼, 𝑘, 𝑟, 𝑏0, 𝑏1, 𝑏2, . . , 𝑏 𝑇} b 0 b 1 b T … rk𝛼𝑏0 = 𝛼𝑘 𝑏𝑡+1 = 𝑏𝑡 − 𝑐𝑡 + 𝒓 𝑏𝑡 1 − 𝑏𝑡 𝒌 𝑣𝑠 • The Schaefer formula is used as likelihood(s) • Priors are required for 𝛼,k and r At each step, the MCMC produces samples for these parameters: where T is the maximum time of the biomass trend 𝜃0 = {𝛼0, 𝑘0, 𝑟0, 𝑏00, 𝑏10, 𝑏20, . . , 𝑏 𝑇0} 𝜃𝑀 = {𝛼𝑀, 𝑘𝑀, 𝑟𝑀, 𝑏0 𝑀, 𝑏1 𝑀, 𝑏2 𝑀, . . , 𝑏 𝑇 𝑀} 𝜃1 𝜃2 𝜃3 𝜃4 After M steps… Hierarchical model for the variables Details in Coro G. Gibbs Sampling with JAGS: Behind the Scenes. Technical report, 2017, CNR PUMA, cnr.isti/2017-B5-001 http://puma.isti.cnr.it/dfdownload.php?ident=/cnr.isti/2017-B5-001&langver=it&scelta=Metadata https://www.researchgate.net/publication/313905185_Gibbs_Sampling_with_JAGS_Behind_the_Scenes
  16. 16. • Simulating a biomass trend by means of an MCMC requires the model to produce, at each step of the sampling process, a new biomass time series by means of new values assigned to model variables • At each step the MCMC tries to simulate the whole biomass time series using new values for r and k • The new picked values are constrained by the Schaefer function and by the prior probability distributions that we assume for the r and k variables • MCMC accounts for these constraints during the fitting phase. After several sampling and adjustment steps, the model finds the variables values that produce the best approximation of the target biomass trend 𝜃1 = {𝛼1, 𝑘1, 𝑟1, 𝑏01, 𝑏11, 𝑏21, . . , 𝑏 𝑇1} 𝜃𝑀 = {𝛼𝑀, 𝑘𝑀, 𝑟𝑀, 𝑏0 𝑀, 𝑏1 𝑀, 𝑏2 𝑀, . . , 𝑏 𝑇 𝑀} …. 𝜃0 = {𝛼0, 𝑘0, 𝑟0, 𝑏00, 𝑏10, 𝑏20, . . , 𝑏 𝑇0} MCMC and the Schaefer function
  17. 17. MCMC using Gibbs Sampling • The user takes model variables and designs a graph of the constraints between the variables • The system writes a posterior probability density in terms of priors, likelihoods and conditionals • The model samples variables values from each factor, using approximate or analytical forms of these factors • At each variable sampling step, the model fixes the values of the other variables • After several steps the values are likely to converge to the best estimate … Best estimate set 𝜃∗ (Markov Chain) Details in Coro G. Gibbs Sampling with JAGS: Behind the Scenes. Technical report, 2017, CNR PUMA, cnr.isti/2017-B5-001 http://puma.isti.cnr.it/dfdownload.php?ident=/cnr.isti/2017-B5-001&langver=it&scelta=Metadata https://www.researchgate.net/publication/313905185_Gibbs_Sampling_with_JAGS_Behind_the_Scenes
  18. 18. 𝑏𝑡+1 = 𝑏𝑡 − 𝑐𝑡 + 𝒓 𝑏𝑡 1 − 𝑏𝑡 𝒌 𝑣𝑠 Step 1: consider the complete r,k space. Use the CMSY points as background reference only Step 2: produce iteratively points that are compliant with the observed Schaefer function and the priors Step 3: concentrate the search in the accumulation area Step 4: take the geometric mean in the accumulation area Bayesian Schaefer Model (BSM) estimate proxies
  19. 19. 1. Defining the form of the distributions of the priors was crucial! This was done using 50 simulated stocks for which r and k were known 2. Defining the initial ranges of the parameters is important This is done by the stock “expert” when indicating the prior knowledge in the ID file 3. A good balance was found between prior knowledge and knowledge from the data This was done by testing the model for several years in Workshops and in focus groups Key aspects of CMSY
  20. 20. CMSY on simulated data • CMSY was tested against 50 simulated stocks where true r, k, MSY and biomass were known • Monte Carlo analysis included the true r-k in 100% of the cases. BSM was used as coherence check
  21. 21. CMSY applications ICES: WKLife IV meeting (27-31 Oct. 2014): CMSY was applied to all the data-limited stocks proposed by ICES. http://ices.dk/sites/pub/Publication%20Reports/Expert%20Group%20Report/acom/2014/WKLIFE4/wklifeIV_2014.pdf WKLife V meeting (5-9 Oct. 2015): CMSY was applied to all the data-limited stocks proposed by ICES. http://ices.dk/sites/pub/Publication%20Reports/Expert%20Group%20Report/acom/2015/WKLIFEV/wklifeV_2015.pdf FAO: Assessed CMSY among the best performing data-limited stocks models http://www.fao.org/docrep/019/i3491e/i3491e.pdf Is building a Web interface to produce fisheries management reports using CMSY http://data.d4science.org/UHZhM2pVWW1IOXRjZk9qTytQTndqaUpjamJScDg0VVVHbWJQNStIS0N6Yz0 Oceana: Based on CMSY Oceana study (on 400 stocks) found that fish catches in European waters could increase by 57% if stocks were managed sustainably http://oceana.org/press-center/press-releases/oceana-study-finds-fish-catches-european-waters-could-increase-57-if
  22. 22. Results on European stocks
  23. 23. R. Froese, C. Garilao, H. Winker, G. Coro, N. Demirel, A. Tsikliras, D. Dimarchopoulou, G. Scarcella, A. Sampang-Reyes (2016) http://eu.oceana.org/sites/default/files/stockstatusreport_n ewversion_0.pdf Full Oceana report and status of EU stocks
  24. 24. European Stocks in 2013-2015 ◄ Management Decision ► Analysis of 397 stocks in European Seas and adjacent waters. Froese et al. 2016. ◄F&Reproduction&Growth►
  25. 25. Exploitation of 397 stocks in European Seas in 2013-2015. Note overlapping of different types of overexploitation, and therefore the numbers do not add up to 100%. Froese et al. 2016
  26. 26. Status of 397 stocks in European Seas 2013-2015. Froese et al. 2016
  27. 27. Froese et al. 2016 Compliance to Common Fisheries Policy of the European Union (CFP 2013) by Ecoregion 2013-2015
  28. 28. 1. Take the estimated biomass of the stocks in a certain region 2. Evolve the relative biomasses in time starting from values in the neighbourhoods of B/Bmsy, F and Fmsy considering different F scenarios 3. For each evolution, cluster the B/Bmsy values and then average the values 4. Average the averages of each evolved variable, and estimate the confidence intervals 5. Plot the averaged evolutions Producing multi-species future fisheries scenarios 𝐵𝑡+1 𝐵 𝑚𝑠𝑦 = 𝐵𝑡 𝐵 𝑚𝑠𝑦 + 2 𝐹 𝑚𝑠𝑦 𝐵𝑡 𝐵 𝑚𝑠𝑦 1 − 𝐵𝑡 2 𝐵 𝑚𝑠𝑦 − 𝐵𝑡 𝐵 𝑚𝑠𝑦 𝐹𝑡
  29. 29. Percentage of Stocks at or above Bmsy Best rebuilding under the 0.5 Fmsy scenario, worst under the 0.95 Fmsy scenario Rainer Froese – Presentation at the EU Parliament 27/02/2017
  30. 30. Percentage of Depleted Stocks Best rebuilding under the 0.5 Fmsy scenario, worst under the 0.95 Fmsy scenario Rainer Froese – Presentation at the EU Parliament 27/02/2017
  31. 31. Profitability Good profits for the 0.5 – 0.8 Fmsy scenarios Low profit for the 0.95 Fmsy scenario Rainer Froese – Presentation at the EU Parliament 27/02/2017 𝜋 𝑡 = 𝐹𝑡 𝐹𝑚𝑠𝑦 𝐵𝑡 𝐵 𝑚𝑠𝑦 − 1 − 𝜇 𝑚𝑒𝑎𝑛 100 𝐶 𝑀𝑆𝑌 𝑚𝑒𝑎𝑛 𝐹 𝐹𝑚𝑠𝑦 𝑚𝑒𝑎𝑛
  32. 32. Analysis of current (2013 -2015) and potential catches for 397 stocks in European Seas. Because of trophic interactions, all stocks cannot support maximum yields simultaneously. Froese et al. 2016.
  33. 33. Comments on the multi-species application of CMSY (1/2) Species interactions and environmental impact are implicitly considered in surplus production models by the rate of net productivity (r), which summarizes natural mortality such as caused by predation by other species, somatic growth such as modulated by available food sources, and recruitment such as impacted by environmental conditions and by parental egg production. CMSY accounts explicitly for reduced recruitment at small stock sizes*. *Froese, N. Demirel, G. Coro, K. Kleisner, H. Winker, Estimating fisheries reference points from catch and resilience. Fish Fish., (in press) 10.1111/faf.12190, J.T. Schnute, L.J. Richards, “Surplus production models” in Handbook of Fish Biology and Fisheries, P.J.B. Hart, J.D. Reynolds, Eds. (Blackwell, 2002), vol. 2, pp. 105–126. T.J. Quinn, R.B. Deriso, Quantitative fish dynamics (Oxford University Press, NY, 1999)
  34. 34. Compared with age-structured models where exploitation is typically reported for a narrow range of fully selected age classes, surplus production models estimate exploitation as total catch to biomass ratio. This is similar to using the mean exploitation rate across all age classes weighted by their respective contribution to the catch. If the catch consists to a large part of juveniles that are only partly selected by the gear, then the overall rate of fishing mortality strongly underestimates the fishing mortality of the fully selected older year classes.  In order to address the problem of underestimation of fishing mortality in fully selected age classes CMSY reduces the estimate of Fmsy as a linear function of biomass below 0.5 Bmsy. 𝐹𝑟𝑒𝑑𝑢𝑐𝑒𝑑 = 2 𝐵𝑡 𝐵 𝑚𝑠𝑦 𝐹 | 𝐵𝑡 𝐵 𝑚𝑠𝑦 < 0.5 Comments on the multi-species application of CMSY (2/2)
  35. 35. A collaborative approach to CMSY
  36. 36. Big Data 1. Large volume 2. High generation velocity 3. Large variety 4. Untrustworthyness (veracity) 5. High complexity (variability) Big Data: a dataset with large volume, variety, generation velocity, containing complex and untrustworthy information that requires nonconventional methods to extract, manage and process information within a reasonable time. 6. Understandable value
  37. 37. New Science Paradigms  Open Science: make scientific research, data and dissemination accessible to all levels of an inquiring society, amateur or professional. Keywords: Open Access, Open research, Open Notebook Science  E-Science: computationally intensive science is carried out in highly distributed network environments that use large data sets and require distributed computing and collaborative tools. Keywords: Provenance of the scientific process, Scientific workflows  Science 2.0: process and publish large data sets using a collaborative approach. Share from raw data to experimental results and processes. Support collaborative experiments and Reproducibility-Repeatability-Reusability (R-R-R) of Science. Keywords: collaborative and repeatable Science
  38. 38. Requirements for IT systems • Support collaborative research and experimentation • Implement Reproducibility-Repeatability-Reusability of Science • Allow sharing data, processes and findings • Grant free access to the produced scientific knowledge • Tackle Big Data challenges • Sustainability: low operational costs, low maintenance prices • Manage heterogeneous data/processes access policies • Meet industrial processes requirements
  39. 39. Distributed e-Infrastructures e-Infrastructures enable researchers at different locations across the world to collaborate in the context of their home institutions or in national or multinational scientific initiatives. • People can work together having shared access to unique or distributed scientific facilities (including data, instruments, computing and communications). Examples: Belief, http://www.beliefproject.org/ OpenAire, http://www.openaire.eu/ i-Marine, http://www.i-marine.eu/ EU-Brazil OpenBio, http://www.eubrazilopenbio.eu/
  40. 40. D4Science.org – Hybrid Data Infrastructure Unified Resource Space Powered by gCube Enables Integrates D4Science.org Infrastructure WPS Variety/Veracity Volume Velocity/ Variability 1. External Systems: • Storage • Computations • Data services 2. Integration services: • Manage external systems • Harmonise data • Host data and processes • Support adaptability 3. Infrastructure resources: • Manage security • Expose Integration services • Support information exchange between services Data Computational Infrastructures Computational Services A system of systems
  41. 41. Virtual Research Environments Integrates D4Science.org Infrastructure Unified Resource Space Powered by gCube Enables VRE VRE VRE WPS • Define sub-communities • Allow temporary dedicated assignment of computational, storage, and data resources • Manage policies • Support data and information sharing
  42. 42. Virtual Research Environments Innovative, web-based, community-oriented, comprehensive, flexible, and secure working environments. • Communities are provided with applications to interact with the VRE services • Client services are provided both with APIs (Java, R) and simple HTTP-REST interfaces
  43. 43. D4Science.org Services Mediators / Adapters Data Analytics Services Data Space Services Infrastructures and Service Providers Collaborative Services Core Services Resources Mgr Catalogue HN AAA VRE Mgr Social Networking Workspace Users Mgmt Standard based (e.g. CWS)Ad-hoc mediators Search Access Storage Dashboard Algorithms Workflows Browse Publish Curation
  44. 44. Researchers D4Science supports scientists in several domains 1. More than 25 000 taxonomic studies per month www.i-marine.eu 2. More than 60 000 species distribution maps produced and hosted www.d4science.eu 3. Used to build a pan- European geothermal energy map www.egip.d4science.org 4. Processing and management of heterogeneous environmental and Earth system data www.envriplus.eu 5. Enhances communication and exchange in Linguistic Studies, Humanities, Cultural Heritage, History and Archaeology www.parthenos-project.eu
  45. 45. Society and citizens 1. CNR Smart Campus - PISA: a Smart City experiment to optimise the use of resources and reduce the environmental impact, whilst increasing the quality of life and work. www.smart-applications.area.pi.cnr.it 2. SoBigData EU Prj. : create the Social Mining & Big Data Ecosystem, a research infrastructure for ethic-sensitive scientific discoveries and advanced applications of social data mining. www.sobigdata.eu data storage and mining of the large data information flow on parking, buildings and mobility computational platform and cloud storage to integrate data mining processes and host data and results, VA enabler
  46. 46. Policy Makers 1. D4Science hosts and runs the CMSY model to assess the health status of fisheries stocks http://www.cnr.it/news/index/news/id/5987 CMSY model 2. D4Science supports the identification of Marine Protected Areas to reduce adverse impact of human activities (e.g. fishing, aquaculture, tourism) on ecosystems, and to ensure these activities are properly embedded in policy frameworks. http://www.bluebridge-vres.eu/services/protected-area- impact-maps
  47. 47. Companies 1. Predict aquaculture revenue and business development www.bluebridge-vres.eu 2. Host and process satellite data from Copernicus 3. Collect logs from experts and centralize the network of information 4. Self-service integration of algorithms to enable Cloud computation services.d4science.org
  48. 48. Education Lecture-style: the course topics stress is different depending on the audience Interactive: after each explained topic, students do experiments Experimental: students reproduce the experiment shown by the teacher and possibly repeat it on their own data Social: students communicate via messaging or VRE discussion panel • 1 course/year In Pisa • 1 course/year In Paris • 12 courses In Copenhagen www.bluebridge-vres.eu International Council for the Exploration of the Sea • 38 courses All over the world +1000 attendees
  49. 49. Numbers • +2000 scientists in 44 countries, • integrating +50 heterogeneous data providers, • executing +25,000 processes/month, • providing access to over a billion quality records in repositories worldwide, • 99,7% service availability. • +50 VREs hosted
  50. 50. Statistical Manager D4Science Computational Facilities Sharing Setup and execution Computing Platform Coro, G., Candela, L., Pagano, P., Italiano, A., & Liccardo, L. (2015). Parallelizing the execution of native data mining algorithms for computational biology. Concurrency and Computation: Practice and Experience, 27(17), 4630-4644.
  51. 51. Collaborative experiments WS Shared online folders Inputs Outputs Results Computational system In the e-Infrastructure Through third party software
  52. 52. Process description: http://dataminer-d- d4s.d4science.org/wps/WebProcessingService?Request=DescribePro cess&Service=WPS&Version=1.0.0&gcube-token=d7a4076c-e8c1- 42fe-81e0- bdecb1e8074a&Identifier=org.gcube.dataanalysis.wps.statisticalman ager.synchserver.mappedclasses.generators.CMSY Process execution: http://dataminer-d- d4s.d4science.org/wps/WebProcessingService?request=Execute&ser vice=WPS&Version=1.0.0&gcube-token=d7a4076c-e8c1-42fe-81e0- bdecb1e8074a&lang=en- US&Identifier=org.gcube.dataanalysis.wps.statisticalmanager.synchse rver.mappedclasses.generators.CMSY&DataInputs=IDsFile=http://go o.gl/9rg3qK;StocksFile=http://goo.gl/Mp2ZLY;SelectedStock=HLH_M0 7 R/JAVA Client Guide: https://wiki.gcube- system.org/gcube/How_to_Interact_with_the_Statistical_Mana ger_by_client#WPS_Client InterfacesWeb Processing Service Web Interfaces QGIS
  53. 53. WPS REST I.S. Infrastructure Infrastructure resources Geospatial data External infra. WPS
  54. 54. Advantages of integrations  The process is available as-a-Service  Invoked via communication standards  Higher computational capabilities  Automatic creation of a Web interface  Provenance management  Storage of results on a high-availability system  Collaboration and sharing  Re-usability, e.g. from other software (e.g. QGIS)
  55. 55. Innovation through integrationVision: integration, sharing, and remote hosting help informing people and taking decisions
  56. 56. Using CMSY https://i-marine.d4science.org/group/drumfish/drumfish
  57. 57. Thank you!

×