SlideShare a Scribd company logo
1 of 38
PyRate for fun and profit research
Brianna McHorse Spring 2019
What does PyRate do?
“PyRate is a program to estimate speciation, extinction, and preservation rates from
fossil occurrence data using a Bayesian framework.”
https://github.com/dsilvestro/PyRate
What does PyRate do?
“PyRate is a program to estimate speciation, extinction, and preservation rates from
fossil occurrence data using a Bayesian framework.”
What does PyRate do?
“PyRate is a program to estimate speciation, extinction, and preservation rates from
fossil occurrence data using a Bayesian framework.”
With a basic PyRate analysis, we can ask questions like:
- How do speciation and extinction rates of the Canidae vary through time?
- When are there changes in speciation/extinction rates in the Crocodylia?
PyRateOccurrence Data
Speciation rates
Extinction rates
Preservation rates
(through time)
Occurrence data, how do they work?
Requirements to run PyRate
- R
- Python 2 (I usually use 2.7)
- PyRate
- Download the PyRate repository from https://github.com/dsilvestro/PyRate (click ‘clone or
download’ and download as a .zip, then unzip)
- Occurrence data
- Check out MioMap for mammals (https://ucmp.berkeley.edu/miomap/)
- Paleobiodb or fossilworks for most things (http://paleobiodb.org, http://fossilworks.org/)
- Optional: powerful computer and/or cluster access
- This makes life easier when you’re doing lots of replicates, which we’ll talk about later
Optional follow-along step: download data
● fossilworks.org → Download → Collection, occurrence, or specimen data
● Fill in Taxon or taxa to include with a group of your choice (I suggest a well-
populated family like Canidae, Felidae, Equidae, etc)
● Collection fields tab: tick boxes for maximum age (Ma) and minimum age (Ma)
● Click Create data set (at the bottom)
● Clean and modify your data as necessary :)
● Decide on a file structure (see next slide for what I use)
○ ./Data/datafile.csv refers to datafile.csv in the Data folder, which is inside the PyRate folder
○ ../Data/datafile.csv does the same thing, but if your Data folder is next to the PyRate folder
○ one dot refers to the same folder you’re in, two dots goes up to the parent folder
○ Examples will proceed as if your Data folder is OUTSIDE of (next to) your PyRate folder
Suggested folder structure
- project_name
- R
- PyRate-setup.R
- data
- Crocodylidae.csv
- Canis_pbdb_data.csv
- PyRate-master
- all the folders/files that come with your download
- manuscript
- etc
In an ideal world, we would set this up as an R project.
But we’ll try not to add too many new things at a time right now.
What does PyRate do?
“PyRate is a program to estimate speciation, extinction, and preservation rates from
fossil occurrence data using a Bayesian framework.”
With a basic PyRate analysis, we can ask questions like:
- How do speciation and extinction rates of the Felidae vary through time?
- When are there changes in speciation/extinction rates in the Crocodylia?
PyRateOccurrence Data
Speciation rates
Extinction rates
Preservation rates
(through time)
A test run: BDS model
Birth-death with rate shifts
(birth = speciation aka origination, death = extinction, shifts = those rates can change)
This is your basic “what are rates doing through time and when do they change”
analysis.
1. Process your data in R: PyRate-setup.R
- Data cleaning (PBDB data is great, but it always has errors)
BDS Model
1. Process your data in R: PyRate-setup.R
- Data cleaning (PBDB data is great, but it always has errors)
You might need to rename columns!
fossilworks.org and paleobiodb.org aren’t always
consistent with column names.
PyRate expects:
Species
min_age
max_age
This should work with min_age (Ma), and even
max_ma, because they begin with the same word.
But, ma_max or min_ma (which are in our fossilworks
datasets) would need to be renamed and will give
you a cryptic error.
It’s all part of data cleaning!
BDS Model
1. Process your data in R: PyRate-setup.R
- Data cleaning (PBDB data is great, but it always has errors)
You might need to rename columns!
fossilworks.org and paleobiodb.org aren’t always
consistent with column names.
PyRate expects:
Species
min_age
max_age
This should work with min_age (Ma), and even
max_ma, because it begins the same way.
But, ma_max or min_ma (which are in our fossilworks
datasets) would need to be renamed and will give
you a cryptic error.
It’s all part of data cleaning!
BDS Model
1. Process your data in R: PyRate-setup.R
- Data cleaning (PBDB data is great, but it always has errors)
- Define extant taxa
- extant = c("Canis rufus","Canis lupus","Canis aureus","Canis latrans","Canis mesomelas","Canis
anthus","Pseudalopex gymnocercus","Canis adustus","Canis familiaris")
OR:
- extant = c(“Crocodylus acutus”, “Crocodylus intermedius”, “Crocodylus johnsoni”, “Crocodylus
mindorensis”, “Crocodylus moreletii”, “Crocodylus niloticus”, “Crocodylus novaeguineae”, “Crocodylus
palustris”, “Crocodylus porosus”, “Crocodylus rhombifer”, “Crocodylus siamensis”, Crocodylus suchus”,
“Osteolaemus tetraspis”, “Mecistops cataphractus”, “Mecistops leptorhynchus”)
BDS Model
1. Process your data in R: PyRate-setup.R
- Data cleaning (PBDB data is great, but it always has errors)
- Define extant taxa
- extant = c("Canis rufus","Canis lupus","Canis aureus","Canis latrans","Canis mesomelas","Canis
anthus","Pseudalopex gymnocercus","Canis adustus","Canis familiaris")
OR:
- extant = c(“Crocodylus acutus”, “Crocodylus intermedius”, “Crocodylus johnsoni”, “Crocodylus
mindorensis”, “Crocodylus moreletii”, “Crocodylus niloticus”, “Crocodylus novaeguineae”, “Crocodylus
palustris”, “Crocodylus porosus”, “Crocodylus rhombifer”, “Crocodylus siamensis”, Crocodylus suchus”,
“Osteolaemus tetraspis”, “Mecistops cataphractus”, “Mecistops leptorhynchus”)
- Source the utilities file: source("../PyRate-master/pyrate_utilities.r")
- Parse your data: extract.ages.pbdb(file= "../data/[data-file].csv",extant_species=extant)
BDS Model
1. Process your data in R: PyRate-setup.R
- Data cleaning (PBDB data is great, but it always has errors)
- Define extant taxa
- extant = c("Canis rufus","Canis lupus","Canis aureus","Canis latrans","Canis mesomelas","Canis
anthus","Pseudalopex gymnocercus","Canis adustus","Canis familiaris")
OR:
- extant = c(“Crocodylus acutus”, “Crocodylus intermedius”, “Crocodylus johnsoni”, “Crocodylus
mindorensis”, “Crocodylus moreletii”, “Crocodylus niloticus”, “Crocodylus novaeguineae”, “Crocodylus
palustris”, “Crocodylus porosus”, “Crocodylus rhombifer”, “Crocodylus siamensis”, Crocodylus suchus”,
“Osteolaemus tetraspis”, “Mecistops cataphractus”, “Mecistops leptorhynchus”)
- Source the utilities file: source("../PyRate-master/pyrate_utilities.r")
- Parse your data: extract.ages.pbdb(file= "../data/[data-file].csv",extant_species=extant)
Remember our file structure?
We’re in the R folder, and our R Project thinks that’s home.
‘..’tells the program that it needs to go up one folder first,
before looking for the PyRate-master or the data folders.
BDS Model
2. Open up the command prompt
If you’re on Windows, enclose file paths with “ not with ‘
1. Check working directory
> chdir [Windows]
> pwd [Mac terminal]
1. Set working directory to the folder project_name
> cd C:/Users/Bri/Desktop/awesome_project [Windows & Mac]
3. Check out data info
> python “./PyRate-master/PyRate.py” “./Data/[data_file]_PyRate.py” -
data_info
BDS Model
2. Open up the command prompt
If you’re on Windows, enclose file paths with “ not with ‘
1. Check working directory
> dir [Windows]
> pwd [Mac terminal]
1. Set working directory to the folder project_name
> cd C:/Users/Bri/Desktop/awesome_project [Windows & Mac]
3. Check out data info
> python “./PyRate-master/PyRate.py” “./Data/[data_file]_PyRate.py” -
data_info
BDS Model
Both PyRate-master and Data are folders in our current
folder, or working directory: project_name. So, we use a
single . to access them.
2. Open up the command prompt
If you’re on Windows, enclose file paths with “ not with ‘
1. Check working directory
> dir [Windows]
> pwd [Mac terminal]
1. Set working directory to the folder project_name
> cd C:/Users/Bri/Desktop/awesome_project [Windows & Mac]
3. Check out data info
> python “./PyRate-master/PyRate.py” “./Data/[data_file]_PyRate.py” -
data_info
BDS Model
Use Python and the stuff in the
PyRate folder
on this data file to give me the
data info
3. Run your BDS analysis
Now we run the analysis, specifying a few parameters with flags (those are the
things that come after a dash).
> python “./PyRate-master/PyRate.py” “./Data/[data_file]_PyRate.py” -A 4 -mG -n 200000 -s
5000
BDS Model
Same as before: tell Python where PyRate
is and which data file to use. These are flags!
-A 4 use algorithm 4
(RJMCMC)
-mG allow heterogeneity in
preservation rate acro
lineages
-n 200000 do 200k iterations
-s 5000 record values every 5k iterations
https://github.com/dsilvestro/PyRate/blob/master/tutorials/pyrate_tutorial_1.md#defining-the-preservation-model
3. Run your BDS analysis
BDS Model
A wild folder appeared!
- project_name
- R
- PyRate-setup.R
- data
- pyrate_mcmc_logs
- Crocodylidae.csv
- Canis_pbdb_data.csv
- PyRate-master
- all the folders/files that come with your download
- manuscript
- etc
4. Look at your results
BDS Model
A wild folder appeared!
- project_name
- R
- PyRate-setup.R
- data
- pyrate_mcmc_logs
- Crocodylidae.csv
- Canis_pbdb_data.csv
- PyRate-master
- all the folders/files that come with your download
- manuscript
- etc
4. Look at your results
BDS Model
This folder has results files in it.
[data_file]_1_Grj_ex_rates.log
[data_file]_1_Grj_mcmc.log
[data_file]_1_Grj_sp_rates.log
[data_file]_1_Grj_sum.log
Summarize model probabilities: how many rate shifts happened?
> python “./PyRate-master/PyRate.py” -mProb “./data/pyrate_mcmc_logs/[data_file]_Grj_mcmc.log” -b 10
4. Look at your results
BDS Model
Give me the rate shift probabilities from the MCMC logs in
our results folder
with a burn-in of 10 samples
(aka: drop the first 10 because the
parameters were wandering
around)
Summarize model probabilities: how many rate shifts happened?
4. Look at your results
BDS Model
Summarize model probabilities: how many rate shifts happened?
Plot your results: what does it look like??
> python “./PyRate-master/PyRate.py” -plotRJ “./data/pyrate_mcmc_logs/” -b 10
4. Look at your results
BDS Model
Summarize model probabilities: how many rate shifts happened?
Plot your results: what does it look like??
> python “./PyRate-master/PyRate.py” -plotRJ “./data/pyrate_mcmc_logs/” -b 10
> Rscript “./data/pyrate_mcmc_logs/RTT_plots.r”
4. Look at your results
BDS Model
BDS Model
What else can we do with PyRate?
1. Trait-correlated diversification models
2. Multivariate birth-death models
With these further analyses, we can ask questions like:
- Do larger-bodied canids go extinct more often?
- Do any/all of global temperature, the genus-level diversity of mammals, and
global proportion of swampland relative to other habitats correlate with
speciation or extinction in crocodylids?
Further analysis: Covar model
A trait covariation model lets speciation and extinction vary, per lineage, as a
function of an estimated correlation with a continuous trait.
Does larger body mass correlate with higher extinction rates in canids?
Covar Model
1. Provide a trait data file
We want a tab-separated text file of just two columns: Species and Trait.
This is usually easiest to make in R. We’ll put it in the data folder.
Covar Model
1. Run your Covar analysis
Again, we run the analysis, specifying a few parameters with flags.
> python “./PyRate-master/PyRate.py” “./data/[data_file]_PyRate.py” .
-trait_file “./data/trait_data.txt” -mCov 2 -logT 2 .
Covar Model
Flags we’re using:
-trait_file says where the trait data file can be found
-mCov 2 mCov specifies which algorithm to use; 2 tests
correlation with extinction rates only
-logT 2 specifies to transform the data with log10 (0 would
specify to not transform, and 1 is log base e)
See more at:
https://github.com/dsilvestro/PyRate/blob/master/tutorials/pyrate_tutorial_4.md#trait-correlated-diversification
1. Run your Covar analysis
Again, we run the analysis, specifying a few parameters with flags.
> python “./PyRate-master/PyRate.py” “./data/[data_file]_PyRate.py” .
-trait_file “./data/trait_data.txt” -mCov 2 -logT 2 .
Covar Model
Flags we’re using:
-trait_file says where the trait data file can be found
-mCov 2 mCov specifies which algorithm to use; 2 tests
correlation with extinction rates only
-logT 2 specifies to transform the data with log10 (0 would
specify to not transform, and 1 is log base e)
See more at:
https://github.com/dsilvestro/PyRate/blob/master/tutorials/pyrate_tutorial_4.md#trait-correlated-diversification
-mCov 1 correlated speciation
-mCov 2 correlated extinction
-mCov 3 correlated speciation and extinction
-mCov 4 correlated preservation
-mCov 5 correlated speciation, extinction, preservation
Covar fix
Instead of using extract.ages.pbdb(), we need to use extract.ages() on a data
frame that already has the Trait data in it.
You can take Canis_pbdb_data.txt (in our data folder) and add a Trait column to it in
the same way we just did for the trait_data.txt file.
Then, call extract.ages() on it and it should work.
What else can we do with PyRate?
1. Trait-correlated diversification models
2. Multivariate birth-death models
With these further analyses, we can ask questions like:
- Do larger-bodied canids go extinct more often?
- Do any/all of global temperature, the genus-level diversity of mammals, and
global proportion of swampland relative to other habitats correlate with
speciation or extinction in crocodylids?
What else can we do with PyRate?
1. Trait-correlated diversification models
2. Multivariate birth-death models
BDS Model
Preservation
rates
Origination/
extinction times
Origination/
extinction rates
Other clade origination/
extinction times
Environmental
variables
MBD Model
Covar Model
Continuous trait
Occurrences
Do rates correlate with other clade
diversity or environmental factors
like global temperature?
Do rates correlate with a continuous
trait (on a lineage-specific basis)?
What else can we do with PyRate?
1. Trait-correlated diversification models
2. Multivariate birth-death models
3. MULTIPLE REPLICATES (discuss)

More Related Content

What's hot

MORPHOLOGY AND SIGNIFICANCE OF CONODONT (group-D).pptx
MORPHOLOGY AND SIGNIFICANCE OF CONODONT (group-D).pptxMORPHOLOGY AND SIGNIFICANCE OF CONODONT (group-D).pptx
MORPHOLOGY AND SIGNIFICANCE OF CONODONT (group-D).pptxKuki Boruah
 
Meteorite Classification and Trajectory Modeling
Meteorite Classification and Trajectory ModelingMeteorite Classification and Trajectory Modeling
Meteorite Classification and Trajectory ModelingJessie Miller
 
Introduction to diagenesis
Introduction to diagenesisIntroduction to diagenesis
Introduction to diagenesisWajid09
 
Submarine Exhalative Deposits.pptx
Submarine Exhalative Deposits.pptxSubmarine Exhalative Deposits.pptx
Submarine Exhalative Deposits.pptxImposter7
 
Chap. 7 Radiolaria.pptx
Chap. 7 Radiolaria.pptxChap. 7 Radiolaria.pptx
Chap. 7 Radiolaria.pptxMuuminCabdulle
 
Cycles of climatic changes
Cycles of climatic changesCycles of climatic changes
Cycles of climatic changesAbdelrhim Eltijani
 
Presentation on igneous texture.pptx
Presentation on igneous texture.pptxPresentation on igneous texture.pptx
Presentation on igneous texture.pptxNareshDash4
 
Nanno planktons
Nanno planktonsNanno planktons
Nanno planktonsPramoda Raj
 
Coral reef presentation
Coral reef presentationCoral reef presentation
Coral reef presentationalmudena casado
 
trilobites
trilobitestrilobites
trilobitesPasupathi S
 
Marine environment 2015
Marine environment 2015Marine environment 2015
Marine environment 2015Sadiqul Amin
 
Gemmology notes
Gemmology notesGemmology notes
Gemmology notesPramoda Raj
 
chapt. 9 nannofossil coccolithophores (1).pptx
chapt. 9 nannofossil coccolithophores (1).pptxchapt. 9 nannofossil coccolithophores (1).pptx
chapt. 9 nannofossil coccolithophores (1).pptxMuuminCabdulle
 
Economic geology - Magmatic ore deposits_1
Economic geology - Magmatic ore deposits_1Economic geology - Magmatic ore deposits_1
Economic geology - Magmatic ore deposits_1AbdelMonem Soltan
 
An Introduction to Gemology
An Introduction to GemologyAn Introduction to Gemology
An Introduction to GemologySarina Mennasemay
 

What's hot (20)

MORPHOLOGY AND SIGNIFICANCE OF CONODONT (group-D).pptx
MORPHOLOGY AND SIGNIFICANCE OF CONODONT (group-D).pptxMORPHOLOGY AND SIGNIFICANCE OF CONODONT (group-D).pptx
MORPHOLOGY AND SIGNIFICANCE OF CONODONT (group-D).pptx
 
Aquatic macrophytes
Aquatic macrophytesAquatic macrophytes
Aquatic macrophytes
 
Meteorite Classification and Trajectory Modeling
Meteorite Classification and Trajectory ModelingMeteorite Classification and Trajectory Modeling
Meteorite Classification and Trajectory Modeling
 
Introduction to diagenesis
Introduction to diagenesisIntroduction to diagenesis
Introduction to diagenesis
 
Submarine Exhalative Deposits.pptx
Submarine Exhalative Deposits.pptxSubmarine Exhalative Deposits.pptx
Submarine Exhalative Deposits.pptx
 
Chap. 7 Radiolaria.pptx
Chap. 7 Radiolaria.pptxChap. 7 Radiolaria.pptx
Chap. 7 Radiolaria.pptx
 
Cycles of climatic changes
Cycles of climatic changesCycles of climatic changes
Cycles of climatic changes
 
Presentation on igneous texture.pptx
Presentation on igneous texture.pptxPresentation on igneous texture.pptx
Presentation on igneous texture.pptx
 
Gabbro
GabbroGabbro
Gabbro
 
Heavy minerals; IMSF, CU
Heavy minerals; IMSF, CUHeavy minerals; IMSF, CU
Heavy minerals; IMSF, CU
 
Nanno planktons
Nanno planktonsNanno planktons
Nanno planktons
 
Calcareous microfossils by Rathinavel
Calcareous microfossils by RathinavelCalcareous microfossils by Rathinavel
Calcareous microfossils by Rathinavel
 
Coral reef presentation
Coral reef presentationCoral reef presentation
Coral reef presentation
 
trilobites
trilobitestrilobites
trilobites
 
Marine environment 2015
Marine environment 2015Marine environment 2015
Marine environment 2015
 
Gemmology notes
Gemmology notesGemmology notes
Gemmology notes
 
chapt. 9 nannofossil coccolithophores (1).pptx
chapt. 9 nannofossil coccolithophores (1).pptxchapt. 9 nannofossil coccolithophores (1).pptx
chapt. 9 nannofossil coccolithophores (1).pptx
 
Mollusc
MolluscMollusc
Mollusc
 
Economic geology - Magmatic ore deposits_1
Economic geology - Magmatic ore deposits_1Economic geology - Magmatic ore deposits_1
Economic geology - Magmatic ore deposits_1
 
An Introduction to Gemology
An Introduction to GemologyAn Introduction to Gemology
An Introduction to Gemology
 

Similar to PyRate for fun and research

Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypetPyData
 
Big data for SAS programmers
Big data for SAS programmersBig data for SAS programmers
Big data for SAS programmersKevin Lee
 
Data exchange alternatives, SBIS conference in Stockholm (2008)
Data exchange alternatives, SBIS conference in Stockholm (2008)Data exchange alternatives, SBIS conference in Stockholm (2008)
Data exchange alternatives, SBIS conference in Stockholm (2008)Dag Endresen
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkEamonn Maguire
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRANRevolution Analytics
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the CloudDataMine Lab
 
Overview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseOverview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseBrendan Tierney
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskAUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskVĂ­ctor Zabalza
 
Easy R
Easy REasy R
Easy RAjay Ohri
 
R stata
R stataR stata
R stataAjay Ohri
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009Ian Foster
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and MiningDaniel JACOB
 
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Databricks
 

Similar to PyRate for fun and research (20)

Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypet
 
Big data for SAS programmers
Big data for SAS programmersBig data for SAS programmers
Big data for SAS programmers
 
Data exchange alternatives, SBIS conference in Stockholm (2008)
Data exchange alternatives, SBIS conference in Stockholm (2008)Data exchange alternatives, SBIS conference in Stockholm (2008)
Data exchange alternatives, SBIS conference in Stockholm (2008)
 
Stata tutorial university of princeton
Stata tutorial university of princetonStata tutorial university of princeton
Stata tutorial university of princeton
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
HEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 TalkHEPData Open Repositories 2016 Talk
HEPData Open Repositories 2016 Talk
 
Through the firewall with miniCRAN
Through the firewall with miniCRANThrough the firewall with miniCRAN
Through the firewall with miniCRAN
 
R Analytics in the Cloud
R Analytics in the CloudR Analytics in the Cloud
R Analytics in the Cloud
 
Overview of running R in the Oracle Database
Overview of running R in the Oracle DatabaseOverview of running R in the Oracle Database
Overview of running R in the Oracle Database
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with DaskAUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
AUTOMATED DATA EXPLORATION - Building efficient analysis pipelines with Dask
 
Easy R
Easy REasy R
Easy R
 
R stata
R stataR stata
R stata
 
Extreme Scripting July 2009
Extreme Scripting July 2009Extreme Scripting July 2009
Extreme Scripting July 2009
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
User biglm
User biglmUser biglm
User biglm
 
Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
 

Recently uploaded

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.k64182334
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSĂ©rgio Sacani
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravitySubhadipsau21168
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...SĂ©rgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Recently uploaded (20)

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.Genomic DNA And Complementary DNA Libraries construction.
Genomic DNA And Complementary DNA Libraries construction.
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
The Black hole shadow in Modified Gravity
The Black hole shadow in Modified GravityThe Black hole shadow in Modified Gravity
The Black hole shadow in Modified Gravity
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

PyRate for fun and research

  • 1. PyRate for fun and profit research Brianna McHorse Spring 2019
  • 2. What does PyRate do? “PyRate is a program to estimate speciation, extinction, and preservation rates from fossil occurrence data using a Bayesian framework.” https://github.com/dsilvestro/PyRate
  • 3. What does PyRate do? “PyRate is a program to estimate speciation, extinction, and preservation rates from fossil occurrence data using a Bayesian framework.”
  • 4. What does PyRate do? “PyRate is a program to estimate speciation, extinction, and preservation rates from fossil occurrence data using a Bayesian framework.” With a basic PyRate analysis, we can ask questions like: - How do speciation and extinction rates of the Canidae vary through time? - When are there changes in speciation/extinction rates in the Crocodylia? PyRateOccurrence Data Speciation rates Extinction rates Preservation rates (through time)
  • 5. Occurrence data, how do they work?
  • 6. Requirements to run PyRate - R - Python 2 (I usually use 2.7) - PyRate - Download the PyRate repository from https://github.com/dsilvestro/PyRate (click ‘clone or download’ and download as a .zip, then unzip) - Occurrence data - Check out MioMap for mammals (https://ucmp.berkeley.edu/miomap/) - Paleobiodb or fossilworks for most things (http://paleobiodb.org, http://fossilworks.org/) - Optional: powerful computer and/or cluster access - This makes life easier when you’re doing lots of replicates, which we’ll talk about later
  • 7. Optional follow-along step: download data ● fossilworks.org → Download → Collection, occurrence, or specimen data ● Fill in Taxon or taxa to include with a group of your choice (I suggest a well- populated family like Canidae, Felidae, Equidae, etc) ● Collection fields tab: tick boxes for maximum age (Ma) and minimum age (Ma) ● Click Create data set (at the bottom) ● Clean and modify your data as necessary :) ● Decide on a file structure (see next slide for what I use) ○ ./Data/datafile.csv refers to datafile.csv in the Data folder, which is inside the PyRate folder ○ ../Data/datafile.csv does the same thing, but if your Data folder is next to the PyRate folder ○ one dot refers to the same folder you’re in, two dots goes up to the parent folder ○ Examples will proceed as if your Data folder is OUTSIDE of (next to) your PyRate folder
  • 8. Suggested folder structure - project_name - R - PyRate-setup.R - data - Crocodylidae.csv - Canis_pbdb_data.csv - PyRate-master - all the folders/files that come with your download - manuscript - etc In an ideal world, we would set this up as an R project. But we’ll try not to add too many new things at a time right now.
  • 9. What does PyRate do? “PyRate is a program to estimate speciation, extinction, and preservation rates from fossil occurrence data using a Bayesian framework.” With a basic PyRate analysis, we can ask questions like: - How do speciation and extinction rates of the Felidae vary through time? - When are there changes in speciation/extinction rates in the Crocodylia? PyRateOccurrence Data Speciation rates Extinction rates Preservation rates (through time)
  • 10. A test run: BDS model Birth-death with rate shifts (birth = speciation aka origination, death = extinction, shifts = those rates can change) This is your basic “what are rates doing through time and when do they change” analysis.
  • 11. 1. Process your data in R: PyRate-setup.R - Data cleaning (PBDB data is great, but it always has errors) BDS Model
  • 12. 1. Process your data in R: PyRate-setup.R - Data cleaning (PBDB data is great, but it always has errors) You might need to rename columns! fossilworks.org and paleobiodb.org aren’t always consistent with column names. PyRate expects: Species min_age max_age This should work with min_age (Ma), and even max_ma, because they begin with the same word. But, ma_max or min_ma (which are in our fossilworks datasets) would need to be renamed and will give you a cryptic error. It’s all part of data cleaning! BDS Model
  • 13. 1. Process your data in R: PyRate-setup.R - Data cleaning (PBDB data is great, but it always has errors) You might need to rename columns! fossilworks.org and paleobiodb.org aren’t always consistent with column names. PyRate expects: Species min_age max_age This should work with min_age (Ma), and even max_ma, because it begins the same way. But, ma_max or min_ma (which are in our fossilworks datasets) would need to be renamed and will give you a cryptic error. It’s all part of data cleaning! BDS Model
  • 14. 1. Process your data in R: PyRate-setup.R - Data cleaning (PBDB data is great, but it always has errors) - Define extant taxa - extant = c("Canis rufus","Canis lupus","Canis aureus","Canis latrans","Canis mesomelas","Canis anthus","Pseudalopex gymnocercus","Canis adustus","Canis familiaris") OR: - extant = c(“Crocodylus acutus”, “Crocodylus intermedius”, “Crocodylus johnsoni”, “Crocodylus mindorensis”, “Crocodylus moreletii”, “Crocodylus niloticus”, “Crocodylus novaeguineae”, “Crocodylus palustris”, “Crocodylus porosus”, “Crocodylus rhombifer”, “Crocodylus siamensis”, Crocodylus suchus”, “Osteolaemus tetraspis”, “Mecistops cataphractus”, “Mecistops leptorhynchus”) BDS Model
  • 15. 1. Process your data in R: PyRate-setup.R - Data cleaning (PBDB data is great, but it always has errors) - Define extant taxa - extant = c("Canis rufus","Canis lupus","Canis aureus","Canis latrans","Canis mesomelas","Canis anthus","Pseudalopex gymnocercus","Canis adustus","Canis familiaris") OR: - extant = c(“Crocodylus acutus”, “Crocodylus intermedius”, “Crocodylus johnsoni”, “Crocodylus mindorensis”, “Crocodylus moreletii”, “Crocodylus niloticus”, “Crocodylus novaeguineae”, “Crocodylus palustris”, “Crocodylus porosus”, “Crocodylus rhombifer”, “Crocodylus siamensis”, Crocodylus suchus”, “Osteolaemus tetraspis”, “Mecistops cataphractus”, “Mecistops leptorhynchus”) - Source the utilities file: source("../PyRate-master/pyrate_utilities.r") - Parse your data: extract.ages.pbdb(file= "../data/[data-file].csv",extant_species=extant) BDS Model
  • 16. 1. Process your data in R: PyRate-setup.R - Data cleaning (PBDB data is great, but it always has errors) - Define extant taxa - extant = c("Canis rufus","Canis lupus","Canis aureus","Canis latrans","Canis mesomelas","Canis anthus","Pseudalopex gymnocercus","Canis adustus","Canis familiaris") OR: - extant = c(“Crocodylus acutus”, “Crocodylus intermedius”, “Crocodylus johnsoni”, “Crocodylus mindorensis”, “Crocodylus moreletii”, “Crocodylus niloticus”, “Crocodylus novaeguineae”, “Crocodylus palustris”, “Crocodylus porosus”, “Crocodylus rhombifer”, “Crocodylus siamensis”, Crocodylus suchus”, “Osteolaemus tetraspis”, “Mecistops cataphractus”, “Mecistops leptorhynchus”) - Source the utilities file: source("../PyRate-master/pyrate_utilities.r") - Parse your data: extract.ages.pbdb(file= "../data/[data-file].csv",extant_species=extant) Remember our file structure? We’re in the R folder, and our R Project thinks that’s home. ‘..’tells the program that it needs to go up one folder first, before looking for the PyRate-master or the data folders. BDS Model
  • 17. 2. Open up the command prompt If you’re on Windows, enclose file paths with “ not with ‘ 1. Check working directory > chdir [Windows] > pwd [Mac terminal] 1. Set working directory to the folder project_name > cd C:/Users/Bri/Desktop/awesome_project [Windows & Mac] 3. Check out data info > python “./PyRate-master/PyRate.py” “./Data/[data_file]_PyRate.py” - data_info BDS Model
  • 18. 2. Open up the command prompt If you’re on Windows, enclose file paths with “ not with ‘ 1. Check working directory > dir [Windows] > pwd [Mac terminal] 1. Set working directory to the folder project_name > cd C:/Users/Bri/Desktop/awesome_project [Windows & Mac] 3. Check out data info > python “./PyRate-master/PyRate.py” “./Data/[data_file]_PyRate.py” - data_info BDS Model Both PyRate-master and Data are folders in our current folder, or working directory: project_name. So, we use a single . to access them.
  • 19. 2. Open up the command prompt If you’re on Windows, enclose file paths with “ not with ‘ 1. Check working directory > dir [Windows] > pwd [Mac terminal] 1. Set working directory to the folder project_name > cd C:/Users/Bri/Desktop/awesome_project [Windows & Mac] 3. Check out data info > python “./PyRate-master/PyRate.py” “./Data/[data_file]_PyRate.py” - data_info BDS Model Use Python and the stuff in the PyRate folder on this data file to give me the data info
  • 20. 3. Run your BDS analysis Now we run the analysis, specifying a few parameters with flags (those are the things that come after a dash). > python “./PyRate-master/PyRate.py” “./Data/[data_file]_PyRate.py” -A 4 -mG -n 200000 -s 5000 BDS Model Same as before: tell Python where PyRate is and which data file to use. These are flags! -A 4 use algorithm 4 (RJMCMC) -mG allow heterogeneity in preservation rate acro lineages -n 200000 do 200k iterations -s 5000 record values every 5k iterations https://github.com/dsilvestro/PyRate/blob/master/tutorials/pyrate_tutorial_1.md#defining-the-preservation-model
  • 21. 3. Run your BDS analysis BDS Model
  • 22. A wild folder appeared! - project_name - R - PyRate-setup.R - data - pyrate_mcmc_logs - Crocodylidae.csv - Canis_pbdb_data.csv - PyRate-master - all the folders/files that come with your download - manuscript - etc 4. Look at your results BDS Model
  • 23. A wild folder appeared! - project_name - R - PyRate-setup.R - data - pyrate_mcmc_logs - Crocodylidae.csv - Canis_pbdb_data.csv - PyRate-master - all the folders/files that come with your download - manuscript - etc 4. Look at your results BDS Model This folder has results files in it. [data_file]_1_Grj_ex_rates.log [data_file]_1_Grj_mcmc.log [data_file]_1_Grj_sp_rates.log [data_file]_1_Grj_sum.log
  • 24. Summarize model probabilities: how many rate shifts happened? > python “./PyRate-master/PyRate.py” -mProb “./data/pyrate_mcmc_logs/[data_file]_Grj_mcmc.log” -b 10 4. Look at your results BDS Model Give me the rate shift probabilities from the MCMC logs in our results folder with a burn-in of 10 samples (aka: drop the first 10 because the parameters were wandering around)
  • 25. Summarize model probabilities: how many rate shifts happened? 4. Look at your results BDS Model
  • 26. Summarize model probabilities: how many rate shifts happened? Plot your results: what does it look like?? > python “./PyRate-master/PyRate.py” -plotRJ “./data/pyrate_mcmc_logs/” -b 10 4. Look at your results BDS Model
  • 27. Summarize model probabilities: how many rate shifts happened? Plot your results: what does it look like?? > python “./PyRate-master/PyRate.py” -plotRJ “./data/pyrate_mcmc_logs/” -b 10 > Rscript “./data/pyrate_mcmc_logs/RTT_plots.r” 4. Look at your results BDS Model
  • 29. What else can we do with PyRate? 1. Trait-correlated diversification models 2. Multivariate birth-death models With these further analyses, we can ask questions like: - Do larger-bodied canids go extinct more often? - Do any/all of global temperature, the genus-level diversity of mammals, and global proportion of swampland relative to other habitats correlate with speciation or extinction in crocodylids?
  • 30. Further analysis: Covar model A trait covariation model lets speciation and extinction vary, per lineage, as a function of an estimated correlation with a continuous trait. Does larger body mass correlate with higher extinction rates in canids? Covar Model
  • 31. 1. Provide a trait data file We want a tab-separated text file of just two columns: Species and Trait. This is usually easiest to make in R. We’ll put it in the data folder. Covar Model
  • 32. 1. Run your Covar analysis Again, we run the analysis, specifying a few parameters with flags. > python “./PyRate-master/PyRate.py” “./data/[data_file]_PyRate.py” . -trait_file “./data/trait_data.txt” -mCov 2 -logT 2 . Covar Model Flags we’re using: -trait_file says where the trait data file can be found -mCov 2 mCov specifies which algorithm to use; 2 tests correlation with extinction rates only -logT 2 specifies to transform the data with log10 (0 would specify to not transform, and 1 is log base e) See more at: https://github.com/dsilvestro/PyRate/blob/master/tutorials/pyrate_tutorial_4.md#trait-correlated-diversification
  • 33. 1. Run your Covar analysis Again, we run the analysis, specifying a few parameters with flags. > python “./PyRate-master/PyRate.py” “./data/[data_file]_PyRate.py” . -trait_file “./data/trait_data.txt” -mCov 2 -logT 2 . Covar Model Flags we’re using: -trait_file says where the trait data file can be found -mCov 2 mCov specifies which algorithm to use; 2 tests correlation with extinction rates only -logT 2 specifies to transform the data with log10 (0 would specify to not transform, and 1 is log base e) See more at: https://github.com/dsilvestro/PyRate/blob/master/tutorials/pyrate_tutorial_4.md#trait-correlated-diversification -mCov 1 correlated speciation -mCov 2 correlated extinction -mCov 3 correlated speciation and extinction -mCov 4 correlated preservation -mCov 5 correlated speciation, extinction, preservation
  • 34.
  • 35. Covar fix Instead of using extract.ages.pbdb(), we need to use extract.ages() on a data frame that already has the Trait data in it. You can take Canis_pbdb_data.txt (in our data folder) and add a Trait column to it in the same way we just did for the trait_data.txt file. Then, call extract.ages() on it and it should work.
  • 36. What else can we do with PyRate? 1. Trait-correlated diversification models 2. Multivariate birth-death models With these further analyses, we can ask questions like: - Do larger-bodied canids go extinct more often? - Do any/all of global temperature, the genus-level diversity of mammals, and global proportion of swampland relative to other habitats correlate with speciation or extinction in crocodylids?
  • 37. What else can we do with PyRate? 1. Trait-correlated diversification models 2. Multivariate birth-death models BDS Model Preservation rates Origination/ extinction times Origination/ extinction rates Other clade origination/ extinction times Environmental variables MBD Model Covar Model Continuous trait Occurrences Do rates correlate with other clade diversity or environmental factors like global temperature? Do rates correlate with a continuous trait (on a lineage-specific basis)?
  • 38. What else can we do with PyRate? 1. Trait-correlated diversification models 2. Multivariate birth-death models 3. MULTIPLE REPLICATES (discuss)

Editor's Notes

  1. https://github.com/dsilvestro/PyRate See tutorials at: https://github.com/dsilvestro/PyRate/tree/master/tutorials They’re fairly regularly updated, but sometimes have slightly out-of-date syntax.
  2. Diversification rates are an entire field, because fundamentally, it’s interesting to know why clades are shaped the way they are. Did lots of things go extinct really fast? Did speciation drop off a cliff? It’s Bayesian because we start with data and say, given these data, what’s the likelihood of these parameters? (As opposed to ‘standard’ or frequentist statistics, where you start with the parameters and figure out the probability of getting your data) Then we make some small tweaks to the proposed parameters and test again, which is the Markov Chain Monte Carlo bit. We do all of this using occurrence data.
  3. An occurrence is literally just that: someone found a fossil of a thing, decided what it was, and put it in a database. We’ll work with the Paleobiology Database today, a very common source. The PBDB only works from published occurrences, so it’s smaller but more curated than some others. You can access the data from fossilworks.org or paleobiodb.org. They are run by different people but should technically have the same content.
  4. Install R: https://www.r-project.org/ I recommend RStudio as a GUI for working with R: https://www.rstudio.com/. Although many of the basic tasks for setting up a PyRate analysis can be done by opening R at the command line, it’s not as comfortable as using RStudio for most people. How to check if you have Python 2 installed (also has download instructions): https://edu.google.com/openonline/course-builder/docs/1.10/set-up-course-builder/check-for-python.html If you already have Python 3 but not Python 2, here's helpful pointers for Windows and for Mac. You may need to do some searching about setting up virtual environments. For scientific purposes, look into the Anaconda distribution, which comes pre-packaged with a bunch of scientific computing packages like numpy, scipy, and pandas. https://www.anaconda.com/distribution/ This will also make sure that you have a couple of PyRate’s required packages installed (numpy and scipy).
  5. Data cleaning: it’s a thing. Carefully check through your data. This might include doing things like listing unique(dataframe$Species) to see a list of all the taxa. Check for typos, old names that need to be updated, etc. You might need to do things like filter out cetaceans from an artiodactyl analysis. You might need to clean or assign trait data for a later analysis. Have fun, be thorough, and save it in a script so you can see what you did and repeat it later!
  6. Here’s one option for file structure, and it’s usually how I set up my analyses. The directory project_name will be our working directory for the rest of these analyses. That means we’ll always refer to the other folders relative to it. On why R projects in RStudio are awesome: https://www.tidyverse.org/articles/2017/12/workflow-vs-script/ R projects also enable using the here package, which is AMAZING for working with file paths across computers (i.e., in any code that you ever want to share, or if you work on multiple devices): https://github.com/jennybc/here_here
  7. As a reminder, these are the questions we’re starting with. You can basically shove your occurrence data into PyRate and get these results out. How exciting!
  8. BDS is the basic/first model of a PyRate analysis, and it’s what was diagrammed on the first slide.
  9. The processing step can be done at the command line, which is how the official tutorial shows it...or in an actual R script, which I recommend. We won’t bother with data cleaning for today’s stuff, but here is an example from my ungulates PyRate project - this is just typo fixes, dropping some unwanted genera, and then the very start of updating taxonomy.
  10. One step that can trip you up: column naming.
  11. This is the cryptic error and it’s very not-helpful if you haven’t run into this problem before. It means you need to rename your age columns to start with min_ and max_ !
  12. OK, define your vector of extant taxa. Feel free to copy and paste. One for dogs, one for crocs.
  13. Now run the command to process what you need.
  14. Since in an ideal world we would be using an R project, I’m assuming that we are working in the R subfolder - hence the two dots ../ used to go up one level, to the project_name folder, before going back into the subfolders for PyRate-master or data.
  15. You can list the files in your current working directory using ls (Mac, Linux/Unix) or dir (Windows)
  16. Note: -data_info is a REALLY good place to make sure FastPyRateC is loaded successfully. Otherwise, everything will be really slow. (It will show up as ‘Module FastPyRateC was loaded successfully.’ if it worked.) If FastPyRateC didn’t load successfully, follow the instructions here: https://github.com/dsilvestro/PyRate/blob/master/pyrate_lib/fastPyRateC/README.md
  17. A translation of what we’re telling the command prompt.
  18. There are lots of different options for a basic BDS analysis, and we set them with flags. Happily, the official tutorial is pretty clear about ~best practices, so you can get more info there. In a real analysis, you’d probably want closer to 10-50 million iterations, ish.
  19. If it’s working, your analysis will print stuff out. These are the parameters as they update with each round of Markov Chain Monte Carlo and you can mostly ignore it, but it’s fun to watch the progress go by. If you get errors here about required libraries not being installed, you need the numpy and scipy packages. If you have an Anaconda Prompt from installing Anaconda, or you’re on a Linux/Unix system (or maybe Mac?), just try pip install numpy and pip install scipy at the command line. On Windows, try python -m pip install numpy (and then the same for scipy) If those don’t work, get thee to a web search for “How to install [numpy/scipy] on [your operating system]”
  20. You don’t need to directly interact with these files unless you’re getting exact numbers and posterior means to report in a paper. All the interactions will happen from more stuff at the command line.
  21. Burn-in refers to dropping the first little bit of sampling while your parameters are jumping around like mad - so, waiting until your chain has converged. The amount to drop is going to vary by dataset. You can look at the traces of your *_mcmc.log file in a program like tracer to determine how much to cut off. Note also that first 10 is really 10 x sampling rate, which is 5000. So we’re dropping 50,000 of 200k.
  22. So the 1-rate model is most probable: there is one speciation rate and one extinction rate. (We’re won’t put much stock in this because it’s a small dataset and we didn’t use too many iterations and we also didn’t clean the data at all, so please infer absolutely no biological relevance from these results.)
  23. It’ll print some more stuff. There may be an error about not finding the file, which is annoying but easily fixable (see next slide)
  24. ...basically, it generates an R file that will make your figures, and the R file generation works even if running it does not. So, we can manually run the R file if we got the error by using this Rscript command.
  25. Speciation up top, extinction below. Rates through time on the left, frequency of rate shift on the right. Because it’s Bayesian, the frequency means: out of all the times we sampled using MCMC and got a result, what proportion showed a rate shift in this 1-million-year bin? (You can change the bin size as one of the options when you call the plotting command.) If the frequency goes above the dotted line, it’s a significant shift. (Here we don’t have that)
  26. This isn’t real trait data! rnorm() draws from a normal distribution with a mean and standard deviation that you give it. But you can do it with real data, this is just for an example :)
  27. Here are your various model options.
  28. :( It looks like the option to include your trait file as a separate file is broken.
  29. Here’s the fix I have used in the past. Currently left as an exercise for the reader, sorry.
  30. A reminder of the other kinds of analysis we can do.
  31. A flowchart of how these analyses are related to BDS.
  32. Multiple replicates let you integrate uncertainty in the date ranges of your fossils. It’s accomplished by adding replicates = n to the extract.ages() or extract.ages.pbdb() function we used to prepare our dataset earlier, in order to get n replicates. You’ll then want to run a BDS analysis on every replicate. This is where we start to get into needing to script things, because you don’t want to manually enter 100 datasets of BDS analysis into the terminal. This is also where cluster access comes in handy (or patience and a decently-powered desktop). I am happy to provide advice and examples on these steps but it’s outside the scope of this lab meeting!