PyRate for fun and research

PyRate for fun and profit research
Brianna McHorse Spring 2019

What does PyRate do?
“PyRate is a program to estimate speciation, extinction, and preservation rates from
fossil occurrence data using a Bayesian framework.”
https://github.com/dsilvestro/PyRate

With a basic PyRate analysis, we can ask questions like:
- How do speciation and extinction rates of the Canidae vary through time?
- When are there changes in speciation/extinction rates in the Crocodylia?
PyRateOccurrence Data
Speciation rates
Extinction rates
Preservation rates
(through time)

Occurrence data, how do they work?

Requirements to run PyRate
- R
- Python 2 (I usually use 2.7)
- PyRate
- Download the PyRate repository from https://github.com/dsilvestro/PyRate (click ‘clone or
download’ and download as a .zip, then unzip)
- Occurrence data
- Check out MioMap for mammals (https://ucmp.berkeley.edu/miomap/)
- Paleobiodb or fossilworks for most things (http://paleobiodb.org, http://fossilworks.org/)
- Optional: powerful computer and/or cluster access
- This makes life easier when you’re doing lots of replicates, which we’ll talk about later

Optional follow-along step: download data
● fossilworks.org → Download → Collection, occurrence, or specimen data
● Fill in Taxon or taxa to include with a group of your choice (I suggest a well-
populated family like Canidae, Felidae, Equidae, etc)
● Collection fields tab: tick boxes for maximum age (Ma) and minimum age (Ma)
● Click Create data set (at the bottom)
● Clean and modify your data as necessary :)
● Decide on a file structure (see next slide for what I use)
○ ./Data/datafile.csv refers to datafile.csv in the Data folder, which is inside the PyRate folder
○ ../Data/datafile.csv does the same thing, but if your Data folder is next to the PyRate folder
○ one dot refers to the same folder you’re in, two dots goes up to the parent folder
○ Examples will proceed as if your Data folder is OUTSIDE of (next to) your PyRate folder

Suggested folder structure
- project_name
- R
- PyRate-setup.R
- data
- Crocodylidae.csv
- Canis_pbdb_data.csv
- PyRate-master
- all the folders/files that come with your download
- manuscript
- etc
In an ideal world, we would set this up as an R project.
But we’ll try not to add too many new things at a time right now.

With a basic PyRate analysis, we can ask questions like:
- How do speciation and extinction rates of the Felidae vary through time?
- When are there changes in speciation/extinction rates in the Crocodylia?
PyRateOccurrence Data
Speciation rates
Extinction rates
Preservation rates
(through time)

A test run: BDS model
Birth-death with rate shifts
(birth = speciation aka origination, death = extinction, shifts = those rates can change)
This is your basic “what are rates doing through time and when do they change”
analysis.

1. Process your data in R: PyRate-setup.R
- Data cleaning (PBDB data is great, but it always has errors)
BDS Model

You might need to rename columns!
fossilworks.org and paleobiodb.org aren’t always
consistent with column names.
PyRate expects:
Species
min_age
max_age
This should work with min_age (Ma), and even
max_ma, because they begin with the same word.
But, ma_max or min_ma (which are in our fossilworks
datasets) would need to be renamed and will give
you a cryptic error.
It’s all part of data cleaning!
BDS Model

You might need to rename columns!
fossilworks.org and paleobiodb.org aren’t always
consistent with column names.
PyRate expects:
Species
min_age
max_age
This should work with min_age (Ma), and even
max_ma, because it begins the same way.
But, ma_max or min_ma (which are in our fossilworks
datasets) would need to be renamed and will give
you a cryptic error.
It’s all part of data cleaning!
BDS Model

- Define extant taxa
- extant = c("Canis rufus","Canis lupus","Canis aureus","Canis latrans","Canis mesomelas","Canis
anthus","Pseudalopex gymnocercus","Canis adustus","Canis familiaris")
OR:
- extant = c(“Crocodylus acutus”, “Crocodylus intermedius”, “Crocodylus johnsoni”, “Crocodylus
mindorensis”, “Crocodylus moreletii”, “Crocodylus niloticus”, “Crocodylus novaeguineae”, “Crocodylus
palustris”, “Crocodylus porosus”, “Crocodylus rhombifer”, “Crocodylus siamensis”, Crocodylus suchus”,
“Osteolaemus tetraspis”, “Mecistops cataphractus”, “Mecistops leptorhynchus”)
BDS Model

OR:
- Source the utilities file: source("../PyRate-master/pyrate_utilities.r")
- Parse your data: extract.ages.pbdb(file= "../data/[data-file].csv",extant_species=extant)
BDS Model

OR:
- Source the utilities file: source("../PyRate-master/pyrate_utilities.r")
- Parse your data: extract.ages.pbdb(file= "../data/[data-file].csv",extant_species=extant)
Remember our file structure?
We’re in the R folder, and our R Project thinks that’s home.
‘..’tells the program that it needs to go up one folder first,
before looking for the PyRate-master or the data folders.
BDS Model

2. Open up the command prompt
If you’re on Windows, enclose file paths with “ not with ‘
1. Check working directory
> chdir [Windows]
> pwd [Mac terminal]
1. Set working directory to the folder project_name
> cd C:/Users/Bri/Desktop/awesome_project [Windows & Mac]
3. Check out data info
> python “./PyRate-master/PyRate.py” “./Data/[data_file]_PyRate.py” -
data_info
BDS Model

> dir [Windows]
data_info
BDS Model
Both PyRate-master and Data are folders in our current
folder, or working directory: project_name. So, we use a
single . to access them.

> dir [Windows]
data_info
BDS Model
Use Python and the stuff in the
PyRate folder
on this data file to give me the
data info

3. Run your BDS analysis
Now we run the analysis, specifying a few parameters with flags (those are the
things that come after a dash).
> python “./PyRate-master/PyRate.py” “./Data/[data_file]_PyRate.py” -A 4 -mG -n 200000 -s
5000
BDS Model
Same as before: tell Python where PyRate
is and which data file to use. These are flags!
-A 4 use algorithm 4
(RJMCMC)
-mG allow heterogeneity in
preservation rate acro
lineages
-n 200000 do 200k iterations
-s 5000 record values every 5k iterations
https://github.com/dsilvestro/PyRate/blob/master/tutorials/pyrate_tutorial_1.md#defining-the-preservation-model

3. Run your BDS analysis
BDS Model

A wild folder appeared!
- project_name
- R
- PyRate-setup.R
- data
- pyrate_mcmc_logs
- Crocodylidae.csv
- PyRate-master
- manuscript
- etc
4. Look at your results
BDS Model

A wild folder appeared!
- project_name
- R
- PyRate-setup.R
- data
- pyrate_mcmc_logs
- Crocodylidae.csv
- PyRate-master
- manuscript
- etc
BDS Model
This folder has results files in it.
[data_file]_1_Grj_ex_rates.log
[data_file]_1_Grj_mcmc.log
[data_file]_1_Grj_sp_rates.log
[data_file]_1_Grj_sum.log

Summarize model probabilities: how many rate shifts happened?
> python “./PyRate-master/PyRate.py” -mProb “./data/pyrate_mcmc_logs/[data_file]_Grj_mcmc.log” -b 10
BDS Model
Give me the rate shift probabilities from the MCMC logs in
our results folder
with a burn-in of 10 samples
(aka: drop the first 10 because the
parameters were wandering
around)

BDS Model

Plot your results: what does it look like??
> python “./PyRate-master/PyRate.py” -plotRJ “./data/pyrate_mcmc_logs/” -b 10
BDS Model

Plot your results: what does it look like??
> python “./PyRate-master/PyRate.py” -plotRJ “./data/pyrate_mcmc_logs/” -b 10
> Rscript “./data/pyrate_mcmc_logs/RTT_plots.r”
BDS Model

What else can we do with PyRate?
1. Trait-correlated diversification models
2. Multivariate birth-death models
With these further analyses, we can ask questions like:
- Do larger-bodied canids go extinct more often?
- Do any/all of global temperature, the genus-level diversity of mammals, and
global proportion of swampland relative to other habitats correlate with
speciation or extinction in crocodylids?

Further analysis: Covar model
A trait covariation model lets speciation and extinction vary, per lineage, as a
function of an estimated correlation with a continuous trait.
Does larger body mass correlate with higher extinction rates in canids?
Covar Model

1. Provide a trait data file
We want a tab-separated text file of just two columns: Species and Trait.
This is usually easiest to make in R. We’ll put it in the data folder.
Covar Model

1. Run your Covar analysis
Again, we run the analysis, specifying a few parameters with flags.
> python “./PyRate-master/PyRate.py” “./data/[data_file]_PyRate.py” .
-trait_file “./data/trait_data.txt” -mCov 2 -logT 2 .
Covar Model
Flags we’re using:
-trait_file says where the trait data file can be found
-mCov 2 mCov specifies which algorithm to use; 2 tests
correlation with extinction rates only
-logT 2 specifies to transform the data with log10 (0 would
specify to not transform, and 1 is log base e)
See more at:
https://github.com/dsilvestro/PyRate/blob/master/tutorials/pyrate_tutorial_4.md#trait-correlated-diversification

1. Run your Covar analysis
Again, we run the analysis, specifying a few parameters with flags.
> python “./PyRate-master/PyRate.py” “./data/[data_file]_PyRate.py” .
-trait_file “./data/trait_data.txt” -mCov 2 -logT 2 .
Covar Model
Flags we’re using:
-trait_file says where the trait data file can be found
-mCov 2 mCov specifies which algorithm to use; 2 tests
correlation with extinction rates only
-logT 2 specifies to transform the data with log10 (0 would
specify to not transform, and 1 is log base e)
See more at:
https://github.com/dsilvestro/PyRate/blob/master/tutorials/pyrate_tutorial_4.md#trait-correlated-diversification
-mCov 1 correlated speciation
-mCov 2 correlated extinction
-mCov 3 correlated speciation and extinction
-mCov 4 correlated preservation
-mCov 5 correlated speciation, extinction, preservation

Covar fix
Instead of using extract.ages.pbdb(), we need to use extract.ages() on a data
frame that already has the Trait data in it.
You can take Canis_pbdb_data.txt (in our data folder) and add a Trait column to it in
the same way we just did for the trait_data.txt file.
Then, call extract.ages() on it and it should work.

BDS Model
Preservation
rates
Origination/
extinction times
Origination/
extinction rates
Other clade origination/
extinction times
Environmental
variables
MBD Model
Covar Model
Continuous trait
Occurrences
Do rates correlate with other clade
diversity or environmental factors
like global temperature?
Do rates correlate with a continuous
trait (on a lineage-specific basis)?

3. MULTIPLE REPLICATES (discuss)

PyRate for fun and research

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to PyRate for fun and research

Similar to PyRate for fun and research (20)

Recently uploaded

Recently uploaded (20)

PyRate for fun and research

Editor's Notes