SDBP: An R package for assessing statistical reliability of phylogenetic trees
- Speedy double bootstrap method
- R package SDBP
- Data files
- How to obtain an input log likelihood file for this package
Speedy double bootstrap method
Evaluating the reliability of estimated phylogenetic trees is of critical importance in the field of molecular phylogenetics, and for other endeavors that depend on accurate phylogenetic reconstruction. The bootstrap method is a well-known computational approach to phylogenetic tree assessment, and more generally for assessing the reliability of statistical models. However, it is known to be biased under certain circumstances, calling into question the accuracy of the method. Therefor several advanced bootstrap methods have been developed to achieve higher accuracy, one of which is the speedy double bootstrap approach. In phylogenetic tree selection problem, it has been shown that the speedy double bootstrap approach has comparable accuracy to the double bootstrap approach and is much more computational efficient.
R package SDBP
SDBP is R package for assenssing the statistical reliability of phylogenetic trees.
It is distributed for academic use free of charge by Aizhen Ren.
The package was written in the S language using the S3 object system.
For each phylogenetic tree in given condidate tree set, called p-values are calculated
via speedy double bootstrap method. p-value of a tree indicates how strong the tree
is supported by data.
SDBP provides three types of p-value: sDBP(speedy double bootstrap probability), DBP(double bootstrap probability), and BP(bootstrap probability).
The source code should be found at CRAN web site
SDBP and its supporting document are also available from this web site:
Our SDBP package is built under R version 3.0.0. Therefore, this R version (or later) is needed to install our package. For Windows OS, after booting R, choose Packages in the upper toolbar and select the Install Package(s) from zip files option, then choose the SDBP_1.0.zip file downloaded from CRAN. For UNIX machine, install the source version package SDBP_1.0.tar.gz, and write the following command on the command line at your home directory where you put the source file on.
R CMD INSTALL SDBP_1.0.tar.gz
and boot R via the command line using the command.
Then, the following on the R console command line to load our package (the following command can be typed on both Unix and regular Windows machines):
library("SDBP") # load our package
Until this step, if you do not get any error, it should be installed. About the detail of installing the R package can see Ligges (2008).
This data files are available as supplementary material.
How to obtain an input log likelihood file for this tool
- Used the software package PAML, to calculate the site-wise log-likelihood for each tree. The output will be .lnf file, for example mam20-conc.lnf.
- Change the format using CONSEL by executing the command "seqmt --paml .lnf", for example "seqmt --paml mam20-conc.lnf". Because the format of PAML .lnf file is not available for ourprogram. Then we obtain the site-wise log-likelihood matrix saved in the .mt file for each tree, for example mam20-conc.mt.
- The .mt file obtained by CONSEL should be placed in the R work directory.
RequirmentThe R package "scaleboot" is required for read .mt files. "scaleboot" can be available via CRAN.
UsageFollowing is an example of typical usage of "SDBP" using data named mam20 saved in the SDBP data file mam20.rda, and also for the mam20-conc.mt file.
> data(mam20) # data named mam20 was loaded > dim(mam20) # mam20 matrix demation  5879 15For mam20-conc.mt
> library(scaleboot) # read library scaleboot > dat<-read.mt(mam20-conc.mt) # load the mam20-conc.mt file > dim(dat) # dat matrix demationTo calculate the sDBP-value for each tree is only following one line.
> result <- sdbp.default(mam20) > resultThe result is in diminishing order of log-likelihood.
Call: SDBP.default(dat = dat) SDBP double bootstrap probabilities: t1 t4 t3 t7 t2 t5 0.7503 0.4281 0.3794 0.3338 0.3054 ... > summay(result)The output is
$Call: sdbp.default(dat = mam20) $coefficients stdErr p.value t1 0.0043 0.7503 t4 0.0049 0.4281 t3 0.0048 0.3794 t7 0.0047 0.3338 ... attr(,"class")  "summary.sdbp"When we want to calculate the reliability for one tree, for example tree 2, we can use the command sdbpk , with the output shown below.
> result1 <- sdbpk(mam20,2) > result1 then, the output is Call: sdbpk(dat = mam20, k = 2) t2 0.3018Then, calculating the bootstrap probability can use the command bp, again shown with the output.
> result2 <- bp(mam20) the output is following Call: bp(dat = mam20) Bootstrap probabilities: t1 t4 t3 t7 t2 0.4887 0.1978 0.1128 0.0882 0.0270 ...Then, calculating the bootstrap probability for one tree can use the command bpk(mam20), and calculating the double bootstrap probability for one tree can use command dbpk(mam20).
ReferenceLigges, Uwe, 2008. Programmieren mit R Springer.
Ren, A., Ishida, T. and Akiyama, Y., 2013. Assessing statistical reliability of phylogenetic trees via a speedy double bootstrap method Molecular Phylogenetic of evolutionm, 67(2), 429-435. doi:10.1016/j.ympev.2013.02.011