# SDBP: An R package for assessing statistical reliability of phylogenetic trees

## Quick Links:

- Speedy double bootstrap method
- R package SDBP
- Download
- Installation
- Data files
- How to obtain an input log likelihood file for this package
- Requirment
- Usage
- Reference

## Speedy double bootstrap method

Evaluating the reliability of estimated phylogenetic trees is of critical importance in the field of molecular phylogenetics, and for other endeavors that depend on accurate phylogenetic reconstruction. The bootstrap method is a well-known computational approach to phylogenetic tree assessment, and more generally for assessing the reliability of statistical models. However, it is known to be biased under certain circumstances, calling into question the accuracy of the method. Therefor several advanced bootstrap methods have been developed to achieve higher accuracy, one of which is the speedy double bootstrap approach. In phylogenetic tree selection problem, it has been shown that the speedy double bootstrap approach has comparable accuracy to the double bootstrap approach and is much more computational efficient.

## R package SDBP

SDBP is R package for assenssing the statistical reliability of phylogenetic trees.
It is distributed for academic use free of charge by Aizhen Ren.
The package was written in the S language using the S3 object system.
For each phylogenetic tree in given condidate tree set, called p-values are calculated
via speedy double bootstrap method. p-value of a tree indicates how strong the tree
is supported by data.

SDBP provides three types of p-value:
sDBP(speedy double bootstrap probability),
DBP(double bootstrap probability), and
BP(bootstrap probability).

### Download

##### UNIX/Windows

The source code should be found at CRAN web site

**SDBP** and its supporting document are also available from this web site:

On Windows you can put the SDBP_1.0.zip which was download from CRAN anywhere on your computer, I just put it on my desktop. On UNIX machine, you can put the SDBP_1.0.tar.gz on your home directory.

### Installation

Our SDBP package is built under R version 3.0.0. Therefore, this R version (or later) is
needed to install our package. For Windows OS, after booting R, choose **Packages** in
the upper toolbar and select the **Install Package(s) from zip files** option,
then choose the ** SDBP_1.0.zip ** file downloaded from CRAN. For UNIX machine,
install the source version package **SDBP_1.0.tar.gz**, and write the following
command on the command line at your home directory where you put the source file on.

R CMD INSTALL SDBP_1.0.tar.gz

and boot R via the command line using the command.

R

Then, the following on the

**R console**command line to load our package (the following command can be typed on both Unix and regular Windows machines):

library("SDBP") # load our package

Until this step, if you do not get any error, it should be installed. About the detail of installing the R package can see Ligges (2008).

### Data files

This data files are available as supplementary material.

`Compressed file: mam20files.tgz (unix), mam20files.zip (win)`

### How to obtain an input log likelihood file for this tool

- Used the software package PAML, to calculate the site-wise log-likelihood for each tree. The output will be .lnf file, for example mam20-conc.lnf.
- Change the format using CONSEL by executing the command "seqmt --paml .lnf", for example "seqmt --paml mam20-conc.lnf". Because the format of PAML .lnf file is not available for ourprogram. Then we obtain the site-wise log-likelihood matrix saved in the .mt file for each tree, for example mam20-conc.mt.
- The .mt file obtained by CONSEL should be placed in the R work directory.

### Requirment

The R package "scaleboot" is required for read .mt files. "scaleboot" can be available via CRAN.### Usage

Following is an example of typical usage of "SDBP" using data named mam20 saved in the SDBP data file mam20.rda, and also for the mam20-conc.mt file.> data(mam20) # data named mam20 was loaded > dim(mam20) # mam20 matrix demation [1] 5879 15For mam20-conc.mt

> library(scaleboot) # read library scaleboot > dat<-read.mt(mam20-conc.mt) # load the mam20-conc.mt file > dim(dat) # dat matrix demationTo calculate the sDBP-value for each tree is only following one line.

> result <- sdbp.default(mam20) > resultThe result is in diminishing order of log-likelihood.

Call: SDBP.default(dat = dat) SDBP double bootstrap probabilities: t1 t4 t3 t7 t2 t5 0.7503 0.4281 0.3794 0.3338 0.3054 ... > summay(result)The output is

$Call: sdbp.default(dat = mam20) $coefficients stdErr p.value t1 0.0043 0.7503 t4 0.0049 0.4281 t3 0.0048 0.3794 t7 0.0047 0.3338 ... attr(,"class") [1] "summary.sdbp"When we want to calculate the reliability for one tree, for example tree 2, we can use the command

**sdbpk**, with the output shown below.

> result1 <- sdbpk(mam20,2) > result1 then, the output is Call: sdbpk(dat = mam20, k = 2) t2 0.3018Then, calculating the bootstrap probability can use the command

**bp**, again shown with the output.

> result2 <- bp(mam20) the output is following Call: bp(dat = mam20) Bootstrap probabilities: t1 t4 t3 t7 t2 0.4887 0.1978 0.1128 0.0882 0.0270 ...Then, calculating the bootstrap probability for one tree can use the command

**bpk(mam20)**, and calculating the double bootstrap probability for one tree can use command

**dbpk(mam20)**.

### Reference

Ligges, Uwe, 2008. Programmieren mit R*Springer*.

Ren, A., Ishida, T. and Akiyama, Y., 2013. Assessing statistical reliability of phylogenetic trees via a speedy double bootstrap method

*Molecular Phylogenetic of evolutionm*,

**67**(2), 429-435. doi:10.1016/j.ympev.2013.02.011