GHOST-MP | Manual

Requirements

C++ compiler that supports OpenMP
Message Passing Interface library
Boost C++ Libraries

How to Build

Download and extract the archive.
run ‘make’ in the extracted directory

The following commands will build ghostmp_makedb and ghostmp_search command. If you use Fujitsu C/C++ compiler, run make command with ‘-f Makefile.fcc’

$ tar zxf ghostmp-version.tar.gz
$ cd ghostmp-version/src
$ make

Usage

GHOST-MP needs to construct the formatted database from database sequence file in FASTA format in advance. ghostmp_makedb command constructs the formatted database. Then you can run a sequence similarity search with ghostmp_search command. The search results are provided in BLAST tabular format.

Example

# Construct the formatted database from FASTA format file:
$ ghostmp_makedb -i db.fasta -o db

# Run a sequence similarity search:
$ mpiexec -n NUM_PROCESS ghostmp_search -i query.fasta -d db -o result

Commands and Options

Build Sequence Database

ghostmp_makedb - convert a FASTA file to GHOSTX format database files.

  ghostmp_makedb [-i dbFastaFile] [-o dbName] [-l chunkSize]

  Options:
  (Required)
    -i STR    Protein sequences in FASTA format for a database
    -o STR    The name of database

  (Optional)
    -l INT    Chunk size of the database (bytes) [1073741824 (=1GB)]
    -t STR    Database sequence type, p (protein) or d (dna) [p]

Search for similar sequences

ghostmp_search - parallel homology search tool.

  ghostmp_search [-i queries] [-o output] [-d databes] [-v maxNumAliSub]
                 [-b maxNumAliQue] [-M scoreMatrix] [-G openGap] [-E extendGap]
                 [-l CandidatesSize] [-s lowerCutoff] [-T UpperCutoff]
                 [-S searchLength] [-q queryType] [-t databaseType]
                 [-a numThreads] [-L maxNumHits] [-w maxAliLen]

  Options:
  (Required)
    -i STR    Input query name (must be formatted)
    -o STR    Output file
    -d STR    database name (must be formatted)

  (Optional)
    -v INT    Maximum number of alignments for each subject [1]
    -b INT    Maximum number of the output for a query [10]

    -M STR    Score matrix file[BLOSUM62]
    -G INT    Open gap penalty [11]
    -E INT    Extend gap penalty [1]

    -l INT    Maximun size of the candidates (Bytes) [134217728 (=128MB)]
    -s INT    Lower limit cutoff score for seed search [4]
    -T INT    Upper limit cutoff score for seed search [30]
    -S INT    Maximum length of alignments in seed search [10]
    -q STR    Query sequence type, p (protein) or d (dna) [p]
    -t STR    Database sequence type, p (protein) or d (dna) [p]
    -F STR    Filter query sequence, T (enable) or F (disable) [T]
    -a INT    The number of threads [1]
    -L INT    Maximum number of hits [67108864]