The following commands will build ghostmp_makedb and ghostmp_search command. If you use Fujitsu C/C++ compiler, run make command with ‘-f Makefile.fcc’
$ tar zxf ghostmp-version.tar.gz $ cd ghostmp-version/src $ make
GHOST-MP needs to construct the formatted database from database sequence file in FASTA format in advance. ghostmp_makedb command constructs the formatted database. Then you can run a sequence similarity search with ghostmp_search command. The search results are provided in BLAST tabular format.
# Construct the formatted database from FASTA format file: $ ghostmp_makedb -i db.fasta -o db # Run a sequence similarity search: $ mpiexec -n NUM_PROCESS ghostmp_search -i query.fasta -d db -o result
ghostmp_makedb - convert a FASTA file to GHOSTX format database files.
ghostmp_makedb [-i dbFastaFile] [-o dbName] [-l chunkSize]
Options:
(Required)
-i STR Protein sequences in FASTA format for a database
-o STR The name of database
(Optional)
-l INT Chunk size of the database (bytes) [1073741824 (=1GB)]
-t STR Database sequence type, p (protein) or d (dna) [p]
ghostmp_search - parallel homology search tool.
ghostmp_search [-i queries] [-o output] [-d databes] [-v maxNumAliSub]
[-b maxNumAliQue] [-M scoreMatrix] [-G openGap] [-E extendGap]
[-l CandidatesSize] [-s lowerCutoff] [-T UpperCutoff]
[-S searchLength] [-q queryType] [-t databaseType]
[-a numThreads] [-L maxNumHits] [-w maxAliLen]
Options:
(Required)
-i STR Input query name (must be formatted)
-o STR Output file
-d STR database name (must be formatted)
(Optional)
-v INT Maximum number of alignments for each subject [1]
-b INT Maximum number of the output for a query [10]
-M STR Score matrix file[BLOSUM62]
-G INT Open gap penalty [11]
-E INT Extend gap penalty [1]
-l INT Maximun size of the candidates (Bytes) [134217728 (=128MB)]
-s INT Lower limit cutoff score for seed search [4]
-T INT Upper limit cutoff score for seed search [30]
-S INT Maximum length of alignments in seed search [10]
-q STR Query sequence type, p (protein) or d (dna) [p]
-t STR Database sequence type, p (protein) or d (dna) [p]
-F STR Filter query sequence, T (enable) or F (disable) [T]
-a INT The number of threads [1]
-L INT Maximum number of hits [67108864]