The following commands will build ghostmp_makedb and ghostmp_search command. If you use Fujitsu C/C++ compiler, run make command with ‘-f Makefile.fcc’
$ tar zxf ghostmp-version.tar.gz $ cd ghostmp-version/src $ make
GHOST-MP needs to construct the formatted database from database sequence file in FASTA format in advance. ghostmp_makedb command constructs the formatted database. Then you can run a sequence similarity search with ghostmp_search command. The search results are provided in BLAST tabular format.
# Construct the formatted database from FASTA format file: $ ghostmp_makedb -i db.fasta -o db # Run a sequence similarity search: $ mpiexec -n NUM_PROCESS ghostmp_search -i query.fasta -d db -o result
ghostmp_makedb - convert a FASTA file to GHOSTX format database files. ghostmp_makedb [-i dbFastaFile] [-o dbName] [-l chunkSize] Options: (Required) -i STR Protein sequences in FASTA format for a database -o STR The name of database (Optional) -l INT Chunk size of the database (bytes) [1073741824 (=1GB)] -t STR Database sequence type, p (protein) or d (dna) [p]
ghostmp_search - parallel homology search tool. ghostmp_search [-i queries] [-o output] [-d databes] [-v maxNumAliSub] [-b maxNumAliQue] [-M scoreMatrix] [-G openGap] [-E extendGap] [-l CandidatesSize] [-s lowerCutoff] [-T UpperCutoff] [-S searchLength] [-q queryType] [-t databaseType] [-a numThreads] [-L maxNumHits] [-w maxAliLen] Options: (Required) -i STR Input query name (must be formatted) -o STR Output file -d STR database name (must be formatted) (Optional) -v INT Maximum number of alignments for each subject [1] -b INT Maximum number of the output for a query [10] -M STR Score matrix file[BLOSUM62] -G INT Open gap penalty [11] -E INT Extend gap penalty [1] -l INT Maximun size of the candidates (Bytes) [134217728 (=128MB)] -s INT Lower limit cutoff score for seed search [4] -T INT Upper limit cutoff score for seed search [30] -S INT Maximum length of alignments in seed search [10] -q STR Query sequence type, p (protein) or d (dna) [p] -t STR Database sequence type, p (protein) or d (dna) [p] -F STR Filter query sequence, T (enable) or F (disable) [T] -a INT The number of threads [1] -L INT Maximum number of hits [67108864]