LCS-HIT is a sequecence clustering tool very similar to CD-HIT. LCS-HIT is over 2 times more efficient than CD-HIT by using the filtering based on the longest common subsequence (LCS) of two sequecences.
Download the archive of LCS-HIT from the above link and extract it. Then cd into the extracted directory and run make.
$ tar zxvf lcs_hit-0.5.3.tar.gz $ cd lcs_hit-0.5.3 $ makeIf successful, the make process will produce an executable file "lcs_hit" in the current directory.
You can also run tests as follows and check whether the compiled "lcs_hit" runs correctly or not.
$ make test
Usage: lcs_hit [Options] Options -i input filename in fasta format, required -o output filename, required -c sequence identity threshold, default 0.9 this is the default lcs_hit's "global sequence identity" calculated as : number of identical bases in alignment divided by the full length of the shorter sequence -n word_length, default 8 -s length difference cutoff, default 0.0 if set to 0.9, the shorter sequences need to be at least 90% length of the representative of the cluster -g 1 or 0, default 0 by cd-hit's default algorithm, a sequence is clustered to the first cluster that meet the threshold (fast cluster). If set to 1, the program will cluster it into the most similar cluster that meet the threshold (accurate but slow mode) but either 1 or 0 won't change the representatives of final clusters -h print this help