Traveling into REmote hoMOLOgy with HCA

TREMOLO-HCA is a tool for the analysis of distant relationships between protein sequences. It allows an easier interpretation of the PSI-BLAST results by adding information on the domain architecture (deduced from the CDD database) of proteins with which the query sequence is aligned, as well on the conservation of hydrophobic amino acids that may be critical for the fold. HCA plots of the aligned sequences can also be displayed, allowing their comparison at the 2D level.

The method is described in "'Identification of hidden relationships from the coupling of Hydrophobic Cluster Analysis and Domain Architecture information" G. Faure & I. Callebaut. Bioinformatics. 2013 29(14):1726-33.

In this page are given instructions for installing the sofware and guildelines for interpreting the results. This website contains a quickstart readme for using TREMOLO-HCA scripts. TREMOLO-HCArun only on UNIX Operating System.

The python scripts can be downloaded on request to: isabelle.callebaut at impmc.upmc.fr

TREMOLO-HCA installation and use

TROMOLO-HCA requires:

First fill the PATH.ini with the local addresses.

 

simple command:
python TremoloHCA.py -h 
 
WARNING the program has to be launched in the current directory of the psiblast results and the sequence file! 
 
python ../TremoloHCA.py -p 975-1063_32967603.it4 -s 975-1063_32967603.tfa -o tremolohca_res
[possible but not recommended: python ../tremoloHCA.py -x blastoutput.xml -s 975-1063_32967603.tfa -o tremolohca_res] 
 
ADDED for revision April 23, 2013:
 
-e : E-value threshold for the definition of positions for which the hydrophobic character is conserved in the significant alignments (default E-value : 0.005)
-r ; application of a sequence similarity filter, the option -i (identity_percentage) allows to set the level of redundancy (default 70) by default -r is not applied! 
 
input: blast results as default blast format (-p option) or xml format (-x option) and the corresponding fasta sequence (-s)


blastpgp -a 3 -e 50 -b 5000 -v 5000 -j 3 -I -d DATABASES/nr -i 975-1063_32967603.tfa -C psitmp.chk -o 975-1063_32967603.tfa
The "-I" option is essential for showing GIs in deflines output: FASTA fasta storage SVG svg storage (pictures) 975-1063_32967603.it4.size contains sizes of sequence 975-1063_32967603.it4.inference annotation by inference for significant and no significant hits 975-1063_32967603.it4.topo positions from which hydrophobicity is conserved in significant hits 975-1063_32967603.it4.rescdd conserved domains 975-1063_32967603.it4.rescdd.arch conserved domain architectures 975-1063_32967603.it4.rescdd.group domain color code 975-1063_32967603.it4.rescdd.sameArch proteins with the same domain architecture 975-1063_32967603.it4.respsi special psiblast format generated by TremoloHCA tremolohca_res.html complete TREMOLO-HCA output, which can be opened with a web navigator tremolohca_res_NR_ARCH.html non-redundant TREMOLO-HCA output (the redundancy has been considered at the level of domain architectures) (added for revision April 23, 2013)