Recently [1] investigated the occurence of protein pairs
with similar-sequence and significant dissimilarity.
Using
sequence-based structure analysis, they "have found
numerous protein pairs, of 50-100% sequence identity, that have
dissimilar structures, as measured by RMSDs greater than 3A or 6A." And
they have compiled a database [2] of sequence-similar, structuraly
dissimilar
protein pairs.
"The RMSD between two
superimposed structures is usually measured only over those residues
that are considered as aligned. ...
In contrast, the RMSD
obtained from the structural alignment is much smaller since this RMSD
is measured only over residues that occupy similar position in space.
...
In almost all cases,
geometry-based structure alignments yield a lower RMSD than
sequence-based RMSDs." (Kosloff and Kolodny, 2008)
In particular, they have identified 158 pairs
with sequence identity 100% and
sequence-based-structure-superposition RMSD ≥ 6A, which are
partitioned into 60 clusters.
Shown below are alignments of the pairs obtained by ComSubstruct (+manual inspection)
and
the pairwise DaliLite
server [4, 5].
(0)
Golden standard
of 60 representative pairs (one pair from each
cluster)
(i) Assignments of
the D2 code,
DSSP state,
PB code, and
(phi, psi) angles
and (ii) Structural analysis by the
FATCAT,
FlexProt,
RAPIDO, and
Dyndom servers
and (iii) Corresponding entries in the
MolMovDB
database:
- Cluster 1 to cluster 5 ( raw data, edited data )
- Cluster 6 to cluster 10 ( raw data, edited data )
- Cluster
11 to cluster 15 ( raw data, edited data )
- Cluster
16 to cluster 20 ( raw data, edited data )
- Cluster 21 to cluster 25 ( raw
data, edited
data )
- Cluster 26 to cluster 30 ( raw
data, edited
data )
- Cluster 31 to cluster 35 ( raw
data, edited
data )
- Cluster 36 to cluster 40 ( raw
data, edited
data )
- Cluster 41 to cluster 45 ( raw
data, edited
data )
- Cluster 46 to cluster 50 ( raw
data, edited
data )
- Cluster 51 to cluster 55 ( raw
data, edited
data )
- Cluster 56 to cluster 60 ( raw
data, edited
data )
References
- Computation of the DSSP
state and (phi,
psi)
angles: http://cubic.bioc.columbia.edu/services/DSSPcont/
- Computation of the PB
code: http://bioinformatics.univ-reunion.fr/PBE/
- The FATCAT
server: http://fatcat.burnham.org/
- The FlexProt
server: http://bioinfo3d.cs.tau.ac.il/FlexProt/
- The RAPIDO
server: http://webapps.embl-hamburg.de/rapido/
- The DynDom
server: http://www.sys.uea.ac.uk/dyndom/
- The MolMovDB
database: http://molmovdb.org/
(1) Local structure analysis of
structure-dissimilar
pairs (158 pairs with sequence identity 100% and RMSD ≥ 6A)
*"
Chameleon sequences"
are amino-acid fragments which could adopt different
regular
secondary
structures
(i.e.,
alpha-helix and beta-strand) in proteins.
(Corrected on 2008-08-12)
(2) Alignment of structure-dissimilar
pairs (158
pairs with sequence identity 100% and RMSD ≥ 6A)
References
- Kosloff, M. and Kolodny, R. (2008) Sequence-similar,
structure-dissimilar protein pairs in the PDB. Proteins: Structure, Function,
and Bioinformatics, 71(2): 891-902.
- Database of Sequence-Similar, Structure-Dissimilar
Protein
Pairs in the
PDB: http://mt.cs.haifa.ac.il/seqsimstrdiff/seqsimstrdiff_local.htm
- Tuinstra, R.L., Peterson, F.C., Kutlesa, S., Elgin, E.S.,
Kron,
M.A., and Volkman, B.F. (2008) Interconversion between two unrelated
protein folds in the lymphotactin native state. Proc. Natl. Acad.
Sci. USA, 105:5057-5062.
- Holm, L. and Park J. (2000) DaliLite workbench for
protein structure comparison. Bioinformatics,
16:566-567.
- The pairwise DaliLite
server : http://www.ebi.ac.uk/DaliLite/