PURPOSE: Because the orthoganal coordinates of PDB files
have only three decimal places, floating point arithmetic
programs may produce
inconsistent results due to precision error after
coordinate transformation. For example,
superimposed RMSD between a PDB file and a coordinate file of its
rotated image
is no longer
0.0Angstrom.
Here we examine the robustness of the
ProteinEncoder
program.
METHOD:
First, rotate a HIV1 PR monomer (2nph, chain A) arbitrarily and create
a
coordinate file. Next, compute superimposition of other HIV1
PR monomers on the rotated monomer (2nphA) and create their coordinate
files, using the
DaliLite
server. Then, the
D^{2} code of a chain is
computed from both of the coordinate files, and the results
are compared with each other.
RESULT:
The
D^{2} code
is 99.9% consistent as shown below.
Class 

#residues
(%) 
residues with different D^{2}
codes 

4
(0.1%)^{*} 
residues with the same D^{2}
code 

7530
(99.9%) 
 PDB files: 1hpsA Res35 (D2 code: "R"/"3"),
1a8gA Res24 ("0"/"3"),
1zlfB Res24 ("0"/"R"), 2nphA Res69 ("G"/"0")
See
more ...
PURPOSE: We examine
how the
D^{2} code
can detect the subtle structural differences between two almost
identical amino acid chains.
METHOD:
HIV1 protease (PR) is a homodimeric molecule, consisted of two
identical 99residue polypeptide chains. The structures of the two
monomers are almost identical and superimposed RSMDs
between them are
0.10.6
Angstrom
for P61 crystals and
0.51.2
Angstrom for NMR models. We compare the
D^{2} codes of
the two monomers of a HIV1 PR with each other. We also compare the
DSSP state sequences
of the
two monomers with each other.
RESULT: The
D^{2} code is as
sensitive as the
DSSP
state and we have successfully identified the structural
differences between the two monomers of the same HIV1
PR molecules by comparing their
D^{2}
codes. A total of 284
D^{2} code assignment
conflicts are
detected, seven of which are related to a pair of visually
indistinguishable local structures (false positive). There seems to be
a linear correlation between number of
D^{2} code assignment
conflicts and the
ZScore.
As for the DSSP state, a total of
323
DSSP state assignment conflicts were observed but the
DSSP state has
no
clear relationship with the
ZScore.
Frequency distributions of the (#{D2 code conflicts}, Z score)
value, the (#{DSSP state conflicts}, Z score) value,
and the
(#{PB block conflicts}, Z score) value of 94 pairs of
HIV1 PR monomers:
#{D^{2}
code conflicts} vs Z score 

#{DSSP
state conflicts} vs Z score 

#{PB
block conflicts} vs Z score (Added 20090419) 





 The Z score
is a measure of quality of
the alignment. As a general rule, above 20 means the two
structures are
definitely homologous, between 8 and 20 means the two are
probably
homologous, between 2 and 8 is a grey area, and a ZScore below 2 is
not significant.
 Seven of the D2 conflicts are related to a
pair of visually indistinguishable local structures. Seven spheres
encircled by a black line in the left figure indicate the
position of
the HIV1 PR dimers which contains the
conflicts. (The robustness problem.)
 Crystal:
structures of 66
P61 crystals, Solution:
structures of 28 NMR models. (See the figures)
 PB
block assignments are computed by the PBET server (http://bioinformatics.univreunion.fr/PBE/PBT.htm).
 PB
blocks are a set of 16 short structural motifs of length five
residues. See Protein Blocks Expert Home (http://bioinformatics.univreunion.fr/PBE)
for more info.
See
more ...
See also
EXAMPLES
(Prediction/Alignment/Others) > Structural alignment >
HIV1 PR variants.
PURPOSE: We examine how the
D^{2} codebased search
can isolate structurally similar amino acid fragments from a
large
dataset.
METHOD:
Using the
ComSubstruct program,
we searched the
ASTRAL
(1.73 95%)
dataset for structurally similar amino acid
fragments of three query chains:
2nphA
(alpha+beta type, HIV1 PR monomer),
d2hkja1 (mainly
alpha type),
and
d1j7ma_
(mainly beta type).
d2hkja1 and
d1j7ma_ are members
of the
ASTRAL (1.73
95%)
dataset. Top 200
D^{2} codesimilar
fragments of the
same length as the query chain are obtained by typing the
following command
^{*}:
%
ComSubstruct
l o1 s w1.0 b200
query_chain.code target_ASTRAL173.code
Because some of the top 200 fragments overlap each
other, we chose manually a fragment for each chain contained in the top
200 (or 150) list and we
obtain 50 fragments for
2nphA (top 200),
42 fragments for
d2hkja1
(top 200),
and 55 fragments for
d1j7ma_
(top 150).
Then we used the
DaliLite
server to compute rigid structural alignment of a query chain
and each of the 50, 42 or 55 fragments. We also computed flexible
structural alignment of the pairs with the
FATCAT server.
*) The "target_ASTRAL173.code"
file (6.2MB) is available from PROGRAM>ComSubstruct>DOWNLOADS.
RESULT: We
have successfully isolated structurally similar fragments within a few
minutes on a notebook computer (2GHz
Intel Core 2 Duo
and 1GB 667MHz
DDR2 SDRAM). If the
D^{2} codeLCS ratio
is greater than 85%, a pair of amino acid fragments are
structurally similar. If the
D^{2} codeLCS ratio
is greater than 80%, a pair of amino acid fragments may be
structurally similar.
 D^{2}
codeLCS ratio
:= length of D^{2} codeLCS / (the length of the
shorter chain  4),
where LCS
stands for Longest Common Subsequence between the query chain and a
chain in the dataset.
Because two residues at the both termini of a chain are not assigned a D^{2}
code, they are excluded from the computation. For example, there are 19
chains in the ASTRAL dataset, whose D^{2}
code have a D^{2}
codeLCS ratio more than or equal to 0.9 with respect to the
D^{2}
code of 2nphA.
(a) ASTRAL dataset search by
ComSubstruct
Frequency distributions
of the length of D^{2} codeLCS between a query
chain
and a fragment of the same length contained in the dataset:
D^{2} codeLCS
ratio 

2nph
A
(alpha+beta,
99 residues) 

d2hkja1
(mainly alpha,
78 residues) 

d1j7ma_
(mainly beta,
60 residues) 
[0.9,
1.0]


19


1


1

[0.8,
0.9)


8707


28 

1713

[0.7,
0.8)


142703


52979 

200570

[0.6,
0.7)


202720


542337 

337648

[0.5,
0.6)


330742


527645 

404777

[0.4,
0.5)


345682


239150 

336195

[0.3,
0.4)


138008


121770 

237459

[0.2,
0.3)


107799


19119 

123953

[0.1,
0.2)


43223


14836 

77832

[0.0,
0.1)


13462


927 

8028

Total


1333065


1518792 

1728176

 2nphA,
d2hkja1, and d1j7ma_ are aligned with all the fragments of 99, 78, and
60 residues contained in the dataset, respectively.
(b) Length of D^{2}
code LCS vs Z score (rigid
structural
alignment)
Frequency distributions
of the (length of D^{2} code LCS, Z score) value of 50, 42, or 55 pairs of amio
acid fragments:
2nphA
vs
50 fragments of 99 res. 

d2hkja1
vs
42 fragments of 78 res. 

d1j7ma_
vs
55 fragments of 60 res. 





 The Z score
is a measure of quality of
the alignment. As a general rule, above 20 means the two
structures are
definitely homologous, between 8 and 20 means the two are
probably
homologous, between 2 and 8 is a grey area, and a ZScore below 2 is
not significant.
(c) Length of D^{2}
codeLCS vs
P value (flexible
structural alignment)
Frequency distributions
of the (length of D^{2} codeLCS, P value) value of 50, 42, or 55 pairs of amino
acid fragments:
2nphA
vs
50 fragments of 99 res. 

d2hkja1
vs
42 fragments of 78 res. 

d1j7ma_
vs
55 fragments of 60 res. 





 The P value
is used in FATCAT
to evaluate the significance of structural
similarity detected by FATCAT,
the probability of observing a greater
score. Structure
pairs with Pvalue < 0.05 are significantly similar.
(d) [FOR REFERENCE] PBEALIGN global alignment (GA)
score vs Z score (Added 20090419)
Frequency distributions
of the (PBEALIGN GA score, Z score) value of 50, 42, or 55 pairs of amino
acid fragments:
2nphA
vs
50 fragments of 99 res. 

d2hkja1
vs
42 fragments of 78 res. 

d1j7ma_
vs
55 fragments of 60 res. 





PURPOSE: It is widely accepted that knowledge of protein
flexibility is fundamental for understanding the mechanism of protein
function. We identify regions of 60 multiplestructure proteins
identified by Kosloff and Kolodny where conformational changes take
place.
METHOD:
We compare the
D^{2} codes of
two structures of the same proteins with each other. We also compare
the DSSP state sequences of the
two structures with each other. Moreover, we compute rigid and flexible
structural alignment of the two structures of the same proteins, using
the
DaliLite
and the
FATCAT
server.
RESULT: Due
to the sensitivity of the
D^{2} code to the
twisting of a protein backbone, the
sources of structural differences are successfully pinpointed by
comparison of
D^{2} codes. All the
pairs with a
DaliLite
ZScore above eight but one have a
D^{2} codeLCS
ratio above 80%, although there is no clear correspondence
between the
ZScore
and the length of
D^{2}
codeLCS. The plot of the two values (See (a) left) shows
that large
structural
differences are often caused by deformation on small regions of a
protein, such as hinge motions.
(a) Frequency distributions
Frequency distributions
of the (length of D^{2} codeLCS, ZScore) value
and
the
(length of D^{2} codeLCS, Pvalue) value of 60 structure pairs of
multiplestructure proteins:
Length
of D^{2} LCS vs ZScore 

Length
of D^{2} LCS vs Pvalue 



 The ZScore
is a measure of quality of
the alignment. As a general rule, above 20 means the two
structures are
definitely homologous, between 8 and 20 means the two are
probably
homologous, between 2 and 8 is a grey area, and a ZScore below 2 is
not significant.
 The Pvalue
is used in FATCAT
to evaluate the significance of structural
similarity detected by FATCAT,
the probability of observing a greater
score. Structure
pairs with Pvalue < 0.05 are significantly similar.
 All but the 1sfcD/1xtgB
pair have Pvalue less than 0.05.
(b) Average lengths of D^{2}
code/DSSP state variable
regions
D^{2}
codevariable regions


#{residues
with
variable D2 code} 

Ave.
length (residues) 


Rigid span 
Variable
span 
HIV1 PR monomers (66 P61 crystals) 

4.5% 

25.1
res. 
1.2
res. 
HIV1 PR monomers (28 NMR models) 

10.6%


11.3
res. 
1.4
res. 
60 structuredissimlar pairs 

12.1% 

14.4
res. 
2.0
res. 
 Multiplestructure proteins can be
thermodynamically identified with a sequence of D2rigid subdomains of
an average length 14.4
residues
connected by D2variable regions of an average length 2.0 residues.
DSSP
statevariable regions


#{residues
with
variable DSSP state} 

Ave.
length (residues) 


Rigid span 
Variable
span 
HIV1 PR monomers (66 P61 crystals) 

4.9% 

33.2
res. 
1.7
res. 
HIV1 PR monomers (28 NMR models) 

15.2%


12.6
res. 
2.3
res. 
60 structuredissimlar pairs 

13.0% 

16.3
res. 
2.5
res. 
 Concerning the DSSP state, solution structure (15.2%) is
more flexible than multiplestructure
proteins (13.0%).
(c) Average alignment lengths of 60
structuredissimilar pairs
Programs 

Number
of aligned residues
(%) 
DaliLite (rigid struct'l alignment) 

170.82
(80.6%) 
ComSubstruct (D^{2}
code alignment) 

187.31
(90.1%^{*}) 
FATCAT (flexible struct'l alignment) 

205.72
(97.1%) 



Ave. length of the proteins 

211.85
(100%) 
*) Two residues at the both termini of a protein
are
excluded from the computation because they are not assigned a D2 code: 0.901=187.31/(211.85
 4).
Frequency distributions in the ASTRAL dataset
Frequency
of occurrence of
the DSSP states / D^{2} codes / PB fragments among 9 superfolds
(1thbA, 256bA, 1aps, 1ubq, 2fox, 7timA, 1ilb, 2buk, and 2rhe)
*See also the page of
"Superfolds2."

Frequency of occurrence of the
DSSP state / D^{2} code / PB fragments among 9 superfolds

*There are
seven DSSP
states,
16 D
^{2} codes, and
16 PB fragments
.
*Although there are 16 D^{2} codes, seven
of them already cover 98.3% of all the residues of the 9
superfolds.