[HOME]

Statistical analysis of representative 60 pairs: METHOD


METHOD:
The D2 coding method was compared to three existing methods: the PB coding [1], the coding method used by Kuznetsov (and its variation) [2], and the DSSP coding [3].

[1] de Brevern A.G., Etchebest C. and Hazout S., 2000, Bayesian probabilistic approach for prediction backbone structures in terms of protein blocks, Proteins, 41:271-287
[2] Kuznetsov I.B., 2008, Ordered conformational change in the protein backbone: prediction of conformationally variable positions from sequence and low-resolution structural data. Proteins, 72(1):74-87.
[3] Kabsh W., Sander C., 1983, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22(12):2577-637.

(0) Coding methods to be compared

Methods Definition of conformationally variable residue positions
The D2 coding Positions where different D2 codes are assigned
The PB coding Positions where different Protein Blocks are assigned
The ANGL (or KUZ1) coding Positions where a change in dihedral angles, PHI or PSI, is greater than or equal to 30°
The KUZ2 coding Positions where changes in both PHI and PSI are greater than or equal to 30°
The DSSP coding Positions where different DSSP states are assigned


(1) Definition of conformationally variable residue positions (Golden Standards)

Name Definition of conformationally variable residue positions
Dihedral Angle Patterns-2
(ANGL_CORE or ANG2)
Residue positions which satisfy either (1) or (2), where
(1) Difference of PHI angle >= 100° or Difference of PSI angle >= 100°
(2) Difference of PHI angle >= 30° and Difference of PSI angle >= 30°
Dihedral Angle Patterns-1
(ANGL or ANG1)
Residue positions which satisfy the following condition:
     Difference of PHI angle >= 30° or Difference of PSI angle >= 30°
Dihedral Angle Patterns-3
(ANGL_SUPP or CNST)
Residue positions which satisfy either (Difference of PHI angle is not equal to 0°)
or (Difference of PSI angle is not equal to 0°)
Flexible Regions
(FLEX)
Flexible residue positions detected by either FATCAT, FlexProt, RAPIDO, or DynDom.


(2) Statistical metrics used for performance assessment

Metrics Definition Description
Accuracy
(ACC)
(TP + TN) / (TP + FP + TN + FN) The proportion of true results in the population.
100% means that the test identifies all positive and negative result correctly.
Sensitivity
(SN)
TP / (TP + FN) The proportion of actual positives which are correctly identified as positive.
It measures the ability of a test to correctly identify the presence of a positive sample.
A highly sensitive test helps rule out negative samples.
Specificity
(SP)
TN / (TN + FP) The proportion of actual negatives which are correctly identified as negative.
It measures the ability of a test to correctly identify the absence of positive samples.
A positive result of a hightly specific test can be used to confirm the presence of positive samples.
Matthews
Correlation
Coefficient
(CC)
(TP·TN - FP·FN) /
sqrt{(TP+FN)(TP+FP)(TN+FP)(TN+FN)}
A measure of the quality of binary classifications.
It returns a value between -1 and +1: +1 represents a perfect prediction (always right),
0 an random guess, and -1 an inverse prediction (always wrong).
Selectivity
(SL)
Sensitivity / (1 - Specificity)  (=  true positive rate / false positive rate)