NAME
NtileCodePredictor (ver.
0.4) -
Predict the D
2 code sequences of given amino
acid sequences (local
structure prediction). It
also predicts the amino acid sequences of given D
2
code sequences
(protein
design).
(About ver. 0.4 :
Support different
alternative location indicators for different chains in the PDB files.)
(About ver. 0.3 :
It outputs not only a ".pred" file but
also
a ".code" file.)
2008-04-20 : fragment-to-5-tile-code conversion table of
ASTRAL SCOP
1.73
95% is
available [
sample]
2009-02-28 : "frag_code1_full173.tbl" (fragments of length
1) is
available.
2009-02-28 : "frag_code5_full173_ID.tbl
(fragments
with SCOP ID) is
available. [sample]
2009-02-28 : "frag_code5_HIV1PR_ID.tbl" (fragments of
HIV-1 protease variants) is
available. [sample]
2009-02-28 : "frag_code5_full_HIV1PR_ID.tbl" (fragments of
HIV-1 protease variants) is
available. [sample]
* 322
crystal structures and 120
NMR models of HIV-1 protease variants are
used to compile the "HIV1PR" tables.
[PDB
ID list]
DOWNLOADS (Free for
academic use)
SYNOPSIS
NtileCodePredictor
[-f][-5][-7][-9][-1][-2][-3]
[-v] [-d][-c]
[-t table_name] [-h] filename
If you type "NtileCodePredictor filename", you would obtain the
predction based
on the
"frag_code5_full.tbl"
table.
Examples:
- Prediction
based on "frag_code5.tbl" and
creation of
the ".udcode" file:
% NtileCodePredictor -5 -v
filename
- Prediction
based on "frag_code5.tbl" and
creation of
the ".udcode" file:
% NtileCodePredictor -5 -v
filename
- Prediction
based on "frag_code5_full.tbl" and
"frag_code5.tbl" and creation of the ".udcode" file:
% NtileCodePredictor -f -5
-v filename
- Protein
design based on "frag_code15_full.tbl" and creation of the
"_dsgn.udcode" file:
% NtileCodePredictor -1 -v -d filename
- New
structure check based on "frag_code21_full.tbl" and
creation of
the
"_chck.udcode" file:
% NtileCodePredictor -2 -v -c filename
- Prediction based on
"frag_code5_HIV1PR_ID.tbl" (fragments of HIV-1 protease variants):
% NtileCodePredictor
-t frag_code5_HIV1PR_ID.tbl filename
INPUT
(1a) A file of amino acid
sequences (for local structure prediction):
".code"
file created by the "ProteinEncoder" program (
example),
or
PDB ".ent" or ".pdb" file (
example),
or
FASTA
type amino acid sequence ".seq" file (
example).
(1b) A file of 5-tile code sequences (for protein design):
".code"
file created by the "ProteinEncoder" program (
example),
or
".code" file without "[Res.]" entries (
example).
(2) Fragment-to-5-tile-code conversion
tables:
[NOTE] "frag_code5_full.tbl" is recommended for local structure
prediction.
"frag_code15_full.tbl" and "frag_code21_full.tbl"
are
recommended for
protein design.
[NOTE] Conversion tables should be
placed in the current directory.
[NOTE] The tables are the list of all fragments of the ASTRAL SCOP
1.71-95 and their 5-tile
codes:
-
"frag_codeN.tbl" gives the 5-tile code determined by the
fragment of length N,
-
"frag_code5_full.tbl" also gives the 5-tile code of the two
amino acids at both ends.
[NOTE] If the "-v" option is specifed, the corresponding PDB file
should
be placed in the directory where the
input protein file exists.
OUTPUT
(1) A ".pred" file
It contains the predicted 5-tile code
sequences
and
corrresponding
statistics. The
rate of successful prediction is also computed if the
input file is a ".code" file
which conatins both
"[Res.]" and "[Code]"
entries (
example).
- A "xxx_dsgn.pred" file is created if the "-d" option is
specified (
example),
- A "xxx_chck.pred" file is created if the "-c" option is
specified (
example),
- A "xxx.pred" file is created otherwise (
example1,
example2).
(2) A ".code"
file (example,
example2)
(3) A ".udcode" file
A ".udcode"
file is also created if the "-v" option is specified so that
one can view the result
using the
"ProteinViewer" program.
- A "xxx_dsgn.udcode" file is created if the "-d"
option is
specified (
example),
- A "xxx_chck.udcode" file is created if the "-c"
option is
specified (
example),
- A
"xxx.udcode" file is created otherwise (
example1,
example2).
[NOTE] Use "ProteinViewer" version 0.3 or later to view the ".udcode"
files.
[NOTE] About "xxx_comp.udcode" (when viewed
with "ProteinViwer")
- Amino acids of failed prediction ("F") are denoted by large
red spheres,
- Amino acids whose code is contained in the prediction ("c")
by large yellow
spheres,
- Amino acids without prediction by small blue spheres.
(Example: pdb1rkl_comp)
[NOTE] About
"xxx_accu.udcode" (when viewed with "ProteinViewer")
- Amino acids with prediction accuracy less than 0.25 are
denoted by large
red spheres,
- Amino acids with prediction accuracy less than 0.5 by large
yellow spheres,
- Amino acids without prediction by small blue spheres.
(Example: pdb1rkl_accu)
[NOTE] About "xxx_comp_yyyy.udcode" (when viewed
with "ProteinViewer")
- Amino acids of successful predction ("-") are denoted by
large blue
spheres,
- Amino acids whose code is contained in the prediction ("c")
by large yellow
spheres,
- Amino acids with no hit (the corresponding code fragments
are
not
included in the conversion table) are denoted by small red
spheres (i.e., "new structure"),
- Amino acids with hit count less than 5 by small orange
spheres (i.e., "rare structure"),
- Amino acids with hit count more than 999 by small blue
spheres (i.e., "popular structure").
(Example1: pdb1rkl_comp_dsgn,
Example2: pdb1rkl_comp_chck)
[NOTE] About "xxx_accu_yyyy.udcode" (when viewed
with "ProteinViewer")
- Amino acids with no hit (the corresponding code fragments
are
not
included in the conversion table) are denoted by small red
spheres (i.e., "new structure"),
- Amino acids with hit count less than 5 by small orange
spheres (i.e., "rare structure"),
- Amino acids with hit count more than 999 by small blue
spheres (i.e., "popular structure").
(Example1: pdb1rkl_accu_dsgn,
Example2: pdb1rkl_accu_chck)
DESCRIPTION
NtileCodePredictor reads
a
file of amino acid sequences and predicts the
5-tile code of the sequences (local structure prediction). It
also predicts the
amino acid sequences of given 5-tile code sequences
(protein
design).
The following options are available:
-f, -5, -7, -9, -1,
-2, or -3
By default, 5-tile code of
amino acid sequences are predicted
based on
"frag_code5_full.tbl". If "-f" is specified, the prediction
is based
on "frag_code5_full.tbl". If "-N" is specified (N = 5, 7, 9),
the prediction is made based on "frag_codeN.tbl" and the design is
made based on "frag_codeN_full.tbl". If "-1" is specified, the
prediction is made based on "frag_code15_full.tbl". If "-2" is
specified,
the prediction is made based on "frag_code21_full.tbl". If "-3" is
specified, the prediction is made based on
"frag_code31_full.tbl".
If more than one options are specifed, prediction is made using
all
specified tables. Thus one could use up to four conversion
tables.
(Or five tables if one also uses the "-t" option.) To
use
"frag_code5_full.tbl" with other tables, one should specify
"-f"
explicitly.
-v
By default, the
corresponding ".udcode" file is not created. If
"-v" is specified, the ".udcode" file is created and one could
check the result visually using the "ProteinViewer" program.
If the input is a ".code" file with
both "[Res.]" and "[Code]"
entries, the file is named such as "xxx_comp.udcode" or
"xxx_comp_yyyy.udcode". In other cases, the file is named such
as "xxx_accu.udcode" or "xxx_accu_yyyy.udcode". In all cases,
there should be the corresponding PDB file in the directory
where the input protein file exists.
-t
fragment_to_code_conversion_table
If
"-t" is specified, prediction is made based on the specified
table.
One could use the option with the "-f", "-5", "-7", and
"-9"
options.
-d
By
default, 5-tile codes are predicted from amino acid sequences
using the conversion tables (local structure prediction).
If "-d" is specified, amino acid sequences are predicted from
sequence of 5-tile codes (proetein design). All amino acids of
a (amino acid fragment, 5-tile code fragment) pair in the
conversion table are used for prediction.
-c
If
"-c" is specified, amino acid sequences are predicted from
sequence of 5-tile codes. Unlike the case of "-d", only the
middle point amino acid of a (amino acid fragment, 5-tile code
fragment) pair in the conversion table is used for protein design.
As a result, the hit count shows the frequency of occurrence of
the corresponding 5-tile code fragment in the conversion table.
(That is, an amino acid with no hit implies a new local structure.)
-h
If
"-h" is specified, synopsis is shown.