[HOME]
NAME
NtileCodePredictor (ver. 0.4) - Predict the D2 code sequences of given amino acid sequences (local structure prediction). It also predicts the amino acid sequences of given D2 code sequences (protein design).
(About ver. 0.4 : Support different alternative location indicators for different chains in the PDB files.)
(About ver. 0.3 : It outputs not only a ".pred" file but also a ".code" file.)

2008-04-20 : fragment-to-5-tile-code conversion table of ASTRAL SCOP 1.73 95% is available [sample]
2009-02-28 : "frag_code1_full173.tbl" (fragments of length 1) is available.
2009-02-28 : "frag_code5_full173_ID.tbl (fragments with SCOP ID) is available. [sample]
2009-02-28 : "frag_code5_HIV1PR_ID.tbl" (fragments of HIV-1 protease variants) is available. [sample]
2009-02-28 : "frag_code5_full_HIV1PR_ID.tbl" (fragments of HIV-1 protease variants) is available. [sample]
    * 322 crystal structures and 120 NMR models of HIV-1 protease variants are used to compile the "
HIV1PR" tables. [PDB ID list]


DOWNLOADS (Free for academic use)
Goto download page


SYNOPSIS
NtileCodePredictor [-f][-5][-7][-9][-1][-2][-3] [-v] [-d][-c] [-t table_name] [-h]  filename

If you type "NtileCodePredictor filename", you would obtain the predction based
on the "frag_code5_full.tbl" table.

Examples:

INPUT
(1a) A file of amino acid sequences (for local structure prediction):
            ".code" file created by the "ProteinEncoder" program (example),
             or    PDB ".ent" or ".pdb" file (example),
             or    FASTA type amino acid sequence ".seq" file (example).

(1b) A file of 5-tile code sequences (for protein design):
            ".code" file created by the "ProteinEncoder" program (example),
             or    ".code" file without "[Res.]" entries (example).

(2) Fragment-to-5-tile-code conversion tables:
- "frag_code5_full.tbl" if the "-f" option is specified (default),
- "frag_code5.tbl" for prediction and "frag_code5_full.tbl" for design
    if the "-5" option is specified,
- "frag_code7.tbl" for prediction and "frag_code7_full.tbl" for design
    if the "-7" option is specified, 
- "frag_code9.tbl" for prediction and "frag_code9_full.tbl" for design
    if the "-9" option is specified,
- "frag_code15_full.tbl" if the "-1" option is specified,
- "frag_code21_full.tbl" if the "-2" option is specified,
- "frag_code31_full.tbl" if the "-3" option is specified,
-  The specifed conversion table file if the "-t" option is specified.

[NOTE] "frag_code5_full.tbl" is recommended for local structure prediction.
               "frag_code15_full.tbl" and "frag_code21_full.tbl" are
                 recommended for protein design.

[NOTE] Conversion tables should be placed in the current directory.

[NOTE] The tables are the list of all fragments of the ASTRAL SCOP
               1.71-95 and their 5-tile codes:
                   - "frag_codeN.tbl" gives the 5-tile code determined by the
                      fragment of length N,
                   - "frag_code5_full.tbl" also gives the 5-tile code of the two
                      amino acids at both ends.

[NOTE] If the "-v" option is specifed, the corresponding PDB file should
               be placed in the directory where the input protein file exists.


OUTPUT
(1) A ".pred" file
    It contains the predicted 5-tile code sequences and corrresponding
    statistics. The rate of successful prediction is also computed if the 
    input file is a ".code" file which conatins both "[Res.]" and "[Code]"
    entries (example).

    - A "xxx_dsgn.pred" file is created if the "-d" option is specified (example),
    - A "xxx_chck.pred" file is created if the "-c" option is specified (example),
    - A "xxx.pred" file is created otherwise (example1, example2).

(2) A ".code" file (example, example2)

(3) A ".udcode" file
    A ".udcode" file is also created if the "-v" option is specified so that
       one can view the result using the "ProteinViewer" program.

    - A "xxx_dsgn.udcode" file is created if the "-d" option is specified (example),
    - A "xxx_chck.udcode" file is created if the "-c" option is specified (example),
    - A "xxx.udcode" file is created otherwise (example1, example2).


[NOTE] Use "ProteinViewer" version 0.3 or later to view the ".udcode" files.

[NOTE] About "xxx_comp.udcode" (when viewed with "ProteinViwer")
         - Amino acids of failed prediction ("F") are denoted by large red spheres,
         - Amino acids whose code is contained in the prediction ("c") by large yellow spheres,
         - Amino acids without prediction by small blue spheres.
        (Example: pdb1rkl_comp)

[NOTE] About "xxx_accu.udcode" (when viewed with "ProteinViewer")
         - Amino acids with prediction accuracy less than 0.25 are denoted by large red spheres,
         - Amino acids with prediction accuracy less than 0.5 by large yellow spheres,
         - Amino acids without prediction by small blue spheres.
        (Example: pdb1rkl_accu)

[NOTE] About "xxx_comp_yyyy.udcode" (when viewed with "ProteinViewer")
         - Amino acids of successful predction ("-") are denoted by large blue spheres,
         - Amino acids whose code is contained in the prediction ("c") by large yellow spheres,
       
         - Amino acids with no hit (the corresponding code fragments are not
           included in the conversion table) are denoted by small red spheres (i.e., "new structure"),
         - Amino acids with hit count less than 5 by small orange spheres (i.e., "rare structure"),
         - Amino acids with hit count more than 999 by small blue spheres (i.e., "popular structure").
        (Example1: pdb1rkl_comp_dsgn, Example2: pdb1rkl_comp_chck)

[NOTE] About "xxx_accu_yyyy.udcode" (when viewed with "ProteinViewer")
         - Amino acids with no hit (the corresponding code fragments are not
           included in the conversion table) are denoted by small red spheres (i.e., "new structure"),
         - Amino acids with hit count less than 5 by small orange spheres (i.e., "rare structure"),
         - Amino acids with hit count more than 999 by small blue spheres (i.e., "popular structure").
        (Example1: pdb1rkl_accu_dsgn, Example2: pdb1rkl_accu_chck)

DESCRIPTION
NtileCodePredictor reads a file of amino acid sequences and predicts the
5-tile code of the sequences (local structure prediction). It also predicts the
amino acid sequences of given 5-tile code sequences (protein design).

The following options are available:

-f, -5, -7, -9, -1, -2, or -3
By default, 5-tile code of amino acid sequences are predicted
based on "frag_code5_full.tbl". If "-f" is specified, the prediction
is based on "frag_code5_full.tbl". If "-N" is specified (N = 5, 7, 9),
the prediction is made based on "frag_codeN.tbl" and the design is
made based on "frag_codeN_full.tbl". If "-1" is specified, the
prediction is made based on "frag_code15_full.tbl". If "-2" is specified,
the prediction is made based on "frag_code21_full.tbl". If "-3" is
specified, the prediction is made based on "frag_code31_full.tbl".
If more than one options are specifed, prediction is made using
all specified tables. Thus one could use up to four conversion
tables. (Or five tables if one also uses the "-t" option.) To
use "frag_code5_full.tbl" with other tables, one should specify
"-f" explicitly.

-v
By default, the corresponding ".udcode" file is not created. If
"-v" is specified, the ".udcode" file is created and one could
check the result visually using the "ProteinViewer" program.
If the input is a ".code" file with both "[Res.]" and "[Code]"
entries, the file is named such as "xxx_comp.udcode" or
"xxx_comp_yyyy.udcode". In other cases, the file is named such
as "xxx_accu.udcode" or "xxx_accu_yyyy.udcode". In all cases,
there should be the corresponding PDB file in the directory
where the input protein file exists.

-t  fragment_to_code_conversion_table
If "-t" is specified, prediction is made based on the specified
table. One could use the option with the "-f", "-5", "-7", and
"-9" options.

-d
By default, 5-tile codes are predicted from amino acid sequences
using the conversion tables (local structure prediction).
If "-d" is specified, amino acid sequences are predicted from
sequence of 5-tile codes (proetein design). All amino acids of
a (amino acid fragment, 5-tile code fragment) pair in the
conversion table are used for prediction.

-c
If "-c" is specified, amino acid sequences are predicted from
sequence of 5-tile codes. Unlike the case of "-d", only the
middle point amino acid of a (amino acid fragment, 5-tile code
fragment) pair in the conversion table is used for protein design.
As a result, the hit count shows the frequency of occurrence of
the corresponding 5-tile code fragment in the conversion table.
(That is, an amino acid with no hit implies a new local structure.)

-h
If "-h" is specified, synopsis is shown.