Create two files in the
current directory: One, whose suffix is ".udcode", contains the detaild
information and is to be feed into "ProteinViewer" program. The other,
whose suffix is ".code", contains the obtained binary sequence of "U"
and "D" (or
"N-tile
codes" if -l option is specified, where "0" stands for DDDDD, "1" for
DDDDU, "2" for DDDUD, ... , "A" for DUDUD, "B" for DUDUU, ... , and "R"
for
UUDUU). In the case of
5-tile code,
- amino-acids of code DUDUD (="A") are denoted by large red
spheres
- amino-acids of code DDDDD (="0") by small blue spheres
- amino-acids
of code UUDUU (="R") by large yellow
spheres
- amino-acids of code UUDUD (="Q") and DUDUU (="B") by large
orange spheres
- amino-acids
of code UDDDD (="G") by large green spheres
when viewed with ProteinViwer
version
0.3 or later (Example:
udcode_file
and
its
image)
(Example:
udcode
file and
code
file, Example of the 5-tile coding:
udcode
file,
code
file, and
its
image)
ProteinEncoder reads a PDB
file (xxx.ent or xxx.pdb) and encodes a protein or RNA/DNA
structure into a
{U,D}-valued
binary sequence based on the second derivative of tetrahedron tiles.
You should change the suffix of the input file to ".ent" or ".pdb" if
the
file
ends with another one. To
view N-tile codes, you should use ProteinViewer ver.0.2 or later.
The following options are available:
-f, -r, or -t
By default, protein
structures are encoded by folding a tetrahedron sequence with rotation
and translation.
If
-f is
specified, encoded by folding only. If
-r
is specified, encoded by folding with rotation. If
-t is specified,
encoded by foldong with rotation and translation, where CA atoms only
are considered (default behaviour). In
the case of -t, the type
of the initial tile is always XYZW and its direction is normally D. (In
the case of erroneous data, the direction could be U.)
(Examples:
folding
only,
folding
with rotation, and
folding
with rotation and translation)
-i index
By default, a protein
structure is encoded several times starting from different initial
atoms until a reasonable result is obtained (in the case of -f and -r)
or encoding is started from the mid point CA atom (in the case of -t).
If the -i
option is specified, encoding is started from the "index"-th
atom (or a nearby CA atom in the case of -f and -t).
-l length
If
the -l option
is specified, a protein structure is encoded with "length"-tile code.
By default, each amino-acid fragment of length "length" is
encoded
by folding a tetrahedron sequence with rotation and
translation.
If -f, -r or -t option is specified, fragments are encoded
accordingly. By
default, length is 5.
-h
If
the -h option is specified, synopsis is shown.
-R
If
"-R" is specified, RNAs and DNAs are encoded (instead of proteins). CA
stands for C1' atom and C stands for P atom in ".udcode"
file. And the backbone is rendered as a broken line obtained by
connecting P and C1' atoms when viewd using ProteinViewer. The 5'
terminus
is rendered as a small red ball and the
3' terminus is as a small blue ball by ProteinViewer.