[HOME]
NAME
ProteinEncoder (ver. 0.8)- Encode a protein or RNA/DNA structure into a {U, D}-valued binary sequence based on the second derivative of tetrahedron tiles.
(About ver. 0.8 : Support different alternative location indicators for different chains.)
(About ver. 0.7 : Any alternative location indicator of the PDB ATOM format is supported. Previously only ' ' and 'A' are supported and other locations are ignored.)


DOWNLOADS (Free for academic use)
Goto download page

SYNOPSIS
ProteinEncoder [-f, -r, or -t] [-i index] [-l length][-h] [-R]    filename

        If you type "ProteinEncoder filename", you would obtain the 5-tile code of the protein.
        Examples:
            % ProteinEncoder filename           (The 5-tile code of the protein)
            % ProteinEncoder -R filename      (The 5-tile code of the RNA/DNA)
            % ProteinEncoder -l 3 filename     (The 3-tile code of the protein)
            % ProteinEncoder -f -l 1 filename  (Approximation of the structure of the protein by
                                                                    a "folded" sequence of tetrahedron tiles)
            % ProteinEncoder -r -l 1 filename  (Approximation of the structure of the protein by
                                                                    a "folded with rotation" sequence of
                                                                    tetrahedron tiles)

INPUT
A PDB file with suffix ".ent" or ".pdb". (Example)

OUTPUT
Create two files in the current directory: One, whose suffix is ".udcode", contains the detaild information and is to be feed into "ProteinViewer" program. The other, whose suffix is ".code", contains the obtained binary sequence of "U" and "D" (or "N-tile codes" if -l option is specified, where "0" stands for DDDDD, "1" for DDDDU, "2" for DDDUD, ... , "A" for DUDUD, "B" for DUDUU, ... , and "R" for UUDUU). In the case of 5-tile code, when viewed with ProteinViwer version 0.3 or later (Example: udcode_file and its image)

(Example: udcode file and code file,  Example of the 5-tile coding: udcode file, code file, and its image)

DESCRIPTION
ProteinEncoder reads a PDB file (xxx.ent or xxx.pdb) and encodes a protein or RNA/DNA structure into a {U,D}-valued binary sequence based on the second derivative of tetrahedron tiles. You should change the suffix of the input file to ".ent" or ".pdb" if the file ends with another one. To view N-tile codes, you should use ProteinViewer ver.0.2 or later.

The following options are available:

-f, -r, or -t
By default, protein structures are encoded by folding a tetrahedron sequence with rotation and translation. If -f is specified, encoded by folding only. If -r is specified, encoded by folding with rotation. If -t is specified, encoded by foldong with rotation and translation, where CA atoms only are considered (default behaviour). In the case of -t, the type of the initial tile is always XYZW and its direction is normally D. (In the case of erroneous data, the direction could be U.)
(Examples: folding only, folding with rotation, and folding with rotation and translation)

-i  index
By default, a protein structure is encoded several times starting from different initial atoms until a reasonable result is obtained (in the case of -f and -r) or encoding is started from the mid point CA atom (in the case of -t). If the -i option is specified, encoding is started from the "index"-th atom (or a nearby CA atom in the case of -f and -t).

-l  length
If the -l option is specified, a protein structure is encoded with "length"-tile code. By default, each amino-acid fragment of length "length" is encoded by folding a tetrahedron sequence with rotation and translation. If -f, -r or -t option is specified, fragments are encoded accordingly.  By default, length is 5.

-h
If the -h option is specified, synopsis is shown.

-R
If "-R" is specified, RNAs and DNAs are encoded (instead of proteins). CA stands for C1' atom and C stands for P atom in ".udcode" file. And the backbone is rendered as a broken line obtained by connecting P and C1' atoms when viewd using ProteinViewer. The 5' terminus is rendered as a small red ball and the 3' terminus is as a small blue ball by ProteinViewer.