ProteinEncoder

Create two files in the current directory: One, whose suffix is ".udcode", contains the detaild information and is to be feed into "ProteinViewer" program. The other, whose suffix is ".code", contains the obtained binary sequence of "U" and "D" (or "N-tile codes" if -l option is specified, where "0" stands for DDDDD, "1" for DDDDU, "2" for DDDUD, ... , "A" for DUDUD, "B" for DUDUU, ... , and "R" for UUDUU). In the case of 5-tile code,

amino-acids of code DUDUD (="A") are denoted by large red spheres
amino-acids of code DDDDD (="0") by small blue spheres
amino-acids of code UUDUU (="R") by large yellow spheres
amino-acids of code UUDUD (="Q") and DUDUU (="B") by large orange spheres
amino-acids of code UDDDD (="G") by large green spheres

when viewed with ProteinViwer version 0.3 or later (Example: udcode_file and its image)

(Example: udcode file and code file, Example of the 5-tile coding: udcode file, code file, and its image)

ProteinEncoder reads a PDB file (xxx.ent or xxx.pdb) and encodes a protein or RNA/DNA structure into a {U,D}-valued binary sequence based on the second derivative of tetrahedron tiles. You should change the suffix of the input file to ".ent" or ".pdb" if the file ends with another one. To view N-tile codes, you should use ProteinViewer ver.0.2 or later.

The following options are available:

-f, -r, or -t

By default, protein structures are encoded by folding a tetrahedron sequence with rotation and translation. If -f is specified, encoded by folding only. If -r is specified, encoded by folding with rotation. If -t is specified, encoded by foldong with rotation and translation, where CA atoms only are considered (default behaviour). In the case of -t, the type of the initial tile is always XYZW and its direction is normally D. (In the case of erroneous data, the direction could be U.)
(Examples: folding only, folding with rotation, and folding with rotation and translation)

-i index

By default, a protein structure is encoded several times starting from different initial atoms until a reasonable result is obtained (in the case of -f and -r) or encoding is started from the mid point CA atom (in the case of -t). If the -i option is specified, encoding is started from the "index"-th atom (or a nearby CA atom in the case of -f and -t).

-l length

If the -l option is specified, a protein structure is encoded with "length"-tile code. By default, each amino-acid fragment of length "length" is encoded by folding a tetrahedron sequence with rotation and translation. If -f, -r or -t option is specified, fragments are encoded accordingly. By default, length is 5.

-h

If the -h option is specified, synopsis is shown.

-R

If "-R" is specified, RNAs and DNAs are encoded (instead of proteins). CA stands for C1' atom and C stands for P atom in ".udcode" file. And the backbone is rendered as a broken line obtained by connecting P and C1' atoms when viewd using ProteinViewer. The 5' terminus is rendered as a small red ball and the 3' terminus is as a small blue ball by ProteinViewer.