GIT:
Gauss Integrals Tuned - a tool for 3D protein structure description,
comparison and classification
Description: GIT reads through a directory
and calculates generalized Gauss integrals of orders one two and three of all chains
of protein pdb-files it finds. For some of the
Gauss integrals, estimates based on the number of amino acids and on the
values of lover order Gauss integrals are subtracted. These estimates are
based on sampling the contributions to the Gauss-integrals as a function of
domain-size and distance along the backbone for more than 24000 CATH2.4
domains. Hereafter each Gauss integral or Gauss integral minus its estimate
is divided by the number of amino acids to the power making it size
independent. The resulting measure is then non-linearly rescaled such that,
on the average, a perturbation of 0.01 RMSD of the protein gives an 0.01 change of the final structural descriptor.
In short, the resulting structural measures
are as independent of each other and of the size of the protein as
possible. Furthermore, they scale as linearly with RMSD as possible for
small deformations of protein structures. C.f., P. Røgen, Evaluating
protein structure descriptors and tuning Gauss integral based descriptors:
Journal of Physics Condensed Matter, vol: 17,
pages: 1523-1538, 2005.
Download: The GIT.c software may as of March 11’th 2011 be downloaded and used under the GNU General Public License version 3.
Choice of flag: Set the flag "SmoothenBackbone"
in the top of GIT.c to 1 if the Gauss integrals
should be calculated of the smoothened backbone (preferable) or to 0 if the
Gauss integrals should be calculated of the carbon alpha curve.
Compile: GIT is compiled entering >gcc GIT.c -lm -O3 -o GIT
Run:
To run GIT enter >GIT /path/to/pdb_file_directory/ Averge-gauss_integral_file output.file
error.file
The Average-gauss_integral_file: has to be AverageGaussTableSmoothRepresentation if the flag "SmoothenBackbone" is 1 and has to be AverageGaussTableCalphaRepresentation if "SmoothenBackbone"
is set to 0.
Output if "SmoothingBackbone" is
1: The
columns of output.file are
pdb.file chainID
#C-alphas_missing #C-alphas
'19.11 times the third root of the #C-alphas' and then 30 structural
measures ordered in decreasing order after Signal-to-Noise ratios (29 Gauss
integral based and the 10'th is based on the length of the smoothened
curve) for example
1JXD.pdb A 0 98 8.8105e+01
-5.2472e+00 7.3504e+00...
TGM: The Tuned Gauss
(Pseudo) Metric is given by the usual metric on the last 31,
30 or 29 columns, respectively; 30(+1) for the smoothened backbone, 29(+1)
for the carbon alpha curve, and in both cases (+1) if the size of the
protein should be taken into account.
Output if "SmoothingBackbone" is
0: The
columns are as if "SmoothingBackbone"
is 1 except that there is one structural measure less (as the length of the
smoothened curve is not calculated).
Note:
We have not considered backbones if more than 3 C-alpha atoms are missing.
This is because, GIT connects the C-alpha atoms it finds and big gaps in
the backbone thus may give a "backbone" that is very different
from what the true backbone was supposed to be. To compute the
number, #C-alphas_missing, GIT just counts the
number of C-alpha atoms and compare this with the starting and ending
residue number. In the case of pdb-files with non
consecutive numbering, this may give strange results.
Citing the use of this
resource:
P. Røgen, Evaluating protein structure descriptors and tuning Gauss
integral based descriptors: Journal of Physics Condensed Matter, vol: 17, pages: 1523-1538, 2005.
Contacting the author: Peter Røgen Peter.Roegen@mat.dtu.dk
Bibliography: of Gauss integrals applied for
protein structure description.