GIT: Gauss Integrals Tuned - a tool for 3D protein structure description, comparison and classification

Description: GIT reads through a directory and calculates generalized Gauss integrals of orders one two and three of all chains of protein pdb-files it finds. For some of the Gauss integrals, estimates based on the number of amino acids and on the values of lover order Gauss integrals are subtracted. These estimates are based on sampling the contributions to the Gauss-integrals as a function of domain-size and distance along the backbone for more than 24000 CATH2.4 domains. Hereafter each Gauss integral or Gauss integral minus its estimate is divided by the number of amino acids to the power making it size independent. The resulting measure is then non-linearly rescaled such that, on the average, a perturbation of 0.01 RMSD of the protein gives an 0.01 change of the final structural descriptor.

In short, the resulting structural measures are as independent of each other and of the size of the protein as possible. Furthermore, they scale as linearly with RMSD as possible for small deformations of protein structures. C.f., P. Røgen, Evaluating protein structure descriptors and tuning Gauss integral based descriptors: Journal of Physics Condensed Matter, vol: 17, pages: 1523-1538, 2005.
Download: The GIT.c software may as of March 11’th 2011 be downloaded and used under the GNU General Public License version 3.
Choice of flag: Set the flag "SmoothenBackbone" in the top of GIT.c to 1 if the Gauss integrals should be calculated of the smoothened backbone (preferable) or to 0 if the Gauss integrals should be calculated of the carbon alpha curve.
Compile: GIT is compiled entering  >gcc GIT.c -lm -O3 -o GIT
Run: To run GIT enter >GIT /path/to/pdb_file_directory/ Averge-gauss_integral_file output.file error.file
The Average-gauss_integral_file: has to be AverageGaussTableSmoothRepresentation if the flag "SmoothenBackbone" is 1 and has to be AverageGaussTableCalphaRepresentation if "SmoothenBackbone" is set to 0.
Output if "SmoothingBackbone" is 1: The columns of output.file are

pdb.file   chainID   #C-alphas_missing   #C-alphas   '19.11 times the third root of the #C-alphas' and then 30 structural measures ordered in decreasing order after Signal-to-Noise ratios (29 Gauss integral based and the 10'th is based on the length of the smoothened curve) for example
1JXD.pdb A 0 98 8.8105e+01 -5.2472e+00 7.3504e+00...

TGM: The  Tuned Gauss (Pseudo) Metric is given by the usual metric on the last 31, 30 or 29 columns, respectively; 30(+1) for the smoothened backbone, 29(+1) for the carbon alpha curve, and in both cases (+1) if the size of the protein should be taken into account.
Output if "SmoothingBackbone" is 0: The columns are as if "SmoothingBackbone" is 1 except that there is one structural measure less (as the length of the smoothened curve is not calculated).

Note: We have not considered backbones if more than 3 C-alpha atoms are missing. This is because, GIT connects the C-alpha atoms it finds and big gaps in the backbone thus may give a "backbone" that is very different from what the true backbone was supposed to be.  To compute the number, #C-alphas_missing, GIT just counts the number of C-alpha atoms and compare this with the starting and ending residue number. In the case of pdb-files with non consecutive numbering, this may give strange results.

 

 

 

Citing the use of this resource: P. Røgen, Evaluating protein structure descriptors and tuning Gauss integral based descriptors: Journal of Physics Condensed Matter, vol: 17, pages: 1523-1538, 2005.
Contacting the author: Peter Røgen  Peter.Roegen@mat.dtu.dk

Bibliography: of Gauss integrals applied for protein structure description.