Sieve - a tool for 3D protein structure description, comparison
  and classification
           Description: Sieve reads
through   a  directory   and calculates average occurrences of patterns of
one, two   and three crossings of   the carbon alpha path of all chains of
protein pdb-files  it finds.
               
   
        
           Download: The Sieve software may be downloaded on these terms.
               
   
           
        Compile: Sieve is compiled entering 
   >cc Sieve.c -lm -O3 -o Sieve
               
   
           
        Run: To run GI enter >Sieve 20 100 25 /path/to/pdb_file_directory/ output.file
  0.000000001
    The program takes five or six arguments. 
    The first two arguments are limits for the length of proteins it should 
 process. For example,  20 100 
 would indicate that the program should only treat proteins for which the 
number of carbon alphas is between 20 and 100. 
    The third argument,  25, is 
a  triangulation parameter. 
    The fourth argument is a directory name (which must end with a " /" ) 
in  which the program will look for protein data files.
    The fifth argument is the name of a file into which output data is to 
be  written. Note, this file is only appended.
    The optional sixth argument, 0.000000001,
  is the amplitude of random noise to be added to atomic coordinates. If
omitted,   this amplitude is set to zero.
               
   
    The program: Sieve searches through the given
 directory  for files ending in `` .pdb''. For each such file, it reads through
 its output  file (which is not overwritten, but only appended to)
to see if there  already is an entry for that protein. If so, it passes over
 to the next one.  If not, it computes the measures for this new protein
if  it can, and appends  a line to the output file if it could.
    The output file is only opened for reading and writing, but not during
 any  computation. Once a line is appended to the output file, the output
stream  is flushed (any buffered but unwritten data is written). This means
that the program can be aborted and restarted without losing more than the
computation  in progress (i.e. one single protein). It also means that one
can first set  the program to treat a set of proteins without any perturbation
of atomic  coordinates (i.e. no sixth argument). It will compute the measures
of those  it can, but not produce an output line for those which caused numerical
problems.  One can then start again with a small perturbation to treat the
remainder.
    
           
        Output: The columns of output.file are
               
   pdb.file   chainID   #C-alphas_missing
  #C-alphas    and then 29 structural measures, ordered as in Table
3 in our paper below, for example
           1cd1C2.pdb  C 0 95   -2.2006067934   23.21.....
           
           
            
Note: We have not
  considered backbones    if more than 3 C-alpha atoms are missing. This
is   because, Sieve connects the carbon alpha atoms it finds and big gaps
in the  backbone thus may give a "backbone"   that is very different from
what the  true backbone was supposed to be.  To compute the number,
#C-alphas_missing,   Sieve just counts the number of carbon alpha atoms and
compare this with  the starting and ending residue number. In the case of
pdb-files with non  consecutive numbering, this may give strange results.