The deconSTRUCT server runs a program for a pre-filtering or culling of a database of protein structures to prioritize the candidates for a structural match with the query. In other words, it removes the bad candidates for a match, and ranks the good ones, without actually performing the structural alignment on the atomic level. In the final step, a rough alignment on the backbone level is produced for a certain number (the default is 200) of top candidates. This rough alignment is used to estimate the number of alignable residues and produce visualizations for the match.

What deconSTRUCT is not. deconSTRUCT is not (yet) a tool for precise pairwise alignment of protein structures on the backbone level. CE, is our favorite program for that purpose. Rather than the alignment program, deconSTRUCT is a search and comparison server.

 
INPUT

Input consists of PDB and peptide chain identifiers. Alternatively a (piece of) structure in PDB format can be uploaded. You can also specify the range on both the uploaded chain and the chain present on the server (retrieved by PDB identifier).

 
OPTIONS

To understand the available options fully, you will have to consult the accompanying paper. Here is their brief and somewhat simplified description.

In the default mode a rigid search is performed, scanning for the directional match of the secondary structural elements (SSEs). The length mismatch is not penalized, and, to speed up the search the SSEs are grouped if they point in close directions in space. The Gaussian width, δ (see the equation for T below), measuring how loose is the match between SSE directions in two structures, is set to the value of 0.3, giving a good tradeoff between sensitivity and speed.

In the "expert" mode the users can decide the interesting targets for the search by specifying the minimal number of SSEs to attempt a match on, and the size of the allowable mismatch in the SSE direction. The direction mismatch is regulated by a parameter we call δ on the input page. It determines on the strictness of the structural match through a metric we call T, and the optimization of which actually drives the underlying search:

In this equation, x and y are directions of SSE's in two structures, i is the sequential indices for SSEs in x, and M(i) is the indices they are mapped to in y. s is equal to +1 if the corresponding SSE is a helix, and -1 if it is a strand.

The value of δ can be chosen from five values for which certain integrals were tabulated. δ = 0.5 results in the most sensitive search, though one should keep in mind here that choosing δ = 0.5 can slow down the search several times compared to one with δ = 0.2.

The user may also decide to make the search a bit more sensitive by not grouping the near SSEs directions during the search.

The user can regulate the size of the output by picking one or more of the following cutoffs: the maximum number of top hits (to be aligned on the backbone level and displayed in the output table), the maximal acceptable RMSD in the aligned Cαs, the maximal mismatch in the length of SSEs, and the maximal geometric z-score (z). The geometric z-score is a number measuring how likely it would be to match the directions in the two given structures by chance - the more negative the value, the more unlikely it is that it could be the case.

After the postprocessing step (the approximate alignment on the backbone level), the hits are sorted according to the following metric:

Here the sum runs over the aligned pairs of Cαs, d is their distance, and d0 a parameter set to 5Å. We are currently not allowing the user to set the cutoff in this quantity - let us know if we should.

 
Options for the One Structure Against Database Comparison
 
 
OUTPUT

In a one-to-many search the output consists of a table with one line per target, with the following columns:

  • target
  • -- the name of the target structure in the provided database: it will be
     a concatenation of the PDB and the chain identifier
  • aln score A
  • -- alignment score (see above)
  • aln length
  • -- alignment length (in terms on number of aligned Cαs)
  • rmsd
  • -- RMSD for the aligned Cαs
  • <dL>
  • -- average difference in length of the matched SSEs
  • geom z
  • -- z-score for the geometric (i.e. prior to the alignment) match
  • target name
  • -- the target molecule name, according to the original PDB entry
  • more
  • -- transformation, download of superimposed coordinates, and
     visualization (Jmol applet, Pymol and Chimera sessions)
     
    The table is sorted according to the alignment score.

    In a one-to-one search the output consists of several maps between the SSEs and the target (with the residue span quoted for each structure), sorted by the total score assigned to matched SSEs, and visualization as in the case of "more info" (see above) in a one-to-many search case.

     
    TEST SET

    A test set that we use to assess the general performance of our algorithm is available for download. The test set is based on PDB Select 25% and is further filtered to include only multi-domain chains that contain domains matching at least three other members of the test set according the CATH fold-family ('T' level) classification.