Spreadsheet output produced by Cube
   
Conservation module of Cube server produces spreadsheet (xls format) with conservation score mapped onto the reference sequence.
The column contents are the following:
  • Position number in the alignment (if you did not provide the alignment yourself, you can find the alignment produced by Cube in the 'work directory' available for download on the results page).
  • Percentage of gaps at this alignment position. Large number indicates that many sequences in the alignment do not have this position at all. Note that the missing residues might be an artifact of protein sequencing or prediction from genomics studies. Ideally, you should replace such sequences with more reliable ones. Otherwise, just take this column as a potential trouble sign.
  • Conservation score. The scoring method used can be changed at the input page. By default it is real-valued trace. See here for the original references. All scores are turned into "fractional coverage:" the top fraction to which each residue belongs according to the used score. Thus "0.14" means "this residue belongs to the top 14% conserved residues".

    Very important thing to keep in mind here. This estimate should always be concluded with the caveat clause "... based on the set of sequences and their alignment provided for the analysis." So - "this residue belongs to top 14% conserved residues in all metazoan proteins." But the same residue might as well belong to the top 43% conserved in vertebrate sequences, simply because these 43% of positions are identical in all vertebrate versions of the protein.

    The fractional coverage score is further binned into 20 bins, and colored according to the colorbar sown in the last column in the table.

  • Residue type in the representative (reference) sequence.
  • Sequential number in the representative (reference) sequence.
  • If the structure is provided, the following three columns appear:
    • Residue type in the structure
    • PDB residue identifier (mostly corresponding to the sequential number in the structure).
    • Surface. The column contains the word "surface" if the solvent accessibility for this residue is > 10Å2, according to DSSP program.
  • If the annotation is provided at the input page, it will appear in the last contents column (the very last column we reserve for the colorbar).
Specialization module of Cube server produces spreadsheet (xls format) with both conservation and specialization score mapped on the reference sequence.
The number of columns will depend on the number of groups in the analysis, and on whether the structure and/or annotation were provided, but they will always be one of the following:
  • Position number in the alignment (if you did not provide the alignment yourself, you can find the alignment produced by Cube in the 'work directory' available for download at the results page).
  • Percentage of gaps a this alignment position. Large number indicates that many sequences in the alignment do not have this position at all. Note that the missing residues might be an artifact of protein sequencing or prediction from genomics studies. Ideally, you should replace such sequences with more reliable ones. Otherwise, just take this column as a potential trouble sign.
  • Conservation - in the specialization module it is always Shannon entropy. Both conservation and specificity scores are turned into fractional coverage, see above .
  • Specificity:
    • 'Discriminant' positions - positions that are well conserved within each group under consideration but different across different groups (labeled as 'specificity' in the Pymol session).
    • 'Determinants' for each group you provided. These are positions that are relatively well conserved in the reference group, and different then in the remaining groups (note that now we are not saying anything about the conservation status in other groups. The position may be conserved or variable in the other groups, as long as the overlap with the amino acid types in the reference group is small).
  • Representative sequences: residue type and sequential number in each.
  • If the structure is provided, the following three columns appear:
    • Residue type in the structure
    • PDB residue identifier (mostly corresponding to the sequential number in the structure).
    • Surface. The column contains the word "surface" if the solvent accessibility for this residue is > 10Å2, according to DSSP program.
  • If the annotation is provided at the input page, it will appear in the last contents column (the very last columns we reserve for the two colorbars - one for conservation and one for specialization).