Information to collect when analyzing a protein:
- Let's not miss the common knowledge about the protein: What do Uniprot and Wikipedia have to say about it?
- Does the protein have a structure? (check out PDB, PDBsum)
- Is the structure for the full length of the peptide known? If not, are the remaining pieces of the structure predicted to be disordered (see for example Disopred) or predicted to have secondary structure (PSIpred). When in doubt, you can always consult other servers, of your own choice.
- What is the info that is available from the structures: interactants, conformational states, metal binding sites, electrostatic picture (see APBS plug-in for Pymol)... ?
- What is the biologically relevant oligomerization state? (Pisa)
- No structure:
- Can the structure be modeled by homology? (Yes, if the structure has been determined for proteins with similar sequence.) Do we need to model some smaller pieces of structure? Websites to check out Modbase and Annotator
- No related structure has been solved: Which part of the sequence is expected to be ordered? (Consult, for example Disopred). Take the piece of structure and run it through Robetta, and perhaps through I-Tasser to see if there is an agreement (and to which degree). Run the structure through a couple of 10s of ns of an MD simulation (see for example Gromacs - you'll probably need some help if you are novice), to see if it holds in one piece.
- Does the protein contain any known functional domains? (Check out SMART, Pfam or Prosite)? Do these domains have known target motifs? (The info can be found in Prosite domain profile, for example.)
- How unique is your protein in the genome:
- The protein belongs to a family of similar proteins found in the same organism:
check out Cube server for conservation and specialization scoring.
- The protein is unique in an organism: conservation across different organisms
Cube and Specs will provide a spreadsheet in Excel format as well as Pymol session that you can extend with your own annotation. Once you have it all in one place, you can start building your models of protein behavior, consistent with all the prior info you have assembled. Keep track of your progress on your wiki page, for later reference.