Homology Modeling: A Brief Summary
Abstract
Homology models can help you figure out where the alpha carbons of key residues are in the folded protein. They can be used to direct mutagenesis experiments or to test hypotheses about structure-function relationships. Homology models are unreliable in predicting the conformations of insertions or deletions, that is, portions of the sequence that do not align with the template sequence, as well as sidechain position details. Unless the sequence identity with the template is greater than 70%, homology models are unlikely to be useful in modelling ligand docking (drug design), and even then, they are less reliable than an empirical crystallographic or NMR structure.
Introduction
Assume you want to know the 3D structure of a target protein that hasn't been solved empirically through X-ray crystallography or NMR. You only have the sequence. If an empirically determined 3D structure for a sufficiently similar protein is available (50% or better sequence identity is preferable), you can use software that arranges the backbone of your sequence exactly like this template. This is known as "homology modelling." It is only moderately accurate in regions with high sequence identity for the positions of alpha carbons in the 3D structure. It is incorrect for sidechain position details and for inserted loops with no matching.
Homology Modeling
A homology modelling routine requires three inputs:
- The "target sequence" of the protein with an unknown 3D structure.
- A 3D template with the highest sequence identity with the target sequence is chosen. The template's 3D structure must be determined using reliable empirical methods such as crystallography or NMR, and it is typically a published atomic coordinate "PDB" file from the Protein Data Bank.
- An alignment between the target and template sequences.
First, the homology modelling routine duplicates the backbone of the template. This means that not only are the alpha carbon positions identical to the template but so are the phi and psi angles and secondary structure. Following that, more sophisticated homology modelling packages adjust sidechain positions to reduce collisions and may offer additional energy minimization or molecular dynamics to improve the model.
How Good Can Homology Modelling be?
Even when determined under comparable conditions, two proteins with a high level of sequence identity and very similar secondary and tertiary structures (identical "folds") will not have exactly identical backbone conformations. A homology model should differ from the real structure by at least this much. The root mean square deviation of the positions of alpha carbons or rmsd, is used to quantify overall differences in protein backbone structures. "A model is considered 'accurate enough' or 'as accurate as you can get when its rmsd is within the range of deviations observed for
How Big is This Spread?
The SWISS-MODEL routines were used by the 3DCrunch project to homology model all sequences in the Swiss-Prot database that had appropriate templates. In the same project, 1,200 models for previously solved structures were created to test the accuracy of homology modelling (see Reliability of models generated by SWISS-MODEL). This allowed for the comparison of homology models with empirical structures for the same sequence, where the homology model was created using a template containing the most similar sequence available other than the target sequence itself.
To provide context for rmsd values, consider that up to 0.5 rmsd of alpha carbons can occur in independent determinations of the same protein. Proteins with 50% sequence identity have 1 rmsd on average. The values given above are for X-ray crystallographic determinations; rmsds for NMR determinations are several orders of magnitude higher.
If we define a "highly successful homology model" as having an rmsd of =2 from the empirical structure, then the template must have >=60% sequence identity with the target in order to have a success rate of >70%. Even at high sequence identities (60%-95%), one in ten homology models has an rmsd greater than 5 compared to the empirical structure. Serious errors become more common when the sequence identity falls below 40%.
The Importance of the Sequence Alignment.
The homology modelling routine will then proceed to align the target sequence's backbone with that of the template, using sequence alignment to determine where to position each residue. As a result, the quality of the sequence alignment is critical. Residues will be misplaced in space due to misplaced indels (gaps representing insertions or deletions). Although many routines perform alignments automatically, careful inspection and adjustment by someone with specialised training may improve the quality of the alignment and thus the homology model.
Conclusion
The homology modelling method is based on the observation that the tertiary structure of proteins is more conserved than the amino acid sequence. As a result, even proteins with significant sequence divergence but detectable similarity will share common structural properties, particularly the overall fold.
Steps for Article Submission
Our Open Access relies on Trade Science Inc.’s online manuscript submission, review, and tracking systems to ensure high-quality and timely review processing. For the quality review process, Trade Science Inc. employs the Online Review and Editorial Manager System. The Editorial Tracking System is a manuscript submission and review system that allows authors to submit manuscripts and track their progress online. Reviewers can access manuscripts and provide feedback. Editors can oversee the entire submission/review/revision/publishing process. Publishers can see which manuscripts are waiting to be published. When significant events occur, an E-mail is automatically sent to those who need to know.
Submit your manuscripts via the online link: http://bit.ly/3ipoZ1G
Email: biochemistry@theresearchpub.com
Whatsapp: +44-7915-64-1605