PyMOL is a great tool for visualising 3D structures of molecules and proteins. It is open-source and can be easily extended using Python code. And as it is open-source you can base your code on existing functionality. This small tutorial will extend the existing half-sphere exposure (HSE) algorithm from a simple discrete count base to a continuous measurement.
The HSE measure is an indicator for how buried an amino acid residue is within the protein. It simply counts the amino acid neighbours within a half sphere of a given radius. The sphere is divided vertically to the amino acid residue, the $C\alpha$-$C\beta$ vector.
The HSE measure is a very simple discrete measure and will be the subject of our small plugin. We will extend the code to a continuous HSExposure measure which takes distance effects and the search radius into account.
Counting neighbours within a half-sphere is a highly simplified approach and there are features of interest which are not taken into account e.g. the sphere radius or the distance of the neighbour residue. There are also disadvantages which come with the discrete space of a simple count algorithm because a continuous variable is much easier translatable into a probabilistic model. The original HSE code already does most of the work and we will simply extend it for our own needs. This is clearly an advantage of open-source.
The scoring function we will use is defined as follows: \[ s(\text{neighbour}) = \frac{1}{\text{radius}} * w(\text{dist}) \] with the following weights \[ w(\text{dist}) = \frac{4}{1+\exp{\sqrt{10d}-6}} \] This weight function describes a sigmoid function with a flatter gradient for higher values due to the non-linear root. This function was simply chosen because it seemed to perform well for a search radius of 12A.
The result of the weight-function are high values for very close residues and low values for distant residues which means that an amino-acid is more buried by close residues than by distant ones. On the other hand, a residue in a very big sphere counts less than in a smaller sphere by dividing the weight by the sphere radius. This can be interpreted as a residue per volume approach because the radius is the only independent variable of the sphere volume. It also scales the sum of all weights within the sphere so that the algorithm performs well in a high range of different sphere radii.
The main code of the extension is based on the Bio.PDB.HSExposure
module. To reuse the classes in the Bio.PDB module small changes are necessary to make the original code more object orientated by defining a additional function called _add_neighbor()
and making the AbstractHSExposure class globally available. The final hse_extended class does the continuous space calculation by inheriting from the imported HSExposure.py
.
class hse_extended(AbstractHSExposure,HSExposureCA,HSExposureCB): """ this class extends the Bio.PDB.HSExposure.HSExposure class """ def __init__(self, model, radius, offset): """ the hse_extended class is based on HSExposure """ AbstractHSExposure.__init__(self,model,radius,offset,'EXP_HSE_B_U','EXP_HSE_B_D') def _add_neighbor(self,count,d): """ measures the residue per radius instead of a integer quantity of neighbors """ from math import exp return count + ((1.0/radius) * (4.0/(1.0+exp(sqrt(10*d.norm())-6)))) def _get_cb(self, r1, r2, r3): if vector_method == 'CA' : return HSExposureCA._get_cb(self,r1,r2,r3) elif vector_method == 'CB' : return HSExposureCB._get_cb(self,r1,r2,r3)
The hse-extended class performing the HSE calculation in continuous space.
There is a typical problem with the two half-spheres approach: The upper sphere requires a normalisation because the side chain spans into the upper half-sphere. Hence, the upper sphere usually has less neighbour residues which are in general also more distant than neighbour residues in the lower sphere. But when we think of the residues per volume approach the accessible volume of the upper sphere is smaller which means that the amino acid is also with less and more distant neighbour residues quite buried.</p>
The method shows a overall linear correlation between the number of residues and the continuous score but it also shows that there are great variations within a group of the same amount of neighbour residues.
The later visualisation in Pymol makes is necessary to renumber the residues in the PDB file because the CA vector method is not applyable to the first and the last residue. Thus, the PDB files are pre-processed in which residues are renumbered and heteroatoms are detached.
Visualisation of 3G21, 12A, CA-method.
Visualisation of 3CA7, 12A, CA-method
#!/usr/bin/python # -*- coding: utf-8 -*- # hse.py by Jan Teichmann # date: 15/01/2010 """ Calculates the half-spher exposure in Continuous space based on a radius weighted distance weight. Command Line Arguments: 1)PDB-ID 2)radius 3)CA-CB-vector method Example: python hse.py 2PTC 12 CA @PDB-ID: 4 letter PDB-database ID @radius: float @CA-CB-vector method: either CA or CB """ from Bio.PDB import PDBList, PDBParser, PDBIO import sys from HSExposure import * from pylab import * class protein(PDBList,PDBParser): """ creats a new protein-structure object """ def __init__(self,ID): """ initializes the protein class by fetching the pdb file and parsing the structure """ self.pdb = PDBList().retrieve_pdb_file(ID) self.structure = PDBParser().get_structure(ID, self.pdb) class hse_extended(AbstractHSExposure,HSExposureCA,HSExposureCB): """ this class extends the Bio.PDB.HSExposure.HSExposureCB class """ def __init__(self, model, radius, offset): """ the hse_extended class is based on HSExposureCB """ AbstractHSExposure.__init__(self, model, radius, offset, 'EXP_HSE_B_U', 'EXP_HSE_B_D') def _add_neighbor(self,count,d): """ measures the residue per radius instead of a integer quantity of neighbors """ from math import exp return count + ( ( 1.0 / radius ) * ( 4.0/(1.0 + exp( sqrt(10*d.norm() ) - 6)) )) def _get_cb(self, r1, r2, r3): if vector_method == 'CA' : return HSExposureCA._get_cb(self,r1,r2,r3) elif vector_method == 'CB' : return HSExposureCB._get_cb(self,r1,r2,r3) def evaluation(hse, hse_2): """ performs a normalization of the upper spher values and writes the results into a text-file which is read by the Pymol visualization script. """ from scipy import stats result_u = [] result_d = [] result_u_2 = [] result_d_2 = [] count = 0 # read the HSExposure values and rearrange them to a list for i in hse: count += 1 result_u.append( i[1][0] ) result_d.append( i[1][1] ) for ii in hse_2: result_u_2.append( ii[1][0] ) result_d_2.append( ii[1][1] ) # linear regression gradient_u, intercept_u, r_value_u, p_value_u, std_err_u = \ stats.linregress(result_u_2, result_u) gradient_d, intercept_d, r_value_d, p_value_d, std_err_d = \ stats.linregress(result_d_2, result_d) #normalization corr = 0 corr = (intercept_d -intercept_u ) - 0.8*max(result_d_2)*abs(gradient_u - gradient_d) result_u = result_u + corr f1 = open('upper.txt', 'w') f2 = open('lower.txt', 'w') # write values for i in range(1,count+1): f1.write(str(i) + " " + str(result_u[i-1]) + "\n") f2.write(str(i) + " " + str(result_d[i-1]) + "\n") f1.close() f2.close() #drawing results figure() ylim( 0, 1 ) scatter(result_u_2, result_u, color='blue', s=10) scatter(result_d_2, result_d, color='red', s=10) xmin, xmax = xlim() ymin, ymax = ylim() x = arange(0,(xmax+xmin),0.05) plot(x, gradient_u * x + intercept_u + corr, color='blue') text( xmax - 0.2*xmax, ymin+0.05, 'R2: ' + str(round(r_value_u**2,2)), color='blue') plot(x, x*gradient_d + intercept_d, color='red') text( xmax-0.2*xmax, ymin + 0.1, 'R2: ' + str(round(r_value_d**2,2)), color='red') legend( ('upper spher','lower spher'), loc=2) savefig(ID + '_' + str(radius) + 'A_' + vector_method + '.png') close() if __name__ == "__main__": try: ID = sys.argv[1] except: print "no PDB-database ID has been defined!" try: radius = float(sys.argv[2]) except: print "no radius has been defined!" try: vector_method = sys.argv[3] except: print "wrong vector method has been defined! possible are CA or CB" if vector_method != 'CA' and vector_method != 'CB': raise IOError('unknown vector calculation method! \ only CA and CB are possible arguments') pdb = protein(ID) if vector_method == 'CA': i = 0 elif vector_method == 'CB': i = 1 for model in pdb.structure: for chain in model: for residue in chain: id = residue.id if id[0] != ' ': chain.detach_child(id) continue if len(chain) == 0: model.detach_child(chain.id) continue residue.id = (' ', i, ' ') i += 1 IO = PDBIO() IO.set_structure(pdb.structure) IO.save("PDB/"+ID+".pdb") hse = hse_extended(pdb.structure[0],radius,0) if vector_method == 'CA' : hse_2 = HSExposureCA(pdb.structure[0],radius, 0) elif vector_method == 'CB' : hse_2 = HSExposureCB(pdb.structure[0],radius, 0) evaluation(hse,hse_2)
from pymol import cmd import colorsys def make_gradient(nbins,sat=1.0,value=1.0): """ creats a color ramp from red (exposed) over yellow to blue (buried) in nbins color steps based on color ramp by Mark Wall """ r1, g1, b1 = 255, 10, 10 # low red r2, g2, b2 = 255, 255, 10 # mid yellow r3, g3, b3 = 10, 10, 255 # high med blue r1, g1, b1 = float(r1)/255.0, float(g1)/255.0, float(b1)/255.0 r2, g2, b2 = float(r2)/255.0, float(g2)/255.0, float(b2)/255.0 r3, g3, b3 = float(r3)/255.0, float(g3)/255.0, float(b3)/255.0 col=[] for j in range(nbins/2): # create colors in a gradient from low to mid rgb = [r1*((float(nbins)-float(j)*2.0)/float(nbins))+r2*(float(j)*2.0/float(nbins)), \ g1*((float(nbins)-float(j)*2.0)/float(nbins))+g2*(float(j)*2.0/float(nbins)), \ b1*((float(nbins)-float(j)*2.0)/float(nbins))+b2*(float(j)*2.0/float(nbins))] # convert rgb to hsv, modify saturation and value and convert back to rgb hsv = list(colorsys.rgb_to_hsv(rgb[0],rgb[1],rgb[2])) hsv[1] = hsv[1]*sat hsv[2] = hsv[2]*value rgb = colorsys.hsv_to_rgb(hsv[0],hsv[1],hsv[2]) #make color available in Pymol col.append(rgb) cmd.set_color("col" + str(j),col[j]) for j in range(nbins/2,nbins): # create colors in a gradient from mid to high rgb = [r2*((float(nbins)-((float(j+1)-float(nbins)/2.0)*2.0))/ \ float(nbins))+r3*(((float(j+1)-float(nbins)/2.0)*2.0)/float(nbins)), \ g2*((float(nbins)-((float(j+1)-float(nbins)/2.0)*2.0))/ \ float(nbins))+g3*(((float(j+1)-float(nbins)/2.0)*2.0)/float(nbins)), \ b2*((float(nbins)-((float(j+1)-float(nbins)/2.0)*2.0))/ \ float(nbins))+b3*(((float(j+1)-float(nbins)/2.0)*2.0)/float(nbins))] # convert rgb to hsv, modify saturation and value and convert back to rgb hsv = list(colorsys.rgb_to_hsv(rgb[0],rgb[1],rgb[2])) hsv[1] = hsv[1]*sat hsv[2] = hsv[2]*value rgb = colorsys.hsv_to_rgb(hsv[0],hsv[1],hsv[2]) #make color available in Pymol col.append(rgb) cmd.set_color("col" + str(j),col[j]) def color_upper(): """ visualise the values of the upper half sphere Pymol call: color_u """ f1 = open('upper.txt', 'r') for line in f1 : upper = line.split() cmd.select('sele', 'resi ' + str(int(upper[0]))) cmd.color('col'+ str(int(float(upper[1])*100)), 'sele') cmd.deselect() def color_lower(): """ visualise the lower half sphere Pymol call: color_d """ f1 = open('lower.txt', 'r') for line in f1 : lower = line.split() cmd.select('sele', 'resi ' + str(int(lower[0]))) cmd.color('col'+ str(int(float(lower[1])*100)), 'sele') cmd.deselect() # creat a gradient with 100 color steps colors = make_gradient(100) cmd.extend("color_u", color_upper) cmd.extend("color_d", color_lower)