This project aims to visualize protein conformations by optimizing Irbäck's off lattice energy equation. An example of such an implementation can be found here.
A hybrid optimization technique known as genetic annealing is employed to reduce computation times. All data from 10/6/2024 onwards was collected using High Performance Computing.
The conformation of proteins is generally expressed using bond and torsion angles. These will serve as the independent variables of our annealing problem. The output will be a
tuple of amino acid residue coordinates. The conformation of a given protein is most likely to be the one where energy is minimum. Hence, finding the bond and torsion angles
for which the energy function is minimum gives the optimal conformation.
Simulated annealing is an optimization technique that searches solution space for the global minimum through random sampling. In this implementation, the annealing will be carried out using the metropolis condition and Boltzmann values.
In my code, this is implemented as
where
A random component of either the bond or torsion angle vector is chosen and altered by a small value. The variation is controlled by a heterogeneous degree parameter
The condition to determine the favorability of a neighbor is the Metropolis condition. The starting and ending temperature are 1.0
and 1e-12
respectively. The cooling coefficient AnnealingOutput
class serves as a container for the algorithm output. run.py
utilizes the algorithm results and diagnostic data provided by objects of this class.
This version is primitive and produces inaccurate results. One issue to fix in the next version is the energy difference sometimes causes OverFlowError
. The annealing schedule must be optimized.
This version of the algorithm solves the abovementioned OverFlowError
. The algorithm is now O(ml
*n
), where ml
is the markov chain length and n
is the number of iterations. For artificial proteins, the markov chain length is set to 50000. For real proteins, it's set to 10000
. The number of iterations depends on the initial and final temperature, as well as the cooling coefficient.
The file src_np.py
utilizing NumPy delivers results within ~0.5 energy units of the values in the research paper for fibonacci artificial proteins of size 13, 21, and 55, although not consistently. Further versions will aim to improve the frequency of accurate modeling.
A PyTorch version was written for the future to take advantage of GPU acceleration. The NumPy version was tested on the protein 4RXN and yielded an energy value -172.059076
, close to the optimal value of -174.612
. The annealing took just over 15 hours.
Access to high performance computing allows for parallel annealing runs. The annealer was run 10 times independently for artificial proteins [ml=50000
].
From now on, conformation data will be stored in LAMMPS (Large Scale Atomic/Molecular Massively Parallel Simulator) DUMP files. This allows conformation visualization in softwares like OVITO.
The implemented hybrid optimization algorithm aims to maximize the efficiency of the simulated annealing algorithm by patching some of its limitations, namely the high time cost. The algorithm runs a population of solutions in parallel. The relative fitness is determined after each Markov chain step. Fit individuals are chosen to be parents of the next generation. The annealer continues until a fixed number of iterations are achieved.
The first working version was implemented in PyTorch. It had faster convergence than serial simulated annealing but always converged to suboptimal local minima. Further versions will be invested in to adjust the cooling schedule and fitness determination.
A NumPy version was implemented to take advantage of np.savetxt()
making LAMMP DUMP file creation easier.
From now on, conformation data will be stored in LAMMPS (Large Scale Atomic/Molecular Massively Parallel Simulator) DUMP files. This allows conformation visualization in softwares like OVITO.
The final version of GA demonstrated faster convergence and higher precision. It outperformed serial simulated annealing, even for higher length proteins.