Biomolecules can only fulfil their diverse functions in the cell when they fold into a characteristic native three-dimensional structure. Knowing this structure is not only of paramount importance for basic research, but also for medicine and pharmacology. Scientists from the Karlsruhe Institute of Technology (KIT) have therefore developed a simple method to predict the three-dimensional structure of biomolecules from the analysis of readily available experimental data. The researchers recently received the “Google Faculty Research Award” for their work.
The various physiological functions of living organisms are performed and maintained by a broad range of different biomolecules. Proteins and structured ribonucleic acids have many different roles, including, for example gene regulation, oxygen transport, muscle activity and transferring information inside and outside the cell. The characteristic task that every single biomolecule fulfils in the body is directly related to its structure: the nucleotide or amino acid sequence determines the three-dimensional structure a protein assumes and hence the activity it performs.
Knowing the spatial structure of biomolecules plays a major role in many scientific disciplines: basic research can only study the function and course of metabolic processes when the molecules involved and their structures are known. The same is also true for medicine and pharmacology as biomolecule malfunctions can lead to diseases: effective therapies can often only be developed if the target molecules and their three-dimensional organisations are known. There are well-established, structure determination methods that represent molecules at atomic resolution and elucidate their three-dimensional structure. However, techniques such as X-ray crystallography and nuclear magnetic resonance spectroscopy are rather complex and cannot be performed rapidly with a large number of molecules.
Statistics to predict the 3D structure
The research team led by Dr. Alexander Schug at the Steinbuch Centre for Computing (SCC) at the Karlsruhe Institute of Technology (KIT) has developed an alternative approach for predicting the three-dimensional structure of biomolecules based on the statistical analysis of sequence information stored in free public databases. “At our institute, we use high-performance supercomputers for computationally intensive tasks,” says Schug. “We also use these systems for scientific projects. One such project involved looking at how we could use existing experimental data for predicting three-dimensional structures of proteins. We based our calculations on DNA and protein sequences, available in huge numbers in the aftermath of the Human Genome Project.” Over the past ten years, the KIT researchers have used nucleic acid and protein data of different organisms stored in free public sequence databases to develop algorithms for detecting specific mutation patterns.
The method is based on the assumption that proteins accumulate mutations as they evolve, but that these mutations only slightly change the sequence. When such mutations occur in pairs, the researchers take this as an indication that the sequence positions affected are located in close spatial vicinity to each other. “Screening the databases for pairwise mutations will help us to find out how these mutations are positioned relative to each other,” says Schug. “Information we get from screening a thousand sequences for the presence of all possible mutations at the second codon positions, for example, will help us determine the co-evolution of sequence pairs. These are the data we can use for predicting three-dimensional functional configurations.”
The scientists carry out statistic analyses of large amounts of sequence data using their algorithms and then feed the results into scientific computer programmes with modelling capabilities. “We then use open source software programmes to predict the spatial structure from the mutation patterns,” says Schug. “However, these results are often hugely flawed. We therefore restrict the search to those parts of the three-dimensional protein structure that are located side by side. We know this from the statistical analyses performed. This considerably improves the calculations. After a few days of computing, we receive suggestions on the most likely structures.” It goes without saying that these computing processes lead to huge amounts of data – big data, as Schug calls them. The physicist is convinced that if the data are cleverly analysed, they will come up with other useful results as well.
The Karlsruhe researchers have come up with their first concrete result: they have determined the three-dimensional structure of riboswitches, RNA elements that bind to other molecules by forming a three-dimensional structure. These riboswitches can regulate gene expression. Although the spatial configuration of some riboswitches is already known, it is nevertheless quite difficult to obtain these molecules in the crystalline form necessary for experimental analyses. In a proof-of-principle test, the researchers used their method to predict the structures of six RNA riboswitch families. “We have not yet achieved the experimental resolution, but our results are already pretty good,” says Schug.
Schug and his team have recently been awarded the “Google Faculty Research Award” for this approach that enables them to predict the 3D structure of RNA. The prize is awarded to selected academic research projects in the fields of computer science, the engineering sciences and related areas. The Google grant supports the SCC project for a period of one year with funds totalling 50,000 US dollars.
Schug plans to use the funding to refine the methods. However, he also wants to continue his work with ribonucleic acids. “Although analysing ribonucleic acids is far more difficult than analysing proteins, we have come up with some fairly spectacular results,” says the scientist. And Google is likely to provide us with further support, notably for our work with databases.” The KIT statisticians also plan to involve life scientists in their experiments and are currently working on establishing the first collaborative projects with experimental scientists who “know biological systems far better than us statisticians,” as the physicist says. The long-term goal is to carry out genome-wide analyses and identify the 3D structure of all kinds of proteins and nucleic acids.