Powered by

Novel method for predicting the spatial structure of biomolecules

Biomolecules can only fulfil their diverse functions in the cell when they fold into a characteristic native three-dimensional structure. Knowing this structure is not only of paramount importance for basic research, but also for medicine and pharmacology. Scientists from the Karlsruhe Institute of Technology (KIT) have therefore developed a simple method to predict the three-dimensional structure of biomolecules from the analysis of readily available experimental data. The researchers recently received the “Google Faculty Research Award” for their work.

The various physiological functions of living organisms are performed and maintained by a broad range of different biomolecules. Proteins and structured ribonucleic acids have many different roles, including, for example gene regulation, oxygen transport, muscle activity and transferring information inside and outside the cell. The characteristic task that every single biomolecule fulfils in the body is directly related to its structure: the nucleotide or amino acid sequence determines the three-dimensional structure a protein assumes and hence the activity it performs.


  • Desoxyribonucleic acid (DNA) is a double-stranded, helical macromolecule encoding the genetic information of an organism.
  • A gene is a hereditary unit which has effects on the traits and thus on the phenotype of an organism. Part on the DNA which contains genetic information for the synthesis of a protein or functional RNA (e.g. tRNA).
  • The genome is entire genetic material of an organism. Each cell of an organism contains the entire genetic material in its nucleus.
  • Being lytic is the feature of a bacteriophage leading to the destruction (lysis) of the host cell upon infection.
  • There are two definitions for the term organism: a) Any biological unit which is capable of reproduction and which is autonomous, i.e. that is able to exist without foreign help (microorganisms, fungi, plants, animals including humans). b) Definition from the Gentechnikgesetz (German Genetic Engineering Law): “Any biological unit which is capable of reproducing or transferring genetic material.“ This definition also includes viruses and viroids. In consequence, any genetic engineering work involving these kinds of particles is regulated by the Genetic Engineering Law.
  • A protein is a high-molecular complex made up of amino acids. The proteins perform a wide variety of activities in the cells and represent more than 50% of organic mass.
  • Ribonucleic acid (abbr. RNA) is a normally single-stranded nucleic acid, which is very similar to DNA. It also consists of a sugar-phosphate backbone and a sequence of four bases. However, the sugar is a ribose and instead of thymine, RNA contains uracil. RNA has got various forms and functions; e.g. it serves as template during protein synthesis and it also constitutes the genome of RNA viruses.
  • Screening is a systematic test procedure that is used to identify certain characteristics within an array of samples or persons. In molecular biology screening is used to filter a designated clone out of a gen bank, for example.
  • Selection in a biological context means the assortment of organisms due to their characteristics. On the one hand, this could be natural selection ("survival of the fittest") like in evolutionary processes. On the other hand, selection by man, e.g. breeding, is called artificial selection. Artificial selection is also used in genetic engineering to identify a genetically modified organism due to its new characteristics (e.g. resistance to antibiotics).
  • Genetic sequences are successions of the bases adenine, thymine, guanine, and cytosine on the DNA (or uracil instead of thymine in the case of RNA).
  • Expression means the biosynthesis of a gene product. Usually, DNA is transcribed into mRNA and subsequently translated into proteins.
  • Physiology is the study of the biochemical and physical processes in cells, tissues and organs of creatures.
  • The term metabolism includes the uptake, transport, biochemical conversion and excretion of substances within an organism. These processes are necessary to build up the body mass and to meet the energy demand of the body. The opposed processes of metabolism are called anabolism and catabolism. Effectiveness of several enzymes could be catabol and anabol. Within one biochemical pathway they cannot work in both directions at the same time.
  • Biomolecules which can bind active agents are called targets. They can be receptors, enzymes or ion channels. If agent and target interact with each other the term agent-target-specific effect is used. The identification of targets is very important in biomedical and pharmaceutical research because a specific interaction can help to understand basic biomolecular processes. This is essential to identify new points of application.
Dr. Alexander Schug is a physicist. He has developed a method to predict the three-dimensional structure of biomolecules. © private

Knowing the spatial structure of biomolecules plays a major role in many scientific disciplines: basic research can only study the function and course of metabolic processes when the molecules involved and their structures are known. The same is also true for medicine and pharmacology as biomolecule malfunctions can lead to diseases: effective therapies can often only be developed if the target molecules and their three-dimensional organisations are known. There are well-established, structure determination methods that represent molecules at atomic resolution and elucidate their three-dimensional structure. However, techniques such as X-ray crystallography and nuclear magnetic resonance spectroscopy are rather complex and cannot be performed rapidly with a large number of molecules.

Statistics to predict the 3D structure

The research team led by Dr. Alexander Schug at the Steinbuch Centre for Computing (SCC) at the Karlsruhe Institute of Technology (KIT) has developed an alternative approach for predicting the three-dimensional structure of biomolecules based on the statistical analysis of sequence information stored in free public databases. “At our institute, we use high-performance supercomputers for computationally intensive tasks,” says Schug. “We also use these systems for scientific projects. One such project involved looking at how we could use existing experimental data for predicting three-dimensional structures of proteins. We based our calculations on DNA and protein sequences, available in huge numbers in the aftermath of the Human Genome Project.” Over the past ten years, the KIT researchers have used nucleic acid and protein data of different organisms stored in free public sequence databases to develop algorithms for detecting specific mutation patterns.

The method is based on the assumption that proteins accumulate mutations as they evolve, but that these mutations only slightly change the sequence. When such mutations occur in pairs, the researchers take this as an indication that the sequence positions affected are located in close spatial vicinity to each other. “Screening the databases for pairwise mutations will help us to find out how these mutations are positioned relative to each other,” says Schug. “Information we get from screening a thousand sequences for the presence of all possible mutations at the second codon positions, for example, will help us determine the co-evolution of sequence pairs. These are the data we can use for predicting three-dimensional functional configurations.”

Big data: huge amounts of data accumulate

The scientists carry out statistic analyses of large amounts of sequence data using their algorithms and then feed the results into scientific computer programmes with modelling capabilities. “We then use open source software programmes to predict the spatial structure from the mutation patterns,” says Schug. “However, these results are often hugely flawed. We therefore restrict the search to those parts of the three-dimensional protein structure that are located side by side. We know this from the statistical analyses performed. This considerably improves the calculations. After a few days of computing, we receive suggestions on the most likely structures.” It goes without saying that these computing processes lead to huge amounts of data – big data, as Schug calls them. The physicist is convinced that if the data are cleverly analysed, they will come up with other useful results as well.

3D structure of a riboswitch with the corresponding RNA sequences like the one predicted by the researchers from Karlsruhe. © Alexander Schug

The Karlsruhe researchers have come up with their first concrete result: they have determined the three-dimensional structure of riboswitches, RNA elements that bind to other molecules by forming a three-dimensional structure. These riboswitches can regulate gene expression. Although the spatial configuration of some riboswitches is already known, it is nevertheless quite difficult to obtain these molecules in the crystalline form necessary for experimental analyses. In a proof-of-principle test, the researchers used their method to predict the structures of six RNA riboswitch families. “We have not yet achieved the experimental resolution, but our results are already pretty good,” says Schug.

Google Faculty Award for alternative method

Schug and his team have recently been awarded the “Google Faculty Research Award” for this approach that enables them to predict the 3D structure of RNA. The prize is awarded to selected academic research projects in the fields of computer science, the engineering sciences and related areas. The Google grant supports the SCC project for a period of one year with funds totalling 50,000 US dollars.

Schug plans to use the funding to refine the methods. However, he also wants to continue his work with ribonucleic acids. “Although analysing ribonucleic acids is far more difficult than analysing proteins, we have come up with some fairly spectacular results,” says the scientist. And Google is likely to provide us with further support, notably for our work with databases.” The KIT statisticians also plan to involve life scientists in their experiments and are currently working on establishing the first collaborative projects with experimental scientists who “know biological systems far better than us statisticians,” as the physicist says. The long-term goal is to carry out genome-wide analyses and identify the 3D structure of all kinds of proteins and nucleic acids.

Website address: https://www.gesundheitsindustrie-bw.de/en/article/news/novel-method-for-predicting-the-spatial-structure-of-biomolecules/