Bioinformatics methods are important tools for the classification of protein sequences. Prof. Dr. Tancred Frickey, Professor of Applied Bioinformatics at the University of Konstanz, has developed a programme that enabled him to classify the AAA ATPase protein family. CLANS software can also be used to visualise the similarities between film actors who have played roles in the same genre category.
Prof. Tancred Frickey, who has returned to Konstanz University after a five-year stay at the Australian National University (ANU) College of Science, is aiming to use computer-based methods to find solutions for life sciences problems. His main research priority is the analysis of protein and gene expression data. "The analysis of kinship degrees between proteins is of such major importance because knowledge of kinship provides us with an idea as to which known protein properties can be transferred to previously undescribed or unknown proteins," explained Prof. Frickey who has developed several software applications, including CLANS, which is used for the three-dimensional visualisation of protein families based on pairwise sequence similarity.
The AAA ATPases, which Prof. Frickey has been able to classify for the first time ever thanks to the CLANS software, is a very special protein family that is found in all organisms and is usually involved in protein unfolding. "Up until a few years ago, we were unable to define which sequences belonged to the AAA ATPase family and which did not. This was because the public protein databases had around 5000 AAA and AAA-related sequences stored," said the bioinformatician.
In addition, mutation saturation of the calculated multiple sequence alignments made it impossible to classify the superfamily. "Using CLANS, we have for the first time ever been able to objectively describe potential AAA ATPases and classify them by identifying the AAA sequences on the basis of their orientation in the sequence space," said the bioinformatician. In combination with phylogenetic methods, the researchers have also been able to decipher the relationship within the AAA subfamilies.
Many of the technologies used in bioinformatics, for example programmes that were developed with the goal of interpreting vegetable sequence data, do not, at least at first sight, have an additional benefit. However, with minor alterations these applications are able to analyse the transcriptome or the human genome. Prof. Frickey's CLANS software can also be used for a broad range of applications: for example, the software can be used to compare several proteomes in order to find out which proteins are present in one species, but not in others. Data from all proteins of the species under investigation are fed into the programme which subsequently visualises pairwise sequence similarities. "If one species is represented in green and the other in red, then we can identify protein groups and families that only occur in one species," said Frickey explaining how CLANS works.
However, not even Prof. Frickey has been able to foresee that it would be possible to use CLANS in areas other than the life sciences: the software enables the visualisation of the similarities between film actors according to the genre in which they have played. "Assuming that films are of great general interest and famous actors are human beings then the software is not only of interest to life scientists, but to people in general and can be used for general, human-oriented applications," said Frickey with a smile.
"Bioinformatics is fast; one can work in a large and varied thematic field and use any computer as a laboratory. In addition, there is only one error source that needs to be taken into account when optimising the programmes and using them for analyses - yourself," said Tancred Frickey describing his passion for this research area. He is convinced that the field of bioinformatics will continue to grow, particularly in the field of data acquisition. He sees a major problem in the fact that new technologies used to integrate data from different databases are developed too slowly. "The superordinate relationships play a very important role in the field of bioinformatics. The composition of genomes on the basis of sequence raw data, for example, enables us to make predictions on the proteins and their function in a developing organism," said Frickey.
Many sequence data are stored in public databases, but not all of them can be applied to a superordinate context nor are they all reliable enough: "Many of the protein sequence annotations are erroneous because their functions and annotations have been adopted from similar sequences without carrying out specific biological experiments. An erroneously annotated protein sequence can thus have a huge influence on other sequences if the properties of this particular sequence are transferred to new proteins," explained Frickey. Conclusions with regard to superordinate relationships, for example on the interactome of proteins in a cell, are thus rather difficult. "However, the generation of data that reveal certain relationships is a very time-consuming process. A technological breakthrough is required to produce data with the quality and quantity that will allow it to be used for a broad field of application," said Prof. Frickey who already has many other projects in mind, which cannot yet be turned into reality due to quality issues with the data.
Further information:Prof. Dr. Tancred FrickeyFaculty of BiologyUniversity of KonstanzUniversitätsstraße 1078457 KonstanzTel.: +49 (0)7531/88-2343E-mail: tancred.frickey(at)uni-konstanz.de