Biomarkers play a key role in testing the efficacy of a new drug or finding out adverse reactions to it. For reliable and quick results, modern laboratory methods are used to generate huge amounts of highly complex molecular data. The data then need to be efficiently analysed. The company Genedata, with headquarters in Basel (Switzerland) and offices throughout Europe, including a base in Konstanz (Germany), has developed the software system Genedata Expressionist®, which uses biostatistical methods and algorithms to analyse the interaction of genes, proteins and metabolites and find out whether these molecules can be used as effective biomarkers. Leading pharmaceutical companies around the world rely on this software.
The major problem in the search for suitable biomarkers is the identification of a sufficient amount of suitable genes, proteins and metabolites from tens of thousands of potential molecules for assessing the efficacy of a certain drug. In order to quantify the huge number of biochemical molecules in blood or tissue samples, specific measurement technologies are required for specific types of molecules. State-of-the-art technologies are able to analyse tens of thousands of biological molecules per sample and process thousands of samples in high-throughput processes in a very short time. This gives scientific projects rapid access to terabytes of experimental raw data to be analysed and interpreted.
Genedata Expressionist® is designed to process data reliably and quickly. "Terabytes of raw data can be read and processed using innovative mathematical and statistical methods," explained Dr. Timo Wittenberger, Head of Business Operations of the Konstanz-based Genedata subsidiary. The software generates a list of genes, proteins and metabolites that have the potential to be used as biomarkers for improving the prediction of the effect of drugs or for identifying patients who are likely to respond effectively to a certain drug.
"The software initially integrates data from different experimental processes into a combined dataset, checks the quality of the data, identifies experimental mistakes and delivers a quality report," explained Dr. Timo Wittenberger. It is important to ensure that a high-quality dataset is free from any experimental artefacts (e.g., instrument errors) before it is used in the subsequent analysis phase. Visual methods enable scientists to quickly and intuitively recognise potential errors in several hundred gigabytes of data.
The modular composition of the software enables scientists to use technology-specific modules in order to deal with certain research issues. The Expressionist "Refiner MS" module performs automated quality assessment and pre-processing of mass spectrometry-based proteomics and metabolomics data. "Profiling studies aimed at discovering unknown biomarkers generate thousands of mass spectra in a single experiment," said Dr. Wittenberger. When the constituents of a sample have been separated using chromatographic methods and analysed using ionisation and fragmentation, the "Refiner MS" module generates three-dimensional colour-coded plots from thousands of mass spectra, which consist of run time, mass-charge ratio and intensity. "The software uses sophisticated algorithms for intensity and alignment correction of the spectra, thereby enabling the data to be compared with data from other experiments," explained Dr. Wittenberger. In the MS spectra, each protein or metabolite creates a number of signals that can be identified using the "Refiner MS" module and MS spectrum repositories that are available in Internet databases. In addition to data analysis tools, the software also comes with a database for the storage, administration and search for biomarker data.
The "Genedata Analyst" module, a comprehensive platform for integrating and interpreting experimental data, is used to identify biomarkers from a large number of quantified molecules. "The majority of data is nothing more than natural "noise" which needs to be differentiated from significant signals," said Dr. Wittenberger. Differentiating and interpreting such data is very challenging, as a relatively small number of experiments leads to enormous amounts of data. Statistical tools and flexible interactive data mining capabilities are required to determine the signal-noise ratio and the significance of individual events. The "Analyst" platform offers classification and grouping methods for doing just this.
"The great advantage of the "Analyst" module is that it can be used for huge quantities of data with more than a billion data points," said Wittenberger. In addition, the tool helps to integrate data from different experimental domains. For example, it is possible to directly compare metabolite data and protein expression data and reconstruct metabolic cycles.
The search for biomarkers is supported by other elements of the Genedata technology: "The "Refiner Array" module enables the simultaneous normalisation of thousands of microarrays," said Dr. Wittenberger. Normalisation methods are used to standardise microarray data, for example, to enable differentiation between biological variations and measurement-process variations. "A simple way of normalising data is scaling, in which all the measurement values of an experiment are multiplied with a certain factor, so that the mean value of all measurement values is 1000, for example," said Dr. Wittenberger. The analysis of a transcriptome, in other words, the sum of all RNA molecules produced in a cell, involves DNA chips that determine the activity of known genes by way of hybridisation. The "Refiner Array" module evaluates raw data from microarray experiments for quality issues (signal background, technology-specific parameters, signal intensity), and normalises and condenses the data into selected features before they are analysed with yet another module.
Data generated with modern high-throughput sequencing technologies can be analysed with the "Refiner Genome" module, since these data are usually analysed in the context of genomic positions. "The module can process huge quantities of data generated by high-throughput sequencers as well as visualise data obtained in microarray experiments, thus enabling the direct representation of these two types of data in a genomic context," said Dr. Wittenberger explaining that the module supports the analysis of RNA expression data (on condition that the chromosomal location of the respective genes is known), SNP polymorphisms, DNA methylation and copy number variations (CNVs), all of which can be generated with high-density DNA microarrays.
The Genedata software can also be used to analyse the kinetics of a compound's effect. In order to analyse the kinetics, experimental time-series data are acquired and aggregated to derive meaningful results from time-dependent responses. "Statistical methods are of great importance in studies that precede kinetic analyses because they provide information about the optimal design of the main study," said Timo Wittenberger. Since kinetics experiments generate vast quantities of data, high-throughput software such as "Genedata Expressionist" is necessary for the efficient processing and analysis of such data, which helps to make drug experiments more rapid and efficient, offering a clear benefit for the end user.
In addition to pharmaceutical research, the high-tech software is also used in plant biotechnology, which focuses on the analysis of thousands of plant varieties according to a comprehensive set of criteria. "A flexible and scalable system enabling the rapid analysis of huge amounts of molecular profiling data is crucial for the identification and characterisation of suitable development and breeding candidates," explained Dr. Wittenberger.
With the continuous development of its software technologies such as Expression ist®, Gene data has become a world leader for data analysis solutions that streamline R&D workflows and improve productivity in research and industry. "Our solutions are currently used by 22 out of the 25 leading pharmaceutical companies in the world," said Dr. Timo Wittenberger.
Dr. Timo Wittenberger
Head of Business Operations Konstanz
Gene data GmbH