The International Cancer Genome Project analyzes the complete genomes of thousands of cancer patients. This generates enormous amounts of data. To find those gene segments which are pivotal for cancer development and treatment, intelligent IT systems are needed. The German Cancer Research Center (Deutsches Krebsforschungszentrum, DKFZ) in Heidelberg and IBM have signed a strategic partnering agreement at CeBIT 2011. The aim of the agreement is to turn sequencing data into useful information for cancer medicine.
"In the coming years, sequencing of cancer genomes will produce considerable amounts of data. This will fundamentally improve diagnosis and treatment of cancer patients," says Professor Otmar D. Wiestler, DKFZ's Scientific Director. "However, in order to actually use the findings from the enormous flood of data, we need intelligent information technology. It will help us to recognize and ultimately use crucial segments. IBM is an ideal partner for us in this great challenge."
Cancer is a disease of the genes. In every cancer cell, many genes are altered compared to healthy cells. However, by no means are always the same genes mutated - they differ from one type of cancer to the other and even from one patient to the other. In many cases it is still unknown which mutations are really pivotal for the process of cancer and which are more likely to be random. To find this out is the aim of the International Cancer Genome Project: In 50 types of cancer, the genomes of 500 cancer patients for each type will be deciphered letter by letter. Each genome generates 2.4 terabytes, which is equal to 24,000 gigabytes. For comparison: A standard laptop currently has a storage capacity of about 100 gigabytes.
DKFZ is involved in three projects of the International Cancer Genome Consortium (ICGC): It coordinates the PedBrain consortium, which analyzes the genome of childhood brain tumors, as well as the study of the genetic causes of prostate cancer in younger patients, and is partner in the German ICGC network for the analysis of malignant lymphomas.The partnering agreement between DKFZ and IBM covers three aspects of handling the gigantic masses of data:
1) First, data will be compressed, similar to MP3 files in the music industry. To this end, new strategies tailored to genome data will be developed.2) Second, the partners will search for solutions how to transfer the gigantic amounts of data from the storages to the computers and to compare them to identify, for example, frequently altered genes.3) Last, the aim of genome analysis is to use results for tailored cancer treatments. To this end, methods will be developed for aligning genome data with clinical parameters such as disease progression or response to targeted drugs. Efficient data processing on the basis of Watson technology will ensure that cancer medicine physicians can use initially unstructured research data for treatment decisions.
All data of the German ICGC projects will be joined together by Professor Roland Eils, who is head of the Theoretical Bioinformatics Division of DKFZ. To this end, Eils is building one of the world’s largest data storage units for life sciences at the BioQuant Center of Heidelberg University. The collaborators expect to operate data storages with a capacity of six petabytes (1 petabyte = 1015 Byte) for storing the genome sequences.