With Bison against a huge amount of data
Prof. Michael Berthold has an ideal colleague in mind for the new EU project: this person has been with the company for decades, knows everybody, has read every important document, has seen all the experiments, read all the articles and talked to everybody. And most of all, this person has an excellent memory and is able to point out interesting relationships.
Prof. Michael Berthold has held the Nycomed Chair for Applied Computer Science, Bioinformatics and Information Mining since 2003. (Photo: University of Constance)
This ideal colleague is able to put his hands immediately on documents that will provide the solution to a problem. He not only has the solution to the problem, but is also a mine of information about the thought processes leading to the solution. The ideal colleague is actually a projected software system that will be developed by a consortium of eight European research teams in a development project due to begin in June. The new system is known as Bison or Bisociation Networks for Creative Information Discovery.
The information network is designed to support creative projects that are held back by the need to trawl through an overwhelming amount of information. At present, the departure point for people looking for information in a database is a specific question. The databases, like the majority of semantic networks, operate within a single domain: they associate a fourth gene with three known genes. Berthold, head of the Department of Bioinformatics and Information Mining at the University of Constance and coordinator of the project, and his partners hope that their system will bisociate, i.e. associate across domain borders. The three genes will not only be associated with a fourth one, but also with something completely different, which, at first sight, might have nothing to do with the original problem.
New questions can be addressed
“We plan to combine different information sources in a loose network,” said Michael Berthold. The networks will function by way of automatic data analyses. The database does not necessarily have to “understand” a text, instead it carries out statistical analyses on the frequency of a certain word or certain word combinations. The more information sources that are combined and the more the stored information has in common, the more probable it is that two things have something in common. This can be the case for texts, experiments, images, etc. The user will discover interesting information that may also raise new issues.
The prototype, which the researchers hope to have developed by the time funding from the EU’s “Future and Emerging Technologies” programme ends in three years’ time, represents a paradigm change in knowledge and information management research. The new approach is aimed at the creation of systems that do not just give specific answers but which try to find stimulating and interesting relationships, says Berthold. “We are drowning in data and are often unable to establish well-maintained information networks. However, nowadays we are able to make connections based on correlations and statistical relationships,” said Michael Berthold. In view of the huge and continually increasing amount of data, the cataloguing of knowledge is difficult to achieve.
Instead of providing long lists of similar documents (or images), future information management systems will help sift through the huge amount of potentially related information rapidly and efficiently. In this way it will resemble the human brain by not providing a long list of “hits”, but through effectively sorting interesting from uninteresting information and concentrating on the most essential.
Source: Uni'kon - issue 30/08