Jaideep Srivastava, Piet de Groen, George Karypis, Vipin Kumar

Mining Structured & Unstructured Life-Sciences Data

In modern clinics, increasing numbers of dictations, laboratory tests and imaging procedures are being added in digitized format to electronic medical records. In the research laboratories the development of automated, robotic analysis facilities has allowed the elucidation of the genomes of many species, as well as analysis of mRNA, protein composition and 3D structure within these species. tissue types; all of this work generates large sets of data. The availability of these complementary datasets provides an unprecedented opportunity to obtain a comprehensive view of diseases, and the hope of providing breakthroughs in our understanding of disease processes.

However, recent developments as outlined above have resulted in so much data that its analysis rapidly is becoming the rate-limiting step in the application of genome-related discoveries to the clinical practice. Moreover, the volume and complex nature of the data require analysis methods that were not possible prior to the development of high power computation platforms and digital availability of data. Thus, there is a need to develop a suite of informatics tools which will provide a uniform, integrated view of all data, free the medical researcher of labor-intensive analysis work and allow the research effort to be spent on questions that enhance the understanding of diseases and disease processes. Our team, consisting of University of Minnesota and Mayo investigators, is initiating a project to fulfill this need.

While the overall goal of our UMN-Mayo collaboration is broad, the proposed project will focus on the specific tasks of (i) mining structured and unstructured patient data, and (ii) post-analysis of the results from the two types of analyses to gain further disease understanding, which would not be possible from any of them alone. Details of this specific project, the proposed approach, and the project tasks are described in the following section. In a sense, we are asking for seed money to finish an important initial phase of our collaboration, which will lead to future external funding from sources such as the NIH and the NSF.