University of Minnesota
University Relations
myU OneStop

Go to unit's home.

Home | Seminars and Symposia | Past seminars/symposia: Thursday, April 18, 2002

DTC Seminar Series

Mining Scientific Data Sets: Challenges & Opportunities


George Karypis
Computer Science & Engineering

Thursday, April 18, 2002
1:00 pm

402 Walter Library

Slide presentation (pdf 636 KB) Data mining is the process of automatically extracting useful information hidden in large data sets. This emerging discipline is becoming increasingly important as advances in data collection have lead to the explosive growth in the amount of available data. This project aims to develop a wide-range of novel data mining algorithms suitable for the characteristics of scientific data sets arising in genomics and fluid dynamics. The research focuses on developing algorithms both for sequential datasets and for datasets that can be represented by directed labeled (geometric and/or topological) graphs. The graph-based modeling enables us to capture in a single and unified framework many of the spatial, topological, geometric, and other types of relational characteristics present in scientific datasets. The specific research tasks that we are currently addressing are: (i) Development of scalable algorithms for finding frequently occurring patterns in graph data sets and algorithms for finding patterns whose frequency decreases as a function of the pattern-length. (ii) Development of scalable and high quality clustering algorithms for sequence and graph data sets which operate directly in the native feature space. (iii) Development of scalable and accurate classification algorithms based on automated sequential or relational feature extraction approaches. In this talk Dr. Karypis will focus on the problem of finding frequently occurring patterns in topological and geometric graphs. Within this context, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs. Dr. Karypis will present computationally efficient algorithms for finding all frequent subgraphs in large topological and geometric graph databases. These algorithms utilize efficient algorithms for canonical labeling and geometric hashing, and are capable of scaling to very large databases.


Dr. Karypis's research interests spans the areas of data mining, parallel algorithm design, information retrieval, collaborative filtering, applications of parallel processing in scientific computing and optimization, sparse matrix computations, parallel preconditioners, and parallel programming languages and libraries. His research has resulted in the development of software libraries for serial and parallel graph partitioning (METIS and ParMETIS), hypergraph partitioning (hMETIS), for parallel Cholesky factorization (PSPASES), for collaborative filtering (SUGGEST), and for clustering high-dimensional data sets (CLUTO). He has coauthored several journal articles and conference papers on these topics and a book title "Introduction to Parallel Computing" (Publ. Benjamin Cummings/Addison Wesley, 1994), and he is a member of ACM, IEEE, and SIAM.