University of Minnesota
University Relations
http://www.umn.edu/urelate
612-624-6868
myU OneStop


Go to unit's home.

Home | Seminars and Symposia | Past seminars/symposia: Tuesday, October 25, 2011

DTC Science and Technology Innovators Lecture Series

Nonnegative tensor factorizations for sparse count data

by

Tamara G. Kolda
Sandia National Labs

Tuesday, October 25, 2011
4:00 p.m. reception
4:30 p.m. lecture

401/402 Walter Library

KoldaIn data mining and analysis, we often encounter data that is indexed by three or more indices. For example, we may consider a group of twitter posts and index them by poster, hash tag, and date. Tensor factorizations can be used for such "multidimensional" data and have already found application in a variety of fields, including high-dimensional function approximations, signal processing, chemometrics, multi-attribute graph analysis, image recognition, and more. We consider the problem of tensor factorizations for sparse count data. For example, we might have a collection of email messages and arrange them by sender, receiver, and date; the data is sparse because not all potential senders and receivers communicate and even those that do communicate do not necessarily do so every day. Typically, we would fit a tensor factorization model to data by minimizing the least squares difference between the data and the tensor factorization model. Statistically speaking, this means we have assumed that the random variation is best described by a Gaussian distribution. We propose, however, that the random variation may be better described via a Poisson distribution because the count observations (especially the zeros) are more naturally explained. Under this Poisson assumption, we can fit a model to observed data using the negative log-likelihood score, also known as Kullback-Leibler divergence. We discuss an alternating optimization approach and propose solving the subproblem using a majorization-minimization approach that yields multiplicative updates. In fact, the method we propose is a generalization of the Lee-Seung multiplicative updates but has the advantage of being faster and also having provable convergence even for solutions on the boundary (i.e., with zeros in the factorization). Moreover, we show how to prevent the algorithm from converging to non-stationary points (i.e., fixing the problem of getting stuck at undesirable zero values). We will show several examples to demonstrate the utility of the approach. This is joint work with Eric C. Chi (UCLA).

 

Tamara G. Kolda is a Distinguished Member of Technical Staff in the Informatics and Systems Assessments department at Sandia National Laboratories in Livermore, California. Her research interests include multilinear algebra and tensor decompositions, graph models and algorithms, data mining, optimization, nonlinear solvers, parallel computing and the design of scientific software. In multilinear algebra and tensor decompositions, she is best known for her work on the MATLAB Tensor Toolbox and a SIAM Review article on tensor decompositions and applications. Tamara has received a 2003 Presidential Early Career Award for Scientists and Engineers (PECASE). She currently serves as Section Editor for the Software and High Performance Computing Section of the SIAM Journal on Scientific Computing and as Associate Editor for the SIAM Journal on Matrix Analysis and Applications.