University of Minnesota
University Relations
http://www.umn.edu/urelate
612-624-6868
myU OneStop


Go to unit's home.

Home | Seminars and Symposia | Past seminars/symposia: Wednesday, May 15, 2013

DTC Leading Edge Seminar Series

Large scale machine learning at Thomson Reuters

by

Frank Schilder, Thomson Reuters
Ravi Kondadadi, Nuance

Wednesday, May 15, 2013
3:30 p.m. reception
4:00 p.m. seminar

401/402 Walter Library

An overview of recent large-scale experiments with Thomson Reuters news articles. First, we will discuss a couple of typical applications we developed that support professionals such as scientists, journalists and traders (e.g. table extraction, email signature detection). Given a set of hand-annotated data, we developed clustering, multi-class classification, or sequence tagging algorithms to support various products. However, the increase of data available as well as the need to increase the accuracy of current algorithms by using multi-dimension data demands the usage of large-scale machine learning approaches. Recent technological advances made processing and storage of large amounts of data more efficient and MapReduce programming frameworks like Hadoop are excellent in processing such large data sets, but they lack the support for iterative algorithms required for most machine learning algorithms. In our talk, we will discuss Spark, a cluster computing framework from UC Berkeley, that supports iterative map-reduce by providing immutable, distributed, in-memory collections. Working at large-scale also necessitates the need for more sophisticated and automated feature engineering approaches that are more data-driven. We will conclude with a discussion on unsupervised feature learning, also called deep learning, that focuses on learning hierarchical feature representations from unlabeled input data.

 

Schilder

Frank Schilder

Frank Schilder obtained his Ph.D. in Cognitive Science from the University of Edinburgh, Scotland. His research interests include discourse analysis, summarization and information extraction. His summarization work has been implemented as the snippet generator for search results of WestlawNext and he is currently involved in various large-scale machine learning projects. Frank has successfully participated in several research competitions on automatic summarization systems such as the Text Analysis Conference (TAC) carried out by the National Institute of Standards and Technology (NIST). Before joining Thomson Reuters, he was employed by the Department for Informatics at the University of Hamburg, Germany, as an assistant professor.


Kondadadi

Ravi Kondadadi

Ravi Kondadadi holds a Masters degree in Computer Science from the University of Memphis. He primarily focuses his research in the areas of information extraction, text summarization and machine learning. His current interests include the application of semisupervised learning approaches to information extraction problems.