Events
Spring 2017

Tuesday ML Seminar

Tuesday, February 28th, 2017, 10:00 am12:00 pm

Add to Calendar

Speaker: 
Location: 

Calvin Lab Room 116

Topic Modeling: From Proof to Practice

Topic Models posit a stochastic generation process for document corpora and devise algorithms to learn the model from real data. Currently, there are two methods of validation: improved efficiency on benchmark corpora up to billions of words and mathematically proven error and time bounds tested on smaller cases. I will present our recent effort where the two meet. The main new algorithm ingredient is an importance sampling procedure inspired by Randomized Linear Algebra. Whereas known topic models posit a near low-rank data matrix, we start with a new high-rank model which allows for realistic noise. The algorithm empirically performs better to scale  than the state of the art.