Events
Spring 2017
Tuesday ML Seminar
Tuesday, February 28th, 2017, 10:00 am–12:00 pm
Parent Program:
Speaker:
Location:
Calvin Lab Room 116
Topic Modeling: From Proof to Practice
Topic Models posit a stochastic generation process for document corpora and devise algorithms to learn the model from real data. Currently, there are two methods of validation: improved efficiency on benchmark corpora up to billions of words and mathematically proven error and time bounds tested on smaller cases. I will present our recent effort where the two meet. The main new algorithm ingredient is an importance sampling procedure inspired by Randomized Linear Algebra. Whereas known topic models posit a near low-rank data matrix, we start with a new high-rank model which allows for realistic noise. The algorithm empirically performs better to scale than the state of the art.