Data Mining Seminar : Sampling
Instructor : Jeff Phillips
Spring 2012 | Wednesday 1:25 pm - 2:45 pm
Location : MEB 3147 (the LCR)
Catalog number: CS 7931 01


Description:
One of the most obvious ways to deal with the "big data" problem is to simply sample the "big" data set to create a smaller one. Then we can run our algorithms/analysis on the smaller data set, which will complete much more quickly. This raises three obvious questions: This seminar will answer these questions, covering classic results as well as fascinating recent approaches.

This 1-credit seminar will meet once a week and be student driven; each student will be responsible for giving 1 lecture in class (or less depending on class size). Topics will include:
Schedule:
Date Topic Speaker
Wed 8.22 Overview (JEFF-NOTES) Jeff Phillips
Wed 8.29 --- Travel Day ---
Wed 9.05 Accuracy: Properties (e.g. eps-samples and eps-nets, and more details on proof) (JEFF-NOTES) Jeff Phillips
Wed 9.12 Accuracy: Importance ( WP ) and Rejection ( WP ) Sampling ( notes ) Poonam Ekhelikar
Wed 9.19 Markov Chains: Definitions and Introduction ( WP, Chapter 1 ) Yan Zheng / Chancey Ding
Wed 9.26 Markov Chains: Rapidly Mixing and Convergence Analysis ( Conductance opt2 opt3 | Chapter 4 ) Haibo Ding / Chad Miller
Wed 10.03 Markov Chains: Metropolis ( WP, for ML, Chapter 10 ) and Gibbs ( WP, Chapter 6 ) Sampling ( notes, BUGS ) Parasaran Raman / Nazmus Saquib
Wed 10.10 (Fall Break - No Class)
Wed 10.17 Markov Chains: Advanced Sampling ( WP, tempered analysis ) John Moeller
Wed 10.24 Markov Chains: Coupling from the Past ( WP, Java App, Chapter 22, more ) Prasana Muralidharan
Wed 10.31 Efficiency: Reservoir ( WP, Vitter, blog ) and Distributed ( distributed, frequency ) Sampling Gholamreza Esfandani / Mina Ghashami
Wed 11.07 Efficiency: Variance-Optimal Sampling ( Priority Sampling, VarOpt Sampling ) Miaomiao Zhang / Xing Lin
Wed 11.14 Efficiency: L_p Sampling ( L_p | Precision sampling | improved | McGregor slides ) Suresh Venkatasubramanian
Wed 11.21 Beyond Random: Discrepancy and How to Use it ( WP, Chazelle, Matousek ) Supraja Jayakumar / Abishek Trivedi
Wed 11.28 Beyond Random: Hyperbolic Cosine ( Chazelle, ) + other discrepancy-coloring methods ( edge walking ) Samira Daruki / Prafulla Surve
Wed 12.05 Beyond Random: Structure-aware VarOpt Sampling ( range, stream ) Amir Abdullah



Description:
Most resources will be linked directly from each weeks row in the above schedule. However, some great references are available on the more general topics of Markov Chains.
  • A First Course in Bayesian Statistical Analysis by Peter Hoff.
    Provides a nice view of practical MCMC sampling for Bayesian statistics.
  • Markov Chains and Mixing Times by David A. Levin and Yuval Peres and Elizabeth L. Wilmer.
    Provides a nice overview of the theory behind Markov Chains.