App Stats: Reshef on "Detecting Novel Bivariate Associations in Large Data Sets"

We hope you can join us this Wednesday, March 21, 2012 for the Applied Statistics Workshop. David Reshef, an MD/PhD student at the Harvard-MIT Division of Health Sciences and Technology (HST), will give a presentation entitled "Detecting Novel Bivariate Associations in Large Data Sets". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Detecting Novel Bivariate Associations in Large Data Sets"
David Reshef
Harvard-MIT Division of Health Sciences and Technology
CGIS K354 (1737 Cambridge St.)
Wednesday, March 21st, 2012 12.00 pm

Abstract:

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. One way of doing so is to search such data sets for pairs of variables that are closely associated. This can be done by calculating some measure of dependence for each pair, ranking the pairs by their scores, and examining the top-scoring pairs. We outline two heuristic properties--generality and equitability--that the statistic we use to measure dependence should have in order for such a strategy to be effective. We present a measure of dependence for two-variable relationships, the maximal information coefficient (MIC), that has these properties. MIC captures a wide range of associations both functional and not (generality), and assigns similar scores to relationships with similar noise levels, regardless of relationship type (equitability). Finally, we show that MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships.

Posted by Konstantin Kashin at March 19, 2012 12:35 AM