March 2012
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee


Matt Blackwell (Gov)


Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries



SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
The Education Wonks
Empirical Legal Studies
Free Exchange
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science



Powered by
Movable Type 4.24-en

« March 6, 2012 | Main | March 26, 2012 »

19 March 2012

App Stats: Reshef on "Detecting Novel Bivariate Associations in Large Data Sets"

We hope you can join us this Wednesday, March 21, 2012 for the Applied Statistics Workshop. David Reshef, an MD/PhD student at the Harvard-MIT Division of Health Sciences and Technology (HST), will give a presentation entitled "Detecting Novel Bivariate Associations in Large Data Sets". A light lunch will be served at 12 pm and the talk will begin at 12.15.

"Detecting Novel Bivariate Associations in Large Data Sets"
David Reshef
Harvard-MIT Division of Health Sciences and Technology
CGIS K354 (1737 Cambridge St.)
Wednesday, March 21st, 2012 12.00 pm


Identifying interesting relationships between pairs of variables in large data sets is increasingly important. One way of doing so is to search such data sets for pairs of variables that are closely associated. This can be done by calculating some measure of dependence for each pair, ranking the pairs by their scores, and examining the top-scoring pairs. We outline two heuristic properties--generality and equitability--that the statistic we use to measure dependence should have in order for such a strategy to be effective. We present a measure of dependence for two-variable relationships, the maximal information coefficient (MIC), that has these properties. MIC captures a wide range of associations both functional and not (generality), and assigns similar scores to relationships with similar noise levels, regardless of relationship type (equitability). Finally, we show that MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships.

Posted by Konstantin Kashin at 12:35 AM