7 March 2011
A well-known social scientist once confessed to me that, after decades of doing social research, he still couldn't remember the difference between Type I and Type II errors. Since I suspect that many others also share this problem, I thought I would share a mnemonic I learned from a statistics professor. Recall that a Type I error occurs when the null hypothesis is rejected when it is in fact true, while a Type II error occurs when a null hypothesis is not rejected when it is actually false. This distinction, of course, many people find difficult to remember.
So here's the mnemonic: first, a Type I error can be viewed as a "false alarm" while a Type II error as a "missed detection"; second, note that the phrase "false alarm" has fewer letters than "missed detection," and analogously the numeral 1 (for Type I error) is smaller than 2 (for Type I error). Since learning this mnemonic, I have not forgotten the difference between Type I and Type II errors!
We hope you can join at the Applied Statistics Workshop us this Wednesday, March 9th, when we are excited to have Don Rubin, the John L. Loeb Professor of Statistics here at Harvard University, who will be presenting recent work on job-training programs. You will find an abstract below. As usual, we will begin with a light lunch at 12 noon, with the presentation starting at 12:15p and wrapping up by 1:30p.
“Are Job-Training Programs Effective?”
John L. Loeb Professor of Statistics, Harvard University
Wednesday, March 9th 12:00pm - 1:30pm
CGIS Knafel K354 (1737 Cambridge St)
In recent years, job-training programs have become more important in many developed countries with rising unemployment. It is widely accepted that the best way to evaluate such programs is to conduct randomized experiments. With these, among a group of people who indicate that they want job-training, some are randomly assigned to be offered the training and the others are denied such offers, at least initially. Then, according to a well-defined protocol, outcomes, such as employment statuses or wages for those who are employed, are measured for those who were offered the training and compared to the same outcomes for those who were not offered the training. Despite the high cost of these experiments, their results can be difficult to interpret because of inevitable complications when doing experiments with humans. In particular, some people do not comply with their assigned treatment, others drop out of the experiment before outcomes can be measured, and others who stay in the experiment are not employed, and thus their wages are not cleanly defined. Statistical analyses of such data can lead to important policy decisions, and yet the analyses typically deal with only one or two of these complications, which may obfuscate subtle effects. An analysis that simultaneously deals with all three complications generally provides more accurate conclusions, which may affect policy decisions. A specific example will be used to illustrate essential ideas that need to be considered when examining such data. Mathematical details will not be pursued.
The following links point to a set of tutorials on many aspects of statistical data mining, including the foundations of probability, the foundations of statistical data analysis, and most of the classic machine learning and data mining algorithms.
These include classification algorithms such as decision trees, neural nets, Bayesian classifiers, Support Vector Machines and cased-based (aka non-parametric) learning. They include regression algorithms such as multivariate polynomial regression, MARS, Locally Weighted Regression, GMDH and neural nets. And they include other data mining operations such as clustering (mixture models, k-means and hierarchical), Bayesian networks and Reinforcement Learning.
There is a little modesty in the description here. The slides that I have looked at do a great job motivating the methods using intuition, which is often hugely lacking.