June 8, 2009
The idea of the Hawthorne effect is that individuals may change their behavior because they are being studied, in addition to any real effects of the intervention. Steven Levitt and John List have revisited the illumination experiments at the Hawthorne plant that gave name to the effect, and argue that many of the original conclusions do not hold up to scrutiny. There's an Economist article on the paper here but its subtitle "Being watched may not affect behavior, after all" is misleading: even if the earlier research was sloppy by today's standards the contribution was to point out the possibility of these effects. A better subtitle could have commended replication as important scientific method.
Levitt and List (2009) "Was there Really a Hawthorne Effect at the Hawthorne Plant? An Analysis of the Original Illumination Experiments" NBER Working Paper #15016, http://www.nber.org/papers/w15016.pdf
The Economist (June 4, 2009) "Light work: Questioning the Hawthorne effect", http://www.economist.com/finance/displayStory.cfm?story_id=13788427
February 5, 2009
A new NBER paper by Angus Deaton takes on the trendiness of randomized trials, instrumental variables and natural experiments in development economics. One of the main points: well-designed experiments are most useful when they help uncover general mechanisms (i.e. inform theory) and can support real-life policy-making outside their narrow context. A good if lengthy read.
Deaton, A (2009) Instruments of development: Randomization in the tropics, and the search for the elusive keys to economic development, NBER Working Paper 14690. http://papers.nber.org/papers/w14690
Harvard users click here.
There is currently much debate about the effectiveness of foreign aid and about what kind of projects can engender economic development. There is skepticism about the ability of econometric analysis to resolve these issues, or of development agencies to learn from their own experience. In response, there is movement in development economics towards the use of randomized controlled trials (RCTs) to accumulate credible knowledge of what works, without over-reliance on questionable theory or statistical methods. When RCTs are not possible, this movement advocates quasi-randomization through instrumental variable (IV) techniques or natural experiments. I argue that many of these applications are unlikely to recover quantities that are useful for policy or understanding: two key issues are the misunderstanding of exogeneity, and the handling of heterogeneity. I illustrate from the literature on aid and growth. Actual randomization faces similar problems as quasi-randomization, notwithstanding rhetoric to the contrary. I argue that experiments have no special ability to produce more credible knowledge than other methods, and that actual experiments are frequently subject to practical problems that undermine any claims to statistical or epistemic superiority. I illustrate using prominent experiments in development. As with IV methods, RCT-based evaluation of projects is unlikely to lead to scientific progress in the understanding of economic development. I welcome recent trends in development experimentation away from the evaluation of projects and towards the evaluation of theoretical mechanisms.
January 6, 2009
Today's New York Times has an article about the increasing popularity of R and what it means for commercial packages. See here for ``Data Analysts Captivated by Power of R''.
November 18, 2008
In today's paper, the NYT reports on an interesting debate between two groups of researchers regarding studies on unconscious racial bias (``In Bias Test, Shades of Gray''). The discussion centers around the usefulness of an online test, the Implicit Association Test, which measures how quickly respondents associate ``good'' or ``bad'' words with blacks or whites. How useful are such tests? It does seem crude as metric for racial bias (try it yourself here). But I suspect that they have raised awareness and deserve credit for involving a wide audience. Yet despite its timid recommendations and disclaimers when the results are displayed the test could also be misleading: what if you're characterized as racially bias (but are not)? What if you're characterized as unbiased (but are and should be told)?
November 6, 2008
It seems that people who have a ``lifetime history of candy cigarette use'' may be more likely to have ever smoked (Klein et al). Some countries like Canada, the UK, Ireland, Norway, Finland and Australia apparently believe that there is a causal link and already ban this type of candy. I think there are good reasons to believe candy cigarettes may have an influence on children and there are qualitative studies that suggest mechanisms like attitudes towards smoking. They certainly look like the real deal and might even build brand recognition. Check out this sample of German candies (DKFZ: 41). The middle one says "filter tipped", "king size" and features camels. Makes you wonder how they present sugar content in place of tar.
Anyway this makes me wonder what standards we must meet to make a plausible case for regulation. At least in the US there are strong barriers to regulating anything that may be construed as limiting ``commercial speech''. Sure enough historically institutions like the Federal Trade Commission had a hard time getting such policies past the courts, and providing sufficient evidence on causal links is a critical factor. For example in the case of regulating TV ads of high-sugar foods to children, establishing causality was one of the main barriers to implementation (the others were political and practical, as Mello et al write).
What are the hopes (fears) for a ban on candy cigarettes? To me it seems difficult to credibly argue that candy cigarette use leads to smoking later in life. If there is a causal link it will be hard to establish empirically (random draw of candy sticks, anyone?) and even harder to meet the high legal standard. I wonder how courts weigh quantitative versus qualitative arguments on such issues. Or will we only have regulations for issues where we can identify causal relations?
PS: Wikipedia says that North Dakota banned candy cigarettes from 1953-1967. Maybe we will see an empirical evaluation soon?
Klein et al (2007) "History of childhood candy cigarette use is associated with tobacco smoking by adults" Preventive Medicine 45(1): 26-30
Mello, M et al (2006) "Obesity -- The New Frontier of Public Health Law" N Engl J Med 354(24): 2601-2610
DKFZ (2008) "Rauchende Kinder und Jugendliche
in Deutschland - leichter Einstieg, schwerer Ausstieg" [in German only, lists the countries that ban candy cigarettes. The candy cigarettes picture above appears on page 41.]
October 23, 2008
Students here are often interested in how to efficiently collect information from the web. Here's a basic tool: iMacros is a plugin for the Firefox browser and lets you create macros to automate tasks or collect information. It exploits that all elements in html pages can be identified and hence targeted. For example a form field will have an ID that iMacros finds and fills with a value of your choice or click a specified button for you. Two nice features are that you can record your own macros without scripting, and that you can use the plugin to collect text information off the web. The capabilities are not what you would get from your customized Python script but it's easy to use and edit, and gets the basics done.
(The basic plugin is free but they also sell other editions with more capabilities.)
October 7, 2008
With many of my friends are preparing for the annual job market song and dance, one question they will have soon is what salary expectations are appropriate for what position and institution.
It seems hard to know. Fortunately (and somewhat incredibly) the Department of Labor Foreign Labor Certification Data Center not only collects employer petitions for H-1B visas for foreign professionals, but the DOL also posts them online. The data goes back until 2001; information for other visa types is sometimes available for earlier years. Overall this seems like a great source for labor economic studies or the effects of visa restrictions etc. (Let us know if you use it!)
For example, looking for "assistant professor economics harvard" will reveal two visa petitions from the university, with a proposed salary of $115,000 in 2005. Stanford proposed to pay $120,000 in early 2006. The data is not just limited to academic jobs of course. You can also see that Morgan Stanley proposed to pay $85,000 for an analyst in New York in 2006. Or that a taxi company in Maryland proposed $11.41 per hour.
Naturally the data is limited since it only covers a specific group of job applicants. Maybe they'll take a lower salary in exchange for help with the visa, or they get paid more to leave their home countries. But the relative scales across institutions could be similar and it's better than no idea at all. Good luck on your job hunts and negotiations!
September 25, 2008
The NBER just posted a new working paper by Steven Levitt and John List ``Field Experiments in Economics: The Past, The Present, and The Future.'' I only had a first glance and this paper looks like an easy to read history of field experiments in economics and a (short) summary of the limitations. Levitt and List also suggest that partnerships with private institutions could be the future of this field. It seems like a natural conclusion. Collaborating with the private sector should create more opportunities for good research, and the money and infrastructure will be attractive to researchers. And anyway what other sector is left to be conquered? But maybe such partnerships are only useful for certain areas of research (Levitt and List suggest the setting could be a useful laboratory for the field of industrial organization). And firms, like any institution, must have an interest to participate. This might be fine for learning about fundamental economic behavior but will we see more declarations of interest on experiments related to policy?
Levitt, S and List, J (2008) ``Field Experiments in Economics: The Past, The Present, and The Future.'' NBER Working Paper 14356, http://papers.nber.org/papers/w14356
Harvard users click here for PIN access.
This study presents an overview of modern field experiments and their usage in economics. Our discussion focuses on three distinct periods of field experimentation that have influenced the economics literature. The first might well be thought of as the dawn of "field" experimentation: the work of Neyman and Fisher, who laid the experimental foundation in the 1920s and 1930s by conceptualizing randomization as an instrument to achieve identification via experimentation with agricultural plots. The second, the large-scale social experiments conducted by government agencies in the mid-twentieth century, moved the exploration from plots of land to groups of individuals. More recently, the nature and range of field experiments has expanded, with a diverse set of controlled experiments being completed outside of the typical laboratory environment. With this growth, the number and types of questions that can be explored using field experiments has grown tremendously. After discussing these three distinct phases, we speculate on the future of field experimental methods, a future that we envision including a strong collaborative effort with outside parties, most importantly private entities.
September 2, 2008
The British Medical Journal just published an great piece by Michael Law* and co-authors on the (in-)effectiveness of direct-to-consumer advertisement (DTCA) for pharmaceuticals. This issue continues to be political controversial and expensive for companies, and good studies are rare. Mike makes use of the linguistic divide in his home country Canada to evaluate the effectiveness of the ads. Canadian TV stations are not allowed to broadcast pharma ads. The French-speakers have no choice to oblige, but English-speaking Canada gets to watch ads for pharmaceuticals on US TV stations. The results suggest that for the three drugs under study, the effects of DTCA maybe very small and short-term.
An interesting fallout of this work is a wave of media attention for causal inference and identifying counterfactuals. For example the WSJ writes
[...] the new study will draw some attention because it is among the first to compare the behavior of people exposed to drug ads with people who weren't.
And the New Scientist says
However, consumer advertising is usually accompanied by other marketing efforts directly to doctors, making it difficult to tease out the effect of the ads alone.
See here for a longer list of articles at Google News.
I think it's great that the study creates so much interest (meaning it's relevant in real life) and that the media gets interested in research design. I'm curious to see the wider repercussions on both issues.
Law, Michael, Majumdar, Sumit and Soumerai, Stephen (2008) "Effect of illicit direct to consumer advertising on use of etanercept, mometasone, and tegaserod in Canada: controlled longitudinal study" BMJ 2008;337:a1055
* Disclosure: Mike is a recent graduate of the PhD in Health Policy, and a classmate and friend of mine.
May 15, 2008
I just finished reading an interesting paper on placebo effects in drug trials by Anup Malani. Malani noticed that participants in high probability trials know that they more likely to get active treatment (because of informed consent prior to the trial). They have higher expectations and hence should have higher placebo effects than patients in low probability trials. Malani compares outcomes across trials with different assignment probabilities and finds evidence for placebo effects. A related finding is that the control group in high probability trials reports more side effects.
The paper discusses some potential implications of placebo effects, e.g. that patients who are optimistic about the outcome might change their behavior and hence get better even without the active drug. It makes me wonder how this might translate into non-medical settings and whether there are studies of placebo effects in the social sciences. Also, if placebo drugs can improve health outcomes, maybe ineffective social programs would still work as long as participants don’t know whether the program works or doesn’t? Maybe this is the role of politics. But what about the side-effects?
Malani, A (2006) “Identifying Placebo Effects with Data from Clinical Trials” Journal of Political Economy, Vol. 114, pp. 236-256. http://papers.ssrn.com/sol3/papers.cfm?abstract_id=901838
A medical treatment is said to have placebo effects if patients who are optimistic about the treatment respond better to the treatment. This paper proposes a simple test for placebo effects. Instead of comparing the treatment and control arms of a single trial, one should compare the treatment arms of two trials with different probabilities of assignment to treatment. If there are placebo effects, patients in the higher-probability trial will experience better outcomes simply because they believe that there is a greater chance of receiving treatment. This paper finds evidence of placebo effects in trials of antiulcer and cholesterol-lowering drugs.
May 1, 2008
James Heckman has a new NBER working paper ``Econmetric Causality’’ which some of you might interesting. To give you a flavor, Heckman writes
``Unlike the Neyman–Rubin model, these [selection] models do not start with the experiment as an ideal but they start with well-posed, clearly articulated models for outcomes and treatment choice where the unobservables that underlie the selection and evaluation problem are made explicit. The hypothetical manipulations define the causal parameters of the model. Randomization is a metaphor and not an ideal or “gold standard".’’ (page 37)
Heckman, J (2008) ``Econometric Causality’’ NBER working paper #13934. http://papers.nber.org/papers/W13934
Abstract: This paper presents the econometric approach to causal modeling. It is motivated by policy problems. New causal parameters are defined and identified to address specific policy problems. Economists embrace a scientific approach to causality and model the preferences and choices of agents to infer subjective (agent) evaluations as well as objective outcomes. Anticipated and realized subjective and objective outcomes are distinguished. Models for simultaneous causality are developed. The paper contrasts the Neyman-Rubin model of causality with the econometric approach.
April 16, 2008
The Journal of the American Medical Association published a piece today on ghostwriting of medical research. Thanks to the Vioxx lawsuits, the authors say that they found documents ``describing Merck employees working either independently or in collaboration with medical publishing companies to prepare manuscripts and subsequently recruiting external, academically affiliated investigators to be authors. Recruited authors were frequently placed in the first and second positions of the authorship list.’’ One of the exhibits uses a placeholder ``External author?’’ for the expert to be named. Obviously the idea that a pharmaceutical company is pre-writing clinical studies is as controversial as doctors possibly signing off on them without really being involved. A NYT article has some comments, and Merck has released a press statement.
April 3, 2008
The Economist recently had an interesting article on anti-terrorist
spending ("Feel safer now?", March-6 print edition). The piece reports
on research done by Todd Sandler and Daniel Arce on the costs and
benefits of different responses to terrorism (paper here). Terrorism creates a lot
of anxiety but (so the authors say) actually costs few lives and many
counter-measures might be ineffective, e.g. if terrorists just shift
attacks to easier targets in response. Sandler and Arce suggest most of
their spending scenarios are not cost-effective, but that political
cooperation could be worthwhile.
Not being an expert in this area, I suspect that the counterfactuals
involved must be extremely hard to defend given the scope of
transnational terrorism. Similarly the reported bounds are huge and the
underlying numbers should be up for debate. For example while skimming
through, I noticed that didn't see any accounting for psychological
stress of those not directly involved in an attack (e.g. the general
population), nor that of military personnel and families who implement
some of the counter-measures. Any views?
March 26, 2008
A joint project by Andy Eggers and Jens Hainmueller, two long-time contributors to this blog, is the basis of a piece in The Guardian this Monday. Check out the article "How election paid off for postwar Tory MPs" and the paper "MPs For Sale? Estimating Returns to Office in Post-War British Politics". Congrats to Andy and Jens!
February 22, 2008
A major item of interest in applied health economics is to understand the impact of health shocks on household income, investments and consumption. This relation is particularly important in developing countries that don’t have programs like universal health insurance or social insurance like Medicaid. Alas it’s also a major challenge to establish causal effects and mechanisms through which the shocks might operate. A main culprit is endogeneity, since health affects wealth and vice versa. As result there is a huge and truly inter-disciplinary literature on the topic, much of it with suspicious identification strategies.
The main struggle is to find a plausibly exogenous exposure to health shocks that have real-life relevance. A new paper by Manoj Mohanan takes this challenge seriously and looks at the effect of health shocks from bus accidents on household’s consumption, and examines what mechanisms households rely on to smooth consumption. (Full disclosure: Manoj is a classmate of mine, and I really like his work!)
To address the endogeneity problem, the paper focuses on people who have been in bus accidents as recorded by the state-run bus company in Karnataka, India. Clearly, finding a good control group is critical: people who travel on public buses may be different from those who don’t. For starters, they actually took the risk of getting on a bus – if you have ever been on the road in a developing country you’ll know what this means. Manoj’s approach is to select unexposed individuals among travelers on the same bus route, after matching on age, sex and geographic area of residence. Hence, conditional on these factors, the bus accident can be treated as exogenous.
He then compares the two groups on various dimensions. He finds that households reduce educational and festival spending by a large amount, but appear to be able to smooth food and housing consumption. He is unable to find effects on assets or labor supply. The principal coping mechanism is debt accumulation. Overall this suggests that not all is well: debt traps aside, reducing investments in education could be very costly in the long run (on this point see also Chetty and Looney, 2006).
February 2, 2008
This year's Spring Conference of the Harvard Program on Survey Research is on ``New Technologies and Survey Research.'' It will be held on May 9, 2008, 9:00am to 5:00 pm at IQSS, and is open to the public.
See here for details.
December 3, 2007
You might recall explanations of a gender bias at birth due to simple and sophisticated discrimination, or even infectious disease like hepatitis B. Last week’s Economist reports that in industrialized countries, the probability of getting a boy is slightly higher than getting a girl. More surprisingly, that extra chance of having a boy has been decreasing.
One new cause put forward is mother’s stress, acute or chronic, and there seems to be evidence that stressed mothers are more likely to give birth to girls. The explanation could be pathological or adaptive: the article suggests that in hard and stressful times, it makes evolutionary sense to have more girls.
I suspect that there are many omitted variables related to stress and other health behaviors. Apparently some studies find similar effects of stress using variation from natural disasters or terrorist attacks. Still, this won’t explain a pro-boy bias in developing countries (since stress should generally be higher there we would expect more girls to be born). But it’s an interesting aspect of a growing literature that takes psychological and environmental stress seriously.
November 6, 2007
This semester I am taking a hands-on (gasp!) class on the ``Design and Analysis of Sample Surveys’’ with Alan Zaslavsky. The design part of the course includes the basics of writing surveys, and the background reading includes a text by Fowler* which might be interesting to applied-minded readers. Here some thoughts on the book, I’d be curious to hear about alternative views or materials.
Fowler provides a quick and informative reading on how to ask about objective and subjective states, and how to pre-test and validate survey questions and answer categories. The book also discusses the design implications of different survey modes. Most items are particularly informative to novices in this area, and often they provoke a ``d’oh, obviously’’ reaction. But Fowler does a good job at alerting the reader to problematic examples might have slipped by. He also offers some advice on how to fix problems, and provides practical tips for implementing pre-tests which he strongly advocates. The chapters end with a useful summary of the key points which can serve as a reference to items in the chapter.
The book has a few shortcomings though, notably its somewhat confusing organization within the chapters, and lengthy wordings. Since some issues are cutting across chapters, the index ought to list more than 50 keywords to be useful as reference. And, being published in 1995, the book provides no background on web-based or email surveys.
I found that the book offers basic insights and is a useful introduction. It certainly raises awareness about the issues in survey design that users should be aware of. Designers of surveys might find the treatment too basic and general. For a more detailed treatment Krosnick and Fabrigar’s forthcoming ``Handbook of Questionnaire Design’’ looks promising (see here for a post on its presentation at IQSS in 2006).
October 17, 2007
Continuing on the topic of self-reported health data, and how to correct for reporting (and other) biases, here an interesting paper on height and weight in the US. Those two measures have received a lot of interest in the past years, not least as components of the body-mass index BMI which is used to estimate the prevalence of obesity. BMI itself is not a great measure (more on that another day) but at least it’s relatively easy to collect via telephone and in-person interviews. Of course some people make mistakes while reporting their own vital measures, and some might do so systematically: a height of 6 foot sounds like a good height to have even to me, and I tend to think in the metric system!
Anyway, the paper by Ezzati et al examines the issue of systematic misreporting. They note that existing smaller-scale studies on this issue might in fact under-estimate the bias because of their design. People might limit their misreporting if they are measured before or after reporting their vitals, which is a challenge for validation studies. And participation might systematically differ with the interview modes of the analysis studies and a general health surveys (e.g. in-person versus telephone interviews) so that the studies are not directly comparable to population-level surveys.
The idea of the paper is to employ two nationally representative surveys to compare three different kinds of measurement for height and weight, by age group and gender. The first survey is the National Health and Nutrition Examination Survey NHANES which collects self-reported information through in-person interviews, and also through medical examination. The second survey is the Behavior and Risk Factor Surveillance Survey BRFFS, an annual cross-sectional telephone survey that is state-level representative and features widely in policy discussions.
The comparisons between the surveys might confirm your priors on misreporting. On average, women under-report their weight and men under 65 tend to over-report their height. The authors find that state-level obesity measures based on the BRFFS are too low – they re-calculate that a number of states in fact had obesity prevalences above 30% in 2000. Of course this is not a perfectly clean assessment, because the NHANES participants might have anticipated the clinical examination a few weeks after the in-person interview. But at the least this study is a good reminder that people do systematically misreport for some reason, and that analysts should treat self-reported BMI carefully.
September 28, 2007
I have earlier written about using anchoring vignetttes to correct for biases in self-reported measures such as health outcomes (here and here). One issue with self-reports is that respondents may interpret identical questions in different ways. The idea of vignettes is use controlled scenarios to measure this bias and adjust the self-reports accordingly, so that they are informative about the actual health status.
An interesting application of this method is a paper by D'Uva et al (2006 working paper here), who use vignettes from the World Health Surveys to identify and correct reporting heterogeneities in Indonesia, China and India. Their objective is to establish whether the reporting differences affect measures of within-country inequality in several health domains (mobility, self-care etc). They find evidence for reporting heterogeneity but also suggest that the bias is not large in their data.
The paper also discusses in more detail two assumptions underlying the vignette method, ``response consistency'' and ``vignette equivalence'' (also discussed in King et al 2004).
``Response consistency'' requires that respondents assess their own health in the same way that they assess other people's health (i.e. the vignette scenarios). This may fail if there is strategic reporting, for example when one's reported health status could provide access to entitlement programs for which other people's health status is irrelevant. ``Vignette equivalence'' essentially requires that the scenarios in the vignettes are perceived similarly across respondents; no systematic differences are allowed. The authors suggest that a failure of this latter assumption may underlie findings with respect to age that are in contrast with other studies. Elderly people might interpret the vignette scenarios differently since they are more likely to have own experiences with the described health problems.
I am curious whether these assumptions have been tested in detail. This might also stimulate some thinking about what elements of self-reports we want to correct for, and whether the determinants of reporting biases are of their own interest.
June 5, 2007
Yesterday, StataCorp announced that Stata 10 will be available from June 25. Apart from a bunch of new routines, a main attraction will be their new graph editor which might well resolve major nightmares for users. Also it appears that there is now a way to copy & paste results to other applications without loosing the formating. Overall the new version looks great, if you're so inclined.
May 30, 2007
The New York Times has an interesting article today on airlines overbooking flights. Apparently the number of people bumped off flights (voluntarily and involuntarily) has risen over the past years despite efforts to model no-shows.
The article mentions US Airways' team of ``math nerds'' who are trying to figure out how many seats the airline can sell without bumping off too many people. One interesting aspect is that they seem unable to do a great job at predicting the number of no-shows for a given flight, which leads to too-high overbooking. I wonder why that's so hard to get right? With all their historical data, airlines should be able to do a reasonable job. The article's charts show that the number of seats sold on the average flight has increased over the last years, which leads to more bumping. I imagine that this behavior is more driven by profit motives, and that airlines risk overselling more frequently. Unless model accuracy increased alongside, it's pretty much given that they end up bumping more people.
But the most interesting insight is how people respond to the increased bumping. Staff book fake passengers to prevent headquarters from overbooking flights (apparently Mickey Mouse is a favorite placeholder). Airlines bump more in the morning, because they can move passengers to flights later in the day. Passengers increasingly refuse to be bumped because they anticipate being stuck if they agree to wait. Whatever the reason that the predictive accuracy is not great right now; people's responses have to be reckoned with and ought to be part of the model.
May 8, 2007
We have blogged a fair bit about reproducibility standards and data-sharing for replication (see here and here). Some journals require authors to make datasets and codes available for a while already, and now these policies start to show effects. For example the American Economic Review requires authors to submit their data since 2004, and this information is now available on their website. The AER provides a basic readme document and files with the used variables for an increasing number of articles since late 2002; some authors also provide their program codes. There's a list of articles with available data here.
The 2006 Report of the Editor suggests that most authors now comply with the data posting requirements and that only few exceptions are made. At this point AER is pretty much alone among the top economics journals with offering this information. I wonder if authors substitute between the AER and other journals. Since the AER is still a very desirable place to publish, maybe this improves the quality of AER submissions if only confident authors submit? At least for now the submission statistics in the editor’s report don't suggest that they are loosing authors. Meanwhile hundreds of grad students can rejoice in a wealth of interesting papers to replicate.
April 17, 2007
The Economist and Time Magazine recently published interesting articles on a new type of twins. Apparently some twins are neither identical nor fraternal, but are `semi-identical'. That is, one twin is male and the other `intersex' (both male and female). You can read a short discussion on the biology in the articles, which also note that it’s unknown how common this type of twins is. More to worry about for believers in twin studies (for other problems, see this earlier post).
April 3, 2007
Here some inspiration on how to present data to a non-expert audience: www.gapminder.org. The goal of that site is ``to make sense of the world by having fun with statistics’’, by making publicly available but highly complex data understandable to the general public. Their reasoning is that the best data won’t make any difference unless you can communicate it well to a large audience. And they do a fantastic job at just that.
There are two neat things on this site. First is the Trendalyzer, an interactive tool for visualizing data. The software takes boring statistical tables and juices them up in an interactive fashion. For example you can watch the world income distribution evolve over time, and single out particular regions and countries to get a better sense of what’s driving the trends. It also shows how aggregates can be deceiving within regions and countries. Many of the pre-designed presentations are on human development, but you can pick your own indicators. I saw this in a lecture on income inequalities, and it was a major hit. The software has been acquired by Google which apparently wants to add features and make it freely available.
The second interesting item is a presentation by Hans Rosling, the founder of Gapminder at the TED 2007 conference (Technology Entertainment Design, which aims to gather inspiring minds). He debunks ``myths about the developing world’’ using the Trendalyzer and plenty of personal animation. He does such a great job at engaging this audience that many a workshop presenter could learn from watching him. He’s more like a sports presenter than academic, jumping up and down in front of the screen and still getting his message across.
March 21, 2007
With the ice melting and the birds chirping it’s the time again for planning the summer. Here a few worthwhile reasons not to be stuck behind your desk all summer. Maybe these are not the most exotic events and locations but at least they are ‘productive’ and you won’t feel guilty for being away.
The Michigan Summer Institute in Survey Research Techniques runs several sessions over a total of eight weeks from June 4 to July 27. The courses are mainly about designing, writing and testing surveys, and analyzing survey data. The level of the courses differs but they have some advanced courses on sampling and analysis. Because of a modular setup, it's possible to pick and choose broadly. I've heard good things about this institute, particularly from people who want to collect their own data.
Also in Michigan is the Summer Program in Quantitative Methods of Social Research which runs two sessions from June 25 to August 17. This program focuses on analytics and also caters for different levels of sophistication. I only know a few people who attended this program, with mixed reviews. Much seems to depend on what courses you actually take, some are great and others so-so.
The University of Chicago hosts this years’ Institute on Computational Economics from July 30 to August 9. The topics are quite advanced and focus on programming approaches to economic problems. This seems to be quite worthwhile, if it's your interest.
Further afield is the Mannheim Empirical Research Summer School from July 8 – 20. This event focuses on analysis of household data but also features sessions on experiment design and behavioral economics. I didn't hear about previous schools but would be curious to find out.
There are other summer schools that don’t have a strong methods focus. Harvard, LSE and a host of other universities offer a number of courses that might provide a quick dip into some of the substantive topics.
March 6, 2007
It’s been a while since Jens and I summarized some useful tools for research. Since then more productivity tools have appeared that make life easy for researchers. Some of the following might only work for Harvard affiliates but maybe your outfit offers something similar.
First, Harvard offers a table of contents service. After signing up you can request to receive the table of contents of most journals that Harvard Libraries carries. The handy part is a “Find it @ Harvard” button next to each article; clicking it takes you to the article through the library's account so that you have full access. This service also allows you to manage all journal subscriptions through only one account. (Best make the service email you the TOC as attachment, as in-text tables occasionally get cut off. Also, your spam filter might intercept those emails so check there if you don’t receive anything.)
Second, Harvard provides a new toolbar for the Firefox browser called LibX (see here). This provides quick links to Harvard’s e-tools (citation index, e-resources etc), lets you search in the Hollis catalog and provides a drag&drop field for Google Scholar. If you’re on a journal website without having gone through Harvard libraries, LibX allows you to reload the restricted webpage via Harvard to access to the full-text sources. Another nice feature is that LibX embeds cues in webpages. For example if you have installed the tool and are looking at a book on Amazon, you will notice a little Harvard shield on the page. Clicking it takes you straight to the book’s entry in Hollis. LibX also provides automatic links to print and e-resources for ISBN, DOI’s and other identifiers.
There are other useful tools for Firefox. I recently discovered the ScrapBook add-on which essentially works like bookmarks, but allows you to store only the part of a web page you’re interested in. Simply select the part and store it in your scrapbook. You can then access it offline and also comment or highlight. You can sort and import/export items too. A further useful built-in function uses search keywords in Firefox. This allows you to access a search box on any website through a user-defined keyword. For example you can define ``gs'' as keyword for the search box on the Google Scholar website. Then entering ``gs'' and a search term in the location bar in Firefox takes you straight to the search results for that term. If you use Google Scholar through your library you'll even get full access to the articles straight away.
February 20, 2007
If you’ve seen it or paid some attention to what’s going on in the popular media in the past six months, you will not have missed the movie ``Borat: Cultural Learnings of America for Make Benefit Glorious Nation of Kazakhstan’’ by Sacha Baron Cohen. The movie went from huge hype to packed movie theatres, and is due out on March 6 on DVD. Some described the movie as ``brilliant’’, for others it was 15 minutes of mediocre jokes drawn out into 82 minutes of film.
Whatever you may think, the government of Kazakhstan certainly took issue. They felt that their country was portrayed in a particularly unfair light, and started an image campaign with advertisements in the New York Times and other news media (see here for an article on that matter by the NYT). But what actually was the impact on Kazakhstan’s image of that movie? Fifteen minutes on Google Trends are suggestive (or frivolous, as Amy suggested).
Here is the timeline of events from Wikipedia: Borat was first screened at some film festivals from July 2006 onwards. It was officially released at the Toronto Film Festival on September 7, 2006 which started the hype. The movie opened in early November in the US, Canada and most European countries. It was number 1 at the US box office for two weeks and only left the top 10 in mid-December.
Here’s a graph of search terms and their associated search volume from Google Trends until November 2006 (you can get this live here and modify as you please). The blue line is the term ``borat movie’’; the red line is ``kazakhstan’’ and the orange line is ``uzbekistan’’ which will serve as (admittedly imperfect) control country. The news reference volume refers to the number of times each topic appeared in Google News.
As you can see, searches for ``borat movie'' take off in September 2006 which coincides with the official release. It spikes in late October before the movie opens at the box office and goes down afterwards. The event B is the announcement of the movie as picked up by Google News. All as expected even if the blips before July are a little strange.
Interestingly the search volume for ``uzbekistan’’ follows that of ``kazakhstan’’ quite well before the movie appears in the spotlight in September. From September onwards the volume for ``kazakhstan’’ somewhat tracks the volume for the movie instead. If you were to look at monthly data you would see that the relationship is not as clear but there does seem to be a trend. So maybe the movie generated some interest in the country.
Here’s another chart for September 2006 (from here). The blue and red lines are as before, but now the orange line is for ``kazakstan’’. It turns out that you can write the name correctly with or without the ``h’’. Maybe people who spell it for the first time would use this version. This search term appears in the search volume just before the movie hits the theaters.
Google Trends gives another hint. If you look at the cities of origin for the searches, you will notice a mix of US/European countries and cities in the second half of 2006. And ``kazakstan’’ is mostly searched by British users. In the first half of the year however almost all searches come from Almaty, the largest city in Kazakhstan.
Now, obviously nothing is causal and proven but it does look interesting. Not only did the search volume on Google shoot up around the time of the introduction of the movie, but also the geographic composition of the searches shifted to where the movie was very popular and the country not well known before Fall 2006
What does all this mean for Kazakhstan? Is this good or bad publicity? It seems that people became interested in the country beyond the movie (see a USA Today story here). A poll of users of a UK travel website put Kazakhstan in the Top 3 places to visit (right after Italy and the UK if you believe the results), and the Lonely Planet already has an article on the real Kazakhstan ``beyond Borat''. We'll see if those people are really going in the end, and if the trend persists over time as Google supplies more information. But all in all the movie might have generated some useful publicity for the country. Estimating the impact on tourism and world opinion, anyone?
February 6, 2007
An article by Jane Miller in the current issue of Health Services Research explains strategies for preparing conference posters. As she writes, posters are a "hybrid of a published paper and an oral presentation" and people often fail to recognize this in preparing a poster. The article reviews existing literature on research communication and provides some guidelines on how to present statistical methods and results appropriately. It's all common sense stuff, might come in handy for first-time presenters looking for guidance.
It also goes nicely with Gary's "Publication, Publication" guide for writing research papers which you can find here.
January 23, 2007
So it's finally getting cold in Boston after some days that resembled Spring more than anything. Outside the buildings, smokers in T-shirts and flip-flops? The first flowers blooming?? But it's not all lost: I was just reading that an early Spring or a short interval of warm temperatures doesn't really matter for plants and animals. Plants just grow new buds or skip a year. Animals adjust their sleep patterns. But maybe Mother Nature is also smart about predicting when it's the right time to wake up. Are plants and animals Bayesians and have learned to give more weight to a signal that is a better predictor of changes in seasons than temperature?
Apparently plans and animals have an internal clock that measures the length of day and night by using length of sunlight exposure as proxy. Having been around a couple of hundred years they might know that relying on the length of day is a safer bet than relying much temperature. I wonder whether there is evidence of Mother Nature changing those weights over time, as one of the signals becomes more reliable? Maybe temperature was a better predictor when the Little Ice Age began? It wouldn't be so great to wake up when it's well below zero in late May. This would be a good example for Amy's post on Bayesian inference and natural selection (see here).
Here in the computer lab of an unnamed basement in Cambridge, MA, yours faithful won't be fooled by the temperatures either. I'll take a nap now.
December 4, 2006
Wednesday's New York Times reports on recommendations by an independent panel on how the journal Science could improve its review process (see here). The panel was instituted after Science had to rectract papers by Dr. Hwang Woo-suk that were based on fabricated results. The panel recommended four changes:
(1) Flag high visibility paper for extra scrutiny in the review process
(2) Require authors to specify their individual contributions to a paper
(3) Make more raw data available online for replication
(4) Work with other journals to establish a common standard for the review process.
Recommendations 3 and 4 has previously featured on this blog here and here. (2) should produce interesting results in joint publications. Maybe a logical extension would be to asses academic output by using the contributions as weights?
December 1, 2006
As requirement for my doctoral program I am required to take a basic epidemiology class this semester. It's been interesting to see how the basic analytics in epi are the same as in say, econometrics, but how much the language and preferences differ across the fields.
One striking difference is the preference for confidence intervals rather than coefficients and standard errors. Epidemiologists don't like p-values for all the same reason that economists dislike them without additional information. But epidemiologists seem to be in love with confidence intervals. Obviously it's a handy statistic but to me it seems to generate a misleading emphasis on the popular 5 percent level. It just pre-empts any thinking about the process of getting that interval. But most epi or medical publication reports not much else.
On the other hand maybe other social sciences could benefit from what epidemiologists call "positive criteria for causality." Those include the existence of plausible (gasp!) mechanisms of cause-and-effect and dose-response relations (dose of exposure is related to level of disease). Other fields often overly rely only on the strength of association and it would be a good idea to think about other positive criteria more seriously.
Other items are pure lingo. For example, epidemiologists seem to call misclassification what economists call measurement error. But at any rate the differences in terminologies and preferences are surprising. When did the academic tribes separate? Also accepted techniques from one field often seem like innovation in another. Why is there not more communication between the fields? It seems like all could benefit from a wider discussion and application, and it's an easy way to publish so the incentives are right too.
October 31, 2006
Jacob Eisenstein at MIT has developed an smart election predictor for the US Senate Elections using a Kalman Filter. The filter helps to decide how much extra weight to attach to more recent polls. Check it out here; he also has some details on the method here.
October 24, 2006
Here’s an interesting piece that should help you keep your New Semester resolutions by understanding procrastination better. Sendhil Mullainathan recently used research by Dan Ariely and Klaus Wertenbroch as motivation for his undergraduate psychology and economics class. Though it’s not exactly statistics, it seems the insights could be useful for grad students and their advisors.
Ariely and Wertenbroch did several experiments to see how deadlines might help overcome procrastination. They examine whether deadlines might be effective pre-commitment devices, and whether they can enhance performance. In one of their experiments, they asked participants to proofread three meaningless synthetic texts. Participants received financial rewards for finding errors and submitting on time (just like in a problem set…). They randomized participants into three categories: three evenly-spaced deadlines every 7 days; an end-deadline after 21 days; or a self-imposed schedule of deadlines within a three week period.
Which one would you select if you could? Maybe the end-deadline because it gives you the most flexibility in arranging the work (similar to a final exam or submitting your dissertation all at once)? Ariely and Wertenbroch found that the end-deadline does the worst both in terms of finding errors and submitting on time. Participants with evenly-spaced deadline did best. But that group also liked the task the least, maybe because they had several unpleasant episodes of reading silly texts, or because they spent more time than the other groups.
So when you start your semester with good intentions, consider setting some reasonable and regular deadlines that bind, and get a calendar. Or just wait for the New Year for another chance to become resolute and have another drink in the meantime.
October 11, 2006
Today's papers were full with reports of a new study in the Lancet (here) on counting the excess deaths in Iraq since the US invasion in 2003. The article by Johns Hopkins researchers is an update on a study published in 2004 which generated a huge debate about the political as well as statistical significance of the estimates. This time the media's attention is again on the magnitude of the estimate (655,000 excess deaths, most of them due to violence) which is again vastly higher than other available numbers. The large uncertainty (95% CI 390,000 - 940,000) gets fewer comments this time, maybe because the interval is further away from 0 than in the 2004 study.
Just to point you to some interesting articles, here is a good summary in today’s Wall Street Journal. Wikipedia has a broad overview of the two studies and criticisms here. Brad deLong responded to criticisms of the 2004 study here; he also covers problems with the cluster sampling approach. And check this and this for some related posts on this blog.
By the way, the WSJ article has a correction for misinterpreting the meaning of 95% confidence. Maybe you can use it convince your stats students that they should pay attention.
October 2, 2006
The New York Times recently published an obituary for David Lykken, who was a pioneer of twin studies. His “Minnesota Twin Studies” suggested the importance of genetic factors in life outcomes. But his work with twins also spurred empirical research in many fields, not just genetics – and for good reason.
The idea of using twins for social science studies is very appealing: some twins are genetically identical, and also grow up in the same family and environment. So from a statistical perspective, comparing outcomes such as earnings between pairs of twins is like having a “perfect match." This idea made the rounds in many fields, such as labor economics. By using the argument that all unobserved characteristics (e.g. “genetic ability”) should be equal and can thus be differenced away, twin studies were used to estimate the returns to education – the effect of education on wages.
Alas there are potential problems with using twin data. For example, measurement error in a difference estimation can lead to severe attenuation bias precisely because twins are so similar. If there is little variation in educational attainment, even small measurement errors can strongly affect the estimate. Researchers have been ingenious about this (e.g. by instrumenting one persons’ education with the level that her twin reported, as in Ashenfelter and Krueger). While this may reduce the attenuation bias it can magnify the omitted variables bias which motivated the use of twins in the first place. Because there are only small differences in schooling, small unobserved differences in ability can lead to a large bias. The culprits can be details such as differences in birth weight (Rosenzweig and Wolpin have a great discussion of such factors). In addition, twins who participate in such studies are a selected group: they are getting along well enough to participate, and many of them get recruited at “twin events.” But not all twins party in Twinsburg, Ohio.
Of course none of this is to belittle the contribution of Dr Lykken, who besides helping to start this flurry of work also was also a major contributor to happiness research.
September 27, 2006
Here's something new to pick at, in addition to methods problems: coding isues. A recent Science (August 18, 2006, pages 979-982) article by Bruce Dohrenwend and colleagues reported on revised estimates of post traumatic stress disorders of Vietnam veterans. See here for an NYT article. The new study indicates that some 18.7% of Vietnam veterans developed diagnosable post-traumatic stress, compared with earlier estimates of 30.9%. The differences comes mainly from using revised measures of diagnosis and exposure to combat for a subset of the individuals covered in the original data source, the 1988 National Vietnam Veterans' Readjustment Study (NVVRS). The authors added military records to come up with the new measures.
Given the political and financial importance (the military has a budget for mental health), this is quite a difference. One critical issue pointed out by the Science article is that the original study did not adequately control for veterans who had been diagnosed for mental health problems before being sent to combat. Just looking at the overall rates after combat is not a great study design. But this also makes me wonder about how the data was collected in the first place. Maybe the most disabled veterans didn’t reply to the survey, or were in such state of illness that they couldn’t (or had died of related illnesses). The NVVRS is supposedly representative but this would be an interesting point to examine.
This article also illustrates how important the data, measures and codings are in social science research these days. It seems that taking these issues more seriously should be part of the academic and policy process just like replication should be (see here and here for some discussion this issue). While study and sample design are under much scrutiny these days, there are still few discussions about the sensitivity to coding and data. Given the difference they can make, this should change.
May 12, 2006
This blog has frequently written about testing for discrimination (see for example here, here, and here). This is also a hot issue in health care. In health care there is a case for 'rational' discrimination' where physicians respond to clinical uncertainty by relying on priors about the prevalence of diseases across racial groups (for example).
A paper by Balsa, McGuire and Meredith in 2005 lays out a very nice application of Bayes Rule to look into this question. The Institute of Medicine suggests that there are three types of discrimination: simple prejudice, stereotyping, and statistical discrimination where docs use probability theory to overcome uncertainty. The latter occurs when the uncertainty of a patients condition leads the physician to treat her differently from similar people of different race.
The paper uses Bayes Rule to conceptualize the decision a doctor has to make when hearing symptom reports from a patient and has to decide whether the patient really has the disease:
Pr(Disease | Symptom) = Pr(Symptom | Disease) * Pr(Disease) / Pr(Symptom)
A doc would decide differently if she believed that disease prevalence differs across racial groups (which affects Pr(Disease)), or if diagnostic signals are more noisy from some groups (which changes Pr(symptom)), maybe because the quality of doctor-patient communication differs across races.
The authors test their model on diagnosis data from family physicians and internists, and find that sensible priors about disease prevalance could explain racial differences in the diagnosis of hypertension and diabetes. For the diagnosis of depression there is evidence that differences in doctors' decisions may be driven by different communication patterns between white docs and their white vs. minority patients.
Obviously prejudice and stereotyping are different from statistical discriminiation, and have quite different policy implicatons. This is a really nice paper that makes these distinctions clear as well as nicely using Bayes Rule to conceptualize the issues. The general idea might also apply to other issues of policy including police stop and search.
April 26, 2006
A group at the Indiana School of Informatics has developed a software to detect whether a document is "human written and authentic or not." The idea was inspired by the successful attempt of MIT students in 2004 to place a computer-generated document at a conference (see here). Their program collated random fragments of computer science speak into a short paper that was accepted at a major conference without revision. (That program is online and you can generate your own paper, though unfortunately it only writes computer science articles).
The new tool lets users paste pieces of text and then assesses whether the content is likely to be authentic or just gibberish. The program tries to identify human-style writing that is characterized by certain repition patterns and apparently does rather well. It is not clear whether this works well for social science type articles. The first paragraphs of a recent health economics article (to remain unnamed) only have a 35.5% chance of being authentic. Hmm...
So is this just a joke or useful programming? The authors say it could be used to differentiate whether a website is authentic or bogus, or to identify different types of texts (articles vs blogs, for example). I wonder what the algorithms behind such technology are, and whether this will lead to an arms race between fakers and detectors? If one of them can recognize a human-written text could this be used by the faking software?
If further tweaked, could this have an application in the social sciences? Maybe we could use the faking software to search existing papers, collate them smartly and use that to identify patterns and get new ideas? Maybe everyone should run their papers through a detector software before submitting it to a journal or presenting at a workshop? And students watch out! No more random collating at 3am to meet the next day deadline!
PS: this blog entry has been classified as "inauthentic with a 26.3% chance of being an authentic text"...
In the last entry I wrote that China is the new exciting trend for researchers interested in development issues. There are now a number of surveys available, and it is getting easier to obtain data. (For a short list, see here.) However there are two key issues that are still pervasive: language difficulties and little sharing of experiences.
While some Chinese surveys are available in English translation, it is still difficult to fully understand their context. China is a very interesting yet peculiar place. It clearly helps to work with someone who speaks (and reads!) the language, though you might still miss some unexpected information -- and there are many things that can be surprising.
More annoying however is the lack of sharing of information and data. This problem has two associated parts. For the existing data, people seem to struggle with similar problems but don't provide their solutions to others. In the case of the China Health and Nutrition Survey for example, numerous papers have been written on different aspects and the key variables are being cleaned over and over. Apart from the time that goes into that, this can lead to different results.
Another lack of sharing is with regards to existing data or ongoing surveys. There are now a lot of people either who either have or are currently collecting data in China. But it is rather difficult even to find out about existing sources. If you're lucky, you've found an article that uses one. If you're not you might find one only once you put in your funding application.
To really start exploring the exciting opportunities that China may have to offer for research, these problems need to get fixed. I can understand that people don't necessarily want to hand over their data, but it seems that there is too little known about existing surveys, even to researchers who have been working on China for longer. And as for the cleaning of existing data and reporting problems, it just seems like a waste not to share. I wonder if there are similar experiences from other countries?
April 13, 2006
While the media keeps preaching that this century is Chinese, many researchers are getting excited about new opportunities for data collection and access to data. For the past decades, many development researchers have focused on India because of the regional variation and good infrastructure for surveys. It seems that now China holds a similar promise, and could provide an interesting comparison to India.
I recently started collecting information on China (here); below are some highlights. If you know of more surveys, do let me know.
Probably the best known micro-survey at this point is the China Health and Nutrition Survey CHNS, which is a panel with rounds in 1989, 1991, 1993, 1997, 2000, and 2004 (the 2006 wave is funded) and covers more than 4,000 households in 9 provinces. Though this is an amazing dataset, using it is not always easy. For example there are problems of linking individuals over time. New longitudinal master files are continuously released but the fixes are sometimes are hard to integrate in ongoing projects (the ID's are mixed up). Also there seem to be some inconsistencies in the recording, especially in earlier rounds and some key variables such as education. The best waves seem to be those of 1997 and 2000.
There is also a World Bank Living Standards Measurement Study (LSMS) for China. That survey used standardized (internationally comparable?) questionnaires and was conducted in 780 households and 31 villages in 1996/7. For those interested in the earlier periods, there is commercial data at the China Population Information and Research Center which has mainly census-based data starting from 1982. The census itself is also available electronically now (and with GIS maps) but there is a lively debate as to how reliable the figures are, and whether key measures changed over time. But it should still be good for basic cross-sectional analysis.
March 21, 2006
Good data on health-related issues in developing countries is hard to find, especially if you need large samples and cross-country comparability. The latest round of the World Health Surveys (WHS) is starting to become available to researchers in the next months and might be one of the best surveys out there, in addition to the
Demographic and Health Surveys (DHS).
The current WHS has been conducted in 70 countries in 2000-2001. The survey is standardized and comes with several modules, including measures of health states of populations; risk factors; responsiveness of health systems; coverage, access and utilization of key health services; and health care expenditures. The instruments use several innovative features, including anchoring vignettes and geocoding, and seems to collect more information on income/expenditure than DHS does.
From the looks, WHS could easily become the new standard dataset for cross-country comparisons of health indicators, though for some applications it might be more of a complement than substitute for the DHS. As of now, the questionnaires and some country reports are online, and the micro-data is supposed to be available by the middle of the year at the latest.
March 7, 2006
Currently most students in Gov 2001 are preparing for the final assignment of the course: replicating and then improving on a published article. While scouting for a suitable piece myself, I came across the debate about whether (and how) data should be made available.
It is somewhat surprising that nowadays one can get all sorts of scholarly research off the web, except for the data that produced the results. Given that methods already exist to ensure that data remains proprietary and confidential, omitting the data from publication seems rather antiquated, unnecessary and counter-productive to scientific advance. Some health datasets -- such as AddHealth, which arguably contains some of the most sensitive information -- have successfully been public for a few years already. There's of course an intriguing debate about this which Gary's website partly documents.
It seems that we are slowly coming in reach of universal data publication. Apart from projects like ICPSR, several major journals recently started to request authors to submit data and codes. The JPE explained to me that they expect to have data for some articles from April 2006, and that 'only the rare article will not include the relevant datasets' from early 2007.
Since debating the robustness of existing results seems like good research, making data and codes available could spur quite a lot of articles. I wonder what the effects on journal content will be. Rather than publishing various replications, maybe journals will post those only online? Or will there be specialized journals to do that to keep the major publications from being jammed?
February 15, 2006
In this week's Gov 2001 class, Gary was showing how to get around difficult statistical problems by simulation rather than using complex analytics. That got me thinking about the trade-offs between the two approaches.
One class example was the Monte Hall game that you can probably recite backwards in your sleep: a contestant is asked to choose between 3 doors, 1 of which has a car behind it. Once the choice is made, the game show host opens one of the remaining doors that only has a goat. The contestant is offered to switch from her initial choice to the remaining door, and the question is whether that's a good strategy.
One can solve this analytically by thinking hard about the problem. Alternatively one can simulating the conditional probabilities of getting the prize given switching or not switching, and use this to get the intuition for the result.
During the debate in class I was wondering whether simulations are really such a good thing. Sure, they solve the particular problem at hand and it may be the only way to handle very complex problems fast. But it doesn't contribute to solving even closely related problems whereas one could glean insights from the analytic approach.
Maybe the simulation is still useful since writing code structures one's thoughts. But it also seems like it might depreciate critical skills. (Apart from the very real possibility that one makes a mistake in the code and tries to convince oneself of the wrong result.) Imagine you show up at Monty's show and they changed the game without telling you. It won't help if you would know how to implement a new simulation if you can't actually run it. Having solid practice in the analytical approach might be more useful.
I don't want to suggest that simulations are generally evil, but maybe they come at a cost. Oh, and the answer is yes, switch.
February 3, 2006
A common excuse for wine lovers is that "a few glasses of wine are good for the heart". Well maybe for warming your heart but possibly not for preventing heart attacks.
A recent note in The Lancet (Vol 366, December 3, 2005, pages 1911-1912) suggests that earlier reports that light to moderate alcohol consumption can lower the risk of ischaemic heart disease were severely affected by confounders in non-randomized trials.
Some people believed that the early results were due to misclassification of former drinkers with cardio-vascular diseases ("CVD") as never-drinkers. This raised the CVD rate among the non-drinkers group. Another possible story is that the studies didn't properly control for confounders -- apparently some risk factors for CVD are more prevalent among non-drinkers, and the non-randmized studies didn't control well enough for those. But as the note points out, confounding could bias results both in favor or against a protective effect. Heavy drinking offers really good protection but those people don't live healtily lifes, and the health benefits would be obscured.
But don't fear, the British Heart Foundation says that low to moderate alcohol consumption probably doesn't do your heart any harm. For protection against CVD you should really quit smoking, do sports, and eat a balanced diet. Not quite as appealing as a good glass of wine, of course.
In any case, food for thought and a great 2-page piece for your next causal inference class. Cheers to that.
January 24, 2006
With the end of the Fall semester comes the happy time of shopping for (applied) quantitative methods courses for the Spring. Here's a partial list for currently planned offerings around Cambridge, and their descriptions.
An introduction into R in 5 3-hour sessions combining demonstration, lecture, and laboratory components. It will be graded pass/fail on the basis of homework assignments. Taught in the Winter session at HSPH.
Introduces theories of inference underlying most statistical methods and how new approaches are developed. Examples include discrete choice, event counts, durations, missing data, ecological inference, time-series cross sectional analysis, compositional data, causal inference, and others. Main assignment is a research paper to be written alongside the class.
Introduction to methods employed in applied econometrics, including linear regression, instrumental variables, panel data techniques, generalized method of moments, and maximum likelihood. Includes detailed discussion of papers in applied econometrics and computer exercises using standard econometric packages. Note: Enrollment limited to certain PhD candidates, check the website.
MIT 14.387 Topics in Applied Econometrics (Angrist and Chernozhukov)
Click here for 2004 website
Covers topics in econometrics and empirical modeling that are likely to be useful to applied researchers working on cross-section and panel data applications.
[It's not clear whether this class will be offered in Spring 06. Check the MIT class pages for updates.
KSG API-208 Program Evaluation: Estimating Program Effectiveness with Empirical Analysis (Abadie)
Accessible from here (click on Spring Schedule)
Deals with a variety of evaluation designs (from random assignment to quasi-experimental evaluation methods) and teaches analysis of data from actual evaluations, such as the national Job Training Partnership Act Study. The course evaluates the strengths and weaknesses of alternative evaluation methods.
KSG PED-327 The Economic Analysis of Poverty in Poor Countries (Jensen)
Accessible from here (click on Spring Schedule)
Emphasizes modeling behavior, testing economic theories, and evaluating the success of policy. Topic areas include: conceptualizing and measuring poverty, inequality, and well-being; models of the household and intra-household allocation; risk, savings, credit, and insurance; gender and gender inequality; fertility; health and nutrition; and education and child labor.
Advanced methods of fitting frequentists and Bayesian models. Generation of random numbers, Monte Carlo methods, optimization methods, numerical integration, and advanced Bayesian computational tools such as the Gibbs sampler, Metropolis Hastings, the method of auxiliary variables, marginal and conditional data augmentation, slice sampling, exact sampling, and reversible jump MCMC.
Methods for handling incomplete data sets with general patterns of missing data, emphasizing the likelihood-based and Bayesian approaches. Focus on the application and theory of iterative maximization methods, iterative simulation methods, and multiple imputation.
Explores the relationship between quantitative methods and the law via simulation of litigation and a short joint (law student and quantitative student) research project. Cross-listed with Harvard Law School.
Methods for analyzing categorical data. Visualizing categorial data, analysis of contingency tables, odds ratios, log-linear models, generalized linear models, logistic regression, and model diagnostics.
January 20, 2006
In a 3-day conference at IQSS, Jon Krosnik is currently presenting chapters of a forthcoming 'Handbook of Questionnaire Design: Insights from Social and Cognitive Psychology'. Applied social scientists have put a lot of effort into improving research methods once the data is collected. However some of the evidence that Krosnik discusses shows that those efforts may be frustrated: getting the data may be a rather weak link in the chain of research.
Everyone who collected data themselves will know about those issues. The Handbook might be good way to get a structured review and facilitate more throrough thinking.
PS: The conference is this years' Eric M. Mindich 'Encounters with Authors' symposium. An outline is here.
January 19, 2006
The Economist recently featured an intestesting article on forthcoming research by Griffiths and Tenenbaum on how the brain works ("Bayes Rules", January 7, 2006).
Their research reportedly analyses how the brain makes judgements by using prior distributions. Griffiths and Tenenbaum gave individuals a piece of information and asked them to draw general conclusions. Apparently the answers to most issues correspond well to a Bayesian approach to reasoning. People generally make accurate predictions, and pick the right probability distribution. And it seems that if you don't know the distribution, you can just make experiments and find out.
The interesting question of course is, where does the brain get this information from? Trial and error experience? Learning from your parents or others?
At any rate the results suggest what many readers of this blog already know: real humans are Bayesians. Tell a frequentist next time you meet one.
PS: Andrew Gelman also posted about this article on his blog. See here.
January 9, 2006
Sebastian Bauhoff and Jens Hainmueller
A perfect method for adding drama to life is to wait until a paper deadline looms large. So you're finding yourself at the eve of a deadline, "about to finish" for the last 4 hours, and not having formatted any of the tables yet? Still copying STATA tables into MS Word? Or just received the departmental brainwash regarding statistical software and research best practice? Here are some interesting tools you could use to make life easier and your research more effective. On the Big Picture level, which tools to use is as much a question of philosophy as of your needs: open-source or commercial package? At Harvard, students often use one of the two combos: MS Word and Stata (low tech) or LaTeX and R (high tech). What type are you?
If you're doing a lot of data-based research, need to type formulas and often change your tables, you might want to consider learning LaTeX. Basically, LaTeX is a highly versatile type-setting environment to produce technical and scientific documents with the highest standards of typesetting quality. It's for free and LaTeX implementations are available for all platforms (Linux, Mac, Windows, etc). Bibliographies are easily managed with Bibtex. And you can also produce cool slides using ppower4. At the Government Department, LaTeX is taught to all incoming graduate students and many of them hate it at the beginning (it's a bit tricky to learn), but after a while many of them grew true LaTeX fetishists (in the metaphorical sense, of course).
Ever wondered why some papers look nicer than Word files? They're done in LaTex. A drawback is that they all look the same, of course. But then, some say having your papers in LaTeX-look is a signal that you're part of the academic community...
LaTeX goes well with R, an open-source statistical package modeled on S. R is both a language and an environment for statistical computing. It's very powerful and flexible; some say the graphical capabilities are unparalleled. The nice thing is that R can output LaTeX tables which you can paste directly into your document. There are many ways to do this, one easy way is to use the "LaTeX" function in the design library. A mouse-click later, your paper shines in pdf format, all tables looking professional. As with LaTeX, many incoming graduate students at the Government Department suffer learning it, but eventually most of them never go back to their previous statistical software.
But you are actually looking for a more user friendly modus vivid? Don't feel like wasting your nights writing code and chasing bugs like a stats addict? Rather, you like canned functions, and an easy-to-use working environment. Then consider the MS Word and STATA combo. Getting STATA output to look nice in Word is rather painful unless you use a little tool called outreg or alternatively estout (the latter also produces copy and paste-able LaTeX tables). Outreg is an ado-file that produces a table in Word format, and you can simply apply the normal formatting functions in Word. The problem is that outreg outputs only some of the tables that STATA produces, and so you're stuck having to format at least some. But of course there are many formatting tools available in Word.
So you make your choice depending on how user-friendly and or flexible you like it. But whether you're using Word/STATA or LaTeX/R, one tool comes in handy anyway: WinEdt is a shareware that can be used to write plain text, html, LaTeX etc. (WinEdt automatically comes with a LaTeX engine, so you won't need to install that.) The software can also serve as do-file editor for STATA and R. You can download configuration files that will highlight your commands in WinEdt, do auto-saves whenever you like (ever lost your STATA do-file??) and send your code to STATA or R just like the built-in editors would do. Alternative are other powerful word editors like Emacs, etc.
Confused? Can't decide? Well, your're certainly not the only one. In the web, people fight fervent LaTeX vs Word wars (google it!). We (the authors) recommend using LaTeX and R. This is the way we work, because, as Gary uses to say "if we knew a better way of working we would use it" -- is that what's called a tautology?! :-).
December 1, 2005
"How's it going?" If you ever tried to compare the answer to this question between the average American ("great") and European ("so-so" followed a list of minor complaints), you hit directly on a big problem in measuring self-reported variables.
Essentially the responses to questions on self-reported health, political voice and so on are determined not only by differences in actual experience, but also by differences in expectations and norms. For a European "so-so" is a rather acceptable status of wellbeing whereas for Americans it might generate serious worries. Similarly people's expectations about health may change with age and responses can thus be incomparable within a population (see this hilarious video on Gary King's website for an example).
A way to address this problem in surveys is to use "anchoring vignettes" that let people compare themselves on some scale, and then also ask them to assess hypothetical people on the same scale. The idea is that ratings of the hypothetical persons reflect the respondents' norms and expectations similarly to the rating of their own situation. Since the hypothetical scenarios are fixed across the respondents any difference in response for the vignettes is due to the interpersonal incomparability.
Using vignettes is better than asking people to rank themselves on a scale from "best" to "worst" health because it makes the context explicit and puts it in control of the experimenter. Gary and colleagues have done work on this issue which shows that using vignettes can lead to very different results than self-reports (check out their site). I will write more on this in the next entry.
November 8, 2005
In a recent presentation at Harvard, Caroline Hoxby outlined a paper-in-process on estimating the causal impact of higher education on economic growth in the US states (Aghion, Boustan, Hoxby and Vandenbussche (ABHV) "Exploiting States' Mistakes to Identify the Causal Impact of Higher Education on Growth", draft paper August 6, 2005).
ABHV's paper is interesting for the model and results, and you should read it to get the full story. But the paper is also intersting because of the instrument used to get around the endogeneity of education spending (where rich states spend more on higher education).
The basic idea is as follows: politicians are motivated to channel pork to their constituents in return for support. They do so through appropriations committees that can disburse "earnmarked" funds to research-type education. Observing that the membership of these committees is to a large extent random, ABHV have an instrument for research spending (and more instruments for spending on other types of education) and proceed to estimate the causal effect of education on growth. So this paper employs what could be called a political instrument. Of course there are plenty of other classes of IV's such as natural events (rainfall or natural disasters) etc. But an instrument is only partly distinguished by its ability to fulfill the formal requirements. There's also plenty of scope for creativity.
The IQSS Social Science Statistics blog is soliciting suggestions and requests for instruments: send your favorite IV and its application. Or tell our readers what you always wanted to get instrumented and see if someone comes up with a suggestion.