December 2005
Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries

Categories

Blogroll

SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« Beyond Standard Errors, Part II: What Makes an Inference Prone to Survive Rosenbaum-Type Sensitivity Tests? | Main | End-of-Year Hiatus »

20 December 2005

BUCLD: Statistical Learning in Language Development

Amy Perfors

The annual Boston University Conference on Language Development (BUCLD), this year held on November 4-6th, consistently offered a glimpse into the state of the art in language development. The highlight this year for me was a lunchtime symposium titled "Statistical learning in language development: what is it, what is its potential, and what are its limitations?" It featured a dialogue between three of the biggest names in this area: Jeff Elman at UCSD, who studies connectionist models of many aspects of language development; Mark Johnson at Brown, a computational linguist who applies insights from machine learning and Bayesian reasoning to study human language understanding; and Lou-Ann Gerken at the University of Arizona, who studies infants' sensitivity to statistical aspects of linguistic structure.

I was most interested in the dialogue between Elman and Johnson. Elman focused on a number of phenomena in language acquisition that connectionist models capture. One of them is "the importance of starting small," an argument that says essentially that beginning with limited capacities of memory and perception might actually be a helpful way of learning ultimately very complex things because it "forces" the learning mechanism to notice only the broad, consistent generalizations first and not to be led astray by local ambiguities and complications too soon. Johnson seconded that argument, and pointed out that models learning using Expectation Maximization embody this just as well as neural networks do. Another key insight of Johnson's was that statistical models implicitly extract more information from input than purely logical or rule-based models. This is because statistical models generally assume some underlying distributional form, and therefore when you don't see data from that distribution, that is a valuable form of negative evidence. Because there are a number of areas in which people appear not to receive much negative evidence, they must either incorporate or use statistical assumptions or be innately biased toward the "right" answer.

The most valuable aspect of the symposium, however, was the clarification of many of the issues in statistical learning and cognitive science in general that statistical learning can help to answer. Some of these important questions: in any given problem, what are the units of generalization that human learners (and hence our models) should and do use? [i.e., sentences, bigram frequencies, words, part of speech frequencies, phoneme transitions, etc] What is the range of computations that the human brain is capable of (possibly changing at different stages of development)? What statistical and computational models capture these? What is the nature of the input (the data) that human learners see; to what extent does this depend on factors external to them (the world) and to what extent is it due to internal factors (attentional biases, mental capacities, etc)?

If we can answer these questions, we will have answered a great many of the difficult questions in cognitive science. If we can't, I'd be very surprised if we make much real progress on them.

Posted by Amy Perfors at December 20, 2005 2:09 AM