7 December 2006
I've just spent this week at the annual NIPS conference; though its main focus seems to be machine learning, there are always interesting papers on the intersection of computational/mathematical methods in cognitive science and neuroscience. I thought it might be interesting to mention the highlights of the conference for me - which obviously tends to focus heavily on the cognitive science end of things. (Be aware that links (pdf) are to the paper pre-proceedings, not final versions, which haven't been released yet).
From Daniel Navarro and Tom Griffiths, we have A Nonparametric Bayesian Method for Inferring Features from Similarity Judgments. The problem, in a nutshell, is that if you're given a set of similarity ratings about a group of objects, you'd like to be able to infer the features of the objects from that. Additive clustering assumes that similarity is well-approximated by a weighted linear combination of common features. However, the actual inference problem -- actually finding the features -- has always been difficult. This paper presents a method for inferring the features (as well as figuring out how many features their are) that handles the empirical data well, and might even be useful for figuring out what sorts of information (i.e., what sorts of features) we humans represent and use.
From Mozer et. al. comes Context Effects in Category Learning: An Investigation of Four Probabilistic Models. Some interesting phenomena in human categorization are the so-called push and pull effects: when shown an example from a target category, the prototype gets "pulled" closer to that example, and the prototypes of other related categories get pushed away. It's proven difficult to explain this computationally, and this paper considers four obvious candidate models. The best one uses a distributed representation and a maximum likelihood learning rule (and thus tries to find the prototypes that maximize the probability of being able to identify the category given the example); it's interesting to speculate about what this might imply about humans. The main shortcoming of this paper, to my mind, is that they use very idealized categories; but it's probably a necessary simplification to begin with, and future work can extend it to categories with a richer representation.
The next is work from my own lab (though not me): Kemp et. al. present an account of Combining causal and similarity-based reasoning. The central point is that people have developed accounts of reasoning about causal relationships between properties (say, having wings causes one to be able to fly) and accounts of reasoning about objects on the basis of similarity (say, if a monkey has some gene, an ape is more likely to have it than a duck is). But many real-world inferences rely on both: if a duck has gene X, and gene X causes enzyme Y to be expressed, it is likely that a goose has enzyme Y. This paper presents a model that intelligently combines causal- and similarity-based reasoning, and is thus able to predict human judgments more accurately than either of them alone.
Roger Levy and T. Florian Jaeger have a paper called Speakers optimize information density through syntactic reduction. They explore the (intuitively sensible, but hard to study) idea that people -- if they are rational -- should try to communicate in the information-theoretically optimal way: they should try to give more information at highly ambiguous points in a sentence, but not bother doing so at less ambiguous points (since adding information has the undesirable side-effect of making utterances longer). They examine the use of reduced relative clauses (saying, e.g., "How big is the family you cook for" rather than "How big is the family THAT you look for" - the word "that" is extra information which reduces the ambiguity of the subsequent word "you"). The finding is that speakers choose to reduce the relative clause -- to say the first type of sentence -- when the subsequent word is relatively unambiguous; in other words, their choices are correlated with information density. One of the reasons this is interesting to me is because it motivates the question of why exactly speakers do this: is it a conscious adaptation to try to make things easier for the listener, or a more automatic/unconscious strategy of some sort?
There are a number of other papers that I found interesting -- Chemudugunta et. al. on Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model; Roy et. al. on Learning Annotated Hierarchies from Relational Data, and Greedy Layer-wise Training of Deep Networks by Bengio et. al., to name a few -- so if this sort of thing interests you, I suggest checking out the NIPS proceedings when they come out. And if any of you went to NIPS also, I'd be curious what you really liked and think I should have included on this list!