September 2007
Sun Mon Tue Wed Thu Fri Sat
            1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30            

Authors' Committee

Chair:

Matt Blackwell (Gov)

Members:

Martin Andersen (HealthPol)
Kevin Bartz (Stats)
Deirdre Bloome (Social Policy)
John Graves (HealthPol)
Rich Nielsen (Gov)
Maya Sen (Gov)
Gary King (Gov)

Weekly Research Workshop Sponsors

Alberto Abadie, Lee Fleming, Adam Glynn, Guido Imbens, Gary King, Arthur Spirling, Jamie Robins, Don Rubin, Chris Winship

Weekly Workshop Schedule

Recent Comments

Recent Entries

Categories

Blogroll

SMR Blog
Brad DeLong
Cognitive Daily
Complexity & Social Networks
Developing Intelligence
EconLog
The Education Wonks
Empirical Legal Studies
Free Exchange
Freakonomics
Health Care Economist
Junk Charts
Language Log
Law & Econ Prof Blog
Machine Learning (Theory)
Marginal Revolution
Mixing Memory
Mystery Pollster
New Economist
Political Arithmetik
Political Science Methods
Pure Pedantry
Science & Law Blog
Simon Jackman
Social Science++
Statistical modeling, causal inference, and social science

Archives

Notification

Powered by
Movable Type 4.24-en


« David Lazer on 'Life in the Network' | Main | How do you get 7,000,000 cell phone records? »

25 September 2007

How many data are enough?

I was reminded again the other day that the word “data” is plural, since it means more than one “datum”, and thus “data” requires a plural verb. The Economist style guide says so, as does the European Union translation manual. The Oxford English Dictionary doesn’t even have an entry for “data,” subsuming it under “datum,” and it identifies sentences with singular constructions as “irregular or confused usage.”

End of story, right? Maybe, maybe not. There are a couple of problems with the “data is the plural of datum” story. (These have been discussed widely on the web, and I’m drawing freely on those discussions). First, it is not quite right even in Latin to say that “data” is the plural of the singular count noun “datum”; both are conjugations of the verb dare, to give. Second, in English, we hardly ever refer to one piece of data as a datum; at least in political science it is an observation, a case, or perhaps a data point. When the word datum is used, it usually has a specialized meaning and takes the plural form “datums.”

The bigger problem, from my perspective, is that fully adhering to “data” as a plural count noun forces you into constructions like

How many data are enough?

instead of

How much data is enough?

The first of these “How many data are…” is correct for a plural count noun, while the second, “How much data is…” is appropriate for a mass noun such as “gold” or “water.” The second sentence sounds much better to me. It also wins on a Google Scholar search by a margin of 10 to 1 (2120 to 198). There are also about 400 hits for “How much data are…”, no doubt from those who want to treat “data” as a mass noun but have been reminded that “data is plural.” It seems to me that data has come to be like the mass nouns described in this post from Language Log:

A great many M nouns denote collectivities of things, but small things, especially small things whose indivual identities are not usually important to us: CORN, RICE, BARLEY, CHAFF, CONFETTI, etc. Some of these contrast minimally with C nouns of similar denotations, like BEAN, PEA, LENTIL. In any case, it would be easy to think of barley in "The barley was almost cooked" as "meaning more than one" in much the same way as lentils in "The lentils were almost cooked" does -- and in fact, every so often someone misidentifies little-thing M nouns as "plural".

I kind of like the idea of data as a collection of small things that aren’t that important to us as individual objects but that are meaningful when taken together.

So, in the end, is “data” a plural count noun or a mass noun? I would certainly prefer the latter, but at least on this side of the Atlantic it looks like it will be both. Here are some usage notes to ponder:

Merriam-Webster Online


Data leads a life of its own quite independent of datum, of which it was originally the plural. It occurs in two constructions: as a plural noun (like earnings), taking a plural verb and plural modifiers (as these, many, a few) but not cardinal numbers, and serving as a referent for plural pronouns (as they, them); and as an abstract mass noun (like information), taking a singular verb and singular modifiers (as this, much, little), and being referred to by a singular pronoun (it). Both constructions are standard. The plural construction is more common in print, evidently because the house style of several publishers mandates it.

American Heritage Dictionary, Fourth Edition


The word data is the plural of Latin datum, “something given,” but it is not always treated as a plural noun in English. The plural usage is still common, as this headline from the New York Times attests: “Data Are Elusive on the Homeless.” Sometimes scientists think of data as plural, as in These data do not support the conclusions. But more often scientists and researchers think of data as a singular mass entity like information, and most people now follow this in general usage. Sixty percent of the Usage Panel accepts the use of data with a singular verb and pronoun in the sentence Once the data is in, we can begin to analyze it. A still larger number, 77 percent, accepts the sentence We have very little data on the efficacy of such programs, where the quantifier very little, which is not used with similar plural nouns such as facts and results, implies that data here is indeed singular.

Posted by Mike Kellermann at September 25, 2007 2:43 PM