19 May 2010
Measuring the extent to which our peers influence our behavior is hard for many reasons: one of the most basic is the difficulty of measuring who is a peer.
Manski catalyzed an econometric literature on how to identify three types of peer effects: "(a) endogenous effects, wherein the propensity of an individual to behave in some way varies with the behavior of the group; (b) exogenous (contextual) effects, wherein the propensity of an individual to behave in some way varies with the exogenous characteristics of the group and (c) correlated effects, wherein individuals in the same group tend to behave similarly because they have similar individual characteristics or face similar institutional environments." Bramoullé, Djebbari, and Fortin have a nice paper in which they show that when there are no unobserved correlated effects, you can use directed social network data to identify endogenous and exogenous effects (as long as the population is not partitioned into groups in which everyone in a certain group is influenced by everyone else in that group and no one outside that group). The intuition is that we can instrument for our friend's actions with the actions of our friend's friends who are not our friends. Identification thus relies on the presence of intransitive triads. Personally I think this is a really neat idea. Certainly it seems reasonable (and is empirically regular) to observe such triads, in which A is friends with B and B is friends with C but A is not friends with C. However, we also know that transitive triads occur frequently.
In all studies of social network effects, we rely on the network being accurately measured. In reality, there is a lot of room for measurement error in network data (Marsden has an overview). If you actually are friends with your friend's friends even though in the observed network data there are no direct links to indicate these friendships, then the identification strategy suggested by Bramoullé, Djebbari, and Fortin is problematic. In observed network data, we usually don't know for sure whether the absence of a link indicates that no relationship exists between two people or that a relationship exists but we did not observe it. However, it is possible to simulate network data and then ask, given that we observe an indirect link between two people, how likely is it that they have a direct link?
I used igraph to generate a series of random graphs to determine, at least in a few cases, the probability that two nodes have a direct link given that they have a path length of three or less. I used two network models: the (directed) Erdős-Rényi model, in which the connection probability between any two nodes is constant, and the (directed) Barabási-Albert model, in which the connection probability is proportional to the number of links a node already has. The amount of "preferential attachment" is tuned by the power parameter. I examine graphs of different sizes and different link probabilities (for Erdős-Rényi) or powers (for Barabási-Albert). The results are shown in the above figure. Each point represents the average probability of a direct link between two nodes in a graph given a path length of three or less between the two nodes, where the average is taken over 1,000 simulations of the graph.
While these results may be sensitive to the particular parameters I've chosen, a few patterns seem to stand out. The size of the network seems to matter less than the sparsity. Unsurprisingly, as tie probabilities increase, the probability of a direct link given that we observe an indirect link (path length three of less) also increases. Perhaps most notable is the "baseline" -- even when link probabilities are quite low, it is not very unlikely that an observed indirect link is actually a direct link (and this is particularly true in the somewhat more realistic Barabási-Albert model).
What do you think, should we worry about measurement error in networks? Do you know of good ways of handling such error?