Why 75% might be an overcount and 1 an undercount, but maybe not

August 25, 2014

Wow, it’s dusty around here. I couldn’t figure out where else to make this point, so came back here to share a quick thought.

Image Source: Wonkblog, http://wapo.st/1qI4oxJ Data Source: Public Religion Research Institute, http://bit.ly/1paM2tR

This story‘s been circulating on social media today. The basic punchline is how few non-white friends most whites have (the title comes from the estimated 75% of whites’ friendship networks that have no non-whites and the estimated 1 average black friend in whites’ networks). It then interprets a lot of the potential implications of this conclusion for recent reactions to/interpretations of events in Ferguson, MO. It’s not those implications that I want to take issue with here. In fact, I have few qualms with that part of the story. That’s in no small part because decades of homophily research wouldn’t question the general thrust their finding.

However, the method used here is overly simplistic, and shouldn’t be used to estimate these sorts of questions. Basically what they did is take the “important matters” network name generator and elicit the first 7 people respondents nominated. There’s been a lot of important methodological ink spilled on that data collection strategy, but that’s actually not the issue I have here either. (Let’s assume they’ve dealt with the data collection aspects well, which is potentially a problematic assumption itself, but I don’t think the main limitation of the report on which these estimates are based.) With those responses in hand, what the researchers appear to have done is basically compute the racial composition of those truncated personal networks then extrapolate those proportions up to presumed actual network size, or at least 100 person projections thereof (i.e., percentages).

Here’s the thing, truncated friendship lists like that (i.e., just eliciting the first 7 “important matters” partners) have severe problems in estimating actual proportions of events that have highly skewed distributions. This is why a series of strategies collectively known as the “Network Scale Up Method” were developed. In practice, this isn’t the most common use of the NSUM (which is more often used to estimate the size of hard to enumerate populations). But this is something the approach is able to handle quite nicely. What the NSUM basically does is recognize that various dimensions of overly dispersed traits can be elicited at once. The estimation requires that you then compare those that have known distributions in the population (e.g., how many people there are of particular races, ages, etc. in the population – not among the elicited names). This allows one to “scale up” from the elicitations on these numerous dimensions to allow one to estimate the “size” of someone’s personal network. These corrections could then be used (instead of the direct extrapolation of proportions) to estimate the number of friends of particular characteristics within particular folks’ personal networks of estimated (rather than arbitrarily fixed) size.

I don’t know enough about current homophily statistics (paging Matt Brashears, David Schaefer, or Matt Salganik) to suggest whether this approach would give substantially different point estimates than those arrived at in the report above. But, I can tell you with certainty that it would give you different error estimates (particularly the shape of them) than would the direct extrapolation used. Ok, I’ve soap-boxed enough, so I’ll end with the Youtube clip of the Chris Rock bit that the Wonkblog version of this story kicked off with.