1. A quickie on philosophy of modeling by Daniel Lakeland
Estimating Diversity on Facebook
December 17, 2009Since the earliest days of the internet, along with all of the near-utopian promises of openness, freedom and all that, came questions about how the technology would reproduce social inequality. Styled the “digital divide,” the separation of the public into those who had access to the internet and those who lacked access would advantage the already advantaged, and further isolate the disadvantaged from the social and informational resources needed to participate fully in society.
Eszter Hargittai has done some of the most comprehensive work on this topic, and has tracked with longitudinal surveys a diverse undergraduate population, showing how racial and socioeconomic participation in the internet has changed over time. Even recently, she has shown how racial and socioeconomic differences persist in usage of online social networking sites.
Given her work, I, like many others, were very interested to see a release from Facebook today of a study in which their data science team estimated the racial diversity of their population by statistical analysis of members’ surnames. What has always set Facebook apart from other online services is the use of real names. While i’m “redlog” most everywhere else (Twitter, and so on), I’m “Scott Golder” on Facebook. It’s long been noted that this kind of “real” or “honest” data are immensely useful, but I think this is among the first time it’s been shown exactly why.
In short, Facebook used statistical data from the Census on race-surname mappings to estimate the racial makeup of their user base. For example, if 73% of people named Smith are white, then multiply the number of Smiths by .73 and add that to the number of white people. In the blog post, they describe how this method assumes Facebook users are randomly sampled from the population, and they used a mixture model to correct for this error (though more details on the modeling would’ve been great).
They present several findings, but I was most interested in the “saturation” plot. Though whites were slightly overrepresented initially, over time this has disappeared. Asians have consistently been overrepresented. Most strikingly, until recently, blacks and hispanics were signficiantly underrepresented, but are only recently approaching being proportionate. This proportionate representation is good news, but I’d caution against thinking of it as evidence that the digital divide is over. Indeed, I’d argue that racial diversity online is less interesting than socioeconomic diversity, which this study doesn’t address.
Originally I was going to talk in this post about how this is an example of why using internet data to do social science is awesome. But it’s actually the opposite — it’s the use of social science data to do internet research. Facebook has cleverly used data generated at great expense by the American public in order to make their own data more valuable. In addition to using Census data, they could use Social Security Administration data on first names to improve the analysis, as well as perhaps the Census’ race atlas data and zip codes.
I’m very happy that Facebook is doing many things that are of interest to social scientists and to the public interest in general. But now that they’ve leveraged public data for their own use, I’d ask them to think about what can they do with their data to help the public in return.
The Gendered “Last-Naming-Gap”
December 16, 2009Are men more likely to be given the privilege of being identified by last name only? Thorkelson, writes another thoughtful post with a literary quality I dare not try to emulate. You should really read his post because I’m not summarizing it. I’m just replying to one little piece of it, and then explaining how one would approach this question using regression. The rest of my post also appears as a comment at Eli’s blog and I encourage readers to make comments on both sites:
Thorkelson, having read your post, I find the basic claim about men being privileged by the practice of identifying them using solely their last name to be very plausible, e.g. “Sartre” vs. “Simone de Beauvoir.” This is not something I had thought about much before either, and I agree that we should prefer that we had already observed and considered this issue without someone else bringing it to our attention.
That said, I don’t think you should chastise yourself for some skepticism. When someone makes a claim, we should ALWAYS be skeptical. Yes, we should be on the lookout for motivated skepticism. But I’d prefer that we increase skepticism of claims that justify our high status rather than decrease skepticism of claims that we benefit from unearned privilege.
As a quantitative sociologist, I think we could learn a great deal with a statistical approach. I can think of a lot of things I’d want to consider, but one of them would be to account for the prominence of the professor. More prominent professors are probably more likely to be referred to by last name only. More prominent professors are more likely to be men. What happens to the pattern when google hits and citations are accounted for? Perhaps naming is also related to behaviors/personality and behaviors/personality differ across gender.
At the risk of being pedantic I’ll give a very simple explanation of how we’d approach this problem using regression. First we identify our outcome of interest, e.g. the percentage of the time someone is referred to by last name only, first name only, or full name. Then we identify things that might predict this outcome, e.g. gender, citations, field, popularity of name, personality, etc. We run our data through the machinery of regression (important details omitted) and this gives us an equation for predicting the outcome. The “Last-Naming-Gap” is equal to the coefficient on the gender predictor. This will vary depending on which other predictors we include. If we include the right variables, the coefficient may go to zero, in which case we might say we “explained” the “last-naming-gap”? Note this would NOT disprove discrimination. For example, it is quite possible that the gender differences in citations that we are using to “explain” the last-naming-gap, are themselves partially the result of gender bias. And a bias that results in citation differences would be more obviously consequential than a bias that results in a different use of language. Anyway, thanks for the interesting post Eli.
Free Textbooks on Networks
December 10, 2009David Easley and Jon Kleinberg have a textbook coming out called Networks, Crowds, and Markets: Reasoning About a Highly Connected World. It looks great, and for now you can download a preprint of the whole thing for free. Cornell, home of founding co-blogger Matthew Brashears seems like a great place to do work on networks.
Robert Hanneman (with coauthors Riddle and Izquierdo) also has a free textbook or three for you to download. I won’t try to summarize any of these books since you are just a click away from viewing them, but I will point out that they aren’t competitors… they each have a lot of unique material.
See more discussion of social network curriculum/pedagogy at Jimi’s post here.
Sethi on Insights from Ecology
December 8, 2009Economist Rajiv Sethi has a great blog. In this post, Sethi, and Thoma, whom he quotes, seem to acknowledge that the financial crisis should lead them to consider new ideas for economic models. Later on in the post, Sethi points out that behavioral economics has mined psychology for insights, but that economists would do well to look beyond the level of the individual:
If one is to look beyond economics for metaphors and models, why stop at psychology? For financial market behavior, a more appropriate discipline might be evolutionary ecology. This is not a new idea. Consider, for instance, this recent article in Nature. Or take a look at the chapter on “The Ecology of Markets” in Victor Niederhoffer’s extraordinary memoir. Or study Hyman Minsky’s financial instability hypothesis (discussed at some length in an earlier post), which depends explicitly on the assumption that aggressive financial practices are rapidly replicated during periods of stable growth, eventually becoming so widespread that systemic stability is put at risk. To my mind this reflects an ecological rather than psychological understanding of financial market behavior.
Reading people like Sethi, I’m confident economics will come around. Sociologists have never overemphasized rational actors, but we too can learn from approaches in other disciplines like ecology.
couldn’t they at least have faked it?
December 7, 2009Does P=NP?
December 5, 2009If P=NP, then the world would be a profoundly different place than we usually assume it to be. There would be no special value in “creative leaps,” no fundamental gap between solving a problem and recognizing the solution once it’s found. Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-by-step argument would be Gauss; everyone who could recognize a good investment strategy would be Warren Buffett. It’s possible to put the point in Darwinian terms: if this is the sort of universe we inhabited, why wouldn’t we already have evolved to take advantage of it? – Scott Aaronson (reason #9)
When you have some free time, watch this amazing lecture by Avi Wigderson about one of the great open problems in all of mathematics.
Preparing the next generation.
December 4, 2009If you’re a regular reader or contributor to this blog, you probably agree that mathematics have an indispensable role in the social sciences. Lately, however, I’ve been thinking a lot about something: what kind of mathematical tools do sociologists of the future require?
The reason I’ve been thinking about this is a talented undergraduate who is strongly considering going to grad school in sociology. Interestingly, part of what has moved him in that direction seems to have been two classes he’s taken with me. The first was the required undergraduate statistics class and the second a substantive class that includes a lot of network analysis, structural theory, and associated material. As it turns out, it’s entirely possible to teach Mayhew & Levinger to undergraduates. Who knew? In any case, in this second class he’s gotten a strong sense that sociology involves a lot of math and this excites him. I’m all for it, since this student is very smart and we need more mathematically-gifted grad students. Yesterday during a conversation, though, he asked me what sorts of math he should be thinking about taking as he prepares for graduate school. I gave him my answers- and an regression analysis textbook to work through during winter break- but I wonder what others think.
If you could somehow start over again in the field, what areas of mathematics would you make sure you learned right from the start?
What Protestant Ethic?
December 2, 2009Davide Cantoni of Harvard Economics offers this job market paper:
The Economic Effects of the Protestant Reformation: Testing the Weber Hypothesis in the German Lands [pdf]
Many theories, most famously Max Weber’s essay on the ‘Protestant ethic,’ have hypothesized that Protestantism should have favored economic development. With their considerable religious heterogeneity and stability of denominational affiliations until the 19th century, the German Lands of the Holy Roman Empire present an ideal testing ground for this hypothesis. Using population figures in a dataset comprising 276 cities in the years 1300-1900, I find no effects of Protestantism on economic growth. The finding is robust to the inclusion of a variety of controls, and does not appear to depend on data selection or small sample size. In addition, Protestantism has no effect when interacted with other likely determinants of economic development. I also analyze the endogeneity of religious choice; instrumental variables estimates of the effects of Protestantism are similar to the OLS results.
Hat Tip to Tyler Cowen.