late blooming sociology?

November 16, 2014

You may have seen Kieran Healey’s post about the most cited papers in sociology by decade. Pretty interesting to noodle over, but it’s hard to come up with a clear account of the main sources of the primary trends observed there (as Kieran summarizes – increasing diversity of sources and the declining dominance of AJS/ASR). His conjecture is that the rise of methodological contributions in a broad field can potentially account for both. Interesting possibility, but my immediate reaction was something different.

Since Kieran doesn’t allow comments on his personal blog, and he hasn’t cross-posted this one at orgtheory or elsewhere, I thought I’d once again dust off my login here to offer an alternative hypothesis.

In addition to the primary trends he notes, I’d add one more–which isn’t at all surprising–the number of citations received by those at the top is much higher in older periods than more recent ones. I say not at all surprising, because that’s something we’d (likely) mostly expect by age. But what it lead me to also notice is that the older “stars” in addition to being substantially more cited, are in generalist journals and more “substantively” or theoretically oriented. So this leads to my hypothesis – general contributions of theory and/or substance simply diffuse slower than methodological contributions do, but once they take hold, they plateau at (much?) higher thresholds.

So, to sum up – perhaps these noted trends are the the result of 2 simultaneous trajectories:

  1. Methodological papers gain momentum quicker, but ultimately top out at lower levels of general prominence than substantive/theoretical papers.
  2. Primarily theoretical papers take longer to catch hold, but once they do the potential prominence of them is wider reaching than methodological papers.

This also might suggest that AJS/ASR papers from recent decades could eventually overtake the top of the list, we just haven’t settled collectively yet on which one(s) those will be. It’s probably also worth noting that this isn’t necessarily on contradiction to Kieran’s speculation, but this was the first story I came up with when looking at the trends he presented.

Maybe nothing, but that’s my $0.015 on those data (not sure it’s quite two whole cents).


doing something about publication bias

September 15, 2014

An interesting idea came up in an article I was reading for a class, and i thought i’d post about the big idea i found in it here to see if this is a more standard approach than i’d realized. The article in question is a meta analysis of smoking cessation effects on mental health outcomes published in BMJ. In one part of the article they acknowledge the potential for publication bias to shape the results in the published literature. So they came up with a plan:

“In some studies, data on mental health were presented incidentally and the aim was to report on other data. In others, the aim of the report was to present data on change in mental health, therefore the decision to publish might have been contingent on the results. We compared effect estimates between studies in which mental health was the primary outcome and those in which it was not to assess if there was evidence of publication bias.” (Taylor et al. 2014, p. 3)

This seems a potentially intriguing way to deal with publication bias, but it’s not one I’ve seen before. So my question is a relatively simple one – is it a common approach? And one with many evaluated strengths/benefits?

Why 75% might be an overcount and 1 an undercount, but maybe not

August 25, 2014

Wow, it’s dusty around here. I couldn’t figure out where else to make this point, so came back here to share a quick thought.

Image Source: Wonkblog, Data Source: Public Religion Research Institute,

This story‘s been circulating on social media today. The basic punchline is how few non-white friends most whites have (the title comes from the estimated 75% of whites’ friendship networks that have no non-whites and the estimated 1 average black friend in whites’ networks). It then interprets a lot of the potential implications of this conclusion for recent reactions to/interpretations of events in Ferguson, MO. It’s not those implications that I want to take issue with here. In fact, I have few qualms with that part of the story. That’s in no small part because decades of homophily research wouldn’t question the general thrust their finding.

However, the method used here is overly simplistic, and shouldn’t be used to estimate these sorts of questions. Basically what they did is take the “important matters” network name generator and elicit the first 7 people respondents nominated. There’s been a lot of important methodological ink spilled on that data collection strategy, but that’s actually not the issue I have here either. (Let’s assume they’ve dealt with the data collection aspects well, which is potentially a problematic assumption itself, but I don’t think the main limitation of the report on which these estimates are based.) With those responses in hand, what the researchers appear to have done is basically compute the racial composition of those truncated personal networks then extrapolate those proportions up to presumed actual network size, or at least 100 person projections thereof (i.e., percentages).

Here’s the thing, truncated friendship lists like that (i.e., just eliciting the first 7 “important matters” partners) have severe problems in estimating actual proportions of events that have highly skewed distributions. This is why a series of strategies collectively known as the “Network Scale Up Method” were developed. In practice, this isn’t the most common use of the NSUM (which is more often used to estimate the size of hard to enumerate populations). But this is something the approach is able to handle quite nicely. What the NSUM basically does is recognize that various dimensions of overly dispersed traits can be elicited at once. The estimation requires that you then compare those that have known distributions in the population (e.g., how many people there are of particular races, ages, etc. in the population – not among the elicited names). This allows one to “scale up” from the elicitations on these numerous dimensions to allow one to estimate the “size” of someone’s personal network. These corrections could then be used (instead of the direct extrapolation of proportions) to estimate the number of friends of particular characteristics within particular folks’ personal networks of estimated (rather than arbitrarily fixed) size.

I don’t know enough about current homophily statistics (paging Matt Brashears, David Schaefer, or Matt Salganik) to suggest whether this approach would give substantially different point estimates than those arrived at in the report above. But, I can tell you with certainty that it would give you different error estimates (particularly the shape of them) than would the direct extrapolation used. Ok, I’ve soap-boxed enough, so I’ll end with the Youtube clip of the Chris Rock bit that the Wonkblog version of this story kicked off with.

On the relationship between social cohesion and structural holes

October 28, 2013

In a continuing series “Highlights of SOCNET” I offer you Vincenzo Nicosia’s email summarizing his cool recently published work: 

In a recent work appeared in Journal of Statistical Physics:
V. Latora, V. Nicosia, P. Panzarasa “Social cohesion, structural
holes, and a tale of two measures”, J. Stat. Phys. 151 (3-4), 745
(2013). (Arxiv version)

We have proved that node degree (k_i), effective size (S_i) and
clustering (C_i) are indeed connected by the simple functional

S_i = k_i – (k_i – 1)C_i

This means that effective size and clustering indeed provide similar
information (even if not exactly the same kind of information), and
they should not be used together in multivariate regression models,
since they tend to be collinear.

In that paper we also build on this relationship to define a measure
of Simmelian brokerage, aiming at quantifying the extent to which a
node acts as a broker among two or more cohesive groups which would
otherwise be disconnected.

Which R packages are good for what social network analysis?

October 8, 2013

Newbies to social network analysis in R should check out this great concise description from Michal Bojanowski on the SOCNET email list.  He writes:

There are two main R packages that provide facilities to store,manipulate and visualize network data. These are “network” and’igraph”. Technically speaking each package provides a specializedclass of R data objects for storing network data plus additionalfunctions to manipulate and visualize them. Each package has itsrelative strengths and weaknesses, but by and large you can do mostbasic network data operations and visualizations in both packagesequally easily. Moreover, you can convert network data objects from”network” to “igraph” or vice versa with functions from the”intergraph” package.Calculating basic network statistics (degree, centrality, etc.) ispossible for both types of objects. For “igraph” objects, functionsfor these purposes are contained in “igraph” itself. For “network”objects, most of the classical SNA routines are contained in the “sna”package.Community detection algorithms (e.g. Newman-Girvan) are available onlyin the “igraph” package.”Fancier things”, especially statistical models for networks (ERGMsetc.) are available in various packages that were build around the”network” package and jointly constitute the ‘statnet’ suite( There is also “tnet” package with some moreroutines for among other things two-mode networks, which borrows fromboth “network” and “igraph” world. And of course there is RSiena forestimating actor-oriented models of network dynamics which is notrelated either “network” or “igraph”.As for matrix algebra, it is obviously available within R itself.My recommendation would be to have a look at both “igraph” and”network” and pick the one which seems easier to you as far asmanipulating and visualizing networks is concerned. Have a look at thedocumentation of these packages (e.g. on and at tutorials on e.g.:- statnet website ( igraph homepage ( R labs by McFarland et al ( Slides and scripts to my Sunbelt workshop( does not really matter whether you pick “igraph” or “network” asyou can aways convert your network to the other class with ‘asIgraph’or ‘asNetwork’ functions from “intergraph” package and take advantageof the functions available in the “other world”.

Check out more of Michal’s helpful contributions at his blog:

forecasting poorly

March 23, 2013

(moderately tweaked excerpt from here)

How hard would it be to get ALL of the first round games in the NCAA men’s basketball tournament wrong? I mean, that would be pretty tough, right? Given that among the multiple millions of brackets submitted to ESPN this year, none got all the first round games right, it would seem hard to do the inverse too, right? So i’m thinking that next year i organize the “anti-confidence” NCAA pool. Instead of gaining points for every game you correctly predict, it’ll consist of losing points for every game you get right. I.e., your aim will be to incorrectly pick as many games as possible. It would seem easy to incorrectly pick the champ, final-four and even the elite 8. But my hunch is that people would even struggle to get all Sweet 16 teams wrong (see e.g., this year’s Kansas State, Wisconsin, La Salle, Ole Miss “pod”), and missing every team making the round of 32 would be almost impossible.

I think we’re going to have to put this to the test. Something like -1 point for every first round game right, -2 for round 2, -4 for sweet 16, -8 for elite 8, -16 for final 4 picks, -32 for final 4 winners and -64 for getting the champ right. Highest score (closest to zero) wins. How poorly do you think you could do?

a case for single-blind review

January 23, 2013

(Cross posted from here)
When i was in grad school, at one of the academic meetings i regularly participate in, it became regular fare for 2 particular folks in my circles to engage in a prolonged debate about how we should overhaul the academic publishing system. This was so regular (i recall them having portions of this debate for 3 consecutive years over dinner) that the grad students in the bunch thought of this as a grenade in our back pockets we could toss into the fray if ever conversations took an unwelcome turn to the boring. I bring this up because there are lots of aspects of this process that i have quite a few thoughts on, but have never really formalized them too much more than is required for such elongated dinner conversations. And one particular aspect of that was raised on Facebook yesterday by a colleague – asking about the merits of single blind review. I started my answer there, but wanted to engage this a little more fully. So, i’m going to start a series of posts (not sure how many there will be at this point) on the publication/review process here, that i think could be interesting discussions. I hope others will chime in with opinions, questions, etc. These posts will likely be slightly longer than typical fare around here. I expect that some of my thoughts on these will be much more formulated than others.

So, let’s start with a case for single blind review. I think think there are quite a few merits to single blind review (for a few other takes, see here and here). I won’t presume to cover them all here, but i will get a start. Feel free to add others, or tell me i’m completely off my rocker in the comments. Read the rest of this entry »

Neal Caren is on github, replication in social science!

December 11, 2012

I’m passionate about open-source science, so I had to give Big Ups to Neal Caren who I just learned is sharing code on github.  His latest offering  essentially replicates the Mark Regnerus study of children whose parents had same-sex relationships.  The writeup of this exercise is at Scatterplot.

My previous posts on github and sharing code are here and here.  If you’re on github, follow me.

Statistical Teaching (bleg)

November 20, 2012

Ok, in my research methods class, we are hitting an overview of statistics in the closing weeks of the semester. As such, i would prefer to include some empirical examples to visualize the things we’re going to talk about that are fun / outside my typical wheelhouse. So, do you have any favorite (read: typical, atypical, surprising, bizarre, differentially distributed, etc.) examples of univariate distributions and/or bivariate associations that may “stick” in their memories when they see them presented visually? I have plenty of “standard” examples i could draw from, but they’re likely bored with the one’s i think of first by this point in the term. So, what are yours? It’s fine if you just have the numbers, i can convert them to visualizations, but if you have visual pointers, all the better.

(cross posted)

how many, indeed?

October 23, 2012

From class to news to research question. So, this morning in class I taught an article using the network scale-up method. It’s a great technique that’s been used to explore a number of interesting questions (e.g., war casualties, and HIV/AIDS).

I came back from that class to this article pointing to a debate on voter ID laws, and I couldn’t help but think that there has to be a meaningful way to throw this method at this question to estimate plausible bounds for the actual potential impact of these laws. And furthermore, it seems especially important because people without IDs are likely quite hard to accurately enumerate on there own (as are those who’ve engaged in voter fraud).

So, has this study already been published and i just missed it? Else, does someone have the data we’d need for that? I’m hoping it’s a solved question, as i assume its something it would be better to have known a few months ago than a few weeks from now. Anywho, just puzzling over a salient question that linked together some events from my day.