Share your code! (Here is some for Add Health)

March 26, 2012

The National Longitudinal Study of Adolescent Health, aka Add Health, has been in use for more than a decade ago.  Thousands of researchers have used it.  This is fantastic.  There are great economies of scale in the data collection.

Sadly, we researchers have wasted years doing things that others have already done. Anyone beginning a new project must first clean their data.  Add Health doesn’t require as much cleaning as some other, messier sources of data, thanks to people like Joyce Tabor, James Moody, Ken Frank, and many others.  Still, I think research would be sped up quite a lot, and communication greatly enhanced, if people shared their code more widely.  Therefore I’ve created my first github code repository which prepares the variables from the widely used in-school questionnaire portion of Add Health.

This will be of most use to people using R, but the data could be exported.  The script also includes cross tabulations and fairly detailed comments which I hope will help people think about the data.  Some time soon I’ll upload more code.

I recommend Jeremy Freese on reproducibility in sociological research here and here.  Andy Abbott’s best objections don’t apply to a widely used data source like Add Health.

p.s.  Do share links to other code repositories in the comments!


Discussions of Math Soc

March 25, 2012

Math soc discussions happening elsewhere in the blogosphere.

Pam starts a discussion of “controversies” in mathematical sociology at Scatterplot.

Fabio is prompted to write about “Transparency vs. Truth”  For the record, I think Fabio’s reference to “accurate” models, may be a bit misleading for some.  I’m not sure any math models deserve to be called “accurate” ;p  But I guess I lean towards favoring the transparent side of that debate.

And even goes on to call for a sort of open-science collaboration to develop a Schelling model of segregation which considers the influence of topography.  I think its a great idea!