how many, indeed?

October 23, 2012

From class to news to research question. So, this morning in class I taught an article using the network scale-up method. It’s a great technique that’s been used to explore a number of interesting questions (e.g., war casualties, and HIV/AIDS).

I came back from that class to this article pointing to a debate on voter ID laws, and I couldn’t help but think that there has to be a meaningful way to throw this method at this question to estimate plausible bounds for the actual potential impact of these laws. And furthermore, it seems especially important because people without IDs are likely quite hard to accurately enumerate on there own (as are those who’ve engaged in voter fraud).

So, has this study already been published and i just missed it? Else, does someone have the data we’d need for that? I’m hoping it’s a solved question, as i assume its something it would be better to have known a few months ago than a few weeks from now. Anywho, just puzzling over a salient question that linked together some events from my day.

(Cross-posted)

Advertisements

Help / Discussion lists for R packages

May 17, 2011

If you want to learn a methodology, there may be an email list you should be on.  The two big network analysis packages in R  Statnet and igraph each have one (sign up: Statnet, igraph, Mixed Models).  If you join them, you can ask questions when you get stuck.  But you may end up learning even more from other people’s questions.  Jorge M Rocha stimulated Carter Butts to write a mini-essay on exponential random graph models which I received permission to repost.  Dave Hunter also adds some thoughts at the bottom.

Read the rest of this entry »


a (more positive?) nod to Christakis & Fowler

October 7, 2010

Yes, i am aware of my elongated absence from this blog. And i have to plead…well i don’t know what my excuse is, so i’ll just say “howdy all” instead.

A recent article from PLoS One by Christakis and Fowler seems to be getting much less publicity than did their series of papers from the Framingham Heart Study. We’ve talked briefly about some contentions with that work here before.* The thing is that, by my reading, this newest paper is much more compelling and interesting than even the sum of their previous networks-based research, imo.

The new paper is an elegant finding – in essence that we would be better equipped for predicting flu epidemics if our estimates were based on surveillance of the nominated friends of a random sample, than we would get from tracking the random sample itself. It is firmly rooted in previous social networks research and a core idea/finding therein – Felds’ 1991 (gated) “Why your friends have more friends than you do.” And perhaps more importantly, is very clearly and simply potentially useful.

_________
*Incidentally, while i have been somewhat critical of their FHS work, i ended up trying out their Connected book for my current Intro Sociology class. i’ll have to get back to you on how effective a book it is for those purposes.


Social Network Packages Poll

May 6, 2010

Gabriel Rossman is running it here.


Scaling Social Science

April 6, 2010

A friend at Cloudera recently invited me to write a post for their corporate blog about how social scientists are using large scale computation.
I’ve been using Hadoop and MapReduce to study some really large datasets this year. I think it’s going to become more and more important and open the world of scientific computing to social scientists. I’m happy to evangelize for it.

One of the ideas that didn’t make its way into the final version is that even though the tools and data are becoming more widely available to laypeople, asking good social science questions — and answering them correctly — is still hard. It’s comparatively easy to ask the wrong question, use the wrong data, draw the wrong inference, and so on, epecially if the wrongness is subtle. As an example, I think the OkCupid blog is interesting, but it’s not social science.

Social science has long been concerned with sampling methods precisely because it’s dangerously easy to incorrectly extrapolate findings from a non-representative sample to an entire population. Drawing conclusions from internet-based interactions can be problematic because the sample frame doesn’t match the population of interest. Even though I learned to make a cigar box guitar from Make Magazine, I don’t assume I know that much about acoustic engineering. Likewise, recreational data analysis is fun, illuminating and perhaps suggestive of how our social world works, but one ought not conclude that correlations or trends tell the whole, correct story. However, if exploring and experimenting with data can spark an interest in quantitative analysis of our social world, then I think it’s all for the better.

Link: http://www.cloudera.com/blog/2010/04/scaling-social-science-with-hadoop


Network Analysis Bleg for Help

March 16, 2010

So I’ve been working with the National Longitudinal Study of Adolescent Health (Add Health) for a while but I’ve only recently began looking at the raw friendship nomination data.  I’m hoping that someone can give me some practical advice.

My first question this: would you recommend using the network or igraph package?

I’m working in R, and I want to create some measures of centrality.    I wasn’t planning on doing ERG models or anything else complicated at the moment, just simple stuff.   If you want to recommend a different programming environment I’m happy to hear you make your case.


facebook has a soul?*

February 10, 2010

i genuinely hope to get back to more (semi-)regular blogging here soon. But, in the meantime, in case you haven’t seen this one yet – here‘s a wild potential data release that may interest some of you. (ht BW)

____
*it’s highly possible i saw this same title in someone else’s mention of this elsewhere today. but if so, i can’t for the life of me recall where.