On the relationship between social cohesion and structural holes

October 28, 2013

In a continuing series “Highlights of SOCNET” I offer you Vincenzo Nicosia’s email summarizing his cool recently published work: 

In a recent work appeared in Journal of Statistical Physics:
V. Latora, V. Nicosia, P. Panzarasa “Social cohesion, structural
holes, and a tale of two measures”, J. Stat. Phys. 151 (3-4), 745
(2013). (Arxiv version)

We have proved that node degree (k_i), effective size (S_i) and
clustering (C_i) are indeed connected by the simple functional

S_i = k_i – (k_i – 1)C_i

This means that effective size and clustering indeed provide similar
information (even if not exactly the same kind of information), and
they should not be used together in multivariate regression models,
since they tend to be collinear.

In that paper we also build on this relationship to define a measure
of Simmelian brokerage, aiming at quantifying the extent to which a
node acts as a broker among two or more cohesive groups which would
otherwise be disconnected.

Which R packages are good for what social network analysis?

October 8, 2013

Newbies to social network analysis in R should check out this great concise description from Michal Bojanowski on the SOCNET email list.  He writes:

There are two main R packages that provide facilities to store,manipulate and visualize network data. These are “network” and’igraph”. Technically speaking each package provides a specializedclass of R data objects for storing network data plus additionalfunctions to manipulate and visualize them. Each package has itsrelative strengths and weaknesses, but by and large you can do mostbasic network data operations and visualizations in both packagesequally easily. Moreover, you can convert network data objects from”network” to “igraph” or vice versa with functions from the”intergraph” package.Calculating basic network statistics (degree, centrality, etc.) ispossible for both types of objects. For “igraph” objects, functionsfor these purposes are contained in “igraph” itself. For “network”objects, most of the classical SNA routines are contained in the “sna”package.Community detection algorithms (e.g. Newman-Girvan) are available onlyin the “igraph” package.”Fancier things”, especially statistical models for networks (ERGMsetc.) are available in various packages that were build around the”network” package and jointly constitute the ‘statnet’ suite(http://www.statnet.org/). There is also “tnet” package with some moreroutines for among other things two-mode networks, which borrows fromboth “network” and “igraph” world. And of course there is RSiena forestimating actor-oriented models of network dynamics which is notrelated either “network” or “igraph”.As for matrix algebra, it is obviously available within R itself.My recommendation would be to have a look at both “igraph” and”network” and pick the one which seems easier to you as far asmanipulating and visualizing networks is concerned. Have a look at thedocumentation of these packages (e.g. onhttp://www.rdocumentation.org/) and at tutorials on e.g.:- statnet website (http://www.statnet.org/)- igraph homepage (http://igraph.sourceforge.net/)- R labs by McFarland et al (http://sna.stanford.edu/rlabs.php)- Slides and scripts to my Sunbelt workshop(http://www.bojanorama.pl/snar:start)It does not really matter whether you pick “igraph” or “network” asyou can aways convert your network to the other class with ‘asIgraph’or ‘asNetwork’ functions from “intergraph” package and take advantageof the functions available in the “other world”.

Check out more of Michal’s helpful contributions at his blog: http://bc.bojanorama.pl/

forecasting poorly

March 23, 2013

(moderately tweaked excerpt from here)

How hard would it be to get ALL of the first round games in the NCAA men’s basketball tournament wrong? I mean, that would be pretty tough, right? Given that among the multiple millions of brackets submitted to ESPN this year, none got all the first round games right, it would seem hard to do the inverse too, right? So i’m thinking that next year i organize the “anti-confidence” NCAA pool. Instead of gaining points for every game you correctly predict, it’ll consist of losing points for every game you get right. I.e., your aim will be to incorrectly pick as many games as possible. It would seem easy to incorrectly pick the champ, final-four and even the elite 8. But my hunch is that people would even struggle to get all Sweet 16 teams wrong (see e.g., this year’s Kansas State, Wisconsin, La Salle, Ole Miss “pod”), and missing every team making the round of 32 would be almost impossible.

I think we’re going to have to put this to the test. Something like -1 point for every first round game right, -2 for round 2, -4 for sweet 16, -8 for elite 8, -16 for final 4 picks, -32 for final 4 winners and -64 for getting the champ right. Highest score (closest to zero) wins. How poorly do you think you could do?

a case for single-blind review

January 23, 2013

(Cross posted from here)
When i was in grad school, at one of the academic meetings i regularly participate in, it became regular fare for 2 particular folks in my circles to engage in a prolonged debate about how we should overhaul the academic publishing system. This was so regular (i recall them having portions of this debate for 3 consecutive years over dinner) that the grad students in the bunch thought of this as a grenade in our back pockets we could toss into the fray if ever conversations took an unwelcome turn to the boring. I bring this up because there are lots of aspects of this process that i have quite a few thoughts on, but have never really formalized them too much more than is required for such elongated dinner conversations. And one particular aspect of that was raised on Facebook yesterday by a colleague – asking about the merits of single blind review. I started my answer there, but wanted to engage this a little more fully. So, i’m going to start a series of posts (not sure how many there will be at this point) on the publication/review process here, that i think could be interesting discussions. I hope others will chime in with opinions, questions, etc. These posts will likely be slightly longer than typical fare around here. I expect that some of my thoughts on these will be much more formulated than others.

So, let’s start with a case for single blind review. I think think there are quite a few merits to single blind review (for a few other takes, see here and here). I won’t presume to cover them all here, but i will get a start. Feel free to add others, or tell me i’m completely off my rocker in the comments. Read the rest of this entry »

Neal Caren is on github, replication in social science!

December 11, 2012

I’m passionate about open-source science, so I had to give Big Ups to Neal Caren who I just learned is sharing code on github.  His latest offering  essentially replicates the Mark Regnerus study of children whose parents had same-sex relationships.  The writeup of this exercise is at Scatterplot.

My previous posts on github and sharing code are here and here.  If you’re on github, follow me.

Statistical Teaching (bleg)

November 20, 2012

Ok, in my research methods class, we are hitting an overview of statistics in the closing weeks of the semester. As such, i would prefer to include some empirical examples to visualize the things we’re going to talk about that are fun / outside my typical wheelhouse. So, do you have any favorite (read: typical, atypical, surprising, bizarre, differentially distributed, etc.) examples of univariate distributions and/or bivariate associations that may “stick” in their memories when they see them presented visually? I have plenty of “standard” examples i could draw from, but they’re likely bored with the one’s i think of first by this point in the term. So, what are yours? It’s fine if you just have the numbers, i can convert them to visualizations, but if you have visual pointers, all the better.

(cross posted)

how many, indeed?

October 23, 2012

From class to news to research question. So, this morning in class I taught an article using the network scale-up method. It’s a great technique that’s been used to explore a number of interesting questions (e.g., war casualties, and HIV/AIDS).

I came back from that class to this article pointing to a debate on voter ID laws, and I couldn’t help but think that there has to be a meaningful way to throw this method at this question to estimate plausible bounds for the actual potential impact of these laws. And furthermore, it seems especially important because people without IDs are likely quite hard to accurately enumerate on there own (as are those who’ve engaged in voter fraud).

So, has this study already been published and i just missed it? Else, does someone have the data we’d need for that? I’m hoping it’s a solved question, as i assume its something it would be better to have known a few months ago than a few weeks from now. Anywho, just puzzling over a salient question that linked together some events from my day.



Get every new post delivered to your Inbox.