Which R packages are good for what social network analysis?

October 8, 2013

Newbies to social network analysis in R should check out this great concise description from Michal Bojanowski on the SOCNET email list.  He writes:

There are two main R packages that provide facilities to store,manipulate and visualize network data. These are “network” and’igraph”. Technically speaking each package provides a specializedclass of R data objects for storing network data plus additionalfunctions to manipulate and visualize them. Each package has itsrelative strengths and weaknesses, but by and large you can do mostbasic network data operations and visualizations in both packagesequally easily. Moreover, you can convert network data objects from”network” to “igraph” or vice versa with functions from the”intergraph” package.Calculating basic network statistics (degree, centrality, etc.) ispossible for both types of objects. For “igraph” objects, functionsfor these purposes are contained in “igraph” itself. For “network”objects, most of the classical SNA routines are contained in the “sna”package.Community detection algorithms (e.g. Newman-Girvan) are available onlyin the “igraph” package.”Fancier things”, especially statistical models for networks (ERGMsetc.) are available in various packages that were build around the”network” package and jointly constitute the ‘statnet’ suite(http://www.statnet.org/). There is also “tnet” package with some moreroutines for among other things two-mode networks, which borrows fromboth “network” and “igraph” world. And of course there is RSiena forestimating actor-oriented models of network dynamics which is notrelated either “network” or “igraph”.As for matrix algebra, it is obviously available within R itself.My recommendation would be to have a look at both “igraph” and”network” and pick the one which seems easier to you as far asmanipulating and visualizing networks is concerned. Have a look at thedocumentation of these packages (e.g. onhttp://www.rdocumentation.org/) and at tutorials on e.g.:- statnet website (http://www.statnet.org/)- igraph homepage (http://igraph.sourceforge.net/)- R labs by McFarland et al (http://sna.stanford.edu/rlabs.php)- Slides and scripts to my Sunbelt workshop(http://www.bojanorama.pl/snar:start)It does not really matter whether you pick “igraph” or “network” asyou can aways convert your network to the other class with ‘asIgraph’or ‘asNetwork’ functions from “intergraph” package and take advantageof the functions available in the “other world”.

Check out more of Michal’s helpful contributions at his blog: http://bc.bojanorama.pl/

The Promising Future of Mathematical Sociology

May 11, 2012

I strongly believe sociology, especially mathematical sociology, has an extremely promising future. The current trends in information technology clearly indicate a growth in quantitative modeling. Among other things, we are currently witnessing a tsunami of data from a globally-connected world (in fact, big data is the techno-geek buzzword), exponentially faster computing power (Markov Chain Monte Carlo simulations of complex models are now increasingly commonplace), and a rapid uptick in the volume and range of high-quality statistical programs (a great deal of which are open-source).

However, why would I think quantitatively-oriented sociologists are especially well-placed to gain from these structural developments?

The primary reason is that the underlying epistemology of modern quantitative sociology — grounded in complex predictive models, relational and nested data structures, and a folk-Bayesian approach to research design — represents the cutting edge and future direction of modeling in a shockingly vast array of fields. For example, multidimensional scaling, social network analysis, log-linear modeling, and finite mixture models (i.e., latent class analysis) are now at the forefront of disciplines ranging from machine learning to computational genetics (for example, see here, here, here, and here). However, most promising is the growing popularity of Bayesian multilevel models, which sociologists have in effect been using for several decades now. For instance, Bayesian multilevel models are now used by physicists to measure the mysterious properties of dark energy, geneticists to unlock the basic patterns of genomic population differentiation, and neuroscientists to describe the deepest structures of the brain. It is no exaggeration to claim that a human-level form of artificial intelligence, if it is ever developed, will probably be based on multilevel models of the type currently familiar to most quantitatively-oriented sociologists.

A secondary reason why the future looks so promising for mathematical sociology is that a vacuum has been created in the social sciences due to the rise of an alternative approach to quantitative modeling, frequently promoted by mainstream economists. According to this approach, the main goal of quantitative research is to estimate population-averaged causal effects, either by setting up a randomized (controlled) experiment or applying a small suite of techniques to observational data, such as instrumental variables regression, so-called “fixed” effects (rather than “random” effects) regression, difference-in-differences design, and so forth.

This approach is appealing because it promises the extraction of causal estimates with minimal theoretical insight, but it comes at enormous costs. For example, the assumptions of causality are rarely, if ever, satisfied for any particular model fit to observational data (as painfully but clearly outlined by the counterfactual model of causality, and evinced by the growing ranks of not-really-exogenous-but-we’ll-use-it-anyway instrumental variables). Furthermore, although it’s well-known randomized experiments are inferior to controlled experiments, the latter require strong theory that is often absent (and even then experiments in the social sciences often lack generalizability to other populations). Finally, an enormous amount of substantively-rich information is usually discarded when observational data are used primarily  for extracting causal estimates, so if we don’t believe our causal estimates then we’re left with a rather meager description of the data at hand (the worst offender is the so-called “fixed” effects technique, which can be viewed as a special case of a Bayesian multilevel model in which the groups are assumed, rather unreasonably, to have infinite variance between them).

Of course, both economics and sociology are large fields, encompassing a wide range of viewpoints, so I caution that my comments embody ideal types. Yet the dominant trends of a globally-networked world, combined with the rise of a distinctive approach to quantitative modeling popularized by economists, has created the conditions for a promising future for sociology more generally, and mathematical sociology in particular.

Eigenvector Measures of Centrality

April 25, 2011

I’m working with the social networks in the Longitudinal Study of Adolescent Health.  Students are asked to name their five best male friends and five best female friends.  I’m interested in something like a measure of popularity.  In-degree, the number of times others nominate you as their friend, is a simple measure, but I think I can do a bit better if I can capture the intuition that people with popular friends are themselves more popular.  This is one potential use of eigenvector measures of centrality.  In working with such measures, I’m learning a thing or two.  For example, the weight parameter can matter a lot.

I used the igraph package (see p. 8), in R to calculate Bonacich’s alpha measure of centrality for directed networks.  The default weight (variable: acent) is 1.  I compared this measure to in-degree (idgx2) and out-degree (odgx2) with a matrix of scatterplots and was a bit surprised not to see a clear positive relationship with in-degree.

When I tried alpha weights of 0.2 and 0.4  I found fairly strong non-linear relationships due to a handful of outliers.  While I think different alpha weights are worth exploring empirically, I’m inclined to emphasize ones which a positive monotonic relationship with in-degree.  The reason is that, to me, in-degree itself seems like a fairly good measure of popularity or social prominence.  I feel that moving to a measure quite different from in-degree requires justification in the form of strong theory or empirics.  I lack both.

In other contexts though, higher or lower (including negative) alpha weights might be justified.  For more on applying these measures to social networks, I recommend the work of Phillip Bonacich.

what’s in a tie?

January 12, 2010

i recognize that my last few posts really haven’t done much for hitting the target of the theme of this blog. So, with that i give you Stephen Colbert. No seriously, it’s relevant i promise. James Fowler’s on pitching his book with Nicholas Christakis: Connected.* In the interview he makes the shocking revelation that social networks aren’t new to social network sites like Facebook.** Gasp! He then starts to get into a little bit of the “what can we learn about real world social connections from Facebook?” question in the interview.
Read the rest of this entry »