If you want to learn a methodology, there may be an email list you should be on. The two big network analysis packages in R Statnet and igraph each have one (sign up: Statnet, igraph, Mixed Models). If you join them, you can ask questions when you get stuck. But you may end up learning even more from other people’s questions. Jorge M Rocha stimulated Carter Butts to write a mini-essay on exponential random graph models which I received permission to repost. Dave Hunter also adds some thoughts at the bottom.

Hi, Jorge –

Your question raises some important issues that are too deep to address well in email. But I will take a stab at some of them….

On 05/02/2011 10:08 AM, Jorge M Rocha wrote:

From what I understand, the modeling framework of an ERGM treats the

observed network as the _dependent variable_ and the specified

structural configurations and covariate information about nodes/edges as

_independent variables_ whose presence/absence increases the

log-likelihood for the model.

I think it is important to make some comment on this conceptual issue, because it lies at the core of one’s thinking about the modeling process. (In so doing I will speak ex cathedra — I trust that my colleagues will not be shy in administering correctives, should they find me to be in error.) In a standard ERGM without covariate effects, there are no independent variables (in a regression-like sense). The model is one large stochastic system, and is treated as fully endogenous. (Exogenous effects come about from the use of covariates, such as vertex attributes. Of course, these can also be endogenized, but that’s a broader model class.) More to the point, the act of writing down an ERGM is the act of writing down a model for the joint probability distribution of a graph (i.e., the system state). The exponential family form provides the language in which we express that model; the content and the interpretation must come from elsewhere.

Within this context, the role of statistics based on structural configurations is to parametrize the joint distribution of the edge structure. There are several formally equivalent ways to understand this. One natural one is to view the parameter associated with a given structural statistic as measuring the tendency of the expectation of that statistic to be higher or lower than what would be observed if the parameter in question were zero (equivalently, if it were not in the model). From this point of view, all ERGM parameters reflect different sorts of structural “biases” or “deviations” from the behavior of the uniform Bernoulli graph (i.e., the homogeneous Bernoulli graph in which every edge occurs with equal probability).

Note that this is a “holistic” view of the network as a jointly interacting system — if you are familiar with the Lagrangian or Hamiltonian formulations of classical mechanics (in which the behavior of a classical system is described by reference to a potential function), this may prove a useful analogy. In any event, it’s very different from the “elementwise” view of networks that many folks are used to.

For the latter, it can be helpful to think in terms of the full conditionals of the model — the probability of a given edge being present, given the rest of the structure. This is a “local/elementwise” view, perhaps loosely analogous to the way that classical mechanics is taught to beginners (with “forces” acting on particles, the conditional behavior of which is defined in terms of the other particle states). This comports better with the intuition of most social scientists, but IMHO has many underappreciated pitfalls. Chief among these is the fact that one can easily forget that the elements of the system are indeed interacting, and the conditional behavior of one variable is only meaningful within the context of the rest of the network. (For an example of where this intuition led the field astray, one need only look at the rise and fall of pseudo-likelihood techniques.) One can also easily confuse conditional edge probabilities with propensities of _individuals_ to create edges (e.g., in a choice context), with which they are _not_ always synonymous. I am thus less sanguine about the “local” view of ERGMs than some of my colleagues, although there is no question that this perspective _can_ be very helpful.

I want to stress that, properly understood, the holistic and elementwise views of the ERGM are both correct — they are two equivalent ways of describing the same thing. Nor are these the only ways in which the formalism can be validly interpreted. (For instance, one can also understand ERGMs in terms of the equilibrium behavior of certain types of Markov chains, an interpretation that has value in dynamic applications.) But what all valid interpretations of the ERGM framework have in common is that they recognize that one is modeling a system of components, and those components can potentially interact with one another. As such, my sense is that regression-based thinking can be as much a hindrance as a help when learning about ERGMs, at least for most social scientists. (Those whose familiarity with statistical exponential families allows them to understand ERGMs within the broader context of “regression-like” models are exempted, but I daresay that this is not standard training for most social scientists.) That said, I often teach it that way, because the initial approachability of ERGMs as “logistic network regression with independence relaxations” is higher than “potential functions on graphs” for my students. YMMV.

For an example of the use of the “local” view of ERGMs for model parametrization, I suggest the Goodreau et al. 2009 _Demography_ paper. For a different, “semi-local” view based on dependence among edges, one can find many good examples in the work of Pattison and Robins; their 2002 _Sociological Methodology_ article comes to mind. (And, of course, there is the classic Frank and Strauss _JASA_ paper, among others.) The “holistic” view per se is less well-explored, although there are many precursors in structuralist theory. (The Harary wing of structural balance, for instance, operates entirely from a holistic standpoint. This is also where I insert my obligatory plug for Bruce Mayhew.) Mark Handcock’s work on degeneracy (see, e.g., his 2003 chapter) has in my opinion very deep substantive implications from the holistic side, and there are some other recent technical contributions (e.g., a recent arXiv paper by Chatterjee and Diaconis) that are likewise promising. One could accumulate many more examples, of course.

In other words, what we want to achieve

in these models is to gain a better understanding of the types of

processes that might have gone into generating a network similar to the

observed one;

That is one use of the framework, and certainly an important one.

hypotheses are about this or that configuration or

node/edge attribute and their effect on the pattern of social

interactions represented in the network.

This is where we must be careful: a configuration does not “have” an effect on the network. One could say, however, that we hypothesize that our process of interest will result in networks in which certain configurations are over or underrepresented.

If so, how would I go about

testing hypotheses concerning the effects of social interactions on

nodal attributes (I’m particularly thinking of different types of

opinions individuals might tend to have based on their interactions)?

That depends on whether the attributes affect the network. If the network structure is not affected by the attributes in question (e.g., because the former evolves over much longer time scales than the latter), then one need not model the network at all: one can condition on the network, and model the attributes (or, if one just wants to test a marginal hypothesis, employ an even simpler method such as a permutation test).

If, on the time scale of interest, the network and the attributes both coevolve, then the ERGM language is not expressive enough: you need a joint attribute/network model. Such extensions do exist, although my sense is that it is difficult to identify such models without longitudinal data.

What I’ve done so far is kind of arguing backwards (or at least that is

how it feels to me –maybe because I’m coming from a traditional

linear/logit regression background, and probably part of my problem is

thinking in terms of DV and IV). If the coefficients are significant

for some node covariate of my interest in the ERGM estimation, then I’ve

been interpreting this as evidence that it is not unreasonable to argue

that the pattern of social interactions influences the type of opinion

individuals have (I know that if I had longitudinal data this issue of

reverse causality would not be so much of an issue, but so far all I

have is cross-sectional data). How wrong am I on this (the

backward-arguing)?

There may be some settings in which this could give you the right answer, but I would discourage it unless you can prove that you are in one of those settings. If you are dealing with an attribute and a network that are jointly endogenous, then conditioning on the latter and not the former is not going to do what you want. If you just want to establish marginal association (e.g., those who are in certain positions tend to have certain kinds of opinions), I would lean towards a permutation test or similar procedure. If you really want to pull things apart, estimate parameters, test more complex hypotheses, etc., my opinion is that you should look for a joint attribute/network model. (We don’t have one in statnet right now, but RSiena or pnet might.)

Finally, part of my worry is that I’ll be sending the research to a

consumer behavior/marketing journal where the familiarity with social

network stuff –let alone ERGM’s—is limited. Any suggestions on how to

best explain these issues to a non-specialized audience?

Well, my _personal_ advice in your case would be to look closely at your problem, and try to figure out how much of it you can address without using complex network models — it’s going to be much easier to explain, say, permutation or CUG tests to such a readership than ERGMs. If you really must model the system as a whole, first determine whether you can reasonably approximate one portion (the network or the attribute) as being fixed relative to the other. (Some basic knowledge of your case is important here: do you have access to ethnographic, observational, or other information that might help you evaluate the plausibility of a one-way effect?) If you _can_ approximate one portion as fixed, model accordingly (e.g., if the network is fixed, you may want something like a network autocorrelation model, while fixed attributes suggest an ERGM). If you are stuck with having to model both the network and individual attributes jointly, you have my sympathy — this is an exciting area of technical research, which generally implies a painful experience for the practitioner. Getting your hands on a reasonable cross-sectional network/attribute model is then a must, as is learning to use it properly (bearing in mind that, if the model is new, its creators may have only limited experience in that regard).

This email has become both long and polemical, but these are important and frequently encountered questions. Perhaps these comments will prove helpful to you, and to others here who are struggling with the same issues.

Good luck!

-Carter

Hello Jorge. One of the things Carter said reminded me of something: In contrasting the “holistic” and “elementwise” conceptions as he explains them below, it’s important to remember that every holistically-defined model leads to a corresponding set of full conditionals (the elementwise model), but the converse is not true! In other words, one cannot necessarily define a (holistic) probability distribution on networks merely by specifying a full set of conditional distributions of each edge given the rest of the network. I point this out because I *think* it’s at some level another reason that the usual dependent / independent variable interpretation is not applicable to the ERGM situation.

There is some further discussion of the dependent / independent question and a comparison with standard generalized linear models in Section 3 of http://www.stat.psu.edu/~dhunter/papers/ergmuserterms.pdf

Best,

Dave

_______________________________________________

statnet_help mailing list

statnet_help@u.washington.edu

http://mailman2.u.washington.edu/mailman/listinfo/statnet_help

Barnes in the 1950s who defined the size of a social network as a group of about 100 to 150 people.On the Web social sites such as Facebook MySpace and Twitter have expanded the concept to include a companys customers a celebritys fans and a politicians constituents see ….