Eigenvector Measures of Centrality

April 25, 2011

I’m working with the social networks in the Longitudinal Study of Adolescent Health.  Students are asked to name their five best male friends and five best female friends.  I’m interested in something like a measure of popularity.  In-degree, the number of times others nominate you as their friend, is a simple measure, but I think I can do a bit better if I can capture the intuition that people with popular friends are themselves more popular.  This is one potential use of eigenvector measures of centrality.  In working with such measures, I’m learning a thing or two.  For example, the weight parameter can matter a lot.

I used the igraph package (see p. 8), in R to calculate Bonacich’s alpha measure of centrality for directed networks.  The default weight (variable: acent) is 1.  I compared this measure to in-degree (idgx2) and out-degree (odgx2) with a matrix of scatterplots and was a bit surprised not to see a clear positive relationship with in-degree.

When I tried alpha weights of 0.2 and 0.4  I found fairly strong non-linear relationships due to a handful of outliers.  While I think different alpha weights are worth exploring empirically, I’m inclined to emphasize ones which a positive monotonic relationship with in-degree.  The reason is that, to me, in-degree itself seems like a fairly good measure of popularity or social prominence.  I feel that moving to a measure quite different from in-degree requires justification in the form of strong theory or empirics.  I lack both.

In other contexts though, higher or lower (including negative) alpha weights might be justified.  For more on applying these measures to social networks, I recommend the work of Phillip Bonacich.

Ngrams of Social Science Disciplines

January 24, 2011

An “ngram” is an n-word phrase.  So, “human science” is a 2-gram.  If you’ve been living under a rock, you may not have heard about the latest gift from google – having scanned most published books in a number of major languages, they recently provided us the data, and a tool for easy visualization, of the relative popularity of words and phrases over time.  I thought I’d explore some terms of broad interest to sociologists with no particular idea about what I’d find.  Please take a look and help me interpret them.

Below you’ll find the relative frequency with which the major social scientific disciplines (plus psychology) are mentioned in books.  Let me explain the numbers on the Y-axis.  “psychology” is the most common word.  In 1950, it accounted for about 0.0034% of all words published.  In other words, google takes all the books published in a given year, and counts how many occurrences there are for each word.  Then it divides that number by the total number of words published.  There are many methodological considerations… for example, each book only counts once, regardless of how many copies are sold.

So what do we see?  Well, the rank order doesn’t really change over time.  Psychology gets the most mentions, then economics, sociology, anthropology and finally political science.  It’s tempting to interpret this as measuring the prominence of each discipline, but this isn’t a great test.  For starters, authors aren’t generally referring to the academic discipline when they use the word “psychology,” but they are when they use the phrase “political science.”  Sociology is probably between the two in terms of, “the share of word-mentions which actually refer to the academic discipline.”

I feel a bit more comfortable making inferences based on how each of these terms changes over time.  For example.  In in 1950, sociology received almost twice as many mentions as anthropology.  The situation was similar in 1980.  But in 1999, anthropology achieved parity with sociology, and they have been close to even in the decade since.  This appears to be evidence that anthropology gained prominence, relative to sociology, in the last half of the twentieth century.  Naturally, I don’t think we should put too much stock in this single measure of prominence.  We might want to look at trends in the number of students, and people working in each discipline.  We could count mentions in periodicals, citations to academic articles.  We could look to see how each word is used, and how much their usage changes over time.  Do these other measures corroborate, counter, or otherwise contextualize these trends?

I can’t give you easy access to all that data, but you can explore ngrams for yourself!

So readers, what do you see in this graph?  Care to nominate and discuss plausible/potentially useful and/or plainly dangerous assumptions that help us interpret these ngrams or lead us astray?