The Promising Future of Mathematical Sociology

May 11, 2012

I strongly believe sociology, especially mathematical sociology, has an extremely promising future. The current trends in information technology clearly indicate a growth in quantitative modeling. Among other things, we are currently witnessing a tsunami of data from a globally-connected world (in fact, big data is the techno-geek buzzword), exponentially faster computing power (Markov Chain Monte Carlo simulations of complex models are now increasingly commonplace), and a rapid uptick in the volume and range of high-quality statistical programs (a great deal of which are open-source).

However, why would I think quantitatively-oriented sociologists are especially well-placed to gain from these structural developments?

The primary reason is that the underlying epistemology of modern quantitative sociology — grounded in complex predictive models, relational and nested data structures, and a folk-Bayesian approach to research design — represents the cutting edge and future direction of modeling in a shockingly vast array of fields. For example, multidimensional scaling, social network analysis, log-linear modeling, and finite mixture models (i.e., latent class analysis) are now at the forefront of disciplines ranging from machine learning to computational genetics (for example, see here, here, here, and here). However, most promising is the growing popularity of Bayesian multilevel models, which sociologists have in effect been using for several decades now. For instance, Bayesian multilevel models are now used by physicists to measure the mysterious properties of dark energy, geneticists to unlock the basic patterns of genomic population differentiation, and neuroscientists to describe the deepest structures of the brain. It is no exaggeration to claim that a human-level form of artificial intelligence, if it is ever developed, will probably be based on multilevel models of the type currently familiar to most quantitatively-oriented sociologists.

A secondary reason why the future looks so promising for mathematical sociology is that a vacuum has been created in the social sciences due to the rise of an alternative approach to quantitative modeling, frequently promoted by mainstream economists. According to this approach, the main goal of quantitative research is to estimate population-averaged causal effects, either by setting up a randomized (controlled) experiment or applying a small suite of techniques to observational data, such as instrumental variables regression, so-called “fixed” effects (rather than “random” effects) regression, difference-in-differences design, and so forth.

This approach is appealing because it promises the extraction of causal estimates with minimal theoretical insight, but it comes at enormous costs. For example, the assumptions of causality are rarely, if ever, satisfied for any particular model fit to observational data (as painfully but clearly outlined by the counterfactual model of causality, and evinced by the growing ranks of not-really-exogenous-but-we’ll-use-it-anyway instrumental variables). Furthermore, although it’s well-known randomized experiments are inferior to controlled experiments, the latter require strong theory that is often absent (and even then experiments in the social sciences often lack generalizability to other populations). Finally, an enormous amount of substantively-rich information is usually discarded when observational data are used primarily  for extracting causal estimates, so if we don’t believe our causal estimates then we’re left with a rather meager description of the data at hand (the worst offender is the so-called “fixed” effects technique, which can be viewed as a special case of a Bayesian multilevel model in which the groups are assumed, rather unreasonably, to have infinite variance between them).

Of course, both economics and sociology are large fields, encompassing a wide range of viewpoints, so I caution that my comments embody ideal types. Yet the dominant trends of a globally-networked world, combined with the rise of a distinctive approach to quantitative modeling popularized by economists, has created the conditions for a promising future for sociology more generally, and mathematical sociology in particular.


The Success of Stack Exchange: Crowdsourcing + Reputation Systems

May 3, 2012

You’ve heard me say it before… Crowdsourced websites like StackOverflow and Wikipedia are changing the world.  Everyone is familiar with Wikipedia, but most people still haven’t heard about the StackExchange brand question and answer sites.  If you look into their success, I think you’ll begin to see how the combination of crowdsourcing and online reputation systems is going to revolutionize academic publishing and peer-review.

Do you know what’s happened to computer programming since the founding of StackOverflow, the first StackExchange question and answer site?  It has become a key part of every programmer’s continuing education, and for many it is such an essential tool that they can’t imagine working a single day without it.

StackOverflow began in 2008, and since then more than 1 million people have created accounts, more than 3 million questions have been asked, and more than 6 million answers provided (see Wikipedia entry).  Capitalizing on that success, StackExchange, the company which started StackOverflow, has begun a rapid expansion into other fields where people have questions.  Since most of my readers do more statistics than programming, you might especially appreciate the Stack Exchange for statistics (aka CrossValidated).  You can start exploring at my profile on the site or check out this interesting discussion of machine learning and statistics.

How do the Stack Exchange sites work?

The four most common forms of participation are question asking, question answering, commenting, and voting/scoring.  Experts are motivated to answer questions because they enjoy helping, and because good answers increase their prominently advertised reputation score.  Indeed, each question, answer, and comment someone makes be voted up or down by anyone with a certain minimum reputation score.  Questions/answers/comments each have a score next to them, corresponding to their net-positive votes.  Users have an overall reputation score.  Answers earn their author 10 points per up-vote, questions earn 5, and comments earn 2.  As users gain reputation, they earn administrative privileges, and more importantly, respect in the community.  Administrative privileges include the ability to edit, tag, or even delete other people’s responses.  These and other administrative contributions also earn reputation, but most reputation is earned through questions and answers.  Users also earn badges, which focuses attention on the different types of contributions.
Crowdsourcing is based on the idea that knowledge is diffuse, but web technology makes it much easier to harvest distributed knowledge.  A voting and reputation system isn’t necessary for all forms of crowdsourcing, but as the web matures, we’re seeing voting and reputation systems being applied in more and more places with amazing results.
To name a handful the top of my head:
  • A couple of my friends are involved in a startup called ScholasticaHQ which is facilitating peer-review for academic journals, and also offers social networking and question and answer features.
  • The stats.stackexchange.com has an open-source competitor in http://metaoptimize.com/qa/ which works quite similarly.  Their open-source software can and is being applied to other topics.
  • http://www.reddit.com is a popular news story sharing and discussion site where users vote on stories and comments.
  • http://www.quora.com/ is another general-purpose question and answer site.

It isn’t quite as explicit, but internet giants like google and facebook are also based on the idea of rating and reputation.

A growing number of academics blog, and people have been discussing how people could get academic credit for blogging.  People like John Ioannidis are calling attention to how difficult it is to interpret the a scientific literature because of publication bias and other problems.  Of course thoughtful individuals have other concerns about academic publishing.  Many of these concerns will be addressed soon, with the rise of crowdsourcing and online reputation systems.