a case for single-blind review

January 23, 2013

(Cross posted from here)
When i was in grad school, at one of the academic meetings i regularly participate in, it became regular fare for 2 particular folks in my circles to engage in a prolonged debate about how we should overhaul the academic publishing system. This was so regular (i recall them having portions of this debate for 3 consecutive years over dinner) that the grad students in the bunch thought of this as a grenade in our back pockets we could toss into the fray if ever conversations took an unwelcome turn to the boring. I bring this up because there are lots of aspects of this process that i have quite a few thoughts on, but have never really formalized them too much more than is required for such elongated dinner conversations. And one particular aspect of that was raised on Facebook yesterday by a colleague – asking about the merits of single blind review. I started my answer there, but wanted to engage this a little more fully. So, i’m going to start a series of posts (not sure how many there will be at this point) on the publication/review process here, that i think could be interesting discussions. I hope others will chime in with opinions, questions, etc. These posts will likely be slightly longer than typical fare around here. I expect that some of my thoughts on these will be much more formulated than others.

So, let’s start with a case for single blind review. I think think there are quite a few merits to single blind review (for a few other takes, see here and here). I won’t presume to cover them all here, but i will get a start. Feel free to add others, or tell me i’m completely off my rocker in the comments. Read the rest of this entry »


The Success of Stack Exchange: Crowdsourcing + Reputation Systems

May 3, 2012

You’ve heard me say it before… Crowdsourced websites like StackOverflow and Wikipedia are changing the world.  Everyone is familiar with Wikipedia, but most people still haven’t heard about the StackExchange brand question and answer sites.  If you look into their success, I think you’ll begin to see how the combination of crowdsourcing and online reputation systems is going to revolutionize academic publishing and peer-review.

Do you know what’s happened to computer programming since the founding of StackOverflow, the first StackExchange question and answer site?  It has become a key part of every programmer’s continuing education, and for many it is such an essential tool that they can’t imagine working a single day without it.

StackOverflow began in 2008, and since then more than 1 million people have created accounts, more than 3 million questions have been asked, and more than 6 million answers provided (see Wikipedia entry).  Capitalizing on that success, StackExchange, the company which started StackOverflow, has begun a rapid expansion into other fields where people have questions.  Since most of my readers do more statistics than programming, you might especially appreciate the Stack Exchange for statistics (aka CrossValidated).  You can start exploring at my profile on the site or check out this interesting discussion of machine learning and statistics.

How do the Stack Exchange sites work?

The four most common forms of participation are question asking, question answering, commenting, and voting/scoring.  Experts are motivated to answer questions because they enjoy helping, and because good answers increase their prominently advertised reputation score.  Indeed, each question, answer, and comment someone makes be voted up or down by anyone with a certain minimum reputation score.  Questions/answers/comments each have a score next to them, corresponding to their net-positive votes.  Users have an overall reputation score.  Answers earn their author 10 points per up-vote, questions earn 5, and comments earn 2.  As users gain reputation, they earn administrative privileges, and more importantly, respect in the community.  Administrative privileges include the ability to edit, tag, or even delete other people’s responses.  These and other administrative contributions also earn reputation, but most reputation is earned through questions and answers.  Users also earn badges, which focuses attention on the different types of contributions.
Crowdsourcing is based on the idea that knowledge is diffuse, but web technology makes it much easier to harvest distributed knowledge.  A voting and reputation system isn’t necessary for all forms of crowdsourcing, but as the web matures, we’re seeing voting and reputation systems being applied in more and more places with amazing results.
To name a handful the top of my head:
  • A couple of my friends are involved in a startup called ScholasticaHQ which is facilitating peer-review for academic journals, and also offers social networking and question and answer features.
  • The stats.stackexchange.com has an open-source competitor in http://metaoptimize.com/qa/ which works quite similarly.  Their open-source software can and is being applied to other topics.
  • http://www.reddit.com is a popular news story sharing and discussion site where users vote on stories and comments.
  • http://www.quora.com/ is another general-purpose question and answer site.

It isn’t quite as explicit, but internet giants like google and facebook are also based on the idea of rating and reputation.

A growing number of academics blog, and people have been discussing how people could get academic credit for blogging.  People like John Ioannidis are calling attention to how difficult it is to interpret the a scientific literature because of publication bias and other problems.  Of course thoughtful individuals have other concerns about academic publishing.  Many of these concerns will be addressed soon, with the rise of crowdsourcing and online reputation systems.


summer reading bleg

June 27, 2011

So, i haven’t posted here in seemingly forever (and it seems that each of my last few posts start off with a similar preamble), but I have a query, and figured why not send it out to the ole world wide web to see if i get any nibbles.

The following is a list of a chunk of the books i was aiming to (re-)read this summer  (those i brought with me on my first stint away from home anyway). You’ll likely notice a strong theme given my full(er) investment in an ongoing sociology of science project on the structuring of problem-based interdisciplinary fields (but there are a smattering of others that I’ve merely wanted to read and/or revisit for a while).

The request is simple – if there’s one (or even better, more) books on this list that you happen to be tackling this summer as well and you’d be up for throwing a few ideas back and forth via email/phone/skype/whatever, give me a shout. Read the rest of this entry »


Ngrams of Social Science Disciplines

January 24, 2011

An “ngram” is an n-word phrase.  So, “human science” is a 2-gram.  If you’ve been living under a rock, you may not have heard about the latest gift from google – having scanned most published books in a number of major languages, they recently provided us the data, and a tool for easy visualization, of the relative popularity of words and phrases over time.  I thought I’d explore some terms of broad interest to sociologists with no particular idea about what I’d find.  Please take a look and help me interpret them.

Below you’ll find the relative frequency with which the major social scientific disciplines (plus psychology) are mentioned in books.  Let me explain the numbers on the Y-axis.  “psychology” is the most common word.  In 1950, it accounted for about 0.0034% of all words published.  In other words, google takes all the books published in a given year, and counts how many occurrences there are for each word.  Then it divides that number by the total number of words published.  There are many methodological considerations… for example, each book only counts once, regardless of how many copies are sold.

So what do we see?  Well, the rank order doesn’t really change over time.  Psychology gets the most mentions, then economics, sociology, anthropology and finally political science.  It’s tempting to interpret this as measuring the prominence of each discipline, but this isn’t a great test.  For starters, authors aren’t generally referring to the academic discipline when they use the word “psychology,” but they are when they use the phrase “political science.”  Sociology is probably between the two in terms of, “the share of word-mentions which actually refer to the academic discipline.”

I feel a bit more comfortable making inferences based on how each of these terms changes over time.  For example.  In in 1950, sociology received almost twice as many mentions as anthropology.  The situation was similar in 1980.  But in 1999, anthropology achieved parity with sociology, and they have been close to even in the decade since.  This appears to be evidence that anthropology gained prominence, relative to sociology, in the last half of the twentieth century.  Naturally, I don’t think we should put too much stock in this single measure of prominence.  We might want to look at trends in the number of students, and people working in each discipline.  We could count mentions in periodicals, citations to academic articles.  We could look to see how each word is used, and how much their usage changes over time.  Do these other measures corroborate, counter, or otherwise contextualize these trends?

I can’t give you easy access to all that data, but you can explore ngrams for yourself!

So readers, what do you see in this graph?  Care to nominate and discuss plausible/potentially useful and/or plainly dangerous assumptions that help us interpret these ngrams or lead us astray?


I discovered an example of Stigler’s Law

May 30, 2010

from Wikipedia:

Stigler’s Law of Eponymy is a process proposed by University of Chicago statistics professor Stephen Stigler in his 1980 publication “Stigler’s law of eponymy”.[1] In its simplest and strongest form it says: “No scientific discovery is named after its original discoverer.” Stigler attributes its discovery to sociologist Robert K. Merton (which makes the law self-referencing).

So what is the example I discovered?*  I’ve often hear someone mention one, but never more than one, of the following: Campbell’s Law (1976), Goodhart’s Law (1975), and the Lucas Critique (1976).  Just recently I came across a reference to “Steve Kerr’s classic 1975 article, [On the Folly of Rewarding A While Hoping for B]”.

OK, so they aren’t identical (the Lucas critique probably has the most unique content), but they are all closely related.  I looked over each original article (except the one for Goodhart’s Law which doesn’t appear to be online) and none of them cite the other.  I’m sure that with some research one would find that any one of these alone would be an example of Stigler’s Law, that is, we could find very similar insights from someone like Adam Smith or J.S. Mill.

A couple questions:

1. Had the authors been exposed to the others’ similar ideas at the time they published theirs?  Did they make a conscious decision not to cite?  (Note that while the publications have dates, we have no idea when each author first had their idea.)

2. Let’s assume that none of them got their idea directly from one of the others.  What does it say that they each published their idea in 1975-76?  Is it unusual that multiple people had similar ideas, later recognized to be important, at the same time?  Or is it, as I suspect, merely unusual that all four different people are recognized?  I wish I could be chatting with Robert Merton about this.

* While I’m a little proud of the fact I noticed these similarities, and that I edited Wikipedia to link the first three ideas, I hope its obvious that my claim to having “discovered” this example is intentionally ironic.


assessing peer review

November 13, 2009

Here’s an interesting paper (may require login) from the Journal of the Royal Society of Medicine. From the abstract:

Design 607 peer reviewers at the BMJ were randomized to two intervention groups receiving different types of training (face-to-face training or a self-taught package) and a control group. Each reviewer was sent the same three test papers over the study period, each of which had nine major and five minor methodological errors inserted.
Results The number of major errors detected varied over the three papers.The interventions had small effects. At baseline (Paper 1) reviewers found an average of 2.58 of the nine major errors, with no notable difference between the groups.The mean number of errors reported was similar for the second and third papers, 2.71 and 3.0, respectively. Biased randomization was the error detected most frequently in all three papers, with over 60% of reviewers rejecting the papers identifying this error. Reviewers who did not reject the papers found fewer errors and the proportion finding biased randomization was less than 40% for each paper.

The thing is, i am having a relatively difficult time convincing myself that the comparison they made is the interesting one. When reviewing a paper, are we really ever looking for all of the errors in the piece or just enough to sufficiently determine whether to accept/reject the article? So, how interesting is the difference in the “number of errors found” among those who rejected the paper? To me, not very. This doesn’t undermine their conclusion:

Conclusions Editors should not assume that reviewers will detect most major errors, particularly those concerned with the context of study. Short training packages have only a slight impact on improving error detection.

My question is do you find the question interesting, or would you have sliced the data a different way?

(HT: Michelle Poulin)