a case for single-blind review

January 23, 2013

(Cross posted from here)
When i was in grad school, at one of the academic meetings i regularly participate in, it became regular fare for 2 particular folks in my circles to engage in a prolonged debate about how we should overhaul the academic publishing system. This was so regular (i recall them having portions of this debate for 3 consecutive years over dinner) that the grad students in the bunch thought of this as a grenade in our back pockets we could toss into the fray if ever conversations took an unwelcome turn to the boring. I bring this up because there are lots of aspects of this process that i have quite a few thoughts on, but have never really formalized them too much more than is required for such elongated dinner conversations. And one particular aspect of that was raised on Facebook yesterday by a colleague – asking about the merits of single blind review. I started my answer there, but wanted to engage this a little more fully. So, i’m going to start a series of posts (not sure how many there will be at this point) on the publication/review process here, that i think could be interesting discussions. I hope others will chime in with opinions, questions, etc. These posts will likely be slightly longer than typical fare around here. I expect that some of my thoughts on these will be much more formulated than others.

So, let’s start with a case for single blind review. I think think there are quite a few merits to single blind review (for a few other takes, see here and here). I won’t presume to cover them all here, but i will get a start. Feel free to add others, or tell me i’m completely off my rocker in the comments. Read the rest of this entry »


The Success of Stack Exchange: Crowdsourcing + Reputation Systems

May 3, 2012

You’ve heard me say it before… Crowdsourced websites like StackOverflow and Wikipedia are changing the world.  Everyone is familiar with Wikipedia, but most people still haven’t heard about the StackExchange brand question and answer sites.  If you look into their success, I think you’ll begin to see how the combination of crowdsourcing and online reputation systems is going to revolutionize academic publishing and peer-review.

Do you know what’s happened to computer programming since the founding of StackOverflow, the first StackExchange question and answer site?  It has become a key part of every programmer’s continuing education, and for many it is such an essential tool that they can’t imagine working a single day without it.

StackOverflow began in 2008, and since then more than 1 million people have created accounts, more than 3 million questions have been asked, and more than 6 million answers provided (see Wikipedia entry).  Capitalizing on that success, StackExchange, the company which started StackOverflow, has begun a rapid expansion into other fields where people have questions.  Since most of my readers do more statistics than programming, you might especially appreciate the Stack Exchange for statistics (aka CrossValidated).  You can start exploring at my profile on the site or check out this interesting discussion of machine learning and statistics.

How do the Stack Exchange sites work?

The four most common forms of participation are question asking, question answering, commenting, and voting/scoring.  Experts are motivated to answer questions because they enjoy helping, and because good answers increase their prominently advertised reputation score.  Indeed, each question, answer, and comment someone makes be voted up or down by anyone with a certain minimum reputation score.  Questions/answers/comments each have a score next to them, corresponding to their net-positive votes.  Users have an overall reputation score.  Answers earn their author 10 points per up-vote, questions earn 5, and comments earn 2.  As users gain reputation, they earn administrative privileges, and more importantly, respect in the community.  Administrative privileges include the ability to edit, tag, or even delete other people’s responses.  These and other administrative contributions also earn reputation, but most reputation is earned through questions and answers.  Users also earn badges, which focuses attention on the different types of contributions.
Crowdsourcing is based on the idea that knowledge is diffuse, but web technology makes it much easier to harvest distributed knowledge.  A voting and reputation system isn’t necessary for all forms of crowdsourcing, but as the web matures, we’re seeing voting and reputation systems being applied in more and more places with amazing results.
To name a handful the top of my head:
  • A couple of my friends are involved in a startup called ScholasticaHQ which is facilitating peer-review for academic journals, and also offers social networking and question and answer features.
  • The stats.stackexchange.com has an open-source competitor in http://metaoptimize.com/qa/ which works quite similarly.  Their open-source software can and is being applied to other topics.
  • http://www.reddit.com is a popular news story sharing and discussion site where users vote on stories and comments.
  • http://www.quora.com/ is another general-purpose question and answer site.

It isn’t quite as explicit, but internet giants like google and facebook are also based on the idea of rating and reputation.

A growing number of academics blog, and people have been discussing how people could get academic credit for blogging.  People like John Ioannidis are calling attention to how difficult it is to interpret the a scientific literature because of publication bias and other problems.  Of course thoughtful individuals have other concerns about academic publishing.  Many of these concerns will be addressed soon, with the rise of crowdsourcing and online reputation systems.


Join the Section on Altruism, Morality and Solidarity

July 5, 2011

It needs only 6 new members to reach the critical threshold of 300. I think current members of the section are more open to mathematical sociology than most other sections. Furthermore, the topics are interesting and important. See the section mission statement here and a newsletter here.

Speaking of ASA sections, Jeremy posted some interesting data the other day: The percentage of each section which are students. In Mathematical sociology it is one quarter, which is lower than average.

Philip Cohen also had an interesting post a while ago about gender differences among sociologists where he graphs section membership by gender.


Ngrams of Social Science Disciplines

January 24, 2011

An “ngram” is an n-word phrase.  So, “human science” is a 2-gram.  If you’ve been living under a rock, you may not have heard about the latest gift from google – having scanned most published books in a number of major languages, they recently provided us the data, and a tool for easy visualization, of the relative popularity of words and phrases over time.  I thought I’d explore some terms of broad interest to sociologists with no particular idea about what I’d find.  Please take a look and help me interpret them.

Below you’ll find the relative frequency with which the major social scientific disciplines (plus psychology) are mentioned in books.  Let me explain the numbers on the Y-axis.  “psychology” is the most common word.  In 1950, it accounted for about 0.0034% of all words published.  In other words, google takes all the books published in a given year, and counts how many occurrences there are for each word.  Then it divides that number by the total number of words published.  There are many methodological considerations… for example, each book only counts once, regardless of how many copies are sold.

So what do we see?  Well, the rank order doesn’t really change over time.  Psychology gets the most mentions, then economics, sociology, anthropology and finally political science.  It’s tempting to interpret this as measuring the prominence of each discipline, but this isn’t a great test.  For starters, authors aren’t generally referring to the academic discipline when they use the word “psychology,” but they are when they use the phrase “political science.”  Sociology is probably between the two in terms of, “the share of word-mentions which actually refer to the academic discipline.”

I feel a bit more comfortable making inferences based on how each of these terms changes over time.  For example.  In in 1950, sociology received almost twice as many mentions as anthropology.  The situation was similar in 1980.  But in 1999, anthropology achieved parity with sociology, and they have been close to even in the decade since.  This appears to be evidence that anthropology gained prominence, relative to sociology, in the last half of the twentieth century.  Naturally, I don’t think we should put too much stock in this single measure of prominence.  We might want to look at trends in the number of students, and people working in each discipline.  We could count mentions in periodicals, citations to academic articles.  We could look to see how each word is used, and how much their usage changes over time.  Do these other measures corroborate, counter, or otherwise contextualize these trends?

I can’t give you easy access to all that data, but you can explore ngrams for yourself!

So readers, what do you see in this graph?  Care to nominate and discuss plausible/potentially useful and/or plainly dangerous assumptions that help us interpret these ngrams or lead us astray?


Teaching “The Wire”

October 26, 2010

We’ve had a number of posts here before about teaching.  Here’s a question I’ve seen debated a fair bit: does HBO’s, The Wire, have educational value?  In particular, can it be a productive part of a college course in sociology?

I’ve just begun watching the series myself, and yes, I am enjoying it.  But I didn’t need to watch it, or like it, to know that it could help teach.  In fact, it strikes me as curious that there would be public debate about whether The Wire could be curriculum.

Ishmael Reed thinks that the show reinforces stereotypes about blacks.  I’m not sure about that yet, one could argue exactly the opposite, but even if it this is true, what better way to address the stereotypes prevalent in popular culture, than to critique them in a class which also requires reading rigorous social science?

Some point out that one can learn a lot more facts  in an hour of reading about urban social problems than one can by watching one episode of a fictional television program.  I certainly agree, but there are a couple obvious responses: First, popular and critically acclaimed television may have an emotional impact is a valuable part of education, and in some ways cannot be matched by other content.  Second, watching The Wire need not replace reading peer-reviewed studies or other academic approaches.  I think it is quite plausible that one could expect students to do a typical reading load and have them watch some episodes of the Wire on top of that.  So for me there is no question about whether The Wire can be used in education, the only issue, but a very real one, is how it should be used.

Here’s what I find odd about this whole debate… how many unimportant and/or poorly taught classes are being taught this semester, in colleges all across the country?  Answer: lots.  How many articles do you see about important issues in higher education published in outlets like The Huffington Post, The Boston Globe, and The Washington Post?  Answer: few.  The reason this debate is prominent is that people want to read about The Wire, while pretending to read about important issues in education.  OK, that shouldn’t be a revelation for most people.  But it is worthwhile to have pointy headed types pointing out that that is what’s going on.

If, like me and millions of others, you can’t help but be interested in a somewhat silly debate about The Wire, William Julius Wilson and Anmol Chaddha defend their class in an op-ed for The Washington Post.


Professor Quality and Professor Evaluation

June 11, 2010

If you wanted to be more objective about student and professor evaluation, you would have standardized measures of student performance across professors.  In the rare case in which this is done, we learn all sorts of fascinating things, including things which raise questions about the unintended consequences of our evaluation systems.

Tyler Cowen points me to a paper in the Journal of Political Economy, by Scott E. Carrell and James E. West [ungated version].

In the U.S. Airforce Academy students are randomly assigned to professors but all take the same final exam.  What makes the data really interesting is that there are mandatory follow-up courses so you can see the relationship between which Calculus I professor you had, and your performance in Calculus II!  Here’s the summary sentence that Tyler quotes:

The overall pattern of the results shows that students of less experienced and less qualified professors perform significantly better in the contemporaneous course being taught.  In contrast, the students of more experienced and more highly qualified introductory professors perform significantly better in the follow-on courses.

Here’s a nice graph from the paper:

Student evaluations, unsurprisingly, laud the professors who raise performance in the initial course.  The surprising thing is that this is negatively correlated with later performance.  In my post on Babcock’s and Marks’ research, I touched on the possible unintended consequences of student evaluations of professors.  This paper gives new reasons for concern (not to mention much additional evidence, e.g. that physical attractiveness strongly boosts student evaluations).

That said, the scary thing is that even with random assignment, rich data, and careful analysis there are multiple, quite different, explanations.

The obvious first possibility is that inexperienced professors, (perhaps under pressure to get good teaching evaluations) focus strictly on teaching students what they need to know for good grades.  More experienced professors teach a broader curriculum, the benefits of which you might take on faith but needn’t because their students do better in the follow-up course!

But the authors mention a couple other possibilities:

For example, introductory professors who “teach to the test” may induce students to exert less study effort in follow-on related courses.  This may occur due to a false signal of one’s own ability or from an erroneous expectation of how follow-on courses will be taught by other professors.  A final, more cynical, explanation could also relate to student effort.  Students of low value added professors in the introductory course may increase effort in follow-on courses to help “erase” their lower than expected grade in the introductory course.

Indeed, I think there is a broader phenomenon.  Professors who are “good” by almost any objective measure, will have induced their students to put more time and effort into their course.  How much this takes away from students efforts in other courses is an essential question I have never seen addressed.  Perhaps additional analysis of the data could shed some light on this.

Carrell, S., & West, J. (2010). Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors Journal of Political Economy, 118 (3), 409-432 DOI: 10.1086/653808

Added: Jeff Ely has an interesting take: In Defense of Teacher Evaluations.

Added 6/17: Another interesting take from Forest Hinton.


I discovered an example of Stigler’s Law

May 30, 2010

from Wikipedia:

Stigler’s Law of Eponymy is a process proposed by University of Chicago statistics professor Stephen Stigler in his 1980 publication “Stigler’s law of eponymy”.[1] In its simplest and strongest form it says: “No scientific discovery is named after its original discoverer.” Stigler attributes its discovery to sociologist Robert K. Merton (which makes the law self-referencing).

So what is the example I discovered?*  I’ve often hear someone mention one, but never more than one, of the following: Campbell’s Law (1976), Goodhart’s Law (1975), and the Lucas Critique (1976).  Just recently I came across a reference to “Steve Kerr’s classic 1975 article, [On the Folly of Rewarding A While Hoping for B]”.

OK, so they aren’t identical (the Lucas critique probably has the most unique content), but they are all closely related.  I looked over each original article (except the one for Goodhart’s Law which doesn’t appear to be online) and none of them cite the other.  I’m sure that with some research one would find that any one of these alone would be an example of Stigler’s Law, that is, we could find very similar insights from someone like Adam Smith or J.S. Mill.

A couple questions:

1. Had the authors been exposed to the others’ similar ideas at the time they published theirs?  Did they make a conscious decision not to cite?  (Note that while the publications have dates, we have no idea when each author first had their idea.)

2. Let’s assume that none of them got their idea directly from one of the others.  What does it say that they each published their idea in 1975-76?  Is it unusual that multiple people had similar ideas, later recognized to be important, at the same time?  Or is it, as I suspect, merely unusual that all four different people are recognized?  I wish I could be chatting with Robert Merton about this.

* While I’m a little proud of the fact I noticed these similarities, and that I edited Wikipedia to link the first three ideas, I hope its obvious that my claim to having “discovered” this example is intentionally ironic.


Declining Standards in Higher Education

May 4, 2010

In a paper entitled, “Leisure College, USA”  Philip Babcock and Mindy Marks have documented dramatic declines in study effort since 1961, from 24 down to 14 hours per week.  This decline occurred at all different sorts of colleges and is not a result of students working for pay.

At the same time, colleges are handing out better gradesIn other work, Babcock presents strongly suggestive evidence that the two phenomena are related.  That is, lower grading standards lead to less studying.  They also lead students to give better course evaluations.

To me this looks like evidence of big problems in higher education, though I’d love someone to convince me otherwise.

Andrew Perrin has been a leader in developing an institutional response to concerns about grading.  See his original scatterplot post on the topic, “grades: inflation, compression, and systematic inequalities.” as well as the more recent scatterplot discussion.

ADDED 5/4:

Fabio at Orgtheory considers four possible explanations.  I’ll quote him:

  1. Student body composition – there are more colleges than before and even the most elite ones have larger class sizes.
  2. Technology – the Internet + word processing makes assignments much easier to do.
  3. Vocationalism – If the only reason you are in college is for a job, and this has been true for the modal freshman for decades now, you do the minimum.
  4. Grade inflation – ’nuff said.

To address them in reverse order.  Fabio thinks he can rule out grade inflation because even students in hard majors report studying less… I gather he’s arguing that have really tough (uninflated?) grading are studying less, then it seems arbitrary to posit one unnamed cause in those disciplines, and a separate cause (grade inflation) in the other discplines.  I’m not sure if that argument with that data are strong enough to convince me.  I’m not saying that grade inflation explains 100% of the change.  My guess is that it explains some of it, but that both phenomena have common and distinct causes.

Fabio’s favored explanations are vocationalism and technology.  I don’t really like either of them.  First, I don’t know that it’s true that those seeking more career oriented education do the minimum.  Second, as Fabio mentioned, they claim the dropoff is similar across courses of study (though I’m not sure how fine grained that data is).  As for the idea that technology makes studying more efficient, most of the decline in studying had already occurred by the mid-eighties, before email and the web.

A priori I would have predicted the effect was mostly explained by change in the composition of colleges and college students, but the authors claim that the trend was similar among highly competitive colleges.

Any other theories?

ADDED 5/5:

I should have mentioned this before.  The authors are analyzing different surveys with somewhat different methodologies and then attempting to make them comparable.  They lean pretty heavily on the 1961 Project Talent survey.  If that is, for whatever reason, an overestimate, the decline might be far less dramatic.  Ungated version of the paper here.

ADDED 5/6:

After a closer look at the paper, I don’t think the data is fine grained enough to show that today’s students that are similar to those who attended in 1961 (ie. privileged students at top schools) are studying less, or at least not much less.  Therefore one cannot rule out the theory that much/most of the decline is due to compositional change. I wish the authors had made their agreement/disagreement with my assessment more clear because I think it is of fundamental importance in interpreting the trend.

&rfrPhillip Babcock & Mindy Marks (2010). The Falling Time Cost of College: Evidence from Half a Century of Time Use Data NBER Working Paper (April) Other: 15954


Exploratory vs. Confirmatory Data Analysis

February 17, 2010

Seth Roberts vs. Andrew Gelman

In most respects they are actually in agreement (e.g. both think that visualizing data deserves more attention than it often gets) but Andrew focuses on points of disagreement (e.g. do statisticians drastically overvalue difficult but less useful research).  Highly recommended.


Assorted Links

February 2, 2010

Don’t trust the public use micro sample of the census for research on older Americans.

New standards for hiring and tenure?

Tiny Sketch of French Sociology.

Gently regulating academics’ speech with the bully pulpit?