forecasting poorly

March 23, 2013

(moderately tweaked excerpt from here)

How hard would it be to get ALL of the first round games in the NCAA men’s basketball tournament wrong? I mean, that would be pretty tough, right? Given that among the multiple millions of brackets submitted to ESPN this year, none got all the first round games right, it would seem hard to do the inverse too, right? So i’m thinking that next year i organize the “anti-confidence” NCAA pool. Instead of gaining points for every game you correctly predict, it’ll consist of losing points for every game you get right. I.e., your aim will be to incorrectly pick as many games as possible. It would seem easy to incorrectly pick the champ, final-four and even the elite 8. But my hunch is that people would even struggle to get all Sweet 16 teams wrong (see e.g., this year’s Kansas State, Wisconsin, La Salle, Ole Miss “pod”), and missing every team making the round of 32 would be almost impossible.

I think we’re going to have to put this to the test. Something like -1 point for every first round game right, -2 for round 2, -4 for sweet 16, -8 for elite 8, -16 for final 4 picks, -32 for final 4 winners and -64 for getting the champ right. Highest score (closest to zero) wins. How poorly do you think you could do?


Statistical Teaching (bleg)

November 20, 2012

Ok, in my research methods class, we are hitting an overview of statistics in the closing weeks of the semester. As such, i would prefer to include some empirical examples to visualize the things we’re going to talk about that are fun / outside my typical wheelhouse. So, do you have any favorite (read: typical, atypical, surprising, bizarre, differentially distributed, etc.) examples of univariate distributions and/or bivariate associations that may “stick” in their memories when they see them presented visually? I have plenty of “standard” examples i could draw from, but they’re likely bored with the one’s i think of first by this point in the term. So, what are yours? It’s fine if you just have the numbers, i can convert them to visualizations, but if you have visual pointers, all the better.

(cross posted)

how many, indeed?

October 23, 2012

From class to news to research question. So, this morning in class I taught an article using the network scale-up method. It’s a great technique that’s been used to explore a number of interesting questions (e.g., war casualties, and HIV/AIDS).

I came back from that class to this article pointing to a debate on voter ID laws, and I couldn’t help but think that there has to be a meaningful way to throw this method at this question to estimate plausible bounds for the actual potential impact of these laws. And furthermore, it seems especially important because people without IDs are likely quite hard to accurately enumerate on there own (as are those who’ve engaged in voter fraud).

So, has this study already been published and i just missed it? Else, does someone have the data we’d need for that? I’m hoping it’s a solved question, as i assume its something it would be better to have known a few months ago than a few weeks from now. Anywho, just puzzling over a salient question that linked together some events from my day.


The Success of Stack Exchange: Crowdsourcing + Reputation Systems

May 3, 2012

You’ve heard me say it before… Crowdsourced websites like StackOverflow and Wikipedia are changing the world.  Everyone is familiar with Wikipedia, but most people still haven’t heard about the StackExchange brand question and answer sites.  If you look into their success, I think you’ll begin to see how the combination of crowdsourcing and online reputation systems is going to revolutionize academic publishing and peer-review.

Do you know what’s happened to computer programming since the founding of StackOverflow, the first StackExchange question and answer site?  It has become a key part of every programmer’s continuing education, and for many it is such an essential tool that they can’t imagine working a single day without it.

StackOverflow began in 2008, and since then more than 1 million people have created accounts, more than 3 million questions have been asked, and more than 6 million answers provided (see Wikipedia entry).  Capitalizing on that success, StackExchange, the company which started StackOverflow, has begun a rapid expansion into other fields where people have questions.  Since most of my readers do more statistics than programming, you might especially appreciate the Stack Exchange for statistics (aka CrossValidated).  You can start exploring at my profile on the site or check out this interesting discussion of machine learning and statistics.

How do the Stack Exchange sites work?

The four most common forms of participation are question asking, question answering, commenting, and voting/scoring.  Experts are motivated to answer questions because they enjoy helping, and because good answers increase their prominently advertised reputation score.  Indeed, each question, answer, and comment someone makes be voted up or down by anyone with a certain minimum reputation score.  Questions/answers/comments each have a score next to them, corresponding to their net-positive votes.  Users have an overall reputation score.  Answers earn their author 10 points per up-vote, questions earn 5, and comments earn 2.  As users gain reputation, they earn administrative privileges, and more importantly, respect in the community.  Administrative privileges include the ability to edit, tag, or even delete other people’s responses.  These and other administrative contributions also earn reputation, but most reputation is earned through questions and answers.  Users also earn badges, which focuses attention on the different types of contributions.
Crowdsourcing is based on the idea that knowledge is diffuse, but web technology makes it much easier to harvest distributed knowledge.  A voting and reputation system isn’t necessary for all forms of crowdsourcing, but as the web matures, we’re seeing voting and reputation systems being applied in more and more places with amazing results.
To name a handful the top of my head:
  • A couple of my friends are involved in a startup called ScholasticaHQ which is facilitating peer-review for academic journals, and also offers social networking and question and answer features.
  • The has an open-source competitor in which works quite similarly.  Their open-source software can and is being applied to other topics.
  • is a popular news story sharing and discussion site where users vote on stories and comments.
  • is another general-purpose question and answer site.

It isn’t quite as explicit, but internet giants like google and facebook are also based on the idea of rating and reputation.

A growing number of academics blog, and people have been discussing how people could get academic credit for blogging.  People like John Ioannidis are calling attention to how difficult it is to interpret the a scientific literature because of publication bias and other problems.  Of course thoughtful individuals have other concerns about academic publishing.  Many of these concerns will be addressed soon, with the rise of crowdsourcing and online reputation systems.

The Credibility Revolution in Econometrics

May 13, 2010

Angrist and Pischke are on a tear.  They’re bringing econometrics to the masses with their new book, and the editors of the Journal of Economic Perspectives have seen fit to publish a debate around their article assessing the state of econometrics.  A&P claim, and I more or less agree, that microeconometrics has undergone an inspiring “credibility revolution.”

The best summary I’ve found of their article is by Austin Frakt, here.  Arnold Kling comments here.  Andrew Gelman reviewed their textbook positively and constructively here.

Angrist’s website gave ungated links to most of the comments on his paper:

Michael KeaneEdward LeamerAviv Nevo and Michael WhinstonChristopher Sims, and James Stock

Added 6/3:

Austin Frakt reviews the Mostly Harmless Econometrics.

Mostly Harmless Econometrics has a blog!

Revision Control Statistics Bleg

April 21, 2010

revision control system, for those with even less programming experience than myself, manages “changes to documents, programs, and other information stored as computer files.”  The most advanced ones are used by teams of programmers who simultaneously edit the same code.   Simpler revision control is built in to things like wikis and word processors.

I’m wondering whether a revision control system would be helpful for me now, or in the future, even if all I’m doing is statistics.

I’m working with a big dataset (ok Scott, not that big) and I’ve written a fair bit of code.  Nothing too complicated, it is half data preparation, and half analysis and graphics.  Every so often I save my code under a new name, that way, if I accidentally save bad changes, I can always revert to a previous state.  I do the same thing with the dataset itself, and, in R, with my workspace.  In fact, I have an extra reason to do this with the data and my R workspace: memory management.  R often complains that  its running out of memory so I respond by deleting variables that I probably won’t need or could recreate without too much trouble.

It is sometimes annoying to find code that I wrote simply because there is a lot of text to go through.  I can only organize it one way, e.g. I could put all the code that makes graphs together, but then the code that makes graphs wouldn’t be placed next to the code that creates the data the graphs are based on.

Is a revision control system overkill for what I’m doing?  Any other thoughts?

Exploratory vs. Confirmatory Data Analysis

February 17, 2010

Seth Roberts vs. Andrew Gelman

In most respects they are actually in agreement (e.g. both think that visualizing data deserves more attention than it often gets) but Andrew focuses on points of disagreement (e.g. do statisticians drastically overvalue difficult but less useful research).  Highly recommended.

Influential Statisticians

January 17, 2010

Most quantitative social scientists, myself included, master particular statistical techniques, but have limited understanding of the breadth or history of statistical practice.  Academic specialization is necessary, but sometimes we could learn a lot by taking a broader view.  I found it interesting to learn a little more about some of the most influential statisticians and their contributions in this article by Daniel Wright: “Ten Statisticians and their Impacts for Psychologists.”

Though I enjoyed Wright’s piece, one thing I felt was missing was a connection to philosophy and sociology of science.  What are the goals of empirical research in the social sciences?  How have the methods these statisticians invented changed social science, and science more generally?

I recommend Seth Roberts’ comments here and Andrew Gelman’s comments here.

Wright, D. (2009). Ten Statisticians and Their Impacts for Psychologists Perspectives on Psychological Science, 4 (6), 587-597 DOI: 10.1111/j.1745-6924.2009.01167.x

The Gendered “Last-Naming-Gap”

December 16, 2009

Are men more likely to be given the privilege of being identified by last name only?  Thorkelson, writes another thoughtful post with a literary quality I dare not try to emulate.  You should really read his post because I’m not summarizing it.  I’m just replying to one little piece of it, and then explaining how one would approach this question using regression.  The rest of my post also appears as a comment at Eli’s blog and I encourage readers to make comments on both sites:

Thorkelson, having read your post, I find the basic claim about men being privileged by the practice of identifying them using solely their last name to be very plausible, e.g. “Sartre” vs. “Simone de Beauvoir.”  This is not something I had thought about much before either, and I agree that we should prefer that we had already observed and considered this issue without someone else bringing it to our attention.

That said, I don’t think you should chastise yourself for some skepticism.  When someone makes a claim, we should ALWAYS be skeptical.  Yes, we should be on the lookout for motivated skepticism.  But I’d prefer that we increase skepticism of claims that justify our high status rather than decrease skepticism of claims that we benefit from unearned privilege.

As a quantitative sociologist, I think we could learn a great deal with a statistical approach.  I can think of a lot of things I’d want to consider, but one of them would be to account for the prominence of the professor.  More prominent professors are probably more likely to be referred to by last name only.  More prominent professors are more likely to be men.  What happens to the pattern when google hits and citations are accounted for?  Perhaps naming is also related to behaviors/personality and behaviors/personality differ across gender.

At the risk of being pedantic I’ll give a very simple explanation of how we’d approach this problem using regression.  First we identify our outcome of interest, e.g. the percentage of the time someone is referred to by last name only, first name only, or full name.  Then we identify things that might predict this outcome, e.g. gender, citations, field, popularity of name, personality, etc.  We run our data through the machinery of regression (important details omitted) and this gives us an equation for predicting the outcome.  The “Last-Naming-Gap” is equal to the coefficient on the gender predictor.  This will vary depending on which other predictors we include.  If we include the right variables, the coefficient may go to zero, in which case we might say we “explained” the “last-naming-gap”?  Note this would NOT disprove discrimination.  For example, it is quite possible that the gender differences in citations that we are using to “explain” the last-naming-gap, are themselves partially the result of gender bias.  And a bias that results in citation differences would be more obviously consequential than a bias that results in a different use of language.  Anyway, thanks for the interesting post Eli.

The Frog Pond Effect in Schools

December 1, 2009

Appearing in the same issue of the ASR as the Condron article I previously discussed, Robert Crosnoe publishes evidence that lower income students suffer some negative academic and psychosocial consequences from attending higher income schools.  He uses propensity score weighting (no silver bullet, but probably the best methodology you could ask for with this data) in an attempt attempt to reduce possible confounding due to different students selecting into different schools.  Putting that issue aside, my question is, how are students’ academic and psychosocial outcomes changing over time?

Most of the outcomes Crosnoe uses (GPA, negative self-image, social isolation, depression) are measured more than once.  He is predicting the later measure, which is appropriate, but why not run models with the lagged dependent variable?