facebook has a soul?*

i genuinely hope to get back to more (semi-)regular blogging here soon. But, in the meantime, in case you haven’t seen this one yet – here‘s a wild potential data release that may interest some of you. (ht BW)

*it’s highly possible i saw this same title in someone else’s mention of this elsewhere today. but if so, i can’t for the life of me recall where.


5 Responses to facebook has a soul?*

  1. Scott Golder says:

    I would be very wary of using this data for anything of serious academic interest.

    It’s not clear how the data were collected, and we don’t know how representative a sample it is. There seems to be what I call a “bigness” axiom among some people doing research on the internet, that as N increases, it matters less and less how you sampled. Unfortunately, this is only true when N is the entire population and non/undersampling of some subgroups is a real problem.

    The second is the ethics. There are no “public” facebook profiles (or, were not until very recently). As such, you have to authenticate as _someone_ to see anyone. It seems like the collector might have employed some security exploit to get this data.

    This dataset seems pretty toxic. Wouldn’t touch it.

  2. Every Facebook profile has a “public listing” version that is accessible by anyone, whether or not they are logged into Facebook. This is the publicly available version of the Facebook profile you get if you just Google onto the profile without being logged in. This public listing usually has the person’s picture, friend list, and a listing of the pages they are a “fan” of. Some people chose to reveal more in their public listings; it may be possible to reveal less, but very few do. From the description of this data set, it sounds like it most likely was assembled from exactly this publicly available information. That would make it both ethical and representative.

    As to the “bigness axiom”–while it is certainly not true in general, it might have some merit. Simply the size of a sample is of course not enough to make it representative on its own. However, I’ve seen a few comp sci papers that try to demonstrate that very large crawls of online social networks converge towards being representative, though the most highly connected nodes continue to be somewhat overrepresented (and obviously the sample doesn’t pick up unconnected ones). The papers weren’t entirely persuasive, largely because they used quite poor techniques to measure representativeness, but I would love to see something like this tested by sociologists. Since small world networks have very small diameters, it seems plausible that just crawling them for long enough would produce something like representative coverage.

    (Please excuse the lack of references–my computer is out of commission at the moment).

  3. Michael Bishop says:

    Facebook users aren’t representative of the U.S., and Americans aren’t representative of the world. That doesn’t mean that its not worthwhile to study Facebook users or Americans… that said, it is a major aid to interpretation when you can clearly identify the population you’re sampling from. Research questions vary in how sensitive they are to such uncertainty.

  4. Mike, you are right, I wasn’t being clear: I meant that large crawls converge towards being representative of the network being crawled, not towards being representative of the United States in general.

    Also, to make your assertion stronger, I would go as far as to say that being representative of the United States is not a theoretically justified goal in a lot of research. If what is being tested is a general theory, and not a factual assertion about some feature or another of the population of the United States, why focus the findings on the United States? Even if the findings could be representative of the whole planet, that would only be a justified goal if the theory asserted something about all currently living people (and not, e.g., people in general). Truly general theory would require a representative sample from a population of everyone who has ever been alive, or will ever be alive.

    Since this is impossible, generality cannot be statistically verified. Instead, two approaches are possible: generality can either be theoretically imputed, or it can be shifted as an a secondary component of a theory. The latter approach is the one taken by social psychology to justify conducting research largely on a body of undergraduates: the results of successful findings are said to demonstrate the existence of certain social psychological *mechanisms*. If later studies cannot reproduce the same findings on non-undergraduate populations, then the mechanism is simply considered less widespread. The original study then becomes less interesting, not wrong. This approach allows social psychology to (in my view) be more successful in constructing a body of theory than much of sociology.

    I am not advocating that sociology as a whole should go the soc psych route. On the contrary, a lot of phenomena that sociologists are interested in would not be found in such a fixed population–diversity and cultural difference are too important for sociological theory. But diversity is just that: it is not the same goal as representativeness. A large and diverse study population should be enough. This brings me full circle to the “bigness axiom” again. Sure, Facebook studies are not representative of the US or the World. But I don’t think that makes them any less appropriate for testing sociological theory.

  5. Michael Bishop says:

    good points Andrei

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: