Data driven decision making: Netflix or Blockbuster?
Nov 25th, 2008 by Luke Shepard

My wife and I have been Netflix subscribers for years, during which we have rented hundreds of movies. We are considering a switch to Blockbuster, but one of the holdups has been that Blockbuster supposedly only has best sellers, while Netflix has lots of niche and foreign movies that make it more attractive. Then I realized it doesn’t really matter what the selection is in the abstract; what matters is, are the movies we want available? So I wrote a quick Perl script to help answer that question. It was fun so I thought I’d share my methodology and results.

  1. First download my Netflix account history to an HTML file: https://www.netflix.com/RentalActivity?all=true
  2. Take out the movies from the HTML file
grep "http://www.netflix.com/Movie" RentalActivity.html
  | sed "s|.*Movie/||" | sed "s|/.*||"
  | sed "s/_/ /g" > netflix_history

Then, go to the Blockbuster search page, and figure out their search endpoint. Blockbuster doesn’t have an API like Netflix does, so we have to scrape their page. Since we’re looking for a relatively simple answer, this is not bad. Playing around, I can get the answer with this command:

curl "http://www.blockbuster.com/search/movie/movies
  ?keyword=The Sopranos Season 2 Disc 2"
  | grep "results containing"

I figured out the regular expression, and whipped up a quick perl script that pulled the number of results, and the title of the first result.


while ($title = <STDIN>) {
  chomp $title;

  $date = `date`;
  print STDERR $date . "   " . $title . "\n";
  $url = "http://www.blockbuster.com/search/movie/movies?keyword=". $title;
  $result = `curl "$url"`;

  $num = -1;
  if ($result =~ m/(\d+)&nbsp;results containing/) {
    $num = $1;
  $new_title = '';
  if ($result =~ m|<dt class="titleInfo">.*?<a href="/catalog/movieDetails/\d+" title="(.*?)"|) {
    $new_title = $1;
  print $title . "\t" . $num . "\t" . $new_title . "\n";
  sleep 15;

In an attempt not to get shut down by any rate limiters, I only did one query every 15 minutes.

After getting the data this morning, I loaded it into Excel and did some manual scrubbing. Sometimes it was wrong; occasionally I’d get back 0 results even though such an entry did exist. So I manually ran about 20 or 30 searches on the few remaining items, just to make sure everything was accurate.

The net result: only eight out of our 327 movies was not available in Blockbuster. This was mostly composed of the Up Series, which is an old British documentary dating from the 60s, so I’m not terribly surprised. The remaining few missing movies were:

Besides those, Blockbuster had them all. They had all our seasons of Freaks and Geeks, Buffy, Sopranos, Angel, Gilmore Girls, Sex and the City, Six Feet Under. They had The Clan of the Cave Bear, The End of Suburbia, “sex lies and videotape”, Yo Soy Boricua Pa Que Tu Lo Sepas, Uchicago’s own Proof.

So in short, I think I’m switching to Blockbuster. Here’s to data-driven decision making.


An Open Stack glossary for Facebook developers
Nov 21st, 2008 by Luke Shepard

When I was in college, I remember learning how to use Linux. It confused me horribly that different computers window managers would behave in different ways. First off, coming from Windows, I didn’t even know that the window manager and the operating system were distinct concepts - I assumed they just belonged together. Linux empowered greater modularity and flexibility, but it was also quite daunting for a first time user who had to learn all these new abstraction layers that I didn’t even know existed.

The “Open Stack” in social software sometimes feels the same way. There are a lot of terms bandied about that can be quite confusing. OpenID, OAuth, XFN, PortableContacts, OpenSocial, XRDS-Simple, Yadis, … what are these and how do they fit together? Isn’t OpenID just a url thing for geeks?

Facebook application development is similar to the all-included package. You still have the different concepts of identity, authorization, contacts, activities, but they are unified together in a cohesive interface that is easy to understand.

But the Facebook stack and the “Open Stack” are really analogous, just with different names for things. So here is your Open Stack Glossary for the Facebook Developer.

Your identity is your user ID

Identity is what lets an application say “yep, I’ve seen you before.” It’s some sort of identifier that tells them who you are. On Facebook, it’s your user ID. In the “open stack”, it’s your OpenID. Now, since Facebook controls the identity space, user IDs are simple 32-bit ints. Because there is no central authority for OpenID, they needed a more creative way to avoid conflicts, so the OpenID community borrowed from DNS to ensure uniqueness. But whether I say “I am Facebook user 2901279″ or “I am Yahoo user http://openid.yahoo.com/id/XRXRXRXR”, it’s ultimately the same thing. It provides a key into a database on the application side. (well, relying party)

OpenID Translations

OpenID identifier Facebook user ID
OpenID relying party Facebook application
OAuth identity provider Facebook

A Facebook session key is your authorization

Identity isn’t all you get with Facebook. The reason that people develop apps for Facebook is because they want data and distribution, both of which require user permission. And once you have a session key, you can make plenty of API calls. Getting that session key in a secure way is a sort of a pain though- you have to redirect the user back and forth a few times, make sure that everything is signed, handle failure appropriately. The “Facebook Auth” mechanism was developed in 2006, after studying how other websites did it. Eventually, after Flickr, Amazon, Facebook, and others had all reproduced the same series of redirects and signatures, OAuth came along to standardize the means of obtaining a token.

OAuth Translations

OAuth consumer key Facebook api key
OAuth consumer secret Facebook api secret
OAuth request token Facebook auth token
OAuth access token Facebook session key
OAuth token secret Facebook session secret

friends.get() gets all your friends.

Okay, so we have an identifier to key off, and a session key. What can you do with the session key? The current state of the world is, it’s up to the service provider. Google lets you use your OAuth token to make service calls in their own format. Flickr does the same thing. But now we’re seeing some layering on top of that, starting with friends.get().

PortableContacts is a brand new evolving standard that allows a developer, armed with an OAuth token, to fetch information about a bunch of a user’s contacts. In Facebook, this is equivalent to a friends.get() call, followed by users.getInfo() - or better, an FQL call like this:

select uid, name, ... from user where uid in (select uid2 from friend where uid1 = %d)

PortableContacts returns like this:

Of course, the quality of the contacts will vary from service to service. A large part of the reason Facebook is so strong in this area is that the friends returned tend to represent real-world friends with influence. Most services out there will have a hard time achieving the same level of quality.

The Developer Application lets you specify where your application lives and what it can do.

In any communication between two web applications, they need a way to find out how to talk to each other and what language to speak. For Facebook, this is relatively easy - the developer just comes to the site and tells Facebook all the information about itself. But in the “open stack” it’s much harder. After all, there is only one Facebook, but there are an unbounded number of “open” web services, which can each offer some variation on OpenID, OAuth, PortableContacts, etc etc. How does a relying party know who offers what, who is telling the truth, and what’s going on?

On Facebook, the developer visits the Developer application and fills out a few forms. They can specify who they are, what they offer (iframe or canvas application, on-facebook or off, etc), and most importantly, where to find it (the callback URL).

Specify an application's callback URL in the Facebook developer app

Specify an application

XRDS is a service discovery protocol that lets a service advertise what it offers, so that all necessary information can be discovered by software without needing human intervention. It doesn’t cover quite the same options as the Facebook Developer application, but it is the same idea. The main difference is that XRDS is automatic (in-band) discovery, whereas the Developer app is done manually (out-of-band).

For instance, here is Plaxo’s XRDS service definition file. Anyone who hits plaxo.com can figure out that Plaxo supports OpenID, Oauth, and PortableContacts, and they know where to find those services.

<XRD version="2.0">
<Service xmlns=”xri://$xrd*($v*2.0)”>

And then there’s the REST

Once we move beyond the core technologies and into the specifics of the APIs, the analogy starts to break down. The OpenSocial REST API offers an analogous set of APIs as fbCode, but they are sufficiently different as to not quite be a standard. And both the open community and Facebook have features that don’t appear in the other one. For instance, Facebook’s implementation of News Feed publishing, FBML, and XFBML are quite different from the OpenSocial specifications for activity streams and templates, although they do share similarities. I think that Facebook, MySpace, Google, and other containers will continue to experiment with these features until they coalesce on a standard, at which point we can add them to this list.

»  Substance: WordPress   »  Style: Ahren Ahimsa