(Originally written in July 2016. See bottom for latest update)
The insanity started with a worthwhile question, I swear.
As a movie reviewer, I gradually became obsessed with the idea of importance. As in: what makes a film important, while others just wither away in irrelevance? As a reviewer, I am in the business of highlighting what should be seen and what shouldn’t: what informs that choice, and how can I validate my picks against the popular consensus, if there’s such a thing?
(Before going any further, let’s dispense with the idea of looking for “best” movies. This is not what I’m trying to measure. The following voluntarily abstracts the idea of quality, albeit with a surprise twist midway through.)
Finding a Good Metric
I briefly toyed with the idea of relying on Google’s search results. After all, wouldn’t it make sense that a more important film leaves a bigger Google footprint than more forgettable ones? Taking two 1977 science-fiction films as an example, “Star Wars” gets you more than 200 million results, while “Planet of the Dinosaurs” doesn’t crack 200,000 hits. The problem with that idea is the English Language, or rather the lack of a clear identifying mechanism between what is a movie title and what isn’t. “Star Wars” can refer to any of its sequels, spin-offs, franchise material, merchandizing or SDI controversy of the 1980s. Add to that the impossibility of measuring the true natural-language cultural impact of a film via quotes, in-jokes and other references that don’t contain the title of the film, and it’s enough to shelve the idea.
The obvious solution is to rely on a structured database of movie information. This first takes us, irresistibly to box-office results. After all, the amount of money that a film has made at the box office at first seems to be a pretty good approximation for significance. If ten million people saw a film in theatres, it’s likely to be more culturally relevant than something barely seen in art-house theatres. (Again: I’m not talking about quality here.) Unfortunately, while box-office results do offer an approximation of cultural relevance during the film’s theatrical run, they become increasingly irrelevant two, five or twenty-five years later. Recent film history is rife with movies that performed modestly in theatres, but found a significant audience later, either on home video media or on cable TV. Blade Runner, Austin Powers, The Shawshank Redemption and Zoolander are only four examples of films whose disappointing box-office results in no way measure their lasting importance in pop culture. At a smaller scale, plenty of theatrical blockbusters quickly disappear from collective memory barely five years later, while smaller releases go on to earn devotion from a significant number of fans. Finally, the changing nature of movie distribution in this digital era is making theatrical box-office results increasingly irrelevant in the long run: In December 2014, The Interview’s remarkable release on VOD channels after being pulled from chain theatres clearly signalled a bold new era.
So, what do to? While it would be nice to combine box-office results with home video sales and VOD rental numbers, those latter numbers are not released by studios, fragmented across too many platforms and don’t exactly lend themselves to quick bulk analysis. That last factor can seem unbearably nerdy, but it’s actually significant: I want to be able to obtain regularly updated numbers without killing myself in the process of doing manual processing.
That’s when I started looking at the Internet Movie DataBase (IMDB) again. I have long been a fan of IMDB (I’ve been using it long enough that I still remember the outcry when it moved to its.com domain), and there are no comparable sites anywhere else. It has the most complete listing of films in existence, offers plenty of ways to access its information and has a massive user base. Best of all, in an analytical sense, is that it makes a subset of its database available for download so that we can crunch numbers at home.
But here’s my big flash of insight: Once you dispense with the notion of quality, the measure of popularity is right there in the IMDB results: the number of votes every film receives. No matter what ratings are given to the film.
Think about it: Once the film’s theatrical run is over and box-office numbers stop, the films are still available on home media, still being shown on TV. And people who see the film vote on it. Heck, IMDB even captures popularity from illegal downloads, as (some) torrenters do vote on films they’ve seen.
Just for fun, let’s compare the box-office results from 1994 with the IMDB total number of votes for movies released that year:
|Box-Office||Number of votes|
|The Lion King||The Shawshank Redemption|
|Forrest Gump||Pulp Fiction|
|True Lies||Forrest Gump|
|The Santa Clause||Léon: The Professional|
|The Flintstones||The Lion King|
|Dumb & Dumber||Dumb & Dumber|
|Clear and Present Danger||Speed|
|The Mask||Interview with the Vampire|
|Pulp Fiction||Ace Ventura: Pet Detective|
Focusing on the differences, True Lies, The Santa Clause, The Flintstones and Clear and Present Danger doesn’t make the top-10 votes list, whereas it adds The Shawshank Redemption, Léon, Interview with the Vampire and Ace Ventura. Does that sound right? I think it does. The effect gets more pronounced the farther back in pre-internet time you go (go take a look at 1982 for a stunning example of how the box office results in no way reflect what we remember from that year), or the lower down the list where foreign releases and box-office bombs have a chance to shine on the basis of quality, or their stars’ subsequent careers.
As silly as it sounds, total number of IMDB votes is a pretty good approximation for cultural relevance as I define it.
(Of course, the elephant in the room is the acknowledgement that the usual IMDB voter happens to look and behave a lot like me. Looking at the statistics, the average IMDB voter is male, aged 18–29 [with a smaller male cohort 30–44] and quite a bit of a geek-movie fan. There’s a bit of analysis at http://www.quora.com/Why-do-you-think-a-score-on-IMDb-does-not-reflect-performance-at-the-box-office showing the remarkable differences between the IMDB voter and the U.S. moviegoer, although we’ll see in a moment how movies escaping that demographic are still represented fairly well. In fact, comparing the male versus the female top movies doesn’t show much of a difference).
Having thus determined a useful metric, how can we use it?
Interlude: Does popularity mean quality? (Don’t answer too fast)
I’m going to dispense with the reams of analytics that I ended up poring over once I realized that I had a useful source for “cultural importance”, and I will focus on what happened when I realized that I could match my list of movies seen against IMDB’s list of the most popular films.
But before I do, I can’t help but leave you with this graph:
That’s a graph of the 500 most voted-upon movies, with the horizontal axis being the popularity rank (1 to 500) and the vertical axis being the rating given by IMDB users on a ten-point scale. The red line is a polynomial trend line. There are two things to note here:
- There is, at least at first, a correlation between the popularity rank and the rating given: There are only two movies ranked lower than 8.0 in the top 40. This suggests that in some cases, contrary to sophisticated opinion, there is such a thing as correlation between quality and popularity. Or, specifically, that movies good enough to appeal to a lot of people are seen by a lot of people and rated accordingly.
- Most of those 500 most-voted films actually have fairly decent ratings by IMDB standards: the vast majority are solidly in the 7.0 to 8.0 range. The two outliers, rated at 4.3 and 5.2, are the first two films of the Twilight series. (Yes, you can smirk at that.)
But OK, back to my insane gamification project.
From metric to gamification
As I was playing around with the data over a period of weeks, there was at some point an audible click in my head when I realized that I also had another data source to play with: The grand and definitive list of all the movies I had seen in the past eighteen years.
Since you’re on this site, you may have noticed that I have thousands of movie reviews available. Every single film I’ve seen since May 1997 is reviewed (not always very well, I’ll admit) and getting a list of those movie titles is trivially easy.
So you’d think that it would be a piece of cake to match one with the other, right? Just tally them up next to each other, and I’d be able to find out how many of IMDB’s top-voted films I have actually seen.
The days following my first attempts to automate the match between both lists were painful. In trying to play with IMDB data, you see, you first have to match their titling conventions, and it turns out that the movie universe is often weird, and IMDB occasionally have moments of necessary madness in their attempts to properly organize all films ever made.
For instance: Movie titles include periods (such as Adaptation.), ellipses (such as Waiting … or In a World…), multiple subtitles separated by colons, or don’t follow any sort of naming convention over the course of a single series (such as the Fast and Furious series)
Then there’s IMDB’s decisions to include articles as part of the sorting order of movie titles (i.e.; “The Godfather” rather than “Godfather, The”), or their understandable decision to label a movie year by the date of first public showing, even if wide release may come much later—such as Paranormal Activity, first publicly screened at a film festival in 2007 but mass released in 2009.)
In my own reviews, I had usually attempted proper titling sort order and often used the year of mass theatrical release as the year of the film. That introduced a few errors to fix. In practical terms, it meant that I had to modify a substantial (roughly 25–30%) proportion of my review titles to match IMDB. That meant multiple passes through my entries, matching against IMDB, rinse and repeat. It took weeks. And whenever I wanted to rebel against the official title of a film, I included it as an also-known-as and adjusted my data-cleansing scripts accordingly.
By the end, however, I had a nice match between IMDB listings and my own.
It was shortly thereafter that I started looking into the possibility of automating my IMDB data extraction. Fortunately, IMDB is very accommodating of those needs: it provides dumps of its database freely accessible by FTP as so not to burden their sites with thousands of hits. So it’s easy to get a file, updated weekly, of voting numbers for most of the titles in their collection. (IMDB only extracts results for titles with more than five votes, which doesn’t quite cover the entirely of their database but certainly covers anything you’re likely to have seen.) After a few days of trial and error (have I mentioned I’m not much of a coder?), I was able to develop an Excel VBA script to extract data from their ratings.list file and place it in an Excel workbook, with each year a worksheet.
From there, I was able to extract my own yearly movie lists and match them with IMDB’s lists. In other words, I was able to see which of IMDB’s most-voted films I had seen … and which ones I hadn’t.
The gamification began.
Interlude: What’s so important about seeing the popular films?
Before delving deep into gamification, though, it’s not a bad idea to ask why anyone would even think about quantifying whether they’ve seen popular movies.
For most people, it’s an admittedly moot point. Most people see movies when they want, according to their fancies of the moment or what’s playing at the moment. Most people don’t keep track of the films they’ve seen. Most people don’t care to form a deep understanding of The Big Picture when it comes to movies. (Good for them; movies are entertainment and it’s not always helpful to overthink it.)
But I’m a movie reviewer who is actually paid for genre film columns. I have an encyclopedic knowledge of movies in my head. It’s important (for my reviews, for my self-esteem, for my enjoyment of that chosen hobby) that I am able to speak knowledgeably of movies in general, and that includes films that I may not want to see.
In highlighting which popular films I hadn’t seen, for instance, I quickly identified certain genres that I had either avoided or not cared about. Adam Sandler films. Romantic Dramas. Oscar-nominated foreign movies. Torture horror. Mainstream drama without genre elements. A good chunk of Jim Carrey’s later career. Disney animated movies before Bolt. Teenage comedies, including the entire American Pie tetralogy. While most of those have no relevance to the areas of specialization for which I’m paid (crime/thriller and science fiction/fantasy), they are part of a conversation going on the wider universe of Hollywood films. To take an example, I have reviewed genre films starring Seth Rogen and James Franco such as This is the End and Your Highness. But to understand those films, you have to understand where they are coming from, and that quickly takes you to the Judd Apatow films (only half of which I had seen) and earlier with the American Pie series that clearly influenced them. And so on, going back in time: Hollywood and the wider filmmaking community are constantly looking over each other’s shoulders, taking inspiration from the past and commenting on ongoing movie trends. Ryan Gosling in Drive is awesome, but Ryan Gosling as the guy who also starred in Blue Valentine and The Notebook is even more awesome.
All of which to say that for a reviewer or serious film fan, once a film reaches a level of popularity, it becomes a worthwhile viewing experience no matter if it’s aimed at us or not. Seeing what everyone else has seen can give us a clue as to what works and what doesn’t. I firmly believe that reviewer should see far many more movies outside their comfort zone, because it informs what they do see in their comfort zone.
So: At some point, it’s useful to measure what you haven’t seen, and try to correct that.
The first step toward gamification of the movie-viewing experience is to identify goals.
In my case, it didn’t take much more than an Excel function identifying how many of the top-100 films I hadn’t seen. That got refined to top-100 per year since 1997, and then to the top-50 and then the top-ranking film of the year I hadn’t seen.
Inevitably, you end up with the following formula, per year:
Number of films seen
MULTIPLIED by the percentage of the top-100 films seen
MULTIPLIED by the percentage of the top-50 films seen
MULTIPLIED by the ranking of the top unseen film.
So, for 2005, I have seen 104 films of 2005, MULTIPLIED by 65% of the top-100 films, multiplied by 92% of the top 50 films, multiplied by 36 which is the ranking of Hostel, the top-ranked film I hadn’t yet seen. That gives me a score of 2238.9.
And once you have a score, friends, you have a game.
Assign a score to all years between 1998 (my first full year of reviewing) and 2014, get the average and you have a master score. See a top-100 film and get one more point on the master score. See a top-50 film and get three points. Knock down that top-ranked unseen film value and gain a bonus of 15, 30, 60 points!
You can see the appeal, even if it’s a single-player game.
January rolled over, which meant that I added another year. Oh no, the average score went down! How quickly could I bring it back?
That kind of game can quickly become addictive. Of course, I ended up developing increasingly sophisticated tracking mechanisms: Thanks to some Excel trial-and-error, I was able to build a list of movie unseen, and set objectives accordingly.
You do have to set limits, though, otherwise infinite madness awaits. I eventually set out the following rules for myself:
- I want to have seen all top 50 movies in a given year since I started keeping track of what I see. (This isn’t all that excessive for me, given that over the past 15+ years, I naturally saw between 40 and 45 of all top-50 movies.)
- I want to have seen at least 85 of the top 100 films. That’s a bit of a stretch, but not too much given that my average already was around 70. 85 out of 100 also allows me to pick and choose, because there are films in that 50–100 range that I either really don’t want to see or aren’t likely to be available in a reasonable manner.
- Speaking of which, I have to see movies legally. This means no illegal torrenting, no piracy, no bootlegging. Fortunately, I am subscribed to an extensive number of cable channels, online viewing options are getting better and it’s possible to get older DVDs really cheap these days. (Not to mention bargain bin rummaging.)
- Speaking of bargain bin rummaging, I haven’t really put a cap on the price I am willing to pay for the films on my new must-see list, but I’m famously cheap and $5 seems like a workable maximum price point, at least at first. (We’ll see later if some films prove harder to find.)
- There is to be no excessive quibbling on the quality experience of the movies seen. By which I mean that, even if I’d rather see any film in its subtitled original soundtrack, I wasn’t going to stop myself from watching a film dubbed in French, edited for length or bleeped for broadcast on family-friendly standard TV. Also; no high-definition snootiness if it’s available in good old standard definition.
- Finally: I won’t fast-forward through a film and pretend that I’ve seen it … but I allow myself to do something else which I watch it. This became known as the “I Am Sam” clause for reasons obvious to anyone who has seen the film.
Given the above, the last month of 2014 and much of 2015 became a movie catch-up extravaganza. I’d pore over the TV listings on Mondays to program the DVR for the week, and record anything near the top of my should-watch lists. Then I’d watch the equivalent of a film every evening or two. During that time, I accumulated some delays in writing up my reviews but somehow managed to keep up with the rest of my obligations. And it worked. In the first three months, I managed to get my Top-50 average from 87% to 91% and my Top-100 average from 73% to 75%. Most significantly, my top unseen film average went from 16 to 30.
All of which didn’t cost all that much. On any given week, I could find 2–4 targeted films broadcast somewhere in my cable TV subscription. Rummaging through bargain-DVD bins netted me a few more titles. Amazon helped a bit: I got a few movies for $5, including the four-movie American Pie series compilation for $20. Then I subscribed to Netflix, and got access to an entirely new constellation of movies.
Trying to “catch” the films by recording them as they played somewhere in my cable channels took me to very strange places. I ended up recording from channels I’d never watched before, from BET (Will Smith’s movies), to GameTV (Shallow Hal) to YTV (17 Again, The Adventures of Tintin) to Teletoon (Puss in Boots, Kung Fu Panda 2) to French-language CinePop (which I usually avoid given that they broadcast French-dubbed versions of English movies) to plain broadcast TV. Two of the most valuable channels in filling holes in my lists included Slice and the W-movie channel, both of which are aimed at female demographic segments and so featured many movies I hadn’t yet seen. Other more predictable stops included Action (teen-oriented films) as well as the general-audience ShowCase and MovieTime channels.
Some films are harder to get than others. After four months of watching, I’m making nearly no progress on small independent films that somehow ended up popular, and foreign-language movies that struck it big on IMDB. Disney animated movies, I’m finding out, are practically never shown on TV, which is a pretty good excuse for building a preschooler’s Blu-ray library. More extreme horror/thriller movies, unplayable on mainstream TV, are also proving a challenge to get: I ended up feeling as if I was breaking an ethical rule of my life by purchasing a previewed copy of Hostel (for a mere $4, but still…) and I abandoned hope of seeing Hard Candy or Battle Royale on TV. (Netflix eventually provided.) I anticipate an endgame in which the last few remaining Top-50 films are seen via more expensive Amazon purchases, especially for some foreign critical darlings.
Have I learned anything from those films? Maybe a bit. While watching Grown Ups, I realized that Adam Sandler films are built around the notion of comfort in stereotypical gender roles and social conventions. I realized that I like romantic comedies far more than romantic dramas because comedies have better failure modes than dramas if the central concept doesn’t work. I’m perhaps a bit better at distinguishing a good, competently-made film from one that I personally enjoy. I have learned to rediscover the merits of commercial breaks in getting a snack. I catch a few more culture references than I did. I filled the holes in quite a few filmographies of well-known actors (most notably the aforementioned Adam Sandler, Jim Carrey and Will Smith, but also Joseph Gordon-Lewitt). Perhaps surprisingly, I haven’t (yet) found a film that truly madly struck me as an absolute personal favourite. Sure, I liked films such as Intouchables, The Perks of Being a Wallflower, Love & Other Drugs or (500) Days of Summer, but I haven’t found anything I feel as if I had missed much in watching just the movies in which I was interested.
On the other hand, I have now sat through an increasing number of films that I have found either insipid or insufferable. Romantic or mainstream dramas are usually the worst offenders here, as they don’t offer the same kind of genial atmosphere that comedies or action movies do, and tend to irritate me when they get overly manipulative. The worst so far has to be I Am Sam, the 2001 Sean Penn vehicle in which he plays a mentally disabled father trying to keep custody of his daughter: It’s a competently-made film with an Oscar-calibre performance from Penn, but it’s so profoundly irritating that I started to resent the time I was wasting watching the film. (That led to a rule change explained above in which I reserve the right not to give my complete undivided attention to some movies.) Other particularly interminable viewing experiences so far include The Vow, Sweet November, Changeling and Seven Pounds. At least I can see why they are popular, even though they may not be aimed at me.
Other than dull movies, has The Game had any detrimental effect? While I feel that I’ve been reasonably successful at confining the game within the bounds of my free time (i.e.; no encroachment over family time, work time or chores time), it’s clear that most of my other hobbies have been waylaid by this momentary obsession. Reading, writing and emails to friends have all been sharply reduced. (Heck, I had trouble keeping up the pace of my movie reviews because I was watching so many movies!) I’m actually OK (but not overjoyed) by this: I know that I have an obsessive personality, and the surest way to get over a momentary obsession is to burn through it as thoroughly as I can. Once The Game gets under control (I’m thinking once I have less than 25 of the Top-50 movies to see), I’ll be able to return to a more balanced blend of hobbies.
Within the bounds of movie-watching itself, I also believe that my willingness to take a chance on smaller recent movies that “look interesting” has suffered from The Game: My movie-viewing time being limited for the foreseeable future, I haven’t recently pored over the movie channel schedules in the hope of finding something interesting that I haven’t yet seen. Nearly all of my viewing is now dictated from a list. Given that smaller but interesting films are the lifeblood of genre B-movies, I do feel as if I’m missing out a bit. Again; this will get back to normal once The Game is won.
Considering this, I think that the future of The Game is bright, as long as some moderation is used. Trying to cover 85% of the Top-100 for the years since 1997 is tough but doable; trying to expand the same game in earlier years using the same rules seems insane and possibly impossible depending on how far I’m willing to go. On the other hand, modified rules for years prior to 1997 (say, seeing all the Top-10s and at least 75% of the Top-30) should lead to a heck of a film self-education. It’s clear that my pacing will have to slow down once I get most of the easy pickings out of the way: It’s not sustainable to attempt seeing a film every night-or-two for more than a few weeks.
March 2017 update:
The Game has changed. At some point, enthusiasm flags and goals change to measure the remaining distance rather than the one already travelled. So it is that by mid-2016, the most important metric in my game had shifted from the ridiculously convoluted point system to a countdown of the number of films left before I could accomplish my goals.
I eventually reached my objective of seeing all Top-50 movies of 1997–2016 late in 2016, also reaching my Top-85% objective at the same time. I expanded a bit, setting a goal of all Top-10 movies and Top-20% for 1975–1996. While I kept going beyond my Top-85% goal, I also ended up putting a ceiling on what I’m expecting to see. I will not go above Top-95% for any given year—it helps a lot in keeping things under control as I have already reached that limit for 2012–2015 and am steadily expanding these “done!” years. (It also frees me from absolute completism, which can be frustrating in trying to gain access to those foreign movies barely available in Canada.)
While I’m still having fun with The Game (going back to the eighties is terrific, and I’m [re] watching some great movies along the way), I’m also dialling back down on some aspects of it. I miss seeing unknown low-budget genre features, and so in January 2017 gave myself permission to watch a dozen low-budget science-fiction films that had been accumulating in my Netflix queue and DVD stack.
Left unchecked, The Game also means a DVR filled with movies to see quickly in order to make place for other ones, and that eventually feels a lot like a treadmill. So it’s good to take some time off for a while, and consciously go off reading or doing something else rather than Another Movie Evening. (I miss reading books.) Even though I’m digging deeper in 1975–1996 (and eventually 1950–1975), The Game has already served its purpose: I’m fluent in a greater variety and selection of movies than before, seeing links that weren’t obvious before and polishing off my cinephile credentials.