The methodology used here borrows from that used by DP07 and Xiayun over at the now defunct World of KJ Yahoo Review thread. In a nutshell, they theorized and proved that the number of reviews on the Yahoo website on opening day could be used to predict the opening day Box Office number. My work here is similar, but looks at tweets from the days, weeks, months and sometimes even years prior to release.
The methodology is as follows:
Record all tweets for a film from the day its release is announced to opening day.
Filter out garbage posts not related to the film or spam.
Obtain ratios for # of total tweets to Friday/weekend box office.
Simple enough right? Only problem is the sheer volume of tweets, the number of garbage tweets not related to the film (especially with a very generic movie title like "Fame"), and difficult to spell titles that lead to common typos. To get around this I have had to get creative with search filters and a lot of manual eyeballing of data to fine tune them.
Is it foolproof? No, obviously some people might be atrocious spellers and I might not search for all possible typos, or some people might reference the movie by the actors involved ("going to see the new Bale flick tonight."). But my hypothesis is that it will all even out per film. Twitter is by no means a representative sample of the population, BUT, it is a pretty consistent sample, and through following it over many weeks and months clear patterns should emerge once I keep in mind genre, appeal and the wider environment (ie. holidays, midnight screenings etc. that will affect tweet totals).
Since I first came up with the idea and began gathering data back in September of 2009, we have refined ours tools here at Box Office and I now have access to positive and negative tweets by title, tweet data for every day of the week and from the day a movie is announced for wide release its tweets are tracked. As time goes on the formula expands and I have been able to incorporate more sophisticated methods of analysis, all which afford me greater accuracy and insight into buzz.
In general over the last four and a half years the number of tweets by title have increased dramatically as Twitter's popularity for movie discussion has taken off amongst its users and as marketing teams have embraced the power of social media in general and focused advertising campaigns on them.
Two concepts which are core to our work are Ratios and how Twitter is used by different demographics:
The ratio is the number of tweets per $1 million of Friday Box Office gross over a defined period. A film with 1,000 tweets from Monday to Thursday and a $10 million Friday would therefore have a ratio of 100. In general, films that appeal to very young or older audiences have lower ratios since those audiences are not big users of Twitter. By comparison, films that appeal to younger audiences (18-35) have much higher ratios since those audiences are much more active users of Twitter. The tweets used for these ratio calculations are always all tweets from Monday to Thursday of the release week, or for Wednesday openers I use Monday to Tuesday.
The main goal is to come up with a solid tweets to box office ratio by genre and audience for films. Why on earth is this important? Well, cause I'm a numbers geek and it interests me to combine my social media and Box Office passions. But it will also predict flops from further away and help to mine diamonds in the rough that will overperform. I have learned a lot since 2009 about the Twitter landscape as it pertains to movies and how it has and is continually shifting as usage shifts for all parties involved, the perpetual goal is to be a viable source of advance tracking outside of traditional methods.