Sunteți pe pagina 1din 1

DATA PREMIER LEAGUE

Submission Details: Aravindan Ingersol, IMI Delhi, aravindaningersol.p12@imi.edu

BLOCKBUSTER MOVIES
Predicting the performance of upcoming movies based on past movie performances is a challenging task since it depends on a number of complex factors, most of which are difficult to quantify meaningfully. Performance of a movie can be construed to mean either the financial performance of a movie or the performance with respect to audience response or approval (like, say, imdb ratings). Since financial performance, in itself, also implicitly accounts for audience approval, and holds the interest of makers, producers and investors, this submission proceeds with the objective of predicting the financial performance of upcoming movies.

METHODOLOGY
The financial performance of movies can be assessed using the metric: Box office/Production budget (for the sake of brevity its referred subsequently as ROI, though in traditional finance ROI is slightly different). While Box-office also indicates the performance of a movie, ROI is a better metric since it forces all movies (irrespective of the size of their budget) to be compared on a common footing. DATA PREPARATION: From the given data, the genres were recoded to reflect only the following common & distinct genres: drama, comedy, crime, horror, thriller, adventure, action, sci-fi, animation, others. The rest were subsumed into one of these genres; Genre types of movies were recoded using the Binary to decimal system algorithm to represent genre-combinations. For analyzing trends: movie duration, imdb ratings, tomatometer & tomatousermeter ratings, Star ratings (Actor+ Writer+ Director movie credits) & Production budget were recoded into different levels for better interpretation of the data. Independent samples median test was used to confirm the trends observed visually (refer infographics pdf). Production budget data for the movies were obtained from the following sites: http://www.the-numbers.com/movie/budgets/all, http://www.opusdata.com/. A total of 453 movies were distilled, using MS Excel and SPSS, for final analysis (the production budget data was not available for the other movies) Actor, Director and Writer index scores were obtained by finding their total acting or technical experience till date (Data source: http://boxofficemojo.com/people/). Higher the experience, higher their popularity & better their expertise to cater to what the market demands, leading to more movie credits (creating a virtuous cycle). To predict ROI of upcoming movies, Nearest neighbor approach has been used. The nearest neighbor approach predicts the performance of upcoming movies based on how similar movies (i.e. neighbors) have performed in the past. The factors that have been considered for identifying neighbors and predicting performance are: 1) Genre Type: People pick movies to watch based on their tastes, which is most often defined by the genre of a movie. Since most movies are a combination of multiple genres, they have been identified as such in the model, rather than identifying them by their primary genre; 2) Cast (Actors): The cast of a movie acts as a significant factor in generating buzz around a movie; 3) Directors and Writers: Both directors and writers influence the performance of a movie, in terms of making a quality movie and also, like actors, popular directors/writers act as agents of approval and buzz for a movie; 4)Week of Release: Certain weeks/periods in a year are more favorable for releasing a movie such as during summer break, year-end (i.e. winter break, say, in the US),etc. The optimum number of neighbors (K) for prediction was identified by the model to be 9 (Note: SPSS Statistics was used for building the model). The following table shows the SSE of the ROI for different values of K: Sum of squares error K=3 154,175,686.2 K=5 140,816,274.3 K=9 137,859,628.3

S-ar putea să vă placă și