Exploring IMDb Top 250 with Tableau
In this blog we will build a Dashboard and Story to explore IMDb(Internet Movie Database) Top 250 movies. We will build an interactive dashboard with the help of global filters and dashboard actions so that users can explore data and answer their own questions. With the help of Story we will try to answer a few of the common questions that can be thought by anyone. The other takeaway from this blog will be to use/embed Web Page objects in Dashboard to hyperlink to additional web–based information outside of data source, depending on data and on the user’s selection.
For the purpose of this blog we will source the data from IMDb official website. The data is available here and here. One might have to spend some time in order to collect the data and required metric for the analysis. E.g. the result of sourcing data from first link above will have IMDb rating for movies but will not have user votes information.
Whereas sourcing data from second link will have most of the information including number of votes and user rating for each movie along-with release date.
Make sure to collect the URL for each of the movie which will be required for the exercise of embedding Web Page objects in Dashboard at a later stage. One can retrieve the URL of the movies as shown below. The method of collecting the data is left to the readers of this blog.
One can refer to alternative interfaces where plenty of information is available. Please read the copyright information for allowed usage.
For the benefit of all we will quickly understand the structure of the data that I have used and dimensions and measures therein.
The data has been sourced from imdb.com and formatted appropriately for Tableau’s consumption. We have data for top 250 movies between years 1921 to 2015 with information such as user votes and rating for each of the movie. The below table gives you a quick overview of important dimensions and measures present in the dataset.
Rather than asking question upfront, this time we will start making an interactive dashboard to answer our questions later.
Step 1 – Connect to the data.
Open the Data worksheet from excel or text file that has the data. Note that we have all the dimensions and measures that we discussed above in the file IMDb250_RatingsAndVotes opened below.
Step 2 – Go to Sheet 1 and analyse/review the loaded data
Step 3 – Create a sheet with each dimension and measure configured as filter
The idea in this blog is to build a self explorable dashboard for which we need to give control in the hands of user to play around with possible values of dimensions and measures.
For every dimension and measure except Movie URL we are going to create an individual global filter and show them as quick filter with appropriate style.
To create a global filter, right click on chosen filter and then click on Apply Worksheets to select All Using This Data Source as shown below. The database or cylinder icon shown besides the dimension or measure in filters shelf indicates that it is a global filter.
So we have all the dimensions and measures configured as global filter as shown below in Sheet 1. Note that we have Movie Title configured as wild card match filter and Release Year as multiple values (custom list). One can rename Sheet1 to Filters.
Step 4 – Create a simple tabular view of the data
Step 5- Create a Dashboard with Filters and Table sheets
We will create a simple dashboard with Filters and Table sheets and use dashboard actions, web page embedding and little bit of formatting to make it interactive.
Step 6 – Add a Web Page to Dashboard
Double click on Web Page in the left pane Dashboard section which will pop-up Edit URL dialog box. One can leave it blank and click on OK.
It will embed a blank web page object between Table and Filters as shown below.
Step 7 – Create URL action for dashboard
We will add interactivity to dashboard by creating an action whereby whenever we click on a movie in the table it will open up its corresponding URL (movie link on IMDb for details) in the embedded Web Page object area on the dashboard.
Below are the steps to add that action in the dashboard. Prior to this don’t forget to add Movie URL as a tooltip or a separate column in the Table sheet.
Give appropriate name to the Add URL action followed by choosing the source sheet as Table in which when selecting a movie will open a URL as captured in Movie URL attribute.
Dashboard Action once created should look like as shown below.
Step 8 – See the dashboard action in live
Click on any movie in the table and it should open up the link on IMDb for that movie which will have details about the movie, photos, trailer, description etc.
One can argue that layout of the dashboard is not optimized here for better viewing but the idea is to demonstrate the capability of dashboard and embedded web objects here. As a suggestion one can organize the filters on top as a row to make more space for web page object and table.
This is a slightly optimized version of dashboard in terms of layout.
Step 9 – Create a story with various dashboards
Using above steps I have created various dashboards using which a Story can be created. A story is nothing but a collection of dashboards. It is like a book with pages where each page has some new story. One can give title to each page in the story and simply has to drag and drop dashboard or sheets onto the page. Below is a simple story with three tiles/pages each built for different view of the data and analysis.
Let us do some analysis now.
Is there any movie which has rating greater than or equal to 9 but votes less than a 700K?
Using our explorable dashboard one can adjust the filters accordingly to see the results. There is only one movie with such characteristics God Father Part – II of 1974 where the rating is 9 but the votes are close to 700K.
How does the rating characteristic of top 50 movies look like?
Filter the ranks to retrieve top 50 movies on a second page of story named as Ratings and Movies.
It seems like the average rating for top 50 movies is around 8.5 and 8.6 with 9 movies in 8.5 bucket. By the way Gladiator is my favourite movie.
How does the votes characteristic of movies with title containing “The”?
Filter the movie title with value as “*The*” in third dashboard Votes & Movies. It seems like there is no clear pattern in movie title having “The” in it and votes.
Stay tuned for more learning through visualization with Tableau.
Tableau (NYSE: DATA) headquartered in Seattle, Washington has a mission to help people see and understand data. It offers a product portfolio for data visualization focused on business intelligence.
One can visit the official Tableau website to find more details about Tableau and its product offering and features.