This is the third and final part in a series of 3 blogs focussed on Level of Detail Expressions (LOD) introduced in Tableau 9.0 version. In the first and second part we looked at the two variants of LOD expressions that is using FIXED and EXCLUDE scoping keyword in LOD expressions. In this part we are going to create LOD expression using INCLUDE scoping keyword. Before we proceed ahead with the details we will understand the mechanics of INCLUDE LOD expression.
The below graphic explains what happens in terms of Aggregation and Granularity for INCLUDE LOD expression. Tableau first adds the specified dimensions to the Viz LOD and performs the calculation i.e. INCLUDE keyword creates an expression that is less aggregated than the Viz LOD.
To understand how INCLUDE LOD expression works so that everyone can understand its mechanics, we will use ODI Cricket batting data that we have already looked at before in earlier blogs.
For the benefit of new readers we will quickly understand the structure of the data and dimensions and measures therein.
The data has been sourced from howstat.com and formatted appropriately for Tableau’s consumption. This is the first important and often time-consuming step before data visualization and exploration can happen. We have batting data for One Day International (ODI) matches played between years 1971 to 2011 with close to 60,000 data points. The below table gives you a quick overview of important dimensions and measures present in the dataset.
|Player name||Score Rate (runs per 100 balls faced)|
As usual we will start with asking the question that we want to answer using LOD expression.
Let us say we want to analyse what is the average total runs scored by a player for each country. This means finding out how many players played for the country for the time duration that we have in the data and finding out the total runs scored by all the players and then taking the average. Let us see how we can accomplish this easily with INCLUDE LOD expression.
Data Exploration and Analysis
Step 1 – Connect to the data.
Open the Data worksheet from text file that has the data.
Step 2 – Go to Sheet 1 and analyse/review the loaded data
Step 3 – Place Country on Row shelf
Step 4 – Place Runs on Column shelf and choose Avg. as aggregation method
Step 5- Convert the visualization into data table
Converting the visualization into data table will help us understand how INCLUDE LOD expression works by comparing it with normal Average of Runs. As seen in the table below the average of runs depicts the average of runs scored by player and by each match date as the granularity of the data is by player and match date. Using INCLUDE LOD expression we will come up with the average by Player as shown in next step.
Step 6 – Create a calculated field for INCLUDE LOD Expression
Step 7 – Put INCLUDE LOD Expression calculated field on the data table columns shelf
By default the SUM aggregation method is chosen for the calculated field and it represents the total runs scored by the country for the entire time duration and against all opponents. Note that we would have got the same values if we had directly put the runs on column shelf and applied the sum aggregation method as done in step 4. But as soon as we switch the aggregation to Average one can notice the difference between average computed for runs and average computed for the LOD expression.
Step 8 – Choose Average as the aggregation method for calculated field
Step 9 – Verification of INCLUDE LOD expression calculation
Let us now verify how the simple INCLUDE LOD expression was evaluated. Put sum of Runs and distinct count of Players on column shelf.
Step 10 – Create a simple calculated field without using LOD expression
Step 11 – Put Calculated field equivalent to LOD expression on column shelf
Observe the last two columns of the table in above screenshot, they are same as expected. We have created a calculated field without using the LOD expression in order to understand how LOD expression works.
Let us do some analysis now.
Which team has highest average runs by player?
South Africa it is. My guess is they had good all-rounders like Shaun Pollock, Lance Klusener, and Jacques Kalis who could bat extremely well hence pushing the average by player higher as compared to rest of the team.
Which team has lowest average runs by player?
England has average runs by player lower than even Kenya and Zimbabwe partly because the number of distinct players that have played for England (220 it is) is highest among the entire lot hence pushing the average down.
Stay tuned for more learning through visualization with Tableau.
One can visit the official Tableau website to find more details about Tableau and its product offering and features.