May 19, 2015
In this blog, we will learn how to connect to MS Access Database from Tableau, how to perform joins on multiple tables (a.k.a Data Blending) across different data sources and how to import data from tables selectively.
For this exercise, you can choose to open any MS Access database that you may have or use sample datasets that are bundled with Tableau installation package. One can find the sample MS Access database file under “\Documents\My Tableau Repository\Datasources” folder if Tableau is already installed on the machine.
Let us spend some time understanding the data itself that we are going to use for the purpose of this blog.
The screenshot below shows the relationship between the tables that exists in the database opened in MS Access. This is a typical example of a star schema where we have a Fact table referencing one or more Dimension tables.
If one looks closely at DimProduct table, which refers to DimProductSubcategory one will realize that it is generally observed in snowflake schema where dimensions are normalized into multiple related tables.
In nutshell, we have Facts table, which has facts such as sales quantity, sales amount and corresponding metrics for goods returned. We also have data for discount amount and quantity.
The dimensions table indicate Sales of Products happening through different Channels over different Time with one or more Promotions. There is also a dimension missing from the table that is Geography represented by Store Key in FactSales table. We will worry about it a bit later.
Now that we understand the schema of the database, let us go ahead and connect to it from Tableau.
Connecting to MS Access database is no different from connecting to any other data source. If you are using Tableau Public then you will still be able to connect to MS Access database. Choose Access under Connect option browse to the location where MS Access database file exists.
Once connected to MS Access database one should see underlying tables in the left pane.
We will choose to open FactSales table as it has the numeric measure data, which are of interest to us.
In the data preview window one can choose to add filters based on which only selected data will be imported from the data source. One can specify filter criteria using categorical data or measure data. E.g. filter criteria could be to import data for certain years or it could be that import where sales $ or profit percentage is less than or greater than user defined values.
To add filters click on “Add” link on top right corner of the window as seen in screenshot below which will pop up a Edit Data Source Filters dialog.
One can then add one or more filter criteria to be applied as shown below. We will choose to add a filter on Time (Date Key) to import data for only one year.
One can also choose to apply condition on top of member level filtering. Feel free to explore Condition and Top tab as shown in above screenshot.
Applied filters are shown as below once added.
We will now go to worksheet and perform the next exercise of data blending.
Let us say we want to find out how different channels are performing in terms of Sales. We can use Channel Key and Sales Amount.
If one observes the data label for channel (1, 2, 3 etc.) it does not convey meaningful information. It will be the case if one tries to use any other metadata key such as Product, Time or Promotion. This is because we have normalized database following star schema. There are two ways to resolve the Channel key to its name. Let us look at both of them.
Go to start page or press Ctrl + 2 and once again connect to Access database. This time select DimChannel table and connect to it and you should see two data sources in data pane as shown below.
Select the DimChannel table if it already not and observe an orange link icon near Channel Key. Click on the link icon to establish the relationship.
Tableau detected that both the tables are part of the same data source and warns that it is better join the table rather than linking them through a common key in order to optimize the performance.
One can choose to ignore the warning and go ahead with linking the two tables. As soon as a common link is established, FactSales becomes a primary data source with blue right icon and DimChannel becomes secondary data source with orange icon as shown below in top left corner. One can freely use Channel Name, Label or Description seamlessly and Tableau will understand the link between FactSales and DimChannel.
While linking of tables belonging to same data source may affect performance as warned by Tableau but it does not alter the normalized form of the database. This approach is better suited for connecting tables belonging to different data source e.g. FactSales table in MS Access to Geography table maintained in Excel or CSV file. Remember we made a comment that we will worry about Geography at a later point in time. We will not go through the exercise of linking Geography table to FactSales though now that we understand how to create relationship between two tables through a common key.
Now let us look at how to achieve the same thing but with the help of table joins. We will break the relationship between DimChannel and FactSales by breaking the link (orange icon). Double click on FactSales, which will open the data preview window. Drag and drop DimChannel alongside FactSales and observe.
Tableau interpreted the two tables and created an inner join between the two. Click on two overlapping circles to explore and understand this better.
There are four types of joins possible on database tables. In this case, inner join makes perfect sense. One can choose to perform the left, right or full outer join as the case may be. Once the type of join is chosen, Tableau de-normalizes the left table to add columns from the right table for matched records. Channel Label, Name and description are now available in FactSales table as dimensions as shown below. Tableau uses common channel key to perform the join on the two tables.
This way we have learnt how to connect to different data sources, how to connect tables from the same data sources and how to bring only selective required data for analysis and visualization purpose in Tableau.
Stay tuned for more learning and visualization with Tableau.
One can visit the official Tableau website to find more details about Tableau and its product offering and features.