Network diagram is pretty common visualization technique that is used for various types of analysis nowadays. Let us say we have migration data which captures people migrating from one country to other or say we have data for transportation capturing flights between various airports across the world. For that matter network diagrams are very popular for social media analysis such as depicting tweets data for particular hash-tag over some period to capture who re-tweeted a particular tweet, who replied to a particular tweet etc.
Below is a snapshot of Network diagram which analyses distance between two cities in North America and various airlines operating on a particular route connecting two cities.
The diagrams below show the network of Richard and Susan respectively and article by Harvard analyses the network strength of the two.
To create network diagram in Tableau I decided to choose India’s Western Railways Suburban local train data. Reason being, I have spent 31 years of my life in Mumbai and local trains have been part and parcel of everyone’s life in Mumbai. It is the cheapest and fastest available means of commuting in Mumbai. One day while coming back from office (of course via local train) I thought let us do some analysis of local trains. Finding the easy data is a tough part in data analysis and data visualization. After deciding the website to use for data I decided to write a scrapper in python to automate the data fetch. The data that we are going to use for the visualization is scrapped from a website which has details about Western Railway local trains (sub-urban trains aka Mumbai lifeline). The website provided data which can be easily scrapped and claimed to be reasonably accurate and latest. The purpose here is not to have accurate data but the data which can be visualized as a Network diagram so I don’t claim data to be 100% accurate.
After scrapping the raw data I had to do lot of data cleaning and prep activities so as to make data suitable for building network diagram. I will not go into every detail of the steps I performed but here is the quick summary of steps that one needs to perform in order to build the data necessary for creating network diagrams in Tableau.
The raw data had the origin (e.g. Churchgate) and destination (e.g. Borivali) information of a train along with start time and end time of a journey. It had information about the number of coaches in a train (e.g. 12 or 15) and speed of the train (e.g. Fast or Slow).
The raw data was transformed into below format. Count of Train represents the number of trains operating on a particular route (between From & To) in a day. I ignored the time information as this blog is not about doing analysis with Time dimension.
For creating network diagram it is necessary to add “From-To” column to identify the route and Path order. If one observes in below table we have Borivali-Churchgate route appearing twice once with Path order = 1 and second time with Path order = 2. Count of train remains the same for both the instances. I have also added Latitude and Longitude column to reflect Lat./Lon. of “From” station when path order = 1 and Lat./Lon. of “To” station when path order = 2. I used Google’s geocode API to find the latitude and longitude of “From” and “To” stations present in the data.
Now that our data is ready we can perform the steps to create Network diagram in Tableau.
Data Exploration and Analysis
Step 1 – Connect to data source
Here is how I connected to data.
Note that I also have data where “From” station is equal to “To” station with Count of Train equal to zero.
Step 2 – Double click on Latitude and Longitude to create a map
Once a map is generated convert Longitude and Latitude to Dimension as shown below.
You will start seeing the various stations/locations on the map starting from Virar to Churchgate.
Step 3 – Put From-To on Details shelf
As a next step the “From-To” (route) dimension onto Details shelf as shown below.
As a next step change the mark type to Line from Automatic as shown below.
As soon as Marks type is change one sees the lines depicting all the routes of local trains running on Western Railway’s suburban area.
Step 4 – Place Path order on Path shelf
Put Path order measure on Path shelf and change it to Dimension from Measure as shown below.
Our Network diagram is ready but cosmetic changes are still pending.
Step 5 – Add From & To as Filters
Add From and To on Filter shelf and show them as quick filters as shown below.
You should have your basic visualization setup as below.
Step 6 – Remove unwanted elements from Tool-tip
Remove Longitude, Latitude, From-To, Path order from tool-tip.
Step 7 – Remove routes with count of train equal to zero
Remember we had routes with count of train as zero in our data as mentioned in step 1. Go ahead and put filter on Count of Train to remove zero.
Step 8 – Add relevant details in Tool tip
Add From, To and Count of Train in tooltip.
Step 9 – Create a dashboard
For meaningful analysis we will create a simple dashboard showing a table with routes and count of trains on the route and a map showing the network of the local trains running on various routes. We are already done creating Network diagram map we will create a simple table as shown below on the separate sheet.
We will create a dashboard with these two constituent sheets as shown below.
We will make “From” and “To” filter apply to both the worksheets to add some interactivity to the dashboard.
Now it is time for doing some analysis with the simple dashboard that we have created.
1. Which is the busiest station in western suburban?
For someone who knows western suburban local trains inside out it is obvious that Churchgate is the main hub.
Let us choose Churchgate as “From” and All as “To” to understand the trains originating from Churchgate through the day.
Total of 445 trains originate from Churchgate throughout the day with Borivali, Virar and Andheri being top destinations.
2. Which route has the least number of trains running in a day?
I made Count of Train filter visible to add further interactivity. I filtered the routes where train count is between 1 and 5. There are lot of routes where only 1 train runs throughout the day such as between Churchgate to Nalla Sopara, Lower Parel to Virar, Mumbai Central to Borivali etc. I never knew that There was a train that terminated in Jogeshwari !!
Lot of questions can be answered with this visualization and interesting insights can be generated.
Stay tuned for more learning through visualization with Tableau.
One can visit the official Tableau website to find more details about Tableau and its product offering and features.