January 10, 2018
In this blog, we will look at how to create word cloud systematically with the help of Tableau. One should understand what word cloud is and when it is typically used before getting into “How-To” part.
The Wikipedia definition of word cloud (a.k.a tag cloud) states that “word cloud is a visual representation for text data typically used to depict keyword metadata (tags) on websites, or to visualize free form text.” One can refer to the article (and various others on the Internet) to understand more details about word cloud.
The image below shows a sample word cloud of 100 most used passwords. One can easily interpret that “123456” is most used password as represented by its size followed by “password” followed by“12345678” and so on.
This article on bbc.com analyses Mr. Narendra Modi’s speech as a PM Candidate and as a PM. The image below is sourced from the same article, which depicts Mr.Modi’s words as Prime Minister.
The data has been sourced from howstat.com and formatted appropriately for Tableau’s consumption. This is the first, most important and often time-consuming step before data visualization and exploration can happen. We have batting data for One Day International (ODI) matches played between years 1971 to 2011 with close to 60,000 data points. The below table gives you a quick overview of important dimensions and measures present in the dataset.
|Player name||Score Rate (runs per 100 balls faced)|
As always, we will start with a question. Let us begin.
Who has scored more than 1000 runs against India?
Let us first conceptualize what we are trying to visualize and construct a series of steps to achieve the same.
We need to create a word cloud of Player names of various Countries that have scored equal or more than 1000 Runsversus India.
Note:The words in bold correspond to dimensions or measures we already have in our data.
Step 1: Connect to Data
Step 2: Go to Worksheet
Step 3: Setup a filter. In our case, the filter would be Versus = India
Step 4: Drag Player on to Label
Step 5: Drag Runs (by default Sum is chosen as aggregation method) on to Size
Step 6: Put a filter on Runs for criteria “at least 1000”
Step 7: Choose Marks as Text instead of Automatic. This is the key to creating a Word Cloud in any example that you build.
Step 8: Drag Country on to Color.
Word Cloud is ready. One can observe that Sanath Jayasuriya has scored the most number of runs against India followed by Inzamam and Ricky Ponting. In general, Sri Lankans, Australians and Pakistani batsmen have scored heavily against India. The reason is these four countries have played most ODI matches and have played very frequently against each other.
Surprising none of the England Batsmen feature in the visualization and three of the Zimbabwean batsmen appear in the list.
Here is the count of matches played by these countries against India.
Using Word Cloud for above analysis is certainly not right, tree map or bar chart is the best fit. As one would still be required to understand how much runs scored or how many number of matches are played by those players against India. The take away from this blog is how to create Word Cloud with Tableau. The best scenario for using word cloud is to analyse textual data, their frequency of occurrence. That said, one should be cautious, as Word Cloud emphasize on frequency of the word not necessarily their importance. In addition, they do not provide the context in which those words are used so again Word Clouds are good way to do some quick exploratory analysis of text.
Stay tuned for more exciting visualizations and learning with Tableau.
Tableau (NYSE : DATA ) headquartered in Seattle, Washington has a mission to help people see and understand data. It offers a product portfolio for data visualization focused on business intelligence.
One can visit the official Tableau website to find more details about Tableau and its product offering and features.