347 647 9001+1 714 797 8196Request a Call
Call Me

Addressing Multicollinearity with Real Life Examples

July 8, 2015
, , , , ,
addressing multicollinearity

In previous blogs, we have talked about basic information on multicollinearity and how to detect multicollinearity. In this blog, we have four examples of multicollinearity and I shall tell you how you can address it. These are real life practical examples.

Firstly, if there is multicollinearity in a data set, we need to understand why. Having a solid understanding of the data and the logical relationships between the variables is the first step in understanding the effect of multicollinearity on our results and thereby determining how it should be addressed.

Example 1

You may find that your model contains a predictor variable that has a direct causal relationship with another predictor variable.

For example, you may be looking at contributions to town charity organizations using a model that includes the population of the town and the total gross income of the town. You identify that these variables are highly correlated because the population of the town is a direct contributor to the total gross income of the town.

In a case like this, you should restructure your model to avoid regressing on two variables that are causally related. You could do this by either omitting one of these variables or by combining them into a single ratio variable such as per capita income.

Example 2

You may find that your model contains two predictor variables that are manifestations of a common, underlying latent variable or construct. This is often referred to as the halo effect.

For example, you may be looking at customer loyalty to a shop using a model that includes several different measures of satisfaction. You identify that two of these measures of satisfaction (satisfaction with quality of product and satisfaction with the network) are highly correlated and determine that it is because customer don’t tend to describe out satisfaction in that way. Rather, both measures of satisfaction are really a reflection of the same measure of overall satisfaction.

In this case, you could simply use overall satisfaction as a predictor variable instead of the separate measures of satisfaction.

Example 3

You may find that the multicollinearity is a function of the design of the experiment.

For example, in the cloth manufacturer case, we saw that advertising and volume were correlated predictor variables, resulting in major swings in the impact of advertising when volume was and was not included in the model. In further examination, you may discover that the cloth manufacturer may have inadvertently introduced multicollinearity between volume and advertising as part of the experimental design by assigning a high ad budget to cities with smaller stores and a low ad budget to cities with larger stores.

If you were able to re-do the market test, you could address this issue by restructuring the experiment to ensure a good mix of high ad/low volume, high ad/high volume, low ad/high volume and low ad/low volume stores. This would allow you to eliminate the multicollinearity in the data set.

It is often not feasible though, to re-do an experiment. This is why it is important to very carefully analyze the design of a controlled experiment before beginning so that you can avoid accidentally causing such problems. If you have found multicollinearity as a result of the experimental design and you cannot re-do the experiment, you can address the multicollinearity by including controls. In the case of the cloth manufacturer, it will be important to include volume in the model as a control in order to get a better true estimate for the impact of advertising. Other solutions to addressing multicollinearity in cases like this include shrinkage estimations such as principal components regression or partial least-squares analysis.

Example 4

Sometimes, you will find that multicollinearity is inevitable.

In the real world, you are often not working with completely controlled experiments where you can ensure that there is no relationship between your predictor variables through your experimental design. Predictor variables may be closely related to one another, but may not have a direct causal relationship with each other or with a latent variable. In this case, you cannot just remove or replace one of the variables.

The Radison Medical case is an example of this situation: both sales and reps, though correlated, are important predictor variables and should be included in the model. It would not be appropriate to analyze either one without controlling for the other. For example, adding hundreds of new ads without also increasing the number of reps will not have the same effect on sales as increasing both would. In cases such as this, you need to recognize the multicollinearity, accept it as part of your model and ensure your analysis and recommendations consider the relationship between the variables.

Hope this blog has given you good examples to deal with multicollinearity. If you have any queries or doubts then feel free to mention them in the comments box below and I shall get back to you at the earliest.

Other Articles by the same author:

Dealing with Multicollinearity

Detecting Multicollinearity

Other Related Links that you may like:

Creating Decision Tree

Analytics and Modeling


About the Author

Mohammad Arshad Ahmad is a Analytics Advisor Specialist in Accenture. He has over 8.5 years’ of experience in Analytics covering web, social, marketing, Talent & HR and retail domain. Prior to Accenture, he has been part of Absolutdata and Hewlett Packard. He has done MBA from Asian Institute of Management and engineering from BIT, Mesra


Global Association of Risk Professionals, Inc. (GARP®) does not endorse, promote, review or warrant the accuracy of the products or services offered by EduPristine for FRM® related information, nor does it endorse any pass rates claimed by the provider. Further, GARP® is not responsible for any fees or costs paid by the user to EduPristine nor is GARP® responsible for any fees or costs of any person or entity providing any services to EduPristine Study Program. FRM®, GARP® and Global Association of Risk Professionals®, are trademarks owned by the Global Association of Risk Professionals, Inc

CFA Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by EduPristine. CFA Institute, CFA®, Claritas® and Chartered Financial Analyst® are trademarks owned by CFA Institute.

Utmost care has been taken to ensure that there is no copyright violation or infringement in any of our content. Still, in case you feel that there is any copyright violation of any kind please send a mail to and we will rectify it.

Popular Blogs: Whatsapp Revenue Model | CFA vs CPA | CMA vs CPA | ACCA vs CPA | CFA vs FRM

Post ID = 77840