course bg
EduPristine>Blog>Predictive Modeling: Using R to Understand Heteroskedasticity and Fix it

Predictive Modeling: Using R to Understand Heteroskedasticity and Fix it

July 23, 2013

In the previous article I did discuss about Heteroskedasticity and used Excel to detect and fix it. The process was more helpful in learning some important Excel tricks. But manually doing it always has some flaws and completely relying on it can be burdensome. So a better fashion to deal with heteroskedasticity would be R for a simple reason of its in built capability and higher credibility.

We are already aware about the R and its working as it was discussed in the webinar – Learn Linear Regression in R. If you have missed all this you can fetch the recording of our Webinar from here.

Heteroskedasticity and R

Heterokedasticity can be removed quickly and easily from our model using R! Let’s see how

However before we begin, we need to install some additional packages into R, which contains function to remove Heteroskedasticity.

Packages are additional files containing R functions, data, and compiled code in a well-defined format. These all are available for free and can be added as and when required. There are a lot of packages available in R and you can browse through them and read about them.

The packages that we want for our purpose of removing Heteroskedasticity are called sandwich and lmtest.

Installing these packages is a simple matter of writing a command in the command prompt in R


 heteroskedasticity : installing package

Image1: Installing sandwich Package

R automatically downloads and installs the package!

Similarily we download and install the other package


Removing Heteroskedasticity

Before we write the functions to remove heteroskedasticity, we will load both our downloaded packages into R memory, using the command



Now you may be wondering, like I wondered, what exactly is the role of the sandwich package. It is easy to find out the details of the package

Use the following command

>vignette("sandwich", package = "sandwich")

This opens the associated manual containing all the details about the package. You should read it in detail to get better understanding of how regression exactly works in R

Heteroskedasticity: Package Manual

Image 2: Package Manual


Image 3: Estimating variance- covariance matrix

Now to check whether our data has heteroskedasticity or not, we will construct a variance “ covariance matrix.

>vcovHC(FitLinReg, omega = NULL, type = "HC4")

Now vcovHC() function creates a variance “ covariance matrix for you. The first parameter is the regression model we have made. Keep omega as NULL. The type parameter refers to what measure of heteroskedasticity is being used. There are 5 types from HC0 to HC4. HC4 is the latest and we use that. You can read more about it in the sandwich package.

Heteroskedasticity: Variance- Covariance Matrix

Image 4: Variance- Covariance Matrix

This is our variance- covariance matrix. The independent variables are listed both in column and rows. The diagonal elements show the variance of each variable with itself. The diagonal values should have been constant, but since they vary, we can detect the presence of heteroskedasticity!

Finally to remove this heteroskedasticity, we use the coeftest() function in R.

>coeftest(FitLinReg, df = Inf, vcovHC(FitLinReg, omega = NULL, type = "HC4"))

‘df’ stands for degrees of freedom and we take it as infinite because of the large number of variables.

Heteroskedasticity : Errors removed

Image 5: Variance- Covariance Matrix with Heteroskedasticity errors removed 

This has fixed the standard errors in my regression! Incase of any doubts or queries or more tips and how to move forward if you get stuck, let us know in the below comment box and we will get back to you at the earliest.

About Author

avatar EduPristine

Trusted by Fortune 500 Companies and 10,000 Students from 40+ countries across the globe, it is one of the leading International Training providers for Finance Certifications like FRM®, CFA®, PRM®, Business Analytics, HR Analytics, Financial Modeling, and Operational Risk Modeling. EduPristine has conducted more than 500,000 man-hours of quality training in finance.


Interested in this topic?

Our counsellors will get in touch with you with more information about this topic.

* Mandatory Field

Post ID = 32396