In the previous article I did discuss about Heteroskedasticity and used Excel to detect and fix it. The process was more helpful in learning some important Excel tricks. But manually doing it always has some flaws and completely relying on it can be burdensome. So a better fashion to deal with heteroskedasticity would be R for a simple reason of its in built capability and higher credibility.
We are already aware about the R and its working as it was discussed in the webinar – Learn Linear Regression in RÂ. If you have missed all this you can fetch the recording of our Webinar from here.
Heteroskedasticity and R
Heterokedasticity can be removed quickly and easily from our model using R! Let’s see how
However before we begin, we need to install some additional packages into R, which contains function to remove Heteroskedasticity.
Packages are additional files containing R functions, data, and compiled code in a well-defined format. These all are available for free and can be added as and when required. There are a lot of packages available in R and you can browse through them and read about them.
The packages that we want for our purpose of removing Heteroskedasticity are called sandwich and lmtest.
Installing these packages is a simple matter of writing a command in the command prompt in R
R automatically downloads and installs the package!
Similarily we download and install the other package
Before we write the functions to remove heteroskedasticity, we will load both our downloaded packages into R memory, using the command
Now you may be wondering, like I wondered, what exactly is the role of the sandwich package. It is easy to find out the details of the package
Use the following command
>vignette("sandwich", package = "sandwich")
This opens the associated manual containing all the details about the package. You should read it in detail to get better understanding of how regression exactly works in R
Now to check whether our data has heteroskedasticity or not, we will construct a variance â€œ covariance matrix.
>vcovHC(FitLinReg, omega = NULL, type = "HC4")
Now vcovHC() function creates a variance â€œ covariance matrix for you. The first parameter is the regression model we have made. Keep omega as NULL. The type parameter refers to what measure of heteroskedasticity is being used. There are 5 types from HC0 to HC4. HC4 is the latest and we use that. You can read more about it in the sandwich package.
This is our variance- covariance matrix. The independent variables are listed both in column and rows. The diagonal elements show the variance of each variable with itself. The diagonal values should have been constant, but since they vary, we can detect the presence of heteroskedasticity!
Finally to remove this heteroskedasticity, we use the coeftest() function in R.
>coeftest(FitLinReg, df = Inf, vcovHC(FitLinReg, omega = NULL, type = "HC4"))
‘df’Â stands for degrees of freedom and we take it as infinite because of the large number of variables.
This has fixed the standard errors in my regression! Incase of any doubts or queries or more tips and how to move forward if you get stuck, let us know in the below comment box and we will get back to you at the earliest.
Related links you will like:
Tutorial: How to identify, summarize and represent data
How to handle different kinds of variables