Heteroskedasticity might sound quite a hard word to pronounce, but it isn’t that difficult a concept to understand. Simply speaking, heteroskedasticity refers to the scenario in which the variability of a variable is unequal across the range of values of a second variable that predicts it. This can be better understood with the help of examples below:
For example-Annual Income can be a heteroskedastic variable when predicted by age, because most teenagers aren't travelling in BMWs bought from their own income. Generally, teen workers earn close to the minimum wages. So there isn't a lot of variability amongst them during the teenage years. However, as they enter into their 20’s or 30’s, some will tend to shoot-up the tax brackets, while income of others will increase more gradually (or perhaps not at all, unfortunately). Put simply, the gap between the "haves" and the "have-nots" is likely to widen with age.Hence, annual income is a heteroskedastic variable (displays non constant variance with increasing age).
Another example could be suppose we’re modeling household expenditures on leisure activities (movies, skiing, vacations, etc.) on weekends. At lower levels of household income, there will be less household-to-household variation in leisure spending than at high levels of household income. That is, the variance in error willbe proportional to household income. This is because poor people have less room in their budget for such variance.
So, in Linear Regression, whenever the condition of constant variance of the residuals is violated, it results in heteroskedasticity.
Issues with Presence of Heteroskedastcity
• The possible presence of heteroskedasticity poses difficulties in the application of regression analysis as it invalidates the statistical tests of significance which assumes that errors are uncorrelated with each other.
• Heteroskedasticity skews the values of the coefficients’ variance that the models return.
Though it is true that heteroscedasticity can cause problems for statistical inference in the linear regression analysis but not all types of heteroscedasticity affect statistical inference as explained below:
Conditional & Unconditional Heteroskedasticity
• Unconditional heteroskedasticity
It occurs when the heteroskedasticity is NOT correlated with the independent variables in the multiple regression. Though it violates one assumption of the linear regression model, it creates no major problems for statistical inference.
• Conditional heteroskedasticity
Variables whose variance changes with their level are conditionally heteroskedastic. For example, you can predict the weight of an object by holding it in your hand. Your prediction will be fairly accurate with a little pounds or kilograms more or less. However, if you are told to estimate the weight of a building, your estimate might be incorrect by thousands of kgs or pounds. So, in Conditional heteroskedasticity, the variance of the guesswork increases with the weight of the object.
• This type of heteroskedasticity is problematic and needs to be seriously taken care of.
• You should examine scatter plots to see if variance increases as independent variable increases.
• You can use Breusch – Pagan chi square statistic test to detect for the presence heteroskedasticity. (BP Test is a very popular test for conditional heteroskedasticity).
Fortunately, correcting for conditional heteroskedasticity is not a tough job and relatively simple. Many statistical software programs automatically search and correct for conditional heteroskedasticity. There are two methods to correct for conditional heteroskedasticity:
• Robust Standard Errors
• Generalized Least Squares
EduPristine has many more theories and examples to understand the concept in details. Write to us email@example.com
to know further!