347 647 9001+1 714 797 8196Request a Call
Call Me

Regression Analysis: Learn the art of Selecting Variables and Creating Customized Bands

June 24, 2013
, , , ,

Doing Regression Analysis has always been a tough task as there are many complicated terms involved in it like Heteroskedasticity, Multicollinearity, Adjusted R2 and many more.

The lack of practical knowledge on Regression Analysis or reluctance on taking training makes a job really tough.

The task of doing Variable selection and creating customized bands is tough because firs you need understand and then come to a conclusion. I want to help you guys so that you don't end up landing in the same situation like me.

Task 1 of Regression Analysis: Selection of Variables

In Regression there involves Selecting Variable which will prove relation with the dependent variable. With the Insurance data assigned to me previously I had the age, age group, Vehicle age, Vehicle age group, years of driving experience and so on.

How to select variables in Regression Analysis

In Regression Analysis, selecting right variables is an important point as the final decision is taken depending on this relation. The first task I had with these Independent variables was to check for losses (dependent variable) due to these variables. Multicollinearity means the correlation between independent variables, the lower this correlation the better it is for us.

Also Banding of variables is also an important criterion we have to look in Regression. This involves selecting the Age or Age band or Vehicle age or Vehicle age band. Once we are done with checking for Multicollinearity and banding of variables, a statistical significance of variables is tested.

Task 2 of Regression Analysis:Checking the Correlation

With the data in hand, I first started taking out the correlation between age and years of driving experience. Getting Correlation with so much data in hand wasn't an easy task, but it got simplified with the help of MS Excel. I used the function CORREL and selected the column of age and Years of driving experience and got the result within a span of few seconds.


Correlation Check in Regression Analysis

Now the result I got showed a high correlation between the two variables, thus failing the multicollineary test. As I knew I can use either of the variables but the question was which variable I should choose. I searched on the internet and looked in some books and came out with an answer that variable with a better fit should be used.

How to select a variable that best fits the model can be done on the basis of running a regression between dependent and independent variables, that is capped losses and age after that capped losses and years of driving experience. This is by fitting two models independently. I used excel and came to know the result. Let me show you how.

Step1: Go to Data -> Data Analysis from the ribbon menu.

Regression Analysis-  Data Analysis

Step2: Select the Regression function.


Regression Analysis- Regression Function

Step 3: Put the respective inputs, in this case capped losses and age.


Regression Analysis: Inputs

Step 4: In the new sheet we have the result of the Regression


Regression Analysis: Result

Similarly with the same steps, capped losses and Years of driving experience Regression were run.

Regression Analysis

The final decision was based on R2 which shows that how much an independent variable explains the dependent variable. Since age had a R2, it was selected over years of driving experience.

Task 3 of Regression Analysis: Customization of Bands

Now once I checked for Multicollinearity and which variables to be used, I had to check whether to use the age band or age. To do this again we have to run a Regression using age and age band. The problem arises as running the regression will be in Excel and Excel doesn't give results for non-numerical values.

For this the age band has to be converted to average age so that it becomes numerical variable for a categorical variable.


Regression Analysis- Customize Bands For each Age band Average Age is provided.


Since for each policy there should be Average Age, we use V-look up function

Regression Analysis- V-look up

Now with the Age and Average age and which of the variable is to be selected out of it, I ran a Regression in the similar way as I did for Years of driving experience and Average age.


Running Regression Analysis on data

Now that I had the result and a comparison after running the regression, I took the average age or to be clearer the age band over age for the simple reason being the R-square. Thus in selecting the variables I would use age band over age for the age factor.

In the same way I performed the regression for Vehicle age and age band. I got the similar result of Average Vehicle age or Age band over Vehicle age.

Task 4 of Regression Analysis: Dummy Variables

As I discussed before, excel doesn’t take any other values apart from Numerical it was necessary to convert Binary variables present in the data to Numerical one.

The Binary variables such as Gender, fuel type etc needs to be converted. This can be done by giving values to the variables like in the case of Married status.

Use of binary variables for variables

Task 5 of Regression Analysis: Regression Inputs

Finally after selecting the variables based on R2, taking out the band and then creating dummy variables we have the data on which we can run the Multivariate Regression.


Data on which Multivariate Regression will be run

Snapshot of the final data on which we will run the Multivariate Regression

In this way after hours of search and learning on different Excel functions I was able to create the data needed to run the Regression Analysis.

For indepth study of Analytics , have a look at our Business Analytics Program


About the Author

Trusted by Fortune 500 Companies and 10,000 Students from 40+ countries across the globe, it is one of the leading International Training providers for Finance Certifications like FRM®, CFA®, PRM®, Business Analytics, HR Analytics, Financial Modeling, and Operational Risk Modeling. EduPristine has conducted more than 500,000 man-hours of quality training in finance.


Global Association of Risk Professionals, Inc. (GARP®) does not endorse, promote, review or warrant the accuracy of the products or services offered by EduPristine for FRM® related information, nor does it endorse any pass rates claimed by the provider. Further, GARP® is not responsible for any fees or costs paid by the user to EduPristine nor is GARP® responsible for any fees or costs of any person or entity providing any services to EduPristine Study Program. FRM®, GARP® and Global Association of Risk Professionals®, are trademarks owned by the Global Association of Risk Professionals, Inc

CFA Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by EduPristine. CFA Institute, CFA®, Claritas® and Chartered Financial Analyst® are trademarks owned by CFA Institute.

Utmost care has been taken to ensure that there is no copyright violation or infringement in any of our content. Still, in case you feel that there is any copyright violation of any kind please send a mail to and we will rectify it.

Popular Blogs: Whatsapp Revenue Model | CFA vs CPA | CMA vs CPA | ACCA vs CPA | CFA vs FRM

Post ID = 30295