course bg
EduPristine>Blog>Regression Analysis: Learn the art of Selecting Variables and Creating Customized Bands

Regression Analysis: Learn the art of Selecting Variables and Creating Customized Bands

June 24, 2013

Doing Regression Analysis has always been a tough task as there are many complicated terms involved in it like Heteroskedasticity, Multicollinearity, Adjusted R2 and many more.

The lack of practical knowledge on Regression Analysis or reluctance on taking training makes a job really tough.

The task of doing Variable selection and creating customized bands is tough because firs you need understand and then come to a conclusion. I want to help you guys so that you don’t end up landing in the same situation like me.

Task 1 of Regression Analysis: Selection of Variables

In Regression there involves Selecting Variable which will prove relation with the dependent variable. With the Insurance data assigned to me previously I had the age, age group, Vehicle age, Vehicle age group, years of driving experience and so on.

How to select variables in Regression Analysis

In Regression Analysis, selecting right variables is an important point as the final decision is taken depending on this relation. The first task I had with these Independent variables was to check for losses (dependent variable) due to these variables. Multicollinearity means the correlation between independent variables, the lower this correlation the better it is for us.

Also Banding of variables is also an important criterion we have to look in Regression. This involves selecting the Age or Age band or Vehicle age or Vehicle age band. Once we are done with checking for Multicollinearity and banding of variables, a statistical significance of variables is tested.

Task 2 of Regression Analysis:Checking the Correlation

With the data in hand, I first started taking out the correlation between age and years of driving experience. Getting Correlation with so much data in hand wasn’t an easy task, but it got simplified with the help of MS Excel. I used the function CORREL and selected the column of age and Years of driving experience and got the result within a span of few seconds.


Correlation Check in Regression Analysis

Now the result I got showed a high correlation between the two variables, thus failing the multicollineary test. As I knew I can use either of the variables but the question was which variable I should choose. I searched on the internet and looked in some books and came out with an answer that variable with a better fit should be used.

How to select a variable that best fits the model can be done on the basis of running a regression between dependent and independent variables, that is capped losses and age after that capped losses and years of driving experience. This is by fitting two models independently. I used excel and came to know the result. Let me show you how.

Step1: Go to Data -> Data Analysis from the ribbon menu.

Regression Analysis-  Data Analysis

Step2: Select the Regression function.


Regression Analysis- Regression Function

Step 3: Put the respective inputs, in this case capped losses and age.


Regression Analysis: Inputs

Step 4: In the new sheet we have the result of the Regression


Regression Analysis: Result

Similarly with the same steps, capped losses and Years of driving experience Regression were run.

Regression Analysis

The final decision was based on R2 which shows that how much an independent variable explains the dependent variable. Since age had a R2, it was selected over years of driving experience.

Task 3 of Regression Analysis: Customization of Bands

Now once I checked for Multicollinearity and which variables to be used, I had to check whether to use the age band or age. To do this again we have to run a Regression using age and age band. The problem arises as running the regression will be in Excel and Excel doesn’t give results for non-numerical values.

For this the age band has to be converted to average age so that it becomes numerical variable for a categorical variable.


Regression Analysis- Customize Bands For each Age band Average Age is provided.


Since for each policy there should be Average Age, we use V-look up function

Regression Analysis- V-look up

Now with the Age and Average age and which of the variable is to be selected out of it, I ran a Regression in the similar way as I did for Years of driving experience and Average age.


Running Regression Analysis on data

Now that I had the result and a comparison after running the regression, I took the average age or to be clearer the age band over age for the simple reason being the R-square. Thus in selecting the variables I would use age band over age for the age factor.

In the same way I performed the regression for Vehicle age and age band. I got the similar result of Average Vehicle age or Age band over Vehicle age.

Task 4 of Regression Analysis: Dummy Variables

As I discussed before, excel doesn’t take any other values apart from Numerical it was necessary to convert Binary variables present in the data to Numerical one.

The Binary variables such as Gender, fuel type etc needs to be converted. This can be done by giving values to the variables like in the case of Married status.

Use of binary variables for variables

Task 5 of Regression Analysis: Regression Inputs

Finally after selecting the variables based on R2, taking out the band and then creating dummy variables we have the data on which we can run the Multivariate Regression.


Data on which Multivariate Regression will be run

Snapshot of the final data on which we will run the Multivariate Regression

In this way after hours of search and learning on different Excel functions I was able to create the data needed to run the Regression Analysis.

For indepth study of Analytics , have a look at our Business Analytics Program

About Author

avatar EduPristine

EduPristine is a member of Adtalem Global Education (NYSE: ATGE), a global education provider headquartered in the United States. Adtalem is a 3 billion dollars (20,000 crores) company that has about 9 institutions and companies with more than 16,000 employees spread across 145 locations. Adtalem takes pride in training 142,000 degree-seeking students all over the world.The organization's purpose is to empower students to achieve their goals, find success and make inspiring contributions to our global community. EduPristine is one of India's leading training providers in Analytics, Accounting, Finance, Healthcare, and Marketing. Founded in 2008, EduPristine has a strong online platform and network of classrooms across India and caters to self-paced learning and online learning, in addition to classroom learning


Interested in this topic?

Our counsellors will get in touch with you with more information about this topic.

* Mandatory Field

Post ID = 30295