course bg
EduPristine>Blog>Multivariate Linear Regression: Learn to Handle Different Kinds of Data Variables

Multivariate Linear Regression: Learn to Handle Different Kinds of Data Variables

July 31, 2013

I remember those early days of my career when I was a part of Risk Analytics team at a bank. My Boss had given me an assignment where I had to work on the data from the Banks General Insurance unit. I went clueless seeing that huge data.

The data was related to Auto Insurance and it was not about identifying and defining variable types, rather it was all about creating a data dictionary and doing a Bivariate Regression Analysis by a way of multivariate linear regression. The basic objective was to identify and quantify losses in the insurance portfolio.

Although I did have theoretical understanding of regression, yet it took me 2 sleepless nights applying it practically.

Task 1 of Multivariate Linear Regression: Get the Data Dictionary

  • The moment I received the data from my Boss along with the scary deadlines, my initial job was creating a data dictionary.

  • A data dictionary contains the descriptions of data variables in a data model. The IBM Dictionary of Computing defines Data Dictionary as a "centralized repository of information about data such as meaning, relationships to other data, origin, usage, and format.

  • I noted down each variable from the data and gave description to each of them.

Multivariate Linear Regression


Step 1 of Data Dictionary:

Find unique values for each individual variable from the data. In the above table, since policy number itself is a unique identifier I was left with the task of next eight variables. There was indeed a lot of data.But as we know Excel can handle however large the data is, whether we talk about 3000 rows to 15000 odd rows or increasing columns.

I found that the unique values was done by going through the Ribbon menu and then selecting data and using the Advanced filter option

Data Dictionary

Step 2 of Data Dictionary:

Select the data range you want to extract unique values from and then select the unique values option.

Multivariate Linear Regression

Step 3 of Data Dictionary:

Select the cells where the unique values are to pasted and sort them. After doing this for each you finally create the data dictionary having all the unique values for each variable.

Data Dictionary with Unique Values

Data dictionary with unique values and the type of Variables

Task 2 of Multivariate Linear Regression: Doing a Loss Analysis

Step 1 of Loss Analysis:

Since the task was identifying and quantifying the factors for losses (the dependent variable) a Distribution Analysis was done. Initially I started by creating a scatter plot for the loss data to check for the outliers and created a frequency distribution.

Multivariate Linear Regression

Step 2 of Loss Analysis:

Re-create a scatter plot and frequency distribution: Scatter plot clearly indicates a lot of outliers in the original data which creates problem in forming the relationship between dependent and Independent variable, I analyzed that I should give a cap to the loss data and then re-create a scatter plot and frequency distribution.

Capped Losses- Scatter Plot

Task 3 of Multivariate Linear Regression: Doing Bivariate Profiling

Bivariate analysis is the simultaneous analysis of two variables (attributes). It explores the concept of relationship between two variables, whether there exists an association and the strength of this association, or whether there are differences between two variables and the significance of these differences.

For the Insurance portfolio the losses had to be analyzed using different factors and look for the relationship so that a conclusion can be formed for each factor.

Step 1 of Bivariate Profiling:

I started by doing Loss profiling using the age factor. With the help of pivot table in excel I was able to perform this task efficiently. The first step was selecting the data and using the shortcut key Alt D then P use Next function and then select a new worksheet after that finishes.

Bivariate Profiling

At every age factor there is an associated sum of losses, capped losses to avoid outliers and the number of policies.

Step 2 of Bivariate Profiling:

Select the fields of age, losses, count and capped losses. In the row labels select the age since we are concerned with the associated losses at each age.

Multivariate Linear Regression

Step 3 of Bivariate Profiling: 

From this we calculate average losses, average capped losses and percentage of observations. The percentage of observations is calculated as:

Multivariate Regression

Bivariate Profiling

Step 4 of Bivariate Profiling:

Once we have calculated the above numbers we can plot the data to analyze the relationship between losses, age and percentage of observations.

Bivariate Profiling

This way I carried forward the Bivariate Analysis under Multivariate Linear Regression for other variables too and made my boss happy for the second time in the week. Thus in this way I successfully learned to represent data and create an easily readable chart.

Feel free to ask any doubt in the comments section below

About Author

avatar EduPristine

EduPristine is a member of Adtalem Global Education (NYSE: ATGE), a global education provider headquartered in the United States. Adtalem is a 3 billion dollars (20,000 crores) company that has about 9 institutions and companies with more than 16,000 employees spread across 145 locations. Adtalem takes pride in training 142,000 degree-seeking students all over the world.The organization's purpose is to empower students to achieve their goals, find success and make inspiring contributions to our global community. EduPristine is one of India's leading training providers in Analytics, Accounting, Finance, Healthcare, and Marketing. Founded in 2008, EduPristine has a strong online platform and network of classrooms across India and caters to self-paced learning and online learning, in addition to classroom learning


Interested in this topic?

Our counsellors will get in touch with you with more information about this topic.

* Mandatory Field

Post ID = 29430