Predictive Modeling Using Excel
Consider the following table carefully!!
Following are the marks obtained in a test:
If I study for five more hours, will it actually increase my marks??? If the same question lingers over your mind, then the answer to this is hidden in Predictive Modeling using Linear Regression. Predictive Modeling basically involves statistical analysis of past behaviour to simulate future results.
Uses of Linear Regression
1) It is widely used for prediction and forecasting
2) Or explaining the impact of changes in an independent variable on the dependent variable.
For instance in the above example, we can find out the impact of studying for more no of hours (independent variable) on the total marks obtained (dependent variable).
How do you get started?
1) What do you require or what do you need to know –We want to know whether increasing the no of hours of study will really help in increasing total marks obtained.

2) What do you know? – We have a small sample dataset as depicted in the table above.

3) What do you wish to know? - We need to know population regression function ie how are marks obtained and the no of hours of study related to each other.

4) What is predictable? – We need to predict the population regression function from sample regression function as shown below:
5) What predicts? – In this session, you will learn how to use “Ordinary Least Squares method” to predict the population function. We will also see how to detect errors (TSS, RSS etc) in regression.

In daily life, we generally encounter problems in which the output of our actions is influenced by more than one factors. In other words, we move from the simple linear regression model with one predictor to the multiple linear regression model with two or more predictors.
For better understanding, let’s have a look at a real life example where multiple linear regression model is used.
Mini Case Example:
Does an individual’s brain size and body size (height, weight etc) predictive of his/her intelligence??

Now to answer this research question, we first need to identify the predictor and response variables.
• Response variable (y): Performance IQ Score (PIQ) of an individual
• Potential Predictor Variable (x1): Brain Size obtained from MRI scan provided by doctor
• Potential Predictor Variable (x2): Height in inches
• Potential Predictor Variable (x3): Weight in kgs

A common way of investigating the relationships among all of the variables is by way of a "scatter plot matrix." which contains a scatter plot of each pair of variables arranged in an orderly arrangement. Here's how a scatter plot matrix looks like for our brain and body size case study:

Image Courtesy:

Below we can see the multiple linear regression model with three quantitative predictors (brain size, height and weight) :
Where independent error terms εi follow a normal distribution with mean 0 and equal variance σ 2.

The multiple regression model formulated above will try to answer the below question concerns:
• Which predictors (if any) -brain size, height, or weight - explain some of the variation in intelligence scores? Or conduct hypothesis tests for testing whether the slope parameters are 0.
• What is the effect of brain size on PIQ, after taking into account height and weight? Or calculate and interpret a confidence interval for the PIQ slope parameter.
• What is the PIQ of an individual with a given brain size, height, and weight? Or Calculate and interpret a prediction interval for the response.

However, one point to be noted is that whether the individual predictor variables (brain size, height, weight etc) are correlated with each other or not?

• If yes, then it will create a problem in running linear regression, also referred to as Multicollinearity. It is not a mistake in model specification but happens due to the nature of data at hand.
• A simple test for detecting multicollinearity is to conduct artificial regressions between each independent variable (as the “dependent” variable) and the remaining independent variables
• High R2, highly significant F-test, but few or no statistically significant t tests are a symptom of the presence of multicollinearity in the model.
Addressing Multicollinearity
• Manual variable selection - By excluding those independent variables which appear to causing problem.
• If possible, one should obtain more data for the sample considered. This is the preferred solution. More data can produce more precise parameter estimates (with lower standard errors).

EduPristine has many more theories and examples to understand the concept in details. Write to us to know further!


Global Association of Risk Professionals, Inc. (GARP®) does not endorse, promote, review or warrant the accuracy of the products or services offered by Edu for FRM® related information, nor does it endorse any pass rates claimed by the provider. Further, GARP is not responsible for any fees or costs paid by the user to Edu nor is GARP responsible for any fees or costs of any person or entity providing any services to Edu Study Program. FRM®, GARP® and Global Association of Risk Professionals®, are trademarks owned by the Global Association of Risk Professionals, Inc

CFA Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by EduPristine. CFA Institute, CFA®, Claritas®, and Chartered Financial Analyst® are trademarks owned by CFA Institute.

Utmost care has been taken to ensure that there is no copyright violation or infringement in any of our content. Still, in case you feel that there is any copyright violation of any kind please send a mail to and we will rectify it.

Popular Blogs: Whatsapp Revenue Model | CFA vs CPA | CMA vs CPA | ACCA vs CPA | CFA vs FRM

2015 © Edupristine. ALL Rights Reserved.

tick_classroom_city_course.php Post ID = 59389