• While using Linear Regression, if you move too far on the X-axis, the predicted values will be greater than one and less than zero. This could pose problems for subsequent analysis.
• One of the assumptions of Linear Regression is the presence of homoscedasticity (i.e variance of Y is constant across different values of X).This is not true in case of a binary variable.
• The error terms are not normally distributed.
Sample Case Study on Logistic Regression
Suppose there is researcher who wants to know what factors/variables influence the admission process into an MBA programme in Cambridge University.
After detailed research, he came out with the following three most influential factors:
1) GRE (Graduate Record Examination) Scores (GRE)
2) GPA (Grade Point Average) obtained in graduation (GPA)
3) Prestige of the Graduate Institution/ Ranking of the institution (RANK)
The above three variables are the predictor variables in the regression and the response variable is admit/don’t admit is a binary variable.
Now you can run Logistic Regression on the data to find out the logistic regression coefficients which give the change in log odds of the outcome for a unit increase in the predictor variables (GRE, GPA, RANK).
Please note that we cannot use Linear Regression here because the response variable is a Binary variable unlike a continuous variable in case of Linear Regression.
Running Logistic Regression Using R
Excel cannot compute Logistic Regression and hence we useR statistical package/software to run Logistic Regressions. This is due to the following drawbacks in excel:
• Excel does not handle categorical predictors.
• Output in Excel may be incomplete or may not be properly labelled,thus increasing possibility of misidentifying output.
• We need to repeat requests for some analyses multiple times in order to run it for multiple variables, or to request multiple options.
Below we can see a snapshot while running Logistic Regression using R:
• So, from the above discussion, we now know that the output of Logistic Regression is in the range of (0,1). These results are then used by risk managers to select/reject customers or by management to take strategic decisions etc.
• However, before the Logistic Regression output is used for further analysis, it has to be mapped to a more readable format.
• Thus, the output is transformed into a set of scores, also referred to as the Scorecard.
• A properly transformed scorecard produces the same result as obtained from fundamental model output.