Analytics Tutorial: Learn Linear Regression in R
August 8th, 2013
There is often a gap in what we are taught in college and the knowledge that we need to possess to be successful in our professional lives. This is exactly what happened to me when I joined a consultancy firm as a business analyst. At that time I was a fresher coming straight from the cool college atmosphere, newly exposed to the Corporate Heat.
One day my boss called me to his office and told me that one of their clients, a big insurance company, was facing significant losses on auto insurance. They had hired us to identify and quantify the factors responsible for it. My boss emailed me the data that the company had provided and asked me to do a multivariate linear regression analysis on it. My boss told me to use R and make a presentation of the summary.
Now as a statistics student I was quite aware of the principles of a multivariate linear regression, but I had never used R. For those of you who are not aware, R is a statistical programming language. It is a very powerful tool and widely used across the world in analyzing data. Of course, I did not know this at that time.
Anyways, it took me a lot of surfing on the internet and reading books to learn how to fit my model in R. and now I want to help you guys save that time!
Also Join our Exclusive FREE Webinars on the same topic
Getting Started with R
R is an open source tool easily available on the internet. I'll assume you have it installed on your computer. Else you can easily download and install it from www.r-project.org/
I have already converted the raw data file from the client into a clean .csv (comma separated) file. click here to download the file.
I've saved this on the D drive of computer in a folder called Linear_Reg_Sample. You can save it anywhere, but remember to change the path wherever a file path is mentioned.
Open the R software that you've installed. It's time to get started!
Let's Start Regression in R
The first thing to do is obviously read all our data in R. This can be easily done using the command: >LinRegData <- read.csv(file = "D:\\Linear Reg using R\\Linear_Reg_Sample_Data.csv")
Here we read all the data into an object LinRegData, using a function read.csv().
NOTE: If you observe closely, you'll see that we have used \\ instead of a \. This is because of the construct of the language. Whenever you enter a path, make sure to use \\
Let's see if our data has been read by R. Use the following command to get a summary of the data: >summary(LinRegData)
This will give output
Image 1: Summary of input data
In the output you can see the distribution of data. The min, max, median, mean are shown for all the variables.
Performing the Regression Analysis
Now that the data has been loaded, we need to fit a regression model over it.
We will use the following command in R to fit the model: >FitLinReg <- lm(Capped_Losses ~ Number_Vehicles + Average_Age + Gender_Dummy + Married_Dummy + Avg_Veh_Age + Fuel_Type_Dummy, LinRegData)
In this command, we create an object FitLinReg and store the results of our regression model in it. The lm() function is used to fit the model. Inside the model, Capped_Losses is our dependent variable which we are trying to explain using the other variables that are separated by a + sign. The last parameter of the formula is the source of the data.
If no error is displayed, it means our regression is done and the results are stored in FitLinReg. We can see the results using two commands:
The summary command gives us the intercepts of each variable, its standard error, t value and significance.
The output also tells us what the significance level of each variable is. For e.g., a *** variable highly is significant, a ** variable is significant at the 99.9% level and a space next to the variable indicates that it is not significant.
We can easily see that the Number_Vehicles variable is not significant and does not affect the model. We can remove this variable from the model.
If you go through what we've done till now, you will realize that it took us just two commands to fit a multivariate model in R. See how simple life has become!!!
In this way I learnt how to fit a regression model using R. I made a summary of my findings and made a presentation to the clients.
My boss was rather happy with me and I received a hefty bonus that year.
This is but a small example of the power of data analytics and R.
To learn more about Business Analytics, consider having a look at this exclusive offer. EduPristine, a name trusted by Fortune 500 Companies, is launching its time tested Business Analytics Program in US.
With this Program, you will learn to solve business problems using analytics in a variety of fields like retail, FMCG, financial services, telecom, etc.
Join EduPristine's exclusive program on Business Analytics to infuse confidence in every decision and seize opportunities to be more proactive and innovative Analyst.
For further info on : Business Analytics Program
About the Author
EduPristine is trusted by Fortune 500 Companies and 10,000 Students from 40+ countries across the globe. One of the leading International Training providers for Finance Certifications like FRM®, CFA®, PRM®, Business Analytics, HR Analytics, Financial Modeling, Operational Risk Modeling etc. It was founded by industry professionals who have worked in the area of investment banking and private equity in organizations such as Goldman Sachs, Crisil - A Standard & Poors Company, Standard Chartered and Accenture. EduPristine has conducted corporate training for various leading corporations and colleges like JP Morgan, Bank of America, Ernst & Young, Accenture, HSBC, IIM C, NUS Singapore etc. EduPristine has conducted more than 500,000 man-hours of quality training in finance.
FREE 10 days
- about edupristine
- about pristine
- Business Analytics
- career guidance
- Career in Finance
- CFA Concept Checkers
- CFA exam prep
- CFA exam updates
- CFA high scoring tips
- CFA level 1 and 2
- CFA practice test
- Chartered Financial Analyst (CFA)
- classes updates
- current affairs learning
- Detailed CFA blogs
- Diagnose FRM
- edupristine new updates
- exam tips
- Excel FRM
- Fin mod templates
- Financial Modeling
- Financial modeling concept checkers
- Financial Modeling Templates
- Financial Risk Manager (FRM)
- Free Quizzes
- FRM concept checkers
- FRM exam analysis
- FRM exam prep
- FRM exam updates
- FRM expert interviews
- FRM Interviews
- FRM Questions and Answers
- GARP and FRM news
- General updates
- job openings
- job profiles
- jobs in finance
- Learning systems updates
- News and Discussion
- open source financial models
- other courses
- other job openings
- pristine new updates
- Professional Risk Manager (PRM)
- Project Management
- Questions of the day
- Tips and Material
- Value at Risk (VAR)
- webinar blogs
- webinar updates
- webinar updates