When it comes to data science one of the most common points of debate is R vs SAS. It is a well-known fact that R and SAS are the most important two languages to be learned for data analysis
Introduction to the languages
R is the lingua franca for statistics. It is an open source programming language, free to access and pen to all to perform data analysis tasks. It is supported by the R Foundation for basically Statistical Computing. The R language is widely used among data miners for developing statistical software and data analysis. The source code for the R software environment is written primarily in C, FORTRAN, and R. Language R is freely available under the GNU General Public License and is pre-compiled binary versions that are provided for various operating systems. While R has a command line interface, there are several graphical front-ends also available.
The SAS language is a programming language used for statistical analysis, originated by the project at the North Carolina State University. It can input data from common spreadsheets and several databases and output the results of statistical analysis in form of tables, graphs, and as RTF, HTML, and PDF documents.
SAS is a commercial purpose software. It is expensive and still beyond the reach for most of the professionals. So, until and unless you are in an Organization which has in past invested in SAS, it might get difficult to access one.
R programming and Python, on the other hand, are free and could be downloaded by anyone from anywhere. This in itself is one of the biggest advantage for R especially for hobbyists, academicians and new folks who want to get started with data science
Let us try and address few myths about R
1. R is not easy to learn
People think SAS is easier to read than R for someone who doesn’t already know the language and is certainly easier to pick up. However, this was partly true a while back. With packages like dplyr, data.table, ggplot2 and related resources, R has, in fact, has become a lot easier to program (especially data wrangling). With added advantage of flexibility
2. Private Companies/Large Organizations are not using R
The usage of R has been growing both in academics and corporations, few large corporations insist on using SAS, but SMEs and start-ups are increasingly opting for R, given that it’s free. The current job trend seems to show that while SAS is losing its momentum, R is gaining potential. Since R is on an uphill path, it can probably witness more jobs in the future, albeit also in huge corporates.
Few examples of usage are –
- Facebook – For behaviour analysis related to status updates and profile pictures.
- Google – For advertising effectiveness and economic forecasting.
- Twitter – For data visualization and semantic clustering
- Microsoft – Acquired Revolution R Company and use it for a variety of purposes.
- Uber – For statistical analysis
- Airbnb – Scale data science.
- IBM – Joined R Consortium Group
- ANZ – For credit risk modeling
3. Distrust of freeware/open source
Several people say they aren’t willing to accept results from R because you don’t have a for-profit company vetting the code to ensure it gives correct results before it goes out to customers, lest they end up losing business. Actually, it’s a bit of a mindset shift; typically the most widely used and well-established packages have had so many people testing and using them, this isn’t an issue.
4. Not enough statistical packages
R has more than 10,000 packages that can be used from statistical analysis to deep learning to visualizations. Not aware of any statistical analyses that can be done in SAS that cannot be done in R. The opposite is more likely if anything!
5. R cannot handle big data
It’s true that R uses physical memory for processing data and cannot process datasets with sizes more than available physical memory. However, it can run parallel computations, support integrations for Hadoop, Spark, Cloudera and Apache Pig among others. Also, the availability of devices with better RAM capacity, especially cloud infrastructure negate the disadvantages of R
6. No support is available
Since R is free open-source software, expecting customer support will be hard to justify. However, it has a vast online community that can help you with almost everything. There is a lot of information available in form of blogs, StackOverflow, CRAN documentation etc.
Having addressed the myths about R, it is quite obvious that R is hands down the best tool for learning data science. Let’s prove it with some data 🙂
The below chart (source: Stackoverflow) shows the popularity of R over years. It helps us understand how fast “R” is growing and how vibrant is its community on Stackoverflow
In addition to addressing the myths, let’s try and see specific advantages that R has over any tool
R studio is the best, easiest to install, run and maintain IDE out of anything out there for a newbie. You don’t need to configure, you don’t need to set environment variables, and you don’t need to configure the IDE to find your R distribution. You just install RStudio and you’re ready to go and you have a package manager built in to
2. Advancements in tool
R since it is open source software and has many contributors throughout the world. With a growing community and CRAN, the advancements and functionalities are added round the clock
3. It’s Free 🙂
R is an open source software and is completely free to use. Anyone can begin using it right away without having to spend a penny. So, regarding availability and cost, R is hands down the better tool
4. Visualization and Graphic Properties
Graphical capabilities or data visualization is the strongest forte of R. It has access to packages like ggplot, RGIS, Lattice, and GGVIS among others which provide superior graphical competency.
With the above advantages, R has become indispensable for any data analysts when compared to other tools such as SAS and SPSS. The world is moving away from commercial software to open source. R has an offering for everyone, whether it may be an existing data scientist, a newbie or an academician. Especially for people who want to get started on data science R must be part of their arsenal
For these reasons, EduPristine offers Predictive business analytics course in R which caters to different aspects of data science.
It is an intensive, 100+ hour’s program curated by Industry experts for high-performing individuals who wish to master the tools of predictive analytics and give a boost to their careers.