347 647 9001+1 714 797 8196Request a Call
Call Me

What is Hadoop

December 11, 2014

Hadoop is a free, Java-based programming framework that enables the processing of large data sets in a distributed computing environment. It is part of the Apache open source project sponsored by the Apache Software Foundation.

Hadoop makes it possible to run applications on systems with thousands of nodes involving Hadoop enables running applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system, HDFS, assists rapid data transfer rates among the nodes while allowing the system to continue operating uninterruptedly despite a node failure. This process lowers the overall risk of system failure at a catastrophic level, even if a significant number of nodes become inoperative.

Inspiration for Hadoop was Google’s MapReduce, a framework in which an application is broken down into numerous small parts, also known as fragments or blocks. Any of these can be run on any node in the cluster. The name Hadoop is given by its creator, Doug Cutting, after his child’s stuffed toy elephant. Currently, the Apache Hadoop ecosystem consists of Hadoop Kernel, MapReduce, HDFS (Hadoop Distributed File System) and a few related projects such as Apache Hive, HBase and Zookeeper.

The hadoop framework was initially used by major players such as Google, yahoo and IBM for applications involving search engines and advertising. The preferred OS is either Windows or Linux for Hadoop but it can also be run on OS X and BSD.

So what’s the fuss all about?

The reason Hadoop is so popular is because Hadoop enables a computing solution that is:

  • Scalable – New nodes can always be added based on the needs without any requirement to change the data formats or its loading, how jobs are written or the top applications.

  • Cost effective – Hadoop brings the massive parallel computing to commodity servers. As a result, there is a sizeable decrease in the cost per terabyte of storage. This makes modelling all of data pretty affordable.

  • Flexible – Hadoop is schema-less, and is capable of absorbing any type of structured or unstructured data from any number of sources. Multiple source data can be joined and aggregated in arbitrary ways enabling deeper analysis that cannot be provided by any one system.

  • Fault tolerant – When you lose a node, the system redirects work to another location of the data and continues processing without missing a fright beat.


About the Author


Global Association of Risk Professionals, Inc. (GARP®) does not endorse, promote, review or warrant the accuracy of the products or services offered by EduPristine for FRM® related information, nor does it endorse any pass rates claimed by the provider. Further, GARP® is not responsible for any fees or costs paid by the user to EduPristine nor is GARP® responsible for any fees or costs of any person or entity providing any services to EduPristine Study Program. FRM®, GARP® and Global Association of Risk Professionals®, are trademarks owned by the Global Association of Risk Professionals, Inc

CFA Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by EduPristine. CFA Institute, CFA®, Claritas® and Chartered Financial Analyst® are trademarks owned by CFA Institute.

Utmost care has been taken to ensure that there is no copyright violation or infringement in any of our content. Still, in case you feel that there is any copyright violation of any kind please send a mail to and we will rectify it.

Popular Blogs: Whatsapp Revenue Model | CFA vs CPA | CMA vs CPA | ACCA vs CPA | CFA vs FRM

Post ID = 68845