347 647 9001+1 714 797 8196Request a Call
Call Me

Hadoop Clusters

December 5, 2014
, , ,

A Hadoop cluster can be defined as a special type of computational cluster designed to serve the purpose of storing and analysing huge amounts of data that is not structured, in a distributed computing environment.

Hadoop Clusters

Clusters like this can run on Hadoop’s open source distributed processing software on low cost computers, commodity computers to be specific

Hadoop Cluster Architecture:

Hadoop cluster has 3 components:

  1. Client
  2. Master
  3. Slave


It is neither a master nor a slave, the work of a client is to submit the MapReduce jobs describing how the way data should be processed and then retrieve the data to know the response after completion of the Job.


Master consists of 3 components, namely, NameNode, Secondary Node Name, and Job Tracker.

Hadoop Clusters

a. NameNode: NameNode does not store the actual files, it stores the meta information of the files. NameNode oversees the health of the DataNode and coordinates the access to the data.

b. JobTracker: JobTracker coordinates the parallel processing of data using MapReduce. To know more about JobTracker, please read the article All You Want to Know about MapReduce (The Heart of Hadoop).

c. Secondary NameNode: the job of Secondary NameNode is to contact the NameNode periodically to recall the metadata of the filesystem from the NameNode and saves it to a clean file folder and send it back to the NameNode. Essentially secondary Name Node does the job of house keeping. In case fo NameNode failure the saved meta data which is stored in the RAM of NameNode, can be rebuilt using the secondary Node.

Hadoop Clusters


Slave nodes are the majority of the machines in Hadoop Cluster and are responsible for storing the data and processing the computation.

Hadoop Clusters

Why use Hadoop Clusters:

Hadoop clusters are particularly known for boosting the speed of data analysis applications and their scalability. If at any point a cluster’s processing power is under stress by the growing volumes of data, it can be dealt by adding additional cluster nodes to increase throughput. Hadoop clusters have high resistance to failure because each block of data is copied onto other nodes ensuring that the data is not lost if a single node fails.


About the Author

Trusted by Fortune 500 Companies and 10,000 Students from 40+ countries across the globe, it is one of the leading International Training providers for Finance Certifications like FRM®, CFA®, PRM®, Business Analytics, HR Analytics, Financial Modeling, and Operational Risk Modeling. EduPristine has conducted more than 500,000 man-hours of quality training in finance.


Global Association of Risk Professionals, Inc. (GARP®) does not endorse, promote, review or warrant the accuracy of the products or services offered by EduPristine for FRM® related information, nor does it endorse any pass rates claimed by the provider. Further, GARP® is not responsible for any fees or costs paid by the user to EduPristine nor is GARP® responsible for any fees or costs of any person or entity providing any services to EduPristine Study Program. FRM®, GARP® and Global Association of Risk Professionals®, are trademarks owned by the Global Association of Risk Professionals, Inc

CFA Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by EduPristine. CFA Institute, CFA®, Claritas® and Chartered Financial Analyst® are trademarks owned by CFA Institute.

Utmost care has been taken to ensure that there is no copyright violation or infringement in any of our content. Still, in case you feel that there is any copyright violation of any kind please send a mail to and we will rectify it.

Popular Blogs: Whatsapp Revenue Model | CFA vs CPA | CMA vs CPA | ACCA vs CPA | CFA vs FRM

Post ID = 68726