course bg
EduPristine>Blog>Hadoop Clusters

Hadoop Clusters

December 5, 2014

A Hadoop cluster can be defined as a special type of computational cluster designed to serve the purpose of storing and analysing huge amounts of data that
is not structured, in a distributed computing environment.

Hadoop Clusters

Clusters like this can run on Hadoop’s open source distributed processing software on low cost computers, commodity computers to be specific

Hadoop Cluster Architecture:

Hadoop cluster has 3 components:

  1. Client
  2. Master
  3. Slave

Client:

It is neither a master nor a slave, the work of a client is to submit the MapReduce jobs describing how the way data should be processed and then retrieve
the data to know the response after completion of the Job.

Masters:

Master consists of 3 components, namely, NameNode, Secondary Node Name, and Job Tracker.

Hadoop Clusters

a.
NameNode:
NameNode does not store the actual files, it stores the meta information of the files. NameNode oversees the health of the DataNode and coordinates the
access to the data.

b. JobTracker: JobTracker coordinates the parallel processing of data using MapReduce. To know more about JobTracker, please read the
article All You Want to Know about MapReduce (The Heart of Hadoop).

c. Secondary NameNode: the job of Secondary NameNode is to contact the NameNode periodically to recall the metadata of the filesystem from
the NameNode and saves it to a clean file folder and send it back to the NameNode. Essentially secondary Name Node does the job of house keeping. In case
fo NameNode failure the saved meta data which is stored in the RAM of NameNode, can be rebuilt using the secondary Node.

Hadoop Clusters

Slaves:

Slave nodes are the majority of the machines in Hadoop Cluster and are responsible for storing the data and processing the computation.

Hadoop Clusters

Why use Hadoop Clusters:

Hadoop clusters are particularly known for boosting the speed of data analysis applications and their scalability. If at any point a cluster’s processing
power is under stress by the growing volumes of data, it can be dealt by adding additional cluster nodes to increase throughput. Hadoop clusters have high
resistance to failure because each block of data is copied onto other nodes ensuring that the data is not lost if a single node fails.

About Author

avatar EduPristine

EduPristine is a member of Adtalem Global Education (NYSE: ATGE), a global education provider headquartered in the United States. Adtalem is a 3 billion dollars (20,000 crores) company that has about 9 institutions and companies with more than 16,000 employees spread across 145 locations. Adtalem takes pride in training 142,000 degree-seeking students all over the world.The organization's purpose is to empower students to achieve their goals, find success and make inspiring contributions to our global community. EduPristine is one of India's leading training providers in Analytics, Accounting, Finance, Healthcare, and Marketing. Founded in 2008, EduPristine has a strong online platform and network of classrooms across India and caters to self-paced learning and online learning, in addition to classroom learning

Comments

Interested in this topic?

Our counsellors will get in touch with you with more information about this topic.

* Mandatory Field

Post ID = 68726