course bg
EduPristine>Blog>MapReduce Programming

MapReduce Programming

June 12, 2015

What is Hadoop Mapreduce?

Hadoop Map-Reduce is a software framework for developing applications which can process huge amounts of data typically running into terabytes in size. This programming model was initially developed at Google to power their search engine. Hadoop has later adopted this model under apache open source license.

MapReduce has two main functions at its core namely: map() and reduce(). These two operations are inspired from functional programming language Lisp.

Map Processing:

• Given an input file to process, it is divided into smaller chunks (input splits). MapReduce framework will create a new map task for each input split.

• Map task reads each record from the input and maps input key-value pairs to intermediate key-value pairs.

• map(k1,v1)  list(k2,v2) where (k2,v2) is an intermediate key/value pair.

What is the right number of map tasks to create?

Depends on the total size of the input. Typically number of map tasks is equal to number of blocks in input file.

• Mapper outputs are sorted and fed to a partitioner which will partition the intermediate key/value pairs among the reducers (all the intermediate key/value pairs with same keys get partitioned to common reducer).

• MapReduce framework then takes all the intermediate values for a given output key and then combines them together into a list.

Reduce Processing:

• Each reduce task receives the output produced after Map Processing (which is key/list of values pairs) and then performs operation on the list of values against each key. It then emits output key-value pairs.

• (k2,[v2]) -> (k2,v3)

How may reduce tasks should be created?

0.95 * no_of_nodes. This is to accommodate for the failed reduce tasks which needs to be restarted.

Mapreduce example

Let us take an example to see how map-reduce works. Consider an application which takes an input file and counts the number of occurrences of each word in the file. This can be modelled as a map-reduce application:

how map reduce works

wordcount example

Figure showing the wordcount example in execution:

how map reduce works

MapReduce Applications used at:


  • To create index which is used by google search engine for retrieving search results.
  • To compute the page rank of the web pages.
  • Statistical machine translation for translating between different languages.


  • Data Mining.
  • Ad optimization.


  • Spam detection for Yahoo! mail
  • Yahoo! search

In this post, we learnt about MapReduce programming. In the next few posts, I will present you how to model various real world applications using this framework and see their implementation.

About Author

avatar EduPristine

EduPristine is a member of Adtalem Global Education (NYSE: ATGE), a global education provider headquartered in the United States. Adtalem is a 3 billion dollars (20,000 crores) company that has about 9 institutions and companies with more than 16,000 employees spread across 145 locations. Adtalem takes pride in training 142,000 degree-seeking students all over the world.The organization's purpose is to empower students to achieve their goals, find success and make inspiring contributions to our global community. EduPristine is one of India's leading training providers in Analytics, Accounting, Finance, Healthcare, and Marketing. Founded in 2008, EduPristine has a strong online platform and network of classrooms across India and caters to self-paced learning and online learning, in addition to classroom learning


Interested in this topic?

Our counsellors will get in touch with you with more information about this topic.

* Mandatory Field

Post ID = 77022
- -