MapReduce Programming

June 12, 2015
, , , ,

What is Hadoop Mapreduce?

Hadoop Map-Reduce is a software framework for developing applications which can process huge amounts of data typically running into terabytes in size. This programming model was initially developed at Google to power their search engine. Hadoop has later adopted this model under apache open source license.

MapReduce has two main functions at its core namely: map() and reduce(). These two operations are inspired from functional programming language Lisp.

Map Processing:

• Given an input file to process, it is divided into smaller chunks (input splits). MapReduce framework will create a new map task for each input split.

• Map task reads each record from the input and maps input key-value pairs to intermediate key-value pairs.

• map(k1,v1)  list(k2,v2) where (k2,v2) is an intermediate key/value pair.

What is the right number of map tasks to create?

Depends on the total size of the input. Typically number of map tasks is equal to number of blocks in input file.

• Mapper outputs are sorted and fed to a partitioner which will partition the intermediate key/value pairs among the reducers (all the intermediate key/value pairs with same keys get partitioned to common reducer).

• MapReduce framework then takes all the intermediate values for a given output key and then combines them together into a list.

Reduce Processing:

• Each reduce task receives the output produced after Map Processing (which is key/list of values pairs) and then performs operation on the list of values against each key. It then emits output key-value pairs.

• (k2,[v2]) -> (k2,v3)

How may reduce tasks should be created?

0.95 * no_of_nodes. This is to accommodate for the failed reduce tasks which needs to be restarted.

Mapreduce example

Let us take an example to see how map-reduce works. Consider an application which takes an input file and counts the number of occurrences of each word in the file. This can be modelled as a map-reduce application:

how map reduce works

wordcount example

Figure showing the wordcount example in execution:

how map reduce works

MapReduce Applications used at:


  • To create index which is used by google search engine for retrieving search results.
  • To compute the page rank of the web pages.
  • Statistical machine translation for translating between different languages.


  • Data Mining.
  • Ad optimization.


  • Spam detection for Yahoo! mail
  • Yahoo! search

In this post, we learnt about MapReduce programming. In the next few posts, I will present you how to model various real world applications using this framework and see their implementation.


About the Author

Kalyan Nelluri is an open-source enthusiast and technologist who enjoys working on research areas including Machine Learning, Data Mining and Big Data Processing. He holds a master's degree in computer science from IIIT Hyderabad. He is currently working as a Software Engineer in Amazon India

Popular Courses

Global Association of Risk Professionals, Inc. (GARP®) does not endorse, promote, review or warrant the accuracy of the products or services offered by EduPristine for FRM® related information, nor does it endorse any pass rates claimed by the provider. Further, GARP® is not responsible for any fees or costs paid by the user to EduPristine nor is GARP® responsible for any fees or costs of any person or entity providing any services to EduPristine Study Program. FRM®, GARP® and Global Association of Risk Professionals®, are trademarks owned by the Global Association of Risk Professionals, Inc

CFA® Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by EduPristine. CFA® Institute, CFA® Program, CFA® Institute Investment Foundations and Chartered Financial Analyst® are trademarks owned by CFA® Institute.

Utmost care has been taken to ensure that there is no copyright violation or infringement in any of our content. Still, in case you feel that there is any copyright violation of any kind please send a mail to and we will rectify it.

Post ID = 77022