Hadoop Map-Reduce is a software framework for developing applications which can process huge amounts of data typically running into terabytes in size. This programming model was initially developed at Google to power their search engine. Hadoop has later adopted this model under apache open source license.
MapReduce has two main functions at its core namely: map() and reduce(). These two operations are inspired from functional programming language Lisp.
• Given an input file to process, it is divided into smaller chunks (input splits). MapReduce framework will create a new map task for each input split.
• Map task reads each record from the input and maps input key-value pairs to intermediate key-value pairs.
• map(k1,v1) list(k2,v2) where (k2,v2) is an intermediate key/value pair.
Depends on the total size of the input. Typically number of map tasks is equal to number of blocks in input file.
• Mapper outputs are sorted and fed to a partitioner which will partition the intermediate key/value pairs among the reducers (all the intermediate key/value pairs with same keys get partitioned to common reducer).
• MapReduce framework then takes all the intermediate values for a given output key and then combines them together into a list.
• Each reduce task receives the output produced after Map Processing (which is key/list of values pairs) and then performs operation on the list of values against each key. It then emits output key-value pairs.
• (k2,[v2]) -> (k2,v3)
0.95 * no_of_nodes. This is to accommodate for the failed reduce tasks which needs to be restarted.
Let us take an example to see how map-reduce works. Consider an application which takes an input file and counts the number of occurrences of each word in the file. This can be modelled as a map-reduce application:
In this post, we learnt about MapReduce programming. In the next few posts, I will present you how to model various real world applications using this framework and see their implementation.
Global Association of Risk Professionals, Inc. (GARP®) does not endorse, promote, review or warrant the accuracy of the products or services offered by EduPristine for FRM® related information, nor does it endorse any pass rates claimed by the provider. Further, GARP® is not responsible for any fees or costs paid by the user to EduPristine nor is GARP® responsible for any fees or costs of any person or entity providing any services to EduPristine Study Program. FRM®, GARP® and Global Association of Risk Professionals®, are trademarks owned by the Global Association of Risk Professionals, Inc
CFA Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by EduPristine. CFA Institute, CFA®, Claritas® and Chartered Financial Analyst® are trademarks owned by CFA Institute.
Utmost care has been taken to ensure that there is no copyright violation or infringement in any of our content. Still, in case you feel that there is any copyright violation of any kind please send a mail to email@example.com and we will rectify it.
2015 © Edupristine. ALL Rights Reserved.