Spark Programming

anantajb
Posts: 8
Joined: Fri Dec 30, 2016 10:50 am

Spark Programming

Postby anantajb » Wed Sep 13, 2017 4:54 am

I have executed a script for following problem basis transactions_practice.csv

6th value in row brand and 11th value in row is purchaseamount.

I have executed the script to find 5 top selling brands.

Code: Select all

from pyspark.sql.types import StringType,
from pyspark import SQLContext
sqlContext = SQLContext(sc)
transRDD = sc.textFile('transactions_practice.csv').map(lambda line: line.split(","))
transRDD1 = transRDD.map(lambda x: (x[5],float(x[10])))
transRDD2 = transRDD1.groupByKey()
transRDD3 = (transRDD2.map(lambda (group,values):(group,sum(values))))
#Step to take top 5 records basis value of brands.
[u]transRDD4= transRDD3.map(lambda (x,y) :(y,x)).sortByKey(False).map(lambda (k,v):(v,k)).take(5)[/u]
transRDD5 = sc.parallelize(transRDD4)
transRDD5.saveAsTextFile("sparkProject/transRDD5")


I found someting on google to sort RDDs and tried it.

The script provided desired result but I have not understood the sorting mechanism applied to sort the records in descending order.

can you please help me to understand the same ?

Tags:

Return to “Big Data & Hadoop”



cron

Disclaimer

Global Association of Risk Professionals, Inc. (GARP®) does not endorse, promote, review or warrant the accuracy of the products or services offered by EduPristine for FRM® related information, nor does it endorse any pass rates claimed by the provider. Further, GARP® is not responsible for any fees or costs paid by the user to EduPristine nor is GARP® responsible for any fees or costs of any person or entity providing any services to EduPristine Study Program. FRM®, GARP® and Global Association of Risk Professionals®, are trademarks owned by the Global Association of Risk Professionals, Inc

CFA Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by EduPristine. CFA Institute, CFA®, Claritas® and Chartered Financial Analyst® are trademarks owned by CFA Institute.

Utmost care has been taken to ensure that there is no copyright violation or infringement in any of our content. Still, in case you feel that there is any copyright violation of any kind please send a mail to abuse@edupristine.com and we will rectify it.