Apache Spark reducedByKey Function

by Online Tutorials Library July 14, 2022

Spark reduceByKey Function

In Spark, the reduceByKey function is a frequently used transformation operation that performs aggregation of data. It receives key-value pairs (K, V) as an input, aggregates the values based on the key and generates a dataset of (K, V) pairs as an output.

Example of reduceByKey Function

In this example, we aggregate the values on the basis of key.

To open the Spark in Scala mode, follow the below command.

Spark reduceByKey Function

Create an RDD using the parallelized collection.

  scala> val data = sc.parallelize(Array((“C”,3),(“A”,1),(“B”,4),(“A”,2),(“B”,5)))  

Now, we can read the generated result by using the following command.

Spark reduceByKey Function

Apply reduceByKey() function to aggregate the values.

Now, we can read the generated result by using the following command.

Spark reduceByKey Function

Here, we got the desired output.

Next TopicSpark Co-Group Function

Apache Spark reducedByKey Function

Spark reduceByKey Function

Example of reduceByKey Function

Apache Solr Text Analysis

Bayes theorem in Artificial Intelligence

You may also like