Home » HiveQL – GROUP BY and HAVING Clause

HiveQL – GROUP BY and HAVING Clause

by Online Tutorials Library

HiveQL – GROUP BY and HAVING Clause

The Hive Query Language provides GROUP BY and HAVING clauses that facilitate similar functionalities as in SQL. Here, we are going to execute these clauses on the records of the below table:

HiveQL - GROUP BY and HAVING Clause

GROUP BY Clause

The HQL Group By clause is used to group the data from the multiple records based on one or more column. It is generally used in conjunction with the aggregate functions (like SUM, COUNT, MIN, MAX and AVG) to perform an aggregation over each group.

Example of GROUP BY Clause in Hive

Let’s see an example to sum the salary of employees based on department.

  • Select the database in which we want to create a table.

HiveQL - GROUP BY and HAVING Clause

  • Now, create a table by using the following command:

HiveQL - GROUP BY and HAVING Clause

  • Load the data into the table.

HiveQL - GROUP BY and HAVING Clause

  • Now, fetch the sum of employee salaries department wise by using the following command:

HiveQL - GROUP BY and HAVING Clause
HiveQL - GROUP BY and HAVING Clause
HiveQL - GROUP BY and HAVING Clause

Here, we got the desired output.

HAVING CLAUSE

The HQL HAVING clause is used with GROUP BY clause. Its purpose is to apply constraints on the group of data produced by GROUP BY clause. Thus, it always returns the data where the condition is TRUE.

Example of Having Clause in Hive

In this example, we fetch the sum of employee’s salary based on department and apply the required constraints on that sum by using HAVING clause.

  • Let’s fetch the sum of employee’s salary based on department having sum >= 35000 by using the following command:

HiveQL - GROUP BY and HAVING Clause
HiveQL - GROUP BY and HAVING Clause
HiveQL - GROUP BY and HAVING Clause

Here, we got the desired output.

You may also like