HiveQL – GROUP BY and HAVING Clause
The Hive Query Language provides GROUP BY and HAVING clauses that facilitate similar functionalities as in SQL. Here, we are going to execute these clauses on the records of the below table:
GROUP BY Clause
The HQL Group By clause is used to group the data from the multiple records based on one or more column. It is generally used in conjunction with the aggregate functions (like SUM, COUNT, MIN, MAX and AVG) to perform an aggregation over each group.
Example of GROUP BY Clause in Hive
Let’s see an example to sum the salary of employees based on department.
- Select the database in which we want to create a table.
- Now, create a table by using the following command:
- Load the data into the table.
- Now, fetch the sum of employee salaries department wise by using the following command:
Here, we got the desired output.
HAVING CLAUSE
The HQL HAVING clause is used with GROUP BY clause. Its purpose is to apply constraints on the group of data produced by GROUP BY clause. Thus, it always returns the data where the condition is TRUE.
Example of Having Clause in Hive
In this example, we fetch the sum of employee’s salary based on department and apply the required constraints on that sum by using HAVING clause.
- Let’s fetch the sum of employee’s salary based on department having sum >= 35000 by using the following command:
Here, we got the desired output.