Bucketing in Hive

by Online Tutorials Library July 14, 2022

Bucketing in Hive

The bucketing in Hive is a data organizing technique. It is similar to partitioning in Hive with an added functionality that it divides large datasets into more manageable parts known as buckets. So, we can use bucketing in Hive when the implementation of partitioning becomes difficult. However, we can also divide partitions further in buckets.

Working of Bucketing in Hive

Bucketing in Hive

The concept of bucketing is based on the hashing technique.
Here, modules of current column value and the number of required buckets is calculated (let say, F(x) % 3).
Now, based on the resulted value, the data is stored into the corresponding bucket.

Example of Bucketing in Hive

First, select the database in which we want to create a table.

Bucketing in Hive

Create a dummy table to store the data.

  hive> create table emp_demo (Id int, Name string , Salary float)    row format delimited    fields terminated by ‘,’ ;   

Bucketing in Hive

Now, load the data into the table.

  hive> load data local inpath ‘/home/codegyani/hive/emp_details’ into table emp_demo;  

Bucketing in Hive

Enable the bucketing by using the following command: –

Create a bucketing table by using the following command: –

  hive> create table emp_bucket(Id int, Name string , Salary float)    clustered by (Id) into 3 buckets  row format delimited    fields terminated by ‘,’ ;    

Bucketing in Hive