course bg
EduPristine>Blog>Hive Partitions – Explained with Example

Hive Partitions – Explained with Example

June 6, 2015

The previous post had all the concepts covered related to partitions. In this post, we will practically design and implement a Hive table with partitions. Let us take in consideration the same data. Geographical hierarchy of India. We will consider the table name as India – country, then columns and one of those columns state we will make this as partitioned column. Pay attention to the syntax while defining the partitioned table.

Let me first share the sample data we have for this exercise. We have a dataset from Govt. Of India website, related to the postal department. These are the post office, type, pin code, city, state and so on. Following is the schema of the data we have.

OFFICE NAME

OFFICE STATUS

PINCODE

TELEPHONE

TALUK

DISTRICT

STATE.

POSTAL DIVISION

POSTAL REGION

POSTAL CIRCLE

There are two ways to load data to a partitioned table, today we will look at the first one. First we will create a temporary table, without partitions. Then load the data into this temporary non-partitioned table. Next, we create the actual table with partitions and load data from temporary table into partitioned table.

1. Create a database for this exercise

CREATE DATABASE HIVE_PARTITION;

USE HIVE_PARTITION;

2. Create a temporary table

CREATE TABLE temp_India (

OFFICE_NAME STRING,

OFFICE_STATUS     STRING,

PINCODE           INT,

TELEPHONE   BIGINT,

TALUK       STRING,

DISTRICT    STRING,

STATE       STRING,

POSTAL_DIVISION   STRING,

POSTAL_REGION     STRING,

POSTAL_CIRCLE     STRING

)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE;

3. Load data into temporary table

LOAD DATA LOCAL INPATH '/home/cloudera/All_States_PinCode.csv' INTO

TABLE temp_India;

4. Create partitioned table

CREATE TABLE India (

OFFICE_NAME STRING,

OFFICE_STATUS     STRING,

PINCODE           INT,

TELEPHONE   BIGINT,

TALUK       STRING,

DISTRICT    STRING,

POSTAL_DIVISION   STRING,

POSTAL_REGION     STRING,

POSTAL_CIRCLE     STRING

)

PARTITIONED BY (STATE   STRING)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ','

STORED AS TEXTFILE;

5. Instruct hive to dynamically load partitions

SET hive.exec.dynamic.partition = true;

SET hive.exec.dynamic.partition.mode = nonstrict;

6. Import data into partitioned table from temporary table

INSERT OVERWRITE TABLE India PARTITION (STATE)

SELECT

OFFICE_NAME       ,

OFFICE_STATUS     ,

PINCODE           ,

TELEPHONE         ,

TALUK             ,

DISTRICT          ,

STATE             ,

POSTAL_DIVISION   ,

POSTAL_REGION     ,

POSTAL_CIRCLE

FROM temp_India;

We have successfully loaded data into our partitioned table. You can verify this by the following screenshots. One, we check the HDFS folder under the hive warehouse for our table and verify there are folders present for each partition. Also, we can see the schema of the partitioned table using the following command:

desc formatted india;

importing  data into partitioned table

importing data into partitioned table

To view the partitions for a particular table, use the following command inside Hive:

show partitions india;

Output would be similar to the following screenshot.

How to view  table partitions in Hive

One of the observations we can make is the name of the partitions. The partitions will be named along with column name. Meaning, here we have the column name as state and value of column name are the various state names. Both column + form combined together are the folder names. And if you go inside the folder and open the data files, you will not see the state column. This is the same we discussed in the previous post, where records themselves will not have column value stored.

Let us now try and run a query on top of the partitioned table. You will observe that there isn’t any difference between the query you issue on a partitioned table and the one on a non-partitioned table.

select count(*) from india where state=’Punjab’;

naming the  partition in Hive

If you run the same query on non-partitioned table, as follows you will see the same result.

select count(*) from temp_india where state=’Punjab’;

In this section we learnt about creating, loading and querying data on partitioned table using a temporary staging table. In the next post we will learn on how to load data directly into Hive partitioned without using a temporary staging hive table.

The dataset for this exercise is available here.

About Author

avatar EduPristine

Trusted by Fortune 500 Companies and 10,000 Students from 40+ countries across the globe, it is one of the leading International Training providers for Finance Certifications like FRM®, CFA®, PRM®, Business Analytics, HR Analytics, Financial Modeling, and Operational Risk Modeling. EduPristine has conducted more than 500,000 man-hours of quality training in finance.

Comments

Interested in this topic?

Our counsellors will get in touch with you with more information about this topic.

* Mandatory Field

`````````````````````````````````````````````````` Post ID = 76747