April 27, 2015
In the last post we discussed about Hive and how it can be used as a data warehouse over the data available in HDFS. In today’s post we will be talking about writing Hive queries. We will look at creating databases, tables and querying on the data. We will also look at loading data into Hive tables.
Hive provides a logical view of HDFS data files in the form of tables. To view the data in tables, we will first create a table in Hive. Assuming you already have hive installed on your machines, let us begin by starting hive. When you type ‘hive’ in your shell and press the enter key, hive shell will be loaded. By default, the database named ‘default’ will be started. If you want to create a new database, the command is following:
hive> create database my_first_db;
Remember to end all the statements or commands in Hive with a semicolon (;). This is the syntax of Hive. The Hive syntax is most similar to the MySQL syntax, but not exactly same. After creating your first database, let us switch to the newly created database. Use the following command:
hive> use my_first_db;
Now you will be switched to the database ‘my_first_db’. Let us create a table in the new database. The command to create a simple table is as follows:
hive> CREATE TABLE emp_details ( id INT, name STRING )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ‘,’
> LINES TERMINATED BY ‘\n’
> STORED AS TEXTFILE;
Let us go through the above syntax line by line. The words in ALLCAPS are keywords. Make sure you do not use the keywords as table or column names; else you will not be able to successfully execute the commands. Here we are creating a table with the name ’emp_details’, having two columns. The column names are ‘id’ and ‘name’. The data type of column ‘id’ is integer represented by INT, and for ‘name’ the data type is text represented by STRING. The columns and their corresponding data types should be places in braces ‘( )’. The next 3 lines are for telling Hive the delimiters for fields and lines in the file. Here our fields are separated using a ‘,’ and records are separated by new lines. Check the sample file in the next section and you will be able to relate to the delimiters. The last line of our create table command tells Hive the file type. Here it is stored as textfile.
Create a sample file in your favorite editor as following:
Save this file on your desktop as ‘data1’. Let us load the data into Hive table created above. Use the following command to load the data in file ‘data1′ into table ’emp_details’ in database ‘my_first_db’:
hive> LOAD DATA LOCAL INPATH ‘~/Desktop/data1’ INTO TABLE my_first_db.emp_details;
The syntax is keyword LOCAL, which says the data is in your local file system. The path to file is inside single quotes. It is always a best practice to specify the absolute path. Then, we specify the table name. Here we have included the database name along with table name. Use a period ‘.’ to separate database and table. If everything went good, you will see a ‘OK’ message, else you will see relevant error message.
Time to write your first query on the data you just loaded. We will first list all of the data in this table. Then, we will query with a condition.
hive> SELECT * FROM my_first_db.emp_details;
This command will display all the rows in the table ’emp_details’. The next command will query and include a specific condition.
hive> SELECT * FROM my_first_db.emp_details WHERE id=3;
To view all the databases in your Hive environment use the following command:
hive> show databases;
To view all the tables in a database, first switch to that database and then use the follow command:
hive> show tables;
The command to view the schema/metadata of a particular table use the command:
hive> desc emp_details;
Today we saw how to create a database, table and inserting data into table. We loaded data from local files. In the next post we will see how to load data from HDFS into HIVE and also different types of tables.