Tag Archives: Hive

Create a new table with Apache Hive

We are going to create a new table with Apache Hive from a previous one, populate it and then perform a UNION ALL of both tables. Below is the script that will create the new table.

-- Below script creates a new table
USE testdb;
-- show current tables
SHOW tables;
-- describe mytable2, table we will use to create mytable4
DESCRIBE mytable2;
-- create new table copying format from mytable2
CREATE TABLE mytable4 LIKE mytable2 ;

SHOW tables;
-- describe newly created table
DESCRIBE mytable4;
-- select content from newly created table
SELECT * FROM mytable4;

We proceed executing via hive in a linux shell.

hive    -f  create-new-table.hql

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
OK
Time taken: 0.898 seconds
OK
mytable
mytable2
newtable3
Time taken: 0.206 seconds, Fetched: 3 row(s)
OK
id                  	int                 	                    
lname               	string              	                    
fname               	string              	                    
Time taken: 0.263 seconds, Fetched: 3 row(s)
OK
Time taken: 0.272 seconds
OK
mytable
mytable2
mytable4
newtable3
Time taken: 0.043 seconds, Fetched: 4 row(s)
OK
id                  	int                 	                    
lname               	string              	                    
fname               	string              	                    
Time taken: 0.166 seconds, Fetched: 3 row(s)
OK
Time taken: 0.666 seconds

Continue reading

Playing with functions in hive

Apache Hive has built in functions which can be listed with

SHOW FUNCTIONS;

to play with concat we will run the following script.

-- Use testdb
use testdb;
-- describe concat function
DESC FUNCTION concat;
-- describe table mytable2
DESC mytable2;
-- Perform select query uniting fname and lname
SELECT CONCAT(fname,' ',lname) FROM mytable2;

We can execute with beeline or hive. We will use beeline.

beeline -u jdbc:hive2://localhost:10000 -f Documents/concat.hql --verbose=false --showWarnings=false
scan complete in 8ms
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> use testdb;
No rows affected (0.104 seconds)
0: jdbc:hive2://localhost:10000> -- Describe concat function
0: jdbc:hive2://localhost:10000> DESC FUNCTION concat;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
|                                                                                          tab_name                                                                                           |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
| concat(str1, str2, ... strN) - returns the concatenation of str1, str2, ... strN or concat(bin1, bin2, ... binN) - returns the concatenation of bytes in binary data  bin1, bin2, ... binN  |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
1 row selected (0.162 seconds)
0: jdbc:hive2://localhost:10000> 
0: jdbc:hive2://localhost:10000> DESC mytable2;
+-----------+------------+----------+--+
| col_name  | data_type  | comment  |
+-----------+------------+----------+--+
| id        | int        |          |
| lname     | string     |          |
| fname     | string     |          |
+-----------+------------+----------+--+
3 rows selected (0.133 seconds)
0: jdbc:hive2://localhost:10000> 
0: jdbc:hive2://localhost:10000> SELECT CONCAT(fname,' ',lname) FROM mytable2;
+--------------------+--+
|        _c0         |
+--------------------+--+
| John Doe           |
| William Lancaster  |
| Burp Gentoo        |
+--------------------+--+
3 rows selected (18.848 seconds)
0: jdbc:hive2://localhost:10000> 
Closing: 0: jdbc:hive2://localhost:10000

We can also play with functions from inside hive cli as shown below with the sqrt function.

hive> DESC function sqrt;
OK
sqrt(x) - returns the square root of x
Time taken: 0.018 seconds, Fetched: 1 row(s)
hive> SELECT SQRT(64);        
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201710131004_0373, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201710131004_0373
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_201710131004_0373
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-12-06 12:33:16,450 Stage-1 map = 0%,  reduce = 0%
2017-12-06 12:33:23,476 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.58 sec
2017-12-06 12:33:28,497 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1.58 sec
MapReduce Total cumulative CPU time: 1 seconds 580 msec
Ended Job = job_201710131004_0373
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.58 sec   HDFS Read: 273 HDFS Write: 4 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 580 msec
OK
8.0
Time taken: 18.894 seconds, Fetched: 1 row(s)
hive> 

More info about Hive functions here.