Sunday, June 17, 2018

How to Work with Apache Hive

1. Creating external table in hive

first step let us check if there is any table at all in hive warehouse
because after configuration is really on the user to either create or put data on hive warehouse to be used  and analyzed  or both
 obviously this newly configured hive don't contain a single table so let create a table then
for demonstration i will put together table and call it  human

wow we have our table then  moving on to the next level

1.2 loading data table  to hive from HDFS without removing the source file
in earlier publication we have ingested data into Hadoop there were multiple of tables however, i will go for  table called productsubcategory  because it has least variables in it and there two ways to do that

i- load from hdfs to hive data warehouse without removing the source table from hdfs
first we have to survey the table structure in mysql  and then create table

hd@ubuntu:~$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 12
Server version: 5.7.22-0ubuntu0.16.04.1 (Ubuntu)

Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| adventureworks     |
| menagerie          |
| metastore          |
| mysql              |
| performance_schema |
| sys                |
| world              |
| world_x            |
| worldx             |
+--------------------+
10 rows in set (0.00 sec)

mysql> use adventureworks;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-----------------------------------------------+
| Tables_in_adventureworks                      |
+-----------------------------------------------+
| adventureworksdwbuildversion                  |
| databaselog                                   |
| dimaccount                                    |
| dimcurrency                                   |
| dimcustomer                                   |
| dimdate                                       |
| dimdepartmentgroup                            |
| dimemployee                                   |
| dimgeography                                  |
| dimorganization                               |
| dimproduct                                    |
| dimproductcategory                            |
| dimproductsubcategory                         |
| dimpromotion                                  |
| dimreseller                                   |
| dimsalesreason                                |
| dimsalesterritory                             |
| dimscenario                                   |
| factadditionalinternationalproductdescription |
| factcallcenter                                |
| factcurrencyrate                              |
| factfinance                                   |
| factinternetsales                             |
| factinternetsalesreason                       |
| factproductinventory                          |
| factresellersales                             |
| factsalesquota                                |
| factsurveyresponse                            |
| newfactcurrencyrate                           |
| prospectivebuyer                              |
| sysdiagrams                                   |
+-----------------------------------------------+
31 rows in set (0.00 sec)

mysql> describe dimproductsubcategory;
+--------------------------------+-------------+------+-----+---------+----------------+
| Field                          | Type        | Null | Key | Default | Extra          |
+--------------------------------+-------------+------+-----+---------+----------------+
| ProductSubcategoryKey          | int(11)     | NO   | PRI | NULL    | auto_increment |
| ProductSubcategoryAlternateKey | int(11)     | YES  | UNI | NULL    |                |
| EnglishProductSubcategoryName  | varchar(50) | NO   |     | NULL    |                |
| SpanishProductSubcategoryName  | varchar(50) | NO   |     | NULL    |                |
| FrenchProductSubcategoryName   | varchar(50) | NO   |     | NULL    |                |
| ProductCategoryKey             | int(11)     | YES  | MUL | NULL    |                |
+--------------------------------+-------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)

mysql>

after careful examination and learning the variables then create table productsubcategory
here is how

ii -  load table from mysql to hive data warehouse in case your table is in mysql the tool is famous SQOOP
load  table country from database world to hive DATA WAREHOUSE

sqoop import --connect jdbc:mysql://localhost/world --username root --password (mysql password) --table country --hive-import --hive-table country -m 1

observe the table in hive

how many  country name in the table take glimpse of hive inner working
i gave the count the name instruction and came up with below number

 next stop will be either data analysis in hive or move to apache spark


No comments:

Post a Comment

How to connect R with Apache spark

R interface  Step1. Install R-Base we begin with installation of R base programming language by simply dropping few line into terminal a...