Monday, May 28, 2018

Apache Hive

This blog part is all about how to configure and deal with apache hive data warehouse 

I:. Download and extract hive binary 
hd@ubuntu:~$ wget https://www-eu.apache.org/dist/hive/hive-2.1.1/apache-hive-2.1.1-bin.tar.gz
hd@ubuntu:~$ tar xvf apache-hive-2.1.1-bin.tar.gz
hd@ubuntu:~$ sudo mv apache-hive-2.1.1-bin /usr/local/hive hd@ubuntu:~$ cd /usr/local
hd@ubuntu:~$ sudo chown -R hd:hadoop hive
2:. create a link between mysql connector library and apache hive

hduser@ubuntu:~$ cd /usr/local/hive/lib
hduser@ubuntu:/usr/local/hive/lib $  ln -s /usr/share/java/mysql-connector-java.jar mysq-connector-java.jar

 3:. Create metastore database and user for hive in this instance i used same hadoop user for hive however, hive could have separate user


4:. Apache hive configuration

Configure $HIVE_HOME/conf/hive-env.sh
Add or update HADOOP_HOME in this file

hduser@ubuntu:~$ cd /usr/local/hive/conf
hduser@ubuntu:/usr/local/hive/conf$ cp hive-env.sh.template hive-env.sh
hduser@ubuntu:/usr/local/hive/conf$ sudo vi hive-env.sh
--------------------------------------------------------------------
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/usr/local/hadoop
--------------------------------------------------------------------
 Configure $HIVE_HOME/conf/hive-log4j2.properties
Update log location, default is /tmp.
--------------------------------------------------------------------
property.hive.log.dir = /usr/local/hive/logs/${sys:user.name}
--------------------------------------------------------------------

 Configure $HIVE_HOME/conf/hive-site.xml

    First 4 properties are connection properties for metastore
    Next 2 properties ensure metastore schema is not updated post initialization
    Next we set metastore thrift port, this is where hiveserver2 (and other clients) connects to metastore for information.
    Next 2 properties enable concurrency
    Rest are explained in the comments.
--------------------------------------------------------------------------

<configuration>
<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost/metastore</value>
  <description>the URL of the MySQL database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hduser</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>elephant</value>
</property>

<property>
  <name>datanucleus.fixedDatastore</name>
  <value>true</value>
</property>

<property>
<name>hive.metastore.schema.verification</name>
<value>true</value>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://localhost:9083</value>
  <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>


<property>
  <name>hive.support.concurrency</name>
  <description>Enable Hive's Table Lock Manager Service</description>
  <value>true</value>
</property>

<property>
  <name>datanucleus.autoStartMechanism</name>
  <value>SchemaTable</value>
</property>

<property>
    <name>hive.security.authorization.createtable.owner.grants</name>
    <value>ALL</value>
    <description>
      The privileges automatically granted to the owner whenever a table gets created.
      An example like "select,drop" will grant select and drop privilege to the owner
      of the table. Note that the default gives the creator of a table no access to the
      table (but see HIVE-8067).
    </description>
</property>

<property>
    <name>hive.warehouse.subdir.inherit.perms</name>
    <value>false</value>
    <description>
      Set this to false if the table directories should be created
      with the permissions derived from dfs umask instead of
      inheriting the permission of the warehouse or database directory.
    </description>
</property>

<property>
    <name>hive.security.authorization.enabled</name>
    <value>true</value>
    <description>enable or disable the Hive client authorization</description>
</property>

<property>
    <name>hive.users.in.admin.role</name>
    <value>hd,hduser</value>
    <description>
      Comma separated list of users who are in admin role for bootstrapping.
      More users can be added in ADMIN role later.
    </description>
</property>

<property>
  <name>hive.zookeeper.quorum</name>
  <description>Zookeeper quorum used by Hive's Table Lock Manager</description>
  <value>localhost:2181,localhost:2182</value>
</property>

<property>
  <name>hive.server2.thrift.port</name>
  <value>10001</value>
  <description>TCP port number to listen on, default 10000</description>
</property>
</configuration>

 5.Initialize metastore schema



hduser@ubuntu:~$ schematool -dbType mysql -initSchema
/usr/local/hive/conf/hive-env.sh: line 51: property.hive.log.dir: command not found
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:     jdbc:mysql://localhost/metastore
Metastore Connection Driver :     com.mysql.jdbc.Driver
Metastore connection User:     hduser
Mon May 28 11:24:16 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Mon May 28 11:24:17 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Initialization script completed
Mon May 28 11:24:19 PDT 2018 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
schemaTool completed
hduser@ubuntu:~$

6.Create hdfs directories for hive



hduser@ubuntu:~$  hdfs dfs -mkdir /user/hive
hduser@ubuntu:~$  hdfs dfs -chmod 755 /user/hive
hduser@ubuntu:~$  hdfs dfs -mkdir /user/hive/warehouse
hduser@ubuntu:~$  hdfs dfs -chmod 1777 /user/hive/warehouse
hduser@ubuntu:~$  hdfs dfs -chown -R hduser:hadoop  /user/hive 


7.Run Hiveserver2 and Metastore 
hduser@ubuntu:~$ $HIVE_HOME/bin/hive --service metastore  & $HIVE_HOME/bin/hive --service hiveserver2

8.Run beeline to verify your installation

 
when you get above result hive data warehouse all set to roll

Sunday, May 20, 2018

 Preparing for APACHE HADOOP 2.0

I:.  Install Java and  Secure Shell (SSH), secure shell is a UNIX-based command interface and protocol for securely log onto a remote computer

sudo apt-get update
sudo apt-get install default-jdk
update-alternatives --config java
sudo apt-get install ssh




II:. Update OS limits

   *        hard    nofile         50000
   *        soft    nofile          50000
   *        hard    nproc         10000
   *        soft    nproc          10000


III:.Workaround for "kill" command

hd@ubuntu:~$ cd  /etc/systemd
hd@ubuntu:/etc/systemd$ sudo vi logind.conf


KillUserProcesses=no

IV:.Add a group and two users others will be added later :one will be running Hadoop distributed file system(HDFS) and second will run  Yet Another Resources Negotiator (YARN) 


V:. Setup SSH for both hduser and yarnuser  



INSTALL LINUX OPERATION SYSTEM

1. To install Linux OS in your pc . first install one of virtual machine software in this case  i will use VMware workstation
2. Create new virtual machine to view new virtual machine wizard and fill in the requirements



3.When it's done you ready to go next

4. log in Ubuntu 


5.Use terminal to install and configure 



6. Move on 






How to connect R with Apache spark

R interface  Step1. Install R-Base we begin with installation of R base programming language by simply dropping few line into terminal a...