Spark work
Starting from where we left it last time assuming both hive and spark configuration are ready to go show all the tables in hive data warehouse.
Step 1: Create spark sqlcontext to read hive metastore
import org.apache.spark.sparkConf
import org.apache.spark.sql.hive.HiveContext
val conf = new SparkConf().setAppName("Test").setMaster("yarn-client")
val sqlContext = new HiveContext(sc)
import sqlContext.implicits._
import org.apache.spark.sql.hive.HiveContext
val conf = new SparkConf().setAppName("Test").setMaster("yarn-client")
val sqlContext = new HiveContext(sc)
import sqlContext.implicits._
the result will be something like image below
Step 2: Create dataframe from hive external table
hive tables are in front of us so let us create spark table by querying country table. i want to query only countries with population exceeds 200 million
Step3: Create your own table
we will create three data frame and use it for multiple purposes
data frame fruit1
data frame fruit2
join fruit1 and fruit2
union the three tables
Step4: Save the last table
i- created temporary table
fruits.createOrReplaceTempView("fruitsTable")
ii- use hive statement to create table and dump the data from your temp table.
sqlContext.sql("create table fruitsDeit as select * from fruitsTable");
this save directly in hive metastore to confirm let check back hive
table fruitsdiet is there
iii- save to hdfs
fruits.select("id", "name", "diet").write.save("/user/hduser/fruittable.parquet")
take a look web-interface


























