Saturday, June 9, 2018

Using Sqoop to Ingest Data Into Hadoop

To do this Hadoop Eco-system should be ready configured with following
i- Hadoop- Yarn
ii- Apache sqoop
iii- apache hive
iv- Mysql with database
the last one just use your preferred method to put some data in mysql. either using mysql workbench to draw data to mysql or create database and create table or two in it 
if not please take look at earlier configurations to complete
when ready to make next move let sqoop some data. for this example i used adventurewoks database example because it has multiple tables and available for download .

1. create noticeable directory in hdfs (Hadoop Distributed File System ) and let give it name (mydata)



2. folder mydata now rest in hdfs and we want to ingest data into that folder using our famous sqoop either one table at time or all database tables at once and we will try both to know how it works.
i- one table instance

sqoop import --connect "jdbc:mysql://localhost/(databasename)" --username root --password (mysql password) --warehouse-dir /user/--(tablename) -m 1
lets take look at databases in mysql first shall we

we will be using two databases, country from world for single table and adventurewoks for all tables command

after executing above command let us take look at mydata contents


yes table county was ingested into hdfs

ii- all tables instance

sqoop import-all-tables --connect jdbc:mysql://localhost/adventureworks --username root --password (mysqlpassword) --as-textfile -m 1 --warehouse-dir /user/hduser/mydata
the above command with database name and password sorted  let's take a look inside our mydata folder

this is what we got
it ingested 31 tables of adventureworks into our poor folder along with table country  which we did it earlier. if you got it like above roll up your sleeves the work just started

No comments:

Post a Comment

How to connect R with Apache spark

R interface  Step1. Install R-Base we begin with installation of R base programming language by simply dropping few line into terminal a...