程式扎記

Preface
In this exercise you will import data from a relational database using Sqoop. The data you load here will be used subsequent exercises.

Lab Experiment
Consider the MySQL databases movielens, derived from the MovieLens project from University of Minnesota. (See note at the end of this exercise.) The database consists of several related tables, but we will import only two of these: movie, which contains about 3,900 movies; and movierating, which has about 1,000,000 ratings of those movies.

Review the Database Tables
First, review the database tables to be loaded into Hadoop:
1. Log on to MySQL:

$ mysql --user=training --password=training movielens
...
mysql> # Now we are in MySQL interactive console

2. Review the structure and contents of the movie table:

3. Note the column names for the table
4. Review the structure and contents of the movierating table:

5. Note these column names
6. Exit mysql

mysql> quit

Import with Sqoop
You invoke Sqoop on the command line to perform several commands. With it you can connect to your database server to list the databases (schemas) to which you have access, and list the tables available for loading. For database access, you provide a connect string to identify the server, and if required - your username and password.

1. Show the commands available in Sqoop

$ sqoop help

2. List the databases (schemas) in your database server:

$ sqoop list-databases \
--connect jdbc:mysql://localhost \
--username training --password training
14/11/30 00:06:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/30 00:06:54 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
dualcore
hue
metastore
movielens
mysql
test
training

Note: Instead of entering --password training on your command line, you may prefer to enter -P, and let Sqoop prompt you for the password, which is then not visible when you type it.

3. List the tables in the movielens database:

$ sqoop list-tables \
--connect jdbc:mysql://localhost/movielens \
--username training --password training
...
genre
movie
moviegenre
movierating
occupation
user

4. Import the movie table into Hadoop:

$ sqoop import \
--connect jdbc:mysql://localhost/movielens \
--username training --password training \
--fields-terminated-by '\t' --table movie
...
14/11/30 00:23:11 INFO mapreduce.ImportJobBase: Transferred 99.6602 KB in 18.9969 seconds (5.2461 KB/sec)
14/11/30 00:23:11 INFO mapreduce.ImportJobBase: Retrieved 3881 records.

5. Verify that the command has worked:

$ hadoop fs -ls movie
$ hadoop fs -tail movie/part-m-00000

6. Import the movierating table into Hadoop
- Step6.sh

view plaincopy to clipboardprint?
#!/bin/sh  
Table="movierating"  
echo "Start importing table movierating..."  
sqoop import \  
--connect jdbc:mysql://localhost/movielens \  
--username training --password training \  
--fields-terminated-by '\t' --table "$Table"  
  
echo "Check importing result..."  
Result=`hadoop fs -ls $Table 2>&1`  
if [[ $Result =~ "No such file or directory" ]]; then  
    echo "Importing Table=$Table fail!"  
else  
    echo "Check import content:"  
    hadoop fs -tail movierating/part-m-00000  
fi  

Supplement
* Apache Sqoop Home Page

Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

* Sqoop 1.4.3 User Guide

程式扎記

標籤

2014年11月29日星期六

[CCDH] Exercise17 - Importing Data With Sqoop (P60)

沒有留言:

張貼留言

[Git 常見問題] error: The following untracked working tree files would be overwritten by merge

檢舉濫用情形

學習筆記

標籤

2014年11月29日 星期六