PrefaceIn this exercise you will import data from a relational database using Sqoop. The data you load here will be used subsequent exercises.Lab ExperimentConsider the MySQL databases
movielens, derived from the MovieLens project from University of Minnesota. (
See note at the end of this exercise.) The database consists of several related tables, but we will import only two of these:
movie, which contains about 3,900 movies; and
movierating, which has about 1,000,000 ratings of those movies.
Review the Database TablesFirst, review the database tables to be loaded into Hadoop:
1. Log on to MySQL:
$ mysql --user=training --password=training movielens
...
mysql> # Now we are in MySQL interactive console
2. Review the structure and contents of the movie table:3. Note the column names for the table
4. Review the structure and contents of the movierating table:5. Note these column names
6. Exit mysql
mysql> quit
Import with SqoopYou invoke Sqoop on the command line to perform several commands. With it you can connect to your database server to list the databases (
schemas) to which you have access, and list the tables available for loading. For database access, you provide a connect string to identify the server, and if required - your username and password.
1. Show the commands available in Sqoop
$ sqoop help
2. List the databases (schemas) in your database server:
$ sqoop list-databases \
--connect jdbc:mysql://localhost \
--username training --password training
14/11/30 00:06:54 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/30 00:06:54 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
dualcore
hue
metastore
movielens
mysql
test
training
Note: Instead of entering --password training on your command line, you may prefer to enter
-P, and let Sqoop prompt you for the password, which is then not visible when you type it.
3. List the tables in the movielens database:
$ sqoop list-tables \
--connect jdbc:mysql://localhost/movielens \
--username training --password training
...
genre
movie
moviegenre
movierating
occupation
user
4. Import the movie table into Hadoop:
$ sqoop import \
--connect jdbc:mysql://localhost/movielens \
--username training --password training \
--fields-terminated-by '\t' --table movie
...
14/11/30 00:23:11 INFO mapreduce.ImportJobBase: Transferred 99.6602 KB in 18.9969 seconds (5.2461 KB/sec)
14/11/30 00:23:11 INFO mapreduce.ImportJobBase: Retrieved 3881 records.
5. Verify that the command has worked:
$ hadoop fs -ls movie
$ hadoop fs -tail movie/part-m-00000
6. Import the movierating table into Hadoop- Step6.sh
- #!/bin/sh
- Table="movierating"
- echo "Start importing table movierating..."
- sqoop import \
- --connect jdbc:mysql:
- --username training --password training \
- --fields-terminated-by '\t' --table "$Table"
-
- echo "Check importing result..."
- Result=`hadoop fs -ls $Table 2>&1`
- if [[ $Result =~ "No such file or directory" ]]; then
- echo "Importing Table=$Table fail!"
- else
- echo "Check import content:"
- hadoop fs -tail movierating/part-m-00000
- fi
Supplement*
Apache Sqoop Home Page
Apache Sqoop(TM) is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.
*
Sqoop 1.4.3 User Guide