Preface
In this exercise you will import data from a relational database using Sqoop. The data you load here will be used subsequent exercises.
Lab Experiment
Consider the MySQL databases movielens, derived from the MovieLens project from University of Minnesota. (See note at the end of this exercise.) The database consists of several related tables, but we will import only two of these: movie, which contains about 3,900 movies; and movierating, which has about 1,000,000 ratings of those movies.
Review the Database Tables
First, review the database tables to be loaded into Hadoop:
1. Log on to MySQL:
2. Review the structure and contents of the movie table:
3. Note the column names for the table
4. Review the structure and contents of the movierating table:
5. Note these column names
6. Exit mysql
Import with Sqoop
You invoke Sqoop on the command line to perform several commands. With it you can connect to your database server to list the databases (schemas) to which you have access, and list the tables available for loading. For database access, you provide a connect string to identify the server, and if required - your username and password.
1. Show the commands available in Sqoop
2. List the databases (schemas) in your database server:
Note: Instead of entering --password training on your command line, you may prefer to enter -P, and let Sqoop prompt you for the password, which is then not visible when you type it.
3. List the tables in the movielens database:
4. Import the movie table into Hadoop:
5. Verify that the command has worked:
6. Import the movierating table into Hadoop
- Step6.sh
Supplement
* Apache Sqoop Home Page
* Sqoop 1.4.3 User Guide
In this exercise you will import data from a relational database using Sqoop. The data you load here will be used subsequent exercises.
Lab Experiment
Consider the MySQL databases movielens, derived from the MovieLens project from University of Minnesota. (See note at the end of this exercise.) The database consists of several related tables, but we will import only two of these: movie, which contains about 3,900 movies; and movierating, which has about 1,000,000 ratings of those movies.
Review the Database Tables
First, review the database tables to be loaded into Hadoop:
1. Log on to MySQL:
2. Review the structure and contents of the movie table:
3. Note the column names for the table
4. Review the structure and contents of the movierating table:
5. Note these column names
6. Exit mysql
Import with Sqoop
You invoke Sqoop on the command line to perform several commands. With it you can connect to your database server to list the databases (schemas) to which you have access, and list the tables available for loading. For database access, you provide a connect string to identify the server, and if required - your username and password.
1. Show the commands available in Sqoop
2. List the databases (schemas) in your database server:
Note: Instead of entering --password training on your command line, you may prefer to enter -P, and let Sqoop prompt you for the password, which is then not visible when you type it.
3. List the tables in the movielens database:
4. Import the movie table into Hadoop:
5. Verify that the command has worked:
6. Import the movierating table into Hadoop
- Step6.sh
- #!/bin/sh
- Table="movierating"
- echo "Start importing table movierating..."
- sqoop import \
- --connect jdbc:mysql://localhost/movielens \
- --username training --password training \
- --fields-terminated-by '\t' --table "$Table"
- echo "Check importing result..."
- Result=`hadoop fs -ls $Table 2>&1`
- if [[ $Result =~ "No such file or directory" ]]; then
- echo "Importing Table=$Table fail!"
- else
- echo "Check import content:"
- hadoop fs -tail movierating/part-m-00000
- fi
* Apache Sqoop Home Page
* Sqoop 1.4.3 User Guide
沒有留言:
張貼留言