歡迎您光臨本站 註冊首頁

SQOOP的安裝配置

←手機掃碼閱讀     火星人 @ 2014-03-12 , reply:0
  

SQOOP是一款開源的工具,主要用於在HADOOP與傳統的資料庫間進行數據的傳遞,下面從SQOOP用戶手冊上摘錄一段描述

Sqoop is a tool designed to transfer data between Hadoop andrelational databases. You can use Sqoop to import data from arelational database management system (RDBMS) such as MySQL or Oracleinto the Hadoop Distributed File System (HDFS),transform the data in Hadoop MapReduce, and then export the data backinto an RDBMS.

 

這裡我主要描述一下安裝過程

1、下載相應軟體

我使用的HADOOP版本是APACHE官方版本0.20.2,但是後來在使用的過程中報錯,查閱了一些文章,發現SQOOP是不支持此版本的,一般都會推薦你使用CDH3。不過後來通過拷貝相應的包到sqoop-1.2.0-CDH3B4/lib下,依然還是可以使用的。當然,你可以選擇直接使用CDH3。

 

下面是CDH3和SQOOP 1.2.0的下載地址

http://archive.cloudera.com/cdh/3/hadoop-0.20.2-CDH3B4.tar.gz

http://archive.cloudera.com/cdh/3/sqoop-1.2.0-CDH3B4.tar.gz

其中sqoop-1.2.0-CDH3B4依賴hadoop-core-0.20.2-CDH3B4.jar,所以你需要下載hadoop-0.20.2-CDH3B4.tar.gz,解壓縮后將hadoop-0.20.2-CDH3B4/hadoop-core-0.20.2-CDH3B4.jar複製到sqoop-1.2.0-CDH3B4/lib中。

另外,sqoop導入mysql數據運行過程中依賴mysql-connector-java-*.jar,所以你需要下載mysql-connector-java-*.jar並複製到sqoop-1.2.0-CDH3B4/lib中。

2、修改SQOOP的文件configure-sqoop,註釋掉hbase和zookeeper檢查(除非你準備使用HABASE等HADOOP上的組件)

#if [ ! -d "${HBASE_HOME}" ]; then
# echo “Error: $HBASE_HOME does not exist!”
# echo ‘Please set $HBASE_HOME to the root of your HBase installation.’
# exit 1
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
# echo “Error: $ZOOKEEPER_HOME does not exist!”
# echo ‘Please set $ZOOKEEPER_HOME to the root of your ZooKeeper installation.’
# exit 1
#fi

3、啟動HADOOP,配置好相關環境變數(例如$HADOOP_HOME),就可以使用SQOOP了

下面是個從資料庫導出表的數據到HDFS上文件的例子

[wanghai01@tc-crm-rd01.tc sqoop-1.2.0-CDH3B4]$ bin/sqoop import --connect jdbc:mysql://XXXX:XX/crm --username crm --password 123456 --table company -m 1
11/09/21 15:45:25 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
11/09/21 15:45:26 INFO tool.CodeGenTool: Beginning code generation
11/09/21 15:45:26 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1
11/09/21 15:45:26 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1
11/09/21 15:45:26 INFO orm.CompilationManager: HADOOP_HOME is /home/wanghai01/hadoop/hadoop-0.20.2/bin/..
11/09/21 15:45:26 INFO orm.CompilationManager: Found hadoop core jar at: /home/wanghai01/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
11/09/21 15:45:26 ERROR orm.CompilationManager: Could not rename /tmp/sqoop-wanghai01/compile/2bd70cf2b712a9c7cdb0860722ea7c18/company.java to /home/wanghai01/cloudera/sqoop-1.2.0-CDH3B4/./company.java
11/09/21 15:45:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-wanghai01/compile/2bd70cf2b712a9c7cdb0860722ea7c18/company.jar
11/09/21 15:45:26 WARN manager.MySQLManager: It looks like you are importing from mysql.
11/09/21 15:45:26 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
11/09/21 15:45:26 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
11/09/21 15:45:26 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
11/09/21 15:45:26 INFO mapreduce.ImportJobBase: Beginning import of company
11/09/21 15:45:27 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `company` AS t LIMIT 1
11/09/21 15:45:28 INFO mapred.JobClient: Running job: job_201109211521_0001
11/09/21 15:45:29 INFO mapred.JobClient:  map 0% reduce 0%
11/09/21 15:45:40 INFO mapred.JobClient:  map 100% reduce 0%
11/09/21 15:45:42 INFO mapred.JobClient: Job complete: job_201109211521_0001
11/09/21 15:45:42 INFO mapred.JobClient: Counters: 5
11/09/21 15:45:42 INFO mapred.JobClient:   Job Counters
11/09/21 15:45:42 INFO mapred.JobClient:     Launched map tasks=1
11/09/21 15:45:42 INFO mapred.JobClient:   FileSystemCounters
11/09/21 15:45:42 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=44
11/09/21 15:45:42 INFO mapred.JobClient:   Map-Reduce Framework
11/09/21 15:45:42 INFO mapred.JobClient:     Map input records=8
11/09/21 15:45:42 INFO mapred.JobClient:     Spilled Records=0
11/09/21 15:45:42 INFO mapred.JobClient:     Map output records=8
11/09/21 15:45:42 INFO mapreduce.ImportJobBase: Transferred 44 bytes in 15.0061 seconds (2.9321 bytes/sec)
11/09/21 15:45:42 INFO mapreduce.ImportJobBase: Retrieved 8 records.

查看一下數據

[wanghai01@tc-crm-rd01.tc sqoop-1.2.0-CDH3B4]$ hadoop fs -cat /user/wanghai01/company/part-m-00000
1,xx
2,eee
1,xx
2,eee
1,xx
2,eee
1,xx
2,eee

到資料庫中查一下驗證一下

mysql> select * from company;
+------+------+
| id   | name |
+------+------+
|    1 | xx   |
|    2 | eee  |
|    1 | xx   |
|    2 | eee  |
|    1 | xx   |
|    2 | eee  |
|    1 | xx   |
|    2 | eee  |
+------+------+
8 rows in set (0.00 sec)

OK,是沒有問題的。仔細看執行命令時打出的信息,會發現一個ERROR,這是因為之前我執行過此命令失敗了,而再次執行的時候相關的臨時數據沒有清理。

 



[火星人 ] SQOOP的安裝配置已經有478次圍觀

http://coctec.com/docs/linux/show-post-65384.html