快速搭建 Hadoop 環境

火星人 @ 2014-03-08 , reply:0


對於Hadoop來說,最主要的是兩個方面,一個是分散式文件系統HDFS,另一個是MapReduce計算模型,下面講解下我在搭建Hadoop 環境過程.

Hadoop 測試環境

  1. 共4台測試機,1台namenode 3台datanode
  2. OS版本:RHEL 5.5 X86_64
  3. Hadoop:0.20.203.0
  4. Jdk:jdk1.7.0
  5. 角色 ip地址
  6. namenode 192.168.57.75
  7. datanode1 192.168.57.76
  8. datanode2 192.168.57.78
  9. datanode3 192.168.57.79


一 部署 Hadoop 前的準備工作

  1. 1 需要知道hadoop依賴Java和SSH
  2. Java 1.5.x (以上),安裝.
  3. ssh 安裝並且保證 sshd 一直運行,以便用Hadoop 腳本管理遠端Hadoop守護進程.
  4. 2 建立 Hadoop 公共帳號
  5. 所有的節點應該具有相同的用戶名,可以使用如下命令添加:
  6. useradd hadoop
  7. passwd hadoop
  8. 3 配置 host 主機名
  9. tail -n 3 /etc/hosts
  10. 192.168.57.75 namenode
  11. 192.168.57.76 datanode1
  12. 192.168.57.78 datanode2
  13. 192.168.57.79 datanode3
  14. 4 以上幾點要求所有節點(namenode|datanode)配置全部相同


二 ssh 配置
ssh 詳細了解

  1. 1 生成私匙 id_rsa 與 公匙 id_rsa.pub 配置文件
  2. [hadoop@hadoop1 ~]$ ssh-keygen -t rsa
  3. Generating public/private rsa key pair.
  4. Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
  5. Enter passphrase (empty for no passphrase):
  6. Enter same passphrase again:
  7. Your identification has been saved in /home/hadoop/.ssh/id_rsa.
  8. Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
  9. The key fingerprint is:
  10. d6:63:76:43:e2:5b:8e:85:ab:67:a2:7c:a6:8f:23:f9 hadoop@hadoop1.test.com
  11. 2 私匙 id_rsa 與 公匙 id_rsa.pub 配置文件
  12. [hadoop@hadoop1 ~]$ ls .ssh/
  13. authorized_keys id_rsa id_rsa.pub known_hosts
  14. 3 把公匙文件上傳到datanode伺服器
  15. [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode1
  16. 28
  17. hadoop@datanode1's password:
  18. Now try logging into the machine, with "ssh 'hadoop@datanode1'", and check in:
  19. .ssh/authorized_keys
  20. to make sure we haven't added extra keys that you weren't expecting.
  21. [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode2
  22. 28
  23. hadoop@datanode2's password:
  24. Now try logging into the machine, with "ssh 'hadoop@datanode2'", and check in:

  25. .ssh/authorized_keys
  26. to make sure we haven't added extra keys that you weren't expecting.
  27. [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode3
  28. 28
  29. hadoop@datanode3's password:
  30. Now try logging into the machine, with "ssh 'hadoop@datanode3'", and check in:
  31. .ssh/authorized_keys
  32. to make sure we haven't added extra keys that you weren't expecting.
  33. [hadoop@hadoop1 ~]$ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@localhost
  34. 28
  35. hadoop@localhost's password:
  36. Now try logging into the machine, with "ssh 'hadoop@localhost'", and check in:
  37. .ssh/authorized_keys
  38. to make sure we haven't added extra keys that you weren't expecting.
  39. 4 驗證
  40. [hadoop@hadoop1 ~]$ ssh datanode1
  41. Last login: Thu Feb 2 09:01:16 2012 from 192.168.57.71
  42. [hadoop@hadoop2 ~]$ exit
  43. logout
  44. [hadoop@hadoop1 ~]$ ssh datanode2
  45. Last login: Thu Feb 2 09:01:18 2012 from 192.168.57.71
  46. [hadoop@hadoop3 ~]$ exit
  47. logout

  48. [hadoop@hadoop1 ~]$ ssh datanode3
  49. Last login: Thu Feb 2 09:01:20 2012 from 192.168.57.71
  50. [hadoop@hadoop4 ~]$ exit
  51. logout
  52. [hadoop@hadoop1 ~]$ ssh localhost
  53. Last login: Thu Feb 2 09:01:24 2012 from 192.168.57.71
  54. [hadoop@hadoop1 ~]$ exit
  55. logout

三 java環境配置

  1. 1 下載合適的jdk
  2. //此文件為64Linux 系統使用的 RPM包
  3. wget http://download.oracle.com/otn-pub/java/jdk/7/jdk-7-linux-x64.rpm
  4. 2 安裝jdk
  5. rpm -ivh jdk-7-linux-x64.rpm
  6. 3 驗證java
  7. [root@hadoop1 ~]# java -version
  8. java version "1.7.0"
  9. Java(TM) SE Runtime Environment (build 1.7.0-b147)
  10. Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)
  11. [root@hadoop1 ~]# ls /usr/java/
  12. default jdk1.7.0 latest
  13. 4 配置java環境變數
  14. #vim /etc/profile //在profile文件中加入如下信息:
  15. #add for hadoop
  16. export JAVA_HOME=/usr/java/jdk1.7.0
  17. export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/

  18. export PATH=$PATH:$JAVA_HOME/bin
  19. //使環境變數生效
  20. source /etc/profile
  21. 5 拷貝 /etc/profile 到 datanode
  22. [root@hadoop1 src]# scp /etc/profile root@datanode1:/etc/
  23. The authenticity of host 'datanode1 (192.168.57.86)' can't be established.
  24. RSA key fingerprint is b5:00:d1:df:73:4c:94:f1:ea:1f:b5:cd:ed:3a:cc:e1.
  25. Are you sure you want to continue connecting (yes/no)? yes
  26. Warning: Permanently added 'datanode1,192.168.57.86' (RSA) to the list of known hosts.
  27. root@datanode1's password:
  28. profile 100% 1624 1.6KB/s 00:00
  29. [root@hadoop1 src]# scp /etc/profile root@datanode2:/etc/
  30. The authenticity of host 'datanode2 (192.168.57.87)' can't be established.
  31. RSA key fingerprint is 57:cf:96:15:78:a3:94:93:30:16:8e:66:47:cd:f9:cd.
  32. Are you sure you want to continue connecting (yes/no)? yes
  33. Warning: Permanently added 'datanode2,192.168.57.87' (RSA) to the list of known hosts.
  34. root@datanode2's password:
  35. profile 100% 1624 1.6KB/s 00:00
  36. [root@hadoop1 src]# scp /etc/profile root@datanode3:/etc/
  37. The authenticity of host 'datanode3 (192.168.57.88)' can't be established.
  38. RSA key fingerprint is 31:73:e8:3c:20:0c:1e:b2:59:5c:d1:01:4b:26:41:70.
  39. Are you sure you want to continue connecting (yes/no)? yes

  40. Warning: Permanently added 'datanode3,192.168.57.88' (RSA) to the list of known hosts.
  41. root@datanode3's password:
  42. profile 100% 1624 1.6KB/s 00:00
  43. 6 拷貝 jdk 安裝包,並在每個datanode 節點安裝 jdk 包
  44. [root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode1:/home/hadoop/
  45. hadoop@datanode1's password:
  46. hadoop-0.20.203.0rc1.tar.gz 100% 58MB 57.8MB/s 00:01
  47. jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01
  48. [root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode2:/home/hadoop/
  49. hadoop@datanode2's password:
  50. hadoop-0.20.203.0rc1.tar.gz 100% 58MB 57.8MB/s 00:01
  51. jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01
  52. [root@hadoop1 ~]# scp -r /home/hadoop/src/ hadoop@datanode3:/home/hadoop/
  53. hadoop@datanode3's password:
  54. hadoop-0.20.203.0rc1.tar.gz 100% 58MB 57.8MB/s 00:01
  55. jdk-7-linux-x64.rpm 100% 78MB 77.9MB/s 00:01

四 hadoop 配置
//注意使用hadoop 用戶 操作

  1. 1 配置目錄
  2. [hadoop@hadoop1 ~]$ pwd
  3. /home/hadoop
  4. [hadoop@hadoop1 ~]$ ll
  5. total 59220
  6. lrwxrwxrwx 1 hadoop hadoop 17 Feb 1 16:59 hadoop -

    > hadoop-0.20.203.0

  7. drwxr-xr-x 12 hadoop hadoop 4096 Feb 1 17:31 hadoop-0.20.203.0
  8. -rw-r--r-- 1 hadoop hadoop 60569605 Feb 1 14:24 hadoop-0.20.203.0rc1.tar.gz
  9. 2 配置hadoop-env.sh,指定java位置
  10. vim hadoop/conf/hadoop-env.sh
  11. export JAVA_HOME=/usr/java/jdk1.7.0
  12. 3 配置core-site.xml //定位文件系統的 namenode
  13. [hadoop@hadoop1 ~]$ cat hadoop/conf/core-site.xml
  14. <?xml version="1.0"?>
  15. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  16. <configuration>
  17. <property>

  18. <name>fs.default.namename>
  19. <value>hdfs://namenode:9000value>
  20. property>
  21. configuration>
  22. 4 配置mapred-site.xml //定位jobtracker 所在的主節點
  23. [hadoop@hadoop1 ~]$ cat hadoop/conf/mapred-site.xml
  24. <?xml version="1.0"?>
  25. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  26. <

    configuration>

  27. <property>
  28. <name>mapred.job.trackername>
  29. <value>namenode:9001value>
  30. property>
  31. configuration>
  32. 5 配置hdfs-site.xml //配置HDFS副本數量
  33. [hadoop@hadoop1 ~]$ cat hadoop/conf/hdfs-site.xml
  34. <?xml version="1.0"?>
  35. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

  36. <configuration>
  37. <property>
  38. <name>dfs.replicationname>
  39. <value>3value>
  40. property>
  41. configuration>
  42. 6 配置 master 與 slave 配置文檔
  43. [hadoop@hadoop1 ~]$ cat hadoop/conf/masters
  44. namenode
  45. [hadoop@hadoop1 ~]$ cat hadoop/conf/slaves
  46. datanode1
  47. datanode2
  48. 7 拷貝hadoop 目錄到所有節點(datanode)
  49. [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode1:/home/hadoop/
  50. [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode2:/home/hadoop/

  51. [hadoop@hadoop1 ~]$ scp -r hadoop hadoop@datanode3:/home/hadoop
  52. 8 格式化 HDFS
  53. [hadoop@hadoop1 hadoop]$ bin/hadoop namenode -format
  54. 12/02/02 11:31:15 INFO namenode.NameNode: STARTUP_MSG:
  55. /************************************************************
  56. STARTUP_MSG: Starting NameNode
  57. STARTUP_MSG: host = hadoop1.test.com/127.0.0.1
  58. STARTUP_MSG: args = [-format]
  59. STARTUP_MSG: version = 0.20.203.0
  60. STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
  61. ************************************************************/
  62. Re-format filesystem in /tmp/hadoop-hadoop/dfs/name ? (Y or N) Y //這裡輸入Y
  63. 12/02/02 11:31:17 INFO util.GSet: VM type = 64-bit
  64. 12/02/02 11:31:17 INFO util.GSet: 2% max memory = 19.33375 MB
  65. 12/02/02 11:31:17 INFO util.GSet: capacity = 2

    ^21 = 2097152 entries

  66. 12/02/02 11:31:17 INFO util.GSet: recommended=2097152, actual=2097152
  67. 12/02/02 11:31:17 INFO namenode.FSNamesystem: fsOwner=hadoop
  68. 12/02/02 11:31:18 INFO namenode.FSNamesystem: supergroupsupergroup=supergroup
  69. 12/02/02 11:31:18 INFO namenode.FSNamesystem: isPermissionEnabled=true
  70. 12/02/02 11:31:18 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
  71. 12/02/02 11:31:18 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
  72. 12/02/02 11:31:18 INFO namenode.NameNode: Caching file names occuring more than 10 times
  73. 12/02/02 11:31:18 INFO common.Storage: Image file of size 112 saved in 0 seconds.
  74. 12/02/02 11:31:18 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.

  75. 12/02/02 11:31:18 INFO namenode.NameNode: SHUTDOWN_MSG:
  76. /************************************************************
  77. SHUTDOWN_MSG: Shutting down NameNode at hadoop1.test.com/127.0.0.1
  78. ************************************************************/
  79. [hadoop@hadoop1 hadoop]$
  80. 9 啟動hadoop 守護進程
  81. [hadoop@hadoop1 hadoop]$ bin/start-all.sh
  82. starting namenode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-namenode-hadoop1.test.com.out
  83. datanode1: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop2.test.com.out
  84. datanode2: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop3.test.com.out
  85. datanode3: starting datanode, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-hadoop4.test.com.out
  86. starting jobtracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-jobtracker-hadoop1.test.com.out
  87. datanode1: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop2.test.com.out
  88. datanode2: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop3.test.com.out
  89. datanode3: starting tasktracker, logging to /home/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-hadoop4.test.com.out
  90. 10 驗證
  91. //namenode
  92. [hadoop@hadoop1 logs]$ jps
  93. 2883 JobTracker
  94. 3002 Jps
  95. 2769 NameNode
  96. //datanode
  97. [hadoop@hadoop2 ~]$ jps
  98. 2743 TaskTracker
  99. 2670 DataNode
  100. 2857 Jps
  101. [hadoop@hadoop3 ~]$ jps
  102. 2742 TaskTracker
  103. 2856 Jps
  104. 2669 DataNode
  105. [hadoop@hadoop4 ~]$ jps
  106. 2742 TaskTracker
  107. 2852 Jps
  108. 2659 DataNode
  109. Hadoop 監控web頁面
  110. http://192.168.57.75:50070/dfshealth.jsp



五 簡單驗證HDFS

  1. hadoop 的文件命令格式如下:
  2. hadoop fs -cmd <args>
  3. //建立目錄
  4. [hadoop@hadoop1 hadoop]$ bin/hadoop fs -mkdir /test-hadoop
  5. //査看目錄
  6. [hadoop@hadoop1 hadoop]$ bin/hadoop fs -ls /
  7. Found 2 items
  8. drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:32 /test-hadoop
  9. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
  10. //査看目錄包括子目錄
  11. [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
  12. drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:32 /test-hadoop
  13. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp

  14. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
  15. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
  16. drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
  17. -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
  18. //添加文件
  19. [hadoop@hadoop1 hadoop]$ bin/hadoop fs -put /home/hadoop/hadoop-0.20.203.0rc1.tar.gz /test-hadoop
  20. [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
  21. drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:34 /test-hadoop
  22. -rw-r--r-- 2 hadoop supergroup 60569605 2012-02-02 13:34 /test-hadoop/hadoop-0.20.203.0rc1.tar.gz
  23. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
  24. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
  25. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
  26. drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
  27. -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
  28. //獲取文件
  29. [hadoop@hadoop1 hadoop]$ bin/hadoop fs -get /test-hadoop/hadoop-0.20.203.0rc1.tar.gz /tmp/
  30. [hadoop@hadoop1 hadoop]$ ls /tmp/*.tar.gz
  31. /tmp/1.tar.gz /tmp/hadoop-0.20.203.0rc1.tar.gz
  32. //刪除文件
  33. [hadoop@hadoop1 hadoop]$ bin/hadoop fs -rm /test-hadoop/hadoop-0.20.203.0rc1.tar.gz

  34. Deleted hdfs://namenode:9000/test-hadoop/hadoop-0.20.203.0rc1.tar.gz
  35. [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
  36. drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:57 /test-hadoop
  37. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
  38. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
  39. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
  40. drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
  41. -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
  42. drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:36 /user
  43. -rw-r--r-- 2 hadoop supergroup 321 2012-02-02 13:36 /user/hadoop
  44. //刪除目錄
  45. [hadoop@hadoop1 hadoop]$ bin/hadoop fs -rmr /test-hadoop
  46. Deleted hdfs://namenode:9000/test-hadoop
  47. [hadoop@hadoop1 hadoop]$ bin/hadoop fs -lsr /
  48. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp
  49. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop
  50. drwxr-xr-x - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred
  51. drwx------ - hadoop supergroup 0 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system
  52. -rw------- 2 hadoop supergroup 4 2012-02-02 11:32 /tmp/hadoop-hadoop/mapred/system/jobtracker.info
  53. drwxr-xr-x - hadoop supergroup 0 2012-02-02 13:36 /user

  54. -rw-r--r-- 2 hadoop supergroup 321 2012-02-02 13:36 /user/hadoop
  55. //hadoop fs 幫助(部分)
  56. [hadoop@hadoop1 hadoop]$ bin/hadoop fs -help
  57. hadoop fs is the command to execute fs commands. The full syntax is:
  58. hadoop fs [-fs <local | file system URI>] [-conf <configuration file>]
  59. [-D <propertyproperty=value>] [-ls <path>] [-lsr <path>] [-du <path>]
  60. [-dus <path>] [-mv <src> <dst>] [-cp <src> <dst>] [-rm [-skipTrash] <src

    >]

  61. [-rmr [-skipTrash] <src>] [-put <localsrc> ... <dst>] [-copyFromLocal <localsrc> ... <dst>]
  62. [-moveFromLocal <localsrc> ... <dst>] [-get [-ignoreCrc] [-crc] <src> <localdst>
  63. [-getmerge <src> <localdst> [addnl]] [-cat <src>]
  64. [-copyToLocal [-ignoreCrc] [-crc] <src> <localdst>] [-moveToLocal <src

    > <localdst>]

  65. [-mkdir <path>] [-report] [-setrep [-R] [-w] <rep> <path/file>]
  66. [-touchz <path>] [-test -[ezd] <path>] [-stat [format] <path>]
  67. [-tail [-f] <path>] [-text <path>]
  68. [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
  69. [-chown [-R] [OWNER][:[GROUP]] PATH...]
  70. [-chgrp [-R] GROUP PATH...]
  71. [-count[-q] <path>]
  72. [-help [cmd]]


更多Hadoop 相關知識

結束
Hadoop 環境搭建步驟繁瑣,需要具備一定的Linux 系統知識,需要注意的是,通過以上步驟搭建的Hadoop 環境只能讓你大體了解的hadoop ,如果想將HDFS 用於線上服務,還需對hadoop 配置文檔做進一步配置 ,後續文檔將繼續以博文的形式發布,敬請期待.

本文出自 「dongnan」 博客,請務必保留此出處http://dngood.blog.51cto.com/446195/775368




[火星人 via ] 快速搭建 Hadoop 環境已經有125次圍觀

http://www.coctec.com/docs/linux/show-post-45974.html