hadoop2.5.2分布式环境安装配置
我目前还处于hadoop的入门阶段,个人认为hadoop这样的分布式计算、分布式存储技术前景很广阔,所以准备花些精力好好学学。入门当然从安装开始。我在环境是一台IBM X3650的X86服务器,RHEL 5.6_x64,在其上装的virtualbox,通过virtualbox虚出了4台机器。
一、环境准备
OS:OEL 6.6_x64
hadoop版本:hadoop-2.5.2,http://hadoop.apache.org/网站下载的。
jdk版本:jdk-6u31-linux-x64.bin
1.四台测试集群
一个master(hdm01),三个slave(hds01,hds02,hds03),几台服务/etc/hosts设置,原内容下类似如下内容:
192.168.56.91 hdm01
192.168.56.92 hds01
192.168.56.93 hds02
192.168.56.94 hds03
网上有贴子说需去掉localhost这一行,我没去也配成了。
2.每台虚拟主机建一个hadoop用户
[root@hdm01 etc]# su – hadoop
id[hadoop@hdm01 ~]$ id
uid=500(hadoop) gid=500(hadoop) groups=500(hadoop) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
很多人直接用root来实践了,这也没问题,用root实践对于初学者来说更容易。我这里单建了一个hadoop用户目标是使我的测试环境更加接近日后的生产环境。不过因单建了hadoop用户,后续的ssh互信任折腾了好半天。
3. 设置hadoop用户ssh互信任
具体这里不讲,baidu一下就能看到方法,需注意的内容如下:
1)ssh互信对权限要求很严格,权限给的过大过小貌似都不可以,可以参见我的输出
[hadoop@hdm01 ~]$ ls -ld .ssh
drwx——. 2 hadoop hadoop 4096 Mar 12 16:14 .ssh
[hadoop@hdm01 ~]$ ls -l .ssh
total 16
-rw——-. 1 hadoop hadoop 1576 Mar 12 16:12 authorized_keys
-rw——-. 1 hadoop hadoop 1675 Mar 12 16:09 id_rsa
-rw-r–r–. 1 hadoop hadoop 394 Mar 12 16:09 id_rsa.pub
-rw-r–r–. 1 hadoop hadoop 1993 Mar 13 21:21 known_hosts
2)先ssh一下自己,然后ssh一下其成员,敲入yes后以后就不用再敲了。
4.java环境配置
通过系统安装盘安上的一般都是openjdk的包,能用但担心兼容的问题,所以我换成了常用的,替换过程如下:
1)卸载原有的openjdk,为了好看,所以不卸也可以。
2)上传到服务器某一目录,解包
./jdk-6u31-linux-x64.bin
3)挪目录
mv jdk1.6.0_31/ /usr/local/
4)变更环境变量
/etc/profile或用户自己的.bash_profile加入如下内容:
JAVA_HOME=/usr/local/jdk1.6.0_31
export JRE_HOME=/usr/local/jdk1.6.0_31/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
5)执行(root用户)
update-alternatives –install /usr/bin/java java /usr/local/jdk1.6.0_31/bin/java 300
update-alternatives –install /usr/bin/javac javac /usr/local/jdk1.6.0_31/bin/javac 300
update-alternatives –config java
二、下载解压hadoop-2.5.2.tar.gz
1.下载链接:
http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.5.2/hadoop-2.5.2.tar.gz
2.解压并变更权限
[hadoop@hdm01 ~]$ ls -ld /hadoop*
drwxr-xr-x. 13 hadoop hadoop 4096 Mar 17 16:14 /hadoop-2.5.2
三、配置环境变量
1.[hadoop@hdm01 ~]$ cat .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH
#qiuyb add
export JAVA_HOME=/usr/local/jdk1.6.0_31
export JRE_HOME=$JAVA_HOME/jre
export PATH=$PATH:$JRE_HOME/bin
export CLASSPATH=.:CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
#set hadoop_env
export HADOOP_HOME=/hadoop-2.5.2
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=”-Djava.library.path=$HADOOP_HOME/lib”
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
2.同时需要修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh,yarn-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk1.6.0_31
四、改hadoop配置文件
1.修改$HADOOP_HOME/etc/hadoop/core-site.xml
添加:
2.修改$HADOOP_HOME/etc/hadoop/yarn-site.xml
添加:
更多yarn-site.xml参数配置可参考:
http://hadoop.apache.org/docs/r2.5.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
3.修改$HADOOP_HOME/etc/hadoop/mapred-site.xml
默认没有mapred-site.xml文件,copy mapred-site.xml.template 一份为 mapred-site.xml即可
#cp etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml
添加:
4.配置hdfs-site.xml
修改:/usr/local/hadoop/etc/hadoop/hdfs-site.xml
添加:
5.配置salves
告诉hadoop 其他从节点,这样,只要主节点启动,他会自动启动其他机器上的nameNode dataNode 等等
编辑 $HADOOP_HOME/etc/hadoop/slaves
[hadoop@hdm01 hadoop]$ cat slaves
hds01
hds02
hds03
6.同步同步该文件夹 到其他各个从主机上即可
因为我们使用ssh免登陆 不需要使用密码
举例:
[hadoop@hds01 /]$scp -r hdm01:/hadoop-2.5.2 ./
需要说明的是我把hadoop的目录放在了根,上面的命令实际是有问题的,我实际用的是root做的scp然后改的权限,所以实践说明把hadoop放在一个二级目录更回合适。
五、格式化hdfs
1.命令
[hadoop@hds01 /]$hdfs namenode -format
这个命令的输出较多,这里略了,报错能看出。
2.验证
format是否成功也可以通过如下命令验证:
[hadoop@hdm01 ~]$ hdfs dfsadmin -report
Configured Capacity: 316262817792 (294.54 GB)
Present Capacity: 260433219584 (242.55 GB)
DFS Remaining: 260432723968 (242.55 GB)
DFS Used: 495616 (484 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
————————————————-
Live datanodes (3):
Name: 192.168.56.93:50010 (hds02)
Hostname: hds02
Decommission Status : Normal
Configured Capacity: 105420939264 (98.18 GB)
DFS Used: 221184 (216 KB)
Non DFS Used: 18611060736 (17.33 GB)
DFS Remaining: 86809657344 (80.85 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.35%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 18 10:12:44 CST 2015
Name: 192.168.56.92:50010 (hds01)
Hostname: hds01
Decommission Status : Normal
Configured Capacity: 105420939264 (98.18 GB)
DFS Used: 196608 (192 KB)
Non DFS Used: 18611609600 (17.33 GB)
DFS Remaining: 86809133056 (80.85 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.35%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 18 10:12:45 CST 2015
Name: 192.168.56.94:50010 (hds03)
Hostname: hds03
Decommission Status : Normal
Configured Capacity: 105420939264 (98.18 GB)
DFS Used: 77824 (76 KB)
Non DFS Used: 18606927872 (17.33 GB)
DFS Remaining: 86813933568 (80.85 GB)
DFS Used%: 0.00%
DFS Remaining%: 82.35%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Wed Mar 18 10:12:44 CST 2015
六、启动hadoop集群
[hadoop@hdm01 hadoop-2.5.2]$./sbin/start-dfs.sh
[hadoop@hdm01 hadoop-2.5.2]$./sbin/start-yarn.sh
七、验证安装是否成功
1、浏览器打开
http://hdm01:50070/,会看到hdfs管理页面
2、每台机器执行jps命令看输出,正确输出类似如下:
[hadoop@hdm01 ~]$ jps
5661 Jps
30786 NameNode
30986 SecondaryNameNode
31122 ResourceManager
[hadoop@hds01 ~]$ jps
342 NodeManager
32688 DataNode
4873 Jps
[hadoop@hds02 ~]$ jps
32443 NodeManager
4803 Jps
32321 DataNode
[hadoop@hds03 ~]$ jps
4115 Jps
32067 DataNode
32189 NodeManager
八、感受下MapReduce(WordCount验证)
1.dfs上创建input目录
[hadoop@hdm01 hadoop-2.5.2]$bin/hadoop fs -mkdir -p input
2.把hadoop目录下的README.txt拷贝到dfs新建的input里
[hadoop@hdm01 hadoop-2.5.2]$bin/hadoop fs -copyFromLocal README.txt input
3.运行WordCount
1)命令
[hadoop@hdm01 hadoop-2.5.2]$bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.5.2-sources.jar org.apache.hadoop.examples.WordCount input output
2)输出
15/03/18 10:23:54 INFO client.RMProxy: Connecting to ResourceManager at hdm01/192.168.56.91:8032
15/03/18 10:23:56 INFO input.FileInputFormat: Total input paths to process : 1
15/03/18 10:23:56 INFO mapreduce.JobSubmitter: number of splits:1
15/03/18 10:23:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1426580122734_0002
15/03/18 10:23:57 INFO impl.YarnClientImpl: Submitted application application_1426580122734_0002
15/03/18 10:23:57 INFO mapreduce.Job: The url to track the job: http://hdm01:8088/proxy/application_1426580122734_0002/
15/03/18 10:23:57 INFO mapreduce.Job: Running job: job_1426580122734_0002
15/03/18 10:24:13 INFO mapreduce.Job: Job job_1426580122734_0002 running in uber mode : false
15/03/18 10:24:13 INFO mapreduce.Job: map 0% reduce 0%
15/03/18 10:24:24 INFO mapreduce.Job: map 100% reduce 0%
15/03/18 10:24:35 INFO mapreduce.Job: map 100% reduce 100%
15/03/18 10:24:35 INFO mapreduce.Job: Job job_1426580122734_0002 completed successfully
15/03/18 10:24:36 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=1836
FILE: Number of bytes written=197705
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1477
HDFS: Number of bytes written=1306
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=8384
Total time spent by all reduces in occupied slots (ms)=8633
Total time spent by all map tasks (ms)=8384
Total time spent by all reduce tasks (ms)=8633
Total vcore-seconds taken by all map tasks=8384
Total vcore-seconds taken by all reduce tasks=8633
Total megabyte-seconds taken by all map tasks=8585216
Total megabyte-seconds taken by all reduce tasks=8840192
Map-Reduce Framework
Map input records=31
Map output records=179
Map output bytes=2055
Map output materialized bytes=1836
Input split bytes=111
Combine input records=179
Combine output records=131
Reduce input groups=131
Reduce shuffle bytes=1836
Reduce input records=131
Reduce output records=131
Spilled Records=262
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=356
CPU time spent (ms)=2380
Physical memory (bytes) snapshot=311283712
Virtual memory (bytes) snapshot=1663029248
Total committed heap usage (bytes)=136581120
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1366
File Output Format Counters
Bytes Written=1306
4.运行完毕后,查看单词统计结果
1)命令
[hadoop@hdm01 hadoop-2.5.2]$bin/hadoop fs -cat output/*
2)输出
(BIS), 1
(ECCN) 1
(TSU) 1
(see 1
5D002.C.1, 1
740.13) 1
Administration 1
Apache 1
BEFORE 1
BIS 1
Bureau 1
Commerce, 1
Commodity 1
Control 1
Core 1
Department 1
ENC 1
Exception 1
Export 2
For 1
Foundation 1
Government 1
Hadoop 1
Hadoop, 1
Industry 1
Jetty 1
License 1
Number 1
Regulations, 1
SSL 1
Section 1
Security 1
See 1
Software 2
Technology 1
The 4
This 1
U.S. 1
Unrestricted 1
about 1
algorithms. 1
and 6
and/or 1
another 1
any 1
as 1
asymmetric 1
at: 2
both 1
by 1
check 1
classified 1
code 1
code. 1
concerning 1
country 1
country’s 1
country, 1
cryptographic 3
currently 1
details 1
distribution 2
eligible 1
encryption 3
exception 1
export 1
following 1
for 3
form 1
from 1
functions 1
has 1
have 1
http://hadoop.apache.org/core/ 1
http://wiki.apache.org/hadoop/ 1
if 1
import, 2
in 1
included 1
includes 2
information 2
information. 1
is 1
it 1
latest 1
laws, 1
libraries 1
makes 1
manner 1
may 1
more 2
mortbay.org. 1
object 1
of 5
on 2
or 2
our 2
performing 1
permitted. 1
please 2
policies 1
possession, 2
project 1
provides 1
re-export 2
regulations 1
reside 1
restrictions 1
security 1
see 1
software 2
software, 2
software. 2
software: 1
source 1
the 8
this 3
to 2
under 1
use, 2
uses 1
using 2
visit 1
website 1
which 2
wiki, 1
with 1
written 1
you 1
your 1
如程序的输出路径为output,如果该文件夹已经存在,先删除
[hadoop@hdm01 hadoop-2.5.2]$bin/hadoop dfs -rmr output
近期评论