原创

最全Hadoop3.4.2安装指南

温馨提示:
本文最后更新于 2026年02月27日,已超过 27 天没有更新。若文章内的图片失效(无法正常加载),请留言反馈或直接联系我

1. 安装前准备

准备3台服务器,我是用3台虚拟机做实验:
192.168.110.11 k8s-master
192.168.110.12 k8s-node-1
192.168.110.13 k8s-node-1
这是我以前安装k8s时搭建的机组,每台4C4GB,直接沿用;

按装好JDK1.8,并设置好环境变量

export JAVA_HOME=/usr/local/jdk1.8.0_471
export PATH=$JAVA_HOME/bin:$PATH

注:为什么使用jdk1.8?因为跑Hive3.1.3只能使用jdk1.8

修改每台服务器的hosts文件:vim /etc/hosts

192.168.110.11 k8s-master
192.168.110.12 k8s-node-1
192.168.110.13 k8s-node-2

设置从k8s-master到k8s-node-1、k8s-node-2的免密登录:
进入.ssh目录,依次执行以下命令

$ cd ~/.ssh               # 如果没有该目录,先执行一次ssh localhost
$ rm ./id_rsa*            # 删除之前生成的公匙(如果有)
$ ssh-keygen -t rsa       # 一直按回车就可以
cat ./id_rsa.pub >> ./authorized_keys

这样,执行命令:ssh k8s-master 就不用输入密码,执行exit退出,将 authorized_keys 文件复制到k8s-node-1、k8s-node-2的~/.ssh目录,验证免密登录:ssh k8s-node-1

将hadop安装包放到目录:/cloud/,解压并重命名文件夹为hadoop3.4.2
设置hadoop环境变量

export JAVA_HOME=/usr/local/jdk1.8.0_471
export HADOOP_HOME=/cloud/hadoop-3.4.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

执行命令:source /etc/profile

2. 修改配置文件

进入目录:/cloud/hadoop-3.4.2/etc/hadoop

修改core-site.xml

<configuration>
    <!-- 指定HDFS老大(namenode)的通信地址 -->
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://k8s-master:9000</value>
    </property>
    <!-- 指定hadoop运行时产生文件的存储路径 -->
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/cloud/hadoop-3.4.2/tmp</value>
    </property>
    <!-- 静态用户 -->
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
    </property>

    <property>
        <name>hadoop.proxyuser.hdfs.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hdfs.groups</name>
        <value>*</value>
    </property>

</configuration>

修改hdfs-site.xml

<configuration>
    <!-- 设置hdfs副本数量 -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <!-- 启用WebHDFS -->
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:9864</value>
    </property>
    <property>
        <name>dfs.datanode.use.datanode.hostname</name>
        <value>false</value>
    </property>
</configuration>

修改mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>k8s-master:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>k8s-master:19888</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=/cloud/hadoop-3.4.2</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=/cloud/hadoop-3.4.2</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=/cloud/hadoop-3.4.2</value>
    </property>
</configuration>

修改yarn-site.xml

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>localhost</value>
  </property>

  <!-- 关键:绑定到所有接口 -->
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>0.0.0.0:8088</value>
  </property>

  <property>
    <name>yarn.resourcemanager.bind-host</name>
    <value>0.0.0.0</value>
  </property>

  <!-- RPC地址也使用localhost -->
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>localhost:8032</value>
  </property>

  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>localhost:8030</value>
  </property>

  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>localhost:8031</value>
  </property>

  <!-- NodeManager配置 -->
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>

  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>

  <!-- Web代理 -->
  <property>
    <name>yarn.web-proxy.address</name>
    <value>localhost:8089</value>
  </property>
</configuration>

修改文件workers

k8s-node-1
k8s-node-2

3. 启动hadoop

3.1 格式化hadoop

进入目录:/cloud/hadoop-3.4.2/etc/hadoop,执行一下命令

hdfs namenode -format -force

3.2 启动hdfs和yarn

# 先启动HDFS
sbin/start-dfs.sh

# 再启动YARN
sbin/start-yarn.sh

为了方便运维,使用启动脚本操作:start-hadoop.sh

#!/bin/bash
#hadoop用户环境变量
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export YARN_PROXYSERVER_USER=root

# 启动HDFS
echo "Starting HDFS..."
sbin/start-dfs.sh

# 启动YARN(如果需要)
echo "Starting YARN..."
sbin/start-yarn.sh

# 启动History Server(如果需要)
echo "Starting History Server..."
sbin/mr-jobhistory-daemon.sh start historyserver

重启脚本:restart-all.sh

#!/bin/bash

echo "=== Hadoop集群重启脚本 ==="
echo "开始时间: $(date)"
echo ""

# 1. 停止服务
echo "1. 停止服务..."
echo "停止历史服务器..."
sbin/mr-jobhistory-daemon.sh stop historyserver 2>/dev/null

echo "停止YARN..."
sbin/stop-yarn.sh 2>/dev/null

echo "停止HDFS..."
sbin/stop-dfs.sh 2>/dev/null

# 等待服务停止
sleep 5

# 检查是否停止
echo "检查停止状态..."
jps_result=$(jps | grep -v Jps)
if [ -n "$jps_result" ]; then
    echo "警告:以下进程仍在运行:"
    echo "$jps_result"
    echo "尝试强制停止..."
    for pid in $(jps | grep -v Jps | awk '{print $1}'); do
        kill -9 $pid 2>/dev/null
    done
    sleep 3
fi

echo ""

# 2. 设置环境变量
echo "2. 设置环境变量..."
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export YARN_PROXYSERVER_USER=root

echo "环境变量设置完成"
echo ""

# 3. 启动服务
echo "3. 启动服务..."
echo "启动HDFS..."
sbin/start-dfs.sh

# 等待HDFS启动
echo "等待HDFS启动..."
sleep 8

echo "启动YARN..."
sbin/start-yarn.sh

# 等待YARN启动
echo "等待YARN启动..."
sleep 5

echo "启动历史服务器..."
sbin/mr-jobhistory-daemon.sh start historyserver

echo ""

# 4. 检查启动状态
echo "4. 检查集群状态..."
sleep 3

echo "Java进程状态:"
jps

echo ""
echo "HDFS状态:"
hdfs dfsadmin -report 2>/dev/null | grep -E "Live datanodes|Configured Capacity" || echo "  无法获取HDFS状态"

echo ""
echo "YARN状态:"
yarn node -list 2>/dev/null | head -5 || echo "  无法获取YARN状态"

echo ""
echo "Web UI地址:"
echo "  NameNode: http://$(hostname):9870"
echo "  ResourceManager: http://$(hostname):8088"

echo ""
echo "结束时间: $(date)"
echo "=== 重启完成 ==="

3.3 验证是否成功

执行jps看到


4. 浏览器查看Web界面

4.1 Hadoop 2.x vs 3.x 端口对比

服务 Hadoop 2.x 端口 Hadoop 3.x 端口 说明
NameNode HTTP 50070 9870 Web管理界面
NameNode HTTPS 50470 9871 加密Web界面
NameNode RPC 8020/9000 8020/9000 客户端RPC通信
DataNode HTTP 50075 9864 DataNode Web界面
DataNode HTTPS 50475 9865 DataNode加密Web
DataNode IPC 50020 9867 DataNode IPC
DataNode RPC 50010 9866 DataNode数据传输
SecondaryNameNode HTTP 50090 9868 SecondaryNameNode Web
SecondaryNameNode HTTPS 50091 9869 SecondaryNameNode加密Web

4.2 YARN 相关端口

端口 服务 用途
8088 ResourceManager Web UI YARN集群管理界面
8030-8033 ResourceManager RPC 应用提交、资源申请
8040-8042 NodeManager Web UI 节点资源管理
8080 JobHistory Server MapReduce作业历史

4.3 管理界面

hdfs管理界面:http://localhost:9870
file

MR的管理界面:http://localhost:8088
file

5.按装过程报错处理

[root@k8s-master hadoop-3.4.2]# sbin/start-dfs.sh
Starting namenodes on [locahost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [k8s-master]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.

不能使用root用户启动hdfs,改为其他用户即可;切换用户继续。

[root@k8s-master hadoop-3.4.2]# su vagrant
[vagrant@k8s-master hadoop-3.4.2]$ sbin/start-dfs.sh
Starting namenodes on [locahost]
locahost: ssh: Could not resolve hostname locahost: Name or service not known
Starting datanodes
k8s-node-2: Warning: Permanently added 'k8s-node-2' (ECDSA) to the list of known hosts.
k8s-node-2: Permission denied (publickey,password).
k8s-node-1: Warning: Permanently added 'k8s-node-1,192.168.110.12' (ECDSA) to the list of known hosts.
k8s-node-1: Permission denied (publickey,password).
Starting secondary namenodes [k8s-master]
k8s-master: Warning: Permanently added 'k8s-master,192.168.110.11' (ECDSA) to the list of known hosts.
k8s-master: Permission denied (publickey,password).

先设置hadoop用户环境变量,修改etc/hadoop/hadoop-env.sh

# 设置Hadoop用户环境变量
# 找到export HDFS_NAMENODE_USER 把值改为root,并添加一下环境变量
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

改完继续执行启动脚本

[root@k8s-master hadoop-3.4.2]# sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
k8s-node-1: bash: /cloud/hadoop-3.4.2/bin/hdfs: No such file or directory
k8s-node-2: bash: /cloud/hadoop-3.4.2/bin/hdfs: No such file or directory

每个节点都要安装hadoop, 直接同步hadoop安装目录到各节点:

rsync -avz /cloud/hadoop-3.4.2/ root@k8s-node-1:/cloud/hadoop-3.4.2/
rsync -avz /cloud/hadoop-3.4.2/ root@k8s-node-2:/cloud/hadoop-3.4.2/

继续启动,成功标志

[root@k8s-master hadoop-3.4.2]# sbin/start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [k8s-master]
[root@k8s-master hadoop-3.4.2]# sbin/start-yarn.sh
Starting resourcemanager
Starting nodemanagers
正文到此结束
本文目录