基于Yarn的Spark完全分布式搭建

2021年4月22日 62点热度 0条评论 来源: 大猛犸

限于机器个数限制,本次搭建3个节点的集群,其中包含1个master节点(Hdfs namenode and yarn resourceManger)和三个(包括Master)slave节点(hdfs datanode 和Yarn nodeManger)

本次搭建,启动了keberos安全验证

版本

sottware version
jdk jdk11
hadoop hadoop-3.2.2.tar.gz
spark spark-3.1.1-bin-hadoop3.2.tgz
ubuntu Ubuntu18.04

准备环境

主机名 ip
spark1 192.168.0.2
spark2 192.168.0.3
spark3 192.168.0.4

创建用户

在三台服务器上分别创建spark用户

adduser spark #输入密码

实现ssh无密钥访问

如果没有安装ssh安装ssh

apt install openssh-server

生成密钥

ssh-keygen -t rsa

分发密钥

ssh-copy-id  spark@192.168.0.2
ssh-copy-id  spark@192.168.0.3
ssh-copy-id  spark@192.168.0.4

输出如下表示成功

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/spark/.ssh/id_rsa.pub"
The authenticity of host '211.81.248.214 (211.81.248.214)' can't be established.
ECDSA key fingerprint is SHA256:Zhzr91b4jr0h4fYJtuU/S0PLiccjrrg/GH5LlGURPUM.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
spark@211.81.248.214's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'spark@192.168.0.2'"
and check to make sure that only the key(s) you wanted were added.

修改host

192.168.0.2 master         
192.168.0.3 worker1                
192.168.0.4 worker2         

192.168.0.2 EDGE.HADOOP.COM

安装java

//下载java安装包
wget https://download.oracle.com/otn/java/jdk/11.0.11%2B9/ab2da78f32ed489abb3ff52fd0a02b1c/jdk-11.0.11_linux-x64_bin.tar.gz
//解压安装包
tar -zxvf jdk-11.0.11_linux-x64_bin.tar.gz
//切换到root账户,创建/usr/local/spark-dev,并改变目录所有者
mkdir /usr/local/spark-dev
chown -R spark:spark /usr/local/spark-dev
//移动jdk位置
mv jdk-11.0.11 /usr/local/spark-dev/jdk11

配置java环境变量

vim  .bashrc

G后A追加以下内容

 # java环境变量
export JAVA_HOME=/usr/local/spark-dev/jdk11
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

生效配置文件

source .bashrc

配置成功后显示如下内容

安装Hadoop

上传hadoop安装包家目录下或者使用wget命令进行下载
解压Hadoop

tar -zxvf hadoop-3.2.2.tar.gz
 mv hadoop-3.2.2 /usr/local/spark-dev/hadoop-3.2.2

配置相关文件

core-site.xml

<configuration>
          <property>
                  <name>fs.defaultFS</name>
                  <value>hdfs://spark1:9000</value>
          </property>
 </configuration>

hdfs-site.xml
    <property>
            <name>dfs.replication</name>
            <value>3</value>
    </property>
    <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:///usr/local/spark-dev/hadoop-3.2.2/hdfs/data</value>
    </property>
    <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:///usr/local/spark-dev/hadoop-3.2.2/hdfs/data</value>
    </property>

yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
</property>
<property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>spark1</value>
</property>
</configuration>

mapred-site.xml

<configuration>

        <property>
                <name>mapreduce.jobtracker.address</name>
                <value>spark1:54311</value>
        </property>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>

</configuration>


hadoop-env.sh

export JAVA_HOME=/usr/local/spark-dev/jdk11

把文件传递到其他两个节点
启动hdfs

/usr/local/spark-dev/hadoop-3.2.2/sbin/start-dfs.sh
/usr/local/spark-dev/hadoop-3.2.2/sbin/start-yarn.sh

主节点显示3658
Jps

3420 SecondaryNameNode
2895 NameNode
3344 ResourceManger

从节点显示

spark@amax:/usr/local/spark-dev$ jps
13049 DataNode
1233 NodeManger
20267 Jps

安装spark

tar -zxvf spark-3.1.1-bin-hadoop3.2.tgz
mv spark-3.1.1-bin-hadoop3.2 /usr/local/spark-dev/spark-3.1.1

配置spark-env.sh

export HADOOP_CONF_DIR=/usr/local/spark-dev/hadoop-3.2.2/etc/hadoop
export YARN_CONF_DIR=/usr/local/spark-dev/hadoop-3.2.2/etc/hadoop
export JAVA_HOME=/usr/local/spark-dev/jdk11

配置spark-defaluts.conf

spark.master yarn
spark.yarn.jars hdfs://spark1:9008/sparkjars/*
spark.driver.memory 2048m
spark.yarn.am.memory 2048m
spark.executor.memory 2048m

    原文作者:大猛犸
    原文地址: https://blog.csdn.net/zhaoyiwa/article/details/115872716
    本文转自网络文章,转载此文章仅为分享知识,如有侵权,请联系管理员进行删除。