限于机器个数限制,本次搭建3个节点的集群,其中包含1个master节点(Hdfs namenode and yarn resourceManger)和三个(包括Master)slave节点(hdfs datanode 和Yarn nodeManger)
本次搭建,启动了keberos安全验证
版本
sottware | version |
---|---|
jdk | jdk11 |
hadoop | hadoop-3.2.2.tar.gz |
spark | spark-3.1.1-bin-hadoop3.2.tgz |
ubuntu | Ubuntu18.04 |
准备环境
主机名 | ip |
---|---|
spark1 | 192.168.0.2 |
spark2 | 192.168.0.3 |
spark3 | 192.168.0.4 |
创建用户
在三台服务器上分别创建spark用户
adduser spark #输入密码
实现ssh无密钥访问
如果没有安装ssh安装ssh
apt install openssh-server
生成密钥
ssh-keygen -t rsa
分发密钥
ssh-copy-id spark@192.168.0.2
ssh-copy-id spark@192.168.0.3
ssh-copy-id spark@192.168.0.4
输出如下表示成功
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/spark/.ssh/id_rsa.pub"
The authenticity of host '211.81.248.214 (211.81.248.214)' can't be established.
ECDSA key fingerprint is SHA256:Zhzr91b4jr0h4fYJtuU/S0PLiccjrrg/GH5LlGURPUM.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
spark@211.81.248.214's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'spark@192.168.0.2'"
and check to make sure that only the key(s) you wanted were added.
修改host
192.168.0.2 master
192.168.0.3 worker1
192.168.0.4 worker2
192.168.0.2 EDGE.HADOOP.COM
安装java
//下载java安装包
wget https://download.oracle.com/otn/java/jdk/11.0.11%2B9/ab2da78f32ed489abb3ff52fd0a02b1c/jdk-11.0.11_linux-x64_bin.tar.gz
//解压安装包
tar -zxvf jdk-11.0.11_linux-x64_bin.tar.gz
//切换到root账户,创建/usr/local/spark-dev,并改变目录所有者
mkdir /usr/local/spark-dev
chown -R spark:spark /usr/local/spark-dev
//移动jdk位置
mv jdk-11.0.11 /usr/local/spark-dev/jdk11
配置java环境变量
vim .bashrc
G后A追加以下内容
# java环境变量
export JAVA_HOME=/usr/local/spark-dev/jdk11
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
生效配置文件
source .bashrc
配置成功后显示如下内容
安装Hadoop
上传hadoop安装包家目录下或者使用wget命令进行下载
解压Hadoop
tar -zxvf hadoop-3.2.2.tar.gz
mv hadoop-3.2.2 /usr/local/spark-dev/hadoop-3.2.2
配置相关文件
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://spark1:9000</value>
</property>
</configuration>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/spark-dev/hadoop-3.2.2/hdfs/data</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/spark-dev/hadoop-3.2.2/hdfs/data</value>
</property>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>spark1</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>spark1:54311</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
hadoop-env.sh
export JAVA_HOME=/usr/local/spark-dev/jdk11
把文件传递到其他两个节点
启动hdfs
/usr/local/spark-dev/hadoop-3.2.2/sbin/start-dfs.sh
/usr/local/spark-dev/hadoop-3.2.2/sbin/start-yarn.sh
主节点显示3658
Jps
3420 SecondaryNameNode
2895 NameNode
3344 ResourceManger
从节点显示
spark@amax:/usr/local/spark-dev$ jps
13049 DataNode
1233 NodeManger
20267 Jps
安装spark
tar -zxvf spark-3.1.1-bin-hadoop3.2.tgz
mv spark-3.1.1-bin-hadoop3.2 /usr/local/spark-dev/spark-3.1.1
配置spark-env.sh
export HADOOP_CONF_DIR=/usr/local/spark-dev/hadoop-3.2.2/etc/hadoop
export YARN_CONF_DIR=/usr/local/spark-dev/hadoop-3.2.2/etc/hadoop
export JAVA_HOME=/usr/local/spark-dev/jdk11
配置spark-defaluts.conf
spark.master yarn
spark.yarn.jars hdfs://spark1:9008/sparkjars/*
spark.driver.memory 2048m
spark.yarn.am.memory 2048m
spark.executor.memory 2048m