BigData and HadoopTechnologyUncategorized

Setup Multi-Node Hadoop Cluster Using Ambari

Planning to install a multinode Hadoop cluster and feeling confused about which Hadoop platform you should install and how and what components?

Planning to install a multinode Hadoop cluster and feeling confused about which Hadoop platform you should install and how and what components?

Installing a multi-node Hadoop cluster for production could be overwhelming at times due to the number of services used in different Hadoop platforms.

There is a total of three flavors of Hadoop distribution available in the market.

  1. Apache Hadoop.
  2. Hortonworks Hadoop Platform, HDP.
  3. Cloudera Hadoop.

We are going to see in detail here in this article how to build a production-grade multi-node Hadoop cluster from scratch With Centos 7.

Before we proceed further please check in below pre-requisites which need to be fulfilled before the beginning of the installation.

Memory Requirements

The Ambari host should have at least 1 GB RAM, with 500 MB free. To check the available memory on any host, run:

free -m

Maximum Open Files Requirements

The recommended maximum number of open file descriptors is 10000, or more. To check the current value set for the maximum number of open file descriptors, execute the following shell commands on each host:

ulimit -Sn ulimit -Hn 

If the output is not greater than 10000, run the following command to set it to a suitable default:

ulimit -n 10000

Check hostname and FQDN

hostname -f

Please check you should update a complete FQDN, FQDN should be resolved with reverse and direct DNS lookup queries.

Setup Password-Less SSH

Passwordless ssh need to be set up with a host where you are going to install Ambari server to target hosts which are going to be either data node, secondary name nodes or hosting other HDPservices.

Note: This process should be completed with the user which you are going to use for the Hadoop installation and Ambari-server setup. if it is a non-root user you have to follow a bit long process and update some commands and configs for ambari-agents in the sudoers file.

  1. Generate public and private SSH keys on the Ambari Server host.
ssh-keygen 

2. Copy the SSH Public Key (id_rsa.pub) to the root account on your target hosts.

 .ssh/id_rsa .ssh/id_rsa.pub 

3. Add the SSH Public Key to the authorized_keys file on your target hosts.

Note: This step should also be completed for the host that is hosting Ambari server as well apart from other target hosts.

 cat id_rsa.pub >> authorized_keys

4. Depending on your version of SSH, you may need to set permissions on the .ssh directory (to 700) and the authorized_keys file in that directory (to 600) on the target hosts.

chmod 700 ~/.ssh chmod 600 ~/.ssh/authorized_keys 

5. From the Ambari Server, make sure you can connect to each host in the cluster using SSH, without having to enter a password.

ssh root@<hostip>

Enable NTP on the Cluster and on the Browser Host

The clocks of all the nodes in your cluster and the machine that runs the browser through which you access the Ambari Web interface must be able to synchronize with each other.

To install the NTP service and ensure it’s started on boot, run the following commands on each host:

yum install -y ntp 
systemctl enable ntpd

Check DNS and NSCD(Naming Service Caching Deamon)

update IP and host FQDN in /etc/hosts file.

vi /etc/hosts

Add below line

1.2.3.4 <fully.qualified.domain.name>

Set Hostname

hostname <fully.qualified.domain.name>
hostname -f

hostname -f command should return your FQDN you just set.

Edit the Network Configuration File

vi /etc/sysconfig/network

Modify the HOSTNAME property to set the fully qualified domain name.

NETWORKING=yes 
HOSTNAME=<fully.qualified.domain.name>

Configuring iptables

For Ambari to communicate during setup with the hosts it deploys to and manages, certain ports must be open and available.

systemctl disable firewalld 
service firewalld stop

Disable SELinux and PackageKit and check the umask Value

You must disable SELinux for the Ambari setup to function. On each host in your cluster, enter:

setenforce 0

UMASK (User Mask or User file creation MASK) sets the default permissions or base permissions granted when a new file or folder is created on a Linux machine. Most Linux distros set 022 as the default umask value. A umask value of 022 grants read, write, execute permissions of 755 for new files or folders.

A umask value of 027 grants read, write, execute permissions of 750 for new files or folders. Ambari, HDP, and HDF support umask values of 022 (0022 is functionally equivalent), 027 (0027 is functionally equivalent). These values must be set on all hosts.

Setting the umask for your current login session:

umask 0022 

Checking your current umask:

umask 0022 

Permanently changing the umask for all interactive users:

echo umask 0022 >> /etc/profile

Installing Ambari

Please if you do not have internet access in your environment then you may have to follow the procedure to set up a local repository.

If you have internet access, please follow the below procedure.

Downloading Ambari Repositories RHEL/CentOS/Oracle Linux 7.

Steps

1. Log in to your host as root.

2. Download the Ambari repository file to a directory on your installation host.

wget -nv http://public-repo-1.hortonworks.com/ambari/centos7/2.x/updates/2. 6.0.0/ambari.repo -O /etc/yum.repos.d/ambari.repo

Install the Ambari Server

Install the Ambari bits. This also installs the default PostgreSQL Ambari database.

Note: This should be done with sudo or root user.

yum install ambari-server

Set Up the Ambari Server

Before starting the Ambari Server, you must set up the Ambari Server. Setup configures Ambari to talk to the Ambari database, installs the JDK and allows you to customize the user account the Ambari Server daemon will run as.

Note: if you wish to make the Ambari server run with some non-root user you have to do the config for /etc/sudeors and add some entries of config to allow non-root users to run the Ambari server.

ambari-server setup

Note: By default, Ambari Server runs under root. Accept the default (n) at the Customize user account for ambari-server daemon prompt, to proceed as root.

For non-root user add following config at each target host in /etc/sudoers file.

# hadoopadmin Customizable Users
hadoopadmin ALL=(ALL) NOPASSWD:SETENV: /bin/su hdfs *,/bin/su ambari-qa *,/bin/su ranger *,/bin/su zookeeper *,/bin/su knox *,/bin/su falcon *,/bin/su ams *, /bin/su flume *,/bin/su hbase *,/bin/su spark *,/bin/su accumulo *,/bin/su hive *,/bin/su hcat *,/bin/su kafka *,/bin/su mapred *,/bin/su oozie *,/bin/su sqoop *,/bin/su storm *,/bin/su tez *,/bin/su atlas *,/bin/su yarn *,/bin/su kms *,/bin/su activity_analyzer *,/bin/su livy *,/bin/su zeppelin *,/bin/su infra-solr *,/bin/su logsearch *,/bin/su druid *,/bin/su superset *# hadoopadmin: Core System Commands
hadoopadmin ALL=(ALL) NOPASSWD:SETENV: /usr/bin/yum,/usr/bin/zypper,/usr/bin/apt-get, /bin/mkdir, /usr/bin/test, /bin/ln, /bin/ls, /bin/chown, /bin/chmod, /bin/chgrp, /bin/cp, /usr/sbin/setenforce, /usr/bin/test, /usr/bin/stat, /bin/mv, /bin/sed, /bin/rm, /bin/kill, /bin/readlink, /usr/bin/pgrep, /bin/cat, /usr/bin/unzip, /bin/tar, /usr/bin/tee, /bin/touch, /usr/bin/mysql, /sbin/service mysqld *, /usr/bin/dpkg *, /bin/rpm *, /usr/sbin/hst *, /sbin/service rpcbind *, /sbin/service portmap *# hadoopadmin: Hadoop and Configuration Commands
hadoopadmin ALL=(ALL) NOPASSWD:SETENV: /usr/bin/hdp-select, /usr/bin/conf-select, /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh, /usr/lib/hadoop/bin/hadoop-daemon.sh, /usr/lib/hadoop/sbin/hadoop-daemon.sh, /usr/bin/ambari-python-wrap *# hadoopadmin: System User and Group Commands
hadoopadmin ALL=(ALL) NOPASSWD:SETENV: /usr/sbin/groupadd, /usr/sbin/groupmod, /usr/sbin/useradd, /usr/sbin/usermod# hadoopadmin: Knox Commands
hadoopadmin ALL=(ALL) NOPASSWD:SETENV: /usr/bin/python2.6 /var/lib/ambari-agent/data/tmp/validateKnoxStatus.py *, /usr/hdp/current/knox-server/bin/knoxcli.sh# hadoopadmin: Ranger Commands
hadoopadmin ALL=(ALL) NOPASSWD:SETENV: /usr/hdp/*/ranger-usersync/setup.sh, /usr/bin/ranger-usersync-stop, /usr/bin/ranger-usersync-start, /usr/hdp/*/ranger-admin/setup.sh *, /usr/hdp/*/ranger-knox-plugin/disable-knox-plugin.sh *, /usr/hdp/*/ranger-storm-plugin/disable-storm-plugin.sh *, /usr/hdp/*/ranger-hbase-plugin/disable-hbase-plugin.sh *, /usr/hdp/*/ranger-hdfs-plugin/disable-hdfs-plugin.sh *, /usr/hdp/current/ranger-admin/ranger_credential_helper.py, /usr/hdp/current/ranger-kms/ranger_credential_helper.py, /usr/hdp/*/ranger-*/ranger_credential_helper.py# hadoopadmin Infra and LogSearch Commands
hadoopadmin ALL=(ALL) NOPASSWD:SETENV: /usr/lib/ambari-infra-solr/bin/solr *, /usr/lib/ambari-logsearch-logfeeder/run.sh *, /usr/sbin/ambari-metrics-grafana *, /usr/lib/ambari-infra-solr-client/solrCloudCli.sh *#sudo defaults ambari agents
Defaults exempt_group = hadoopadmin
Defaults !env_reset,env_delete-=PATH
Defaults: hadoopadmin !requiretty

Start the Ambari Server

• Run the following command on the Ambari Server host:

Note: Start the ambari server with the user you configured in above step.

ambari-server start 
  • To check the Ambari Server processes:
 ambari-server status 
  • To stop the Ambari Server:
ambari-server stop

Once setup completes and ambari server is started successfully next step is to login in ambari console. Default password for ambari console is admin/admin. you can access ambari GUI as shown below:

http:// <IP>:8080

When you login first time in ambari console, there will be an option called launch install wizard, just click on it.

Name your cluster and select version

Choosing repository

Just choose a single repository of your use and remove other links.In our case let redhat7 be there and remove all others.

Just check the “Skip Repository Base URL Validation”

Install Options

During Install Options put your hostname one per line to set up a multi-node cluster.

In private key, update the ssh key you created at Ambari host with the user you are going to install and connect to target hosts.

Put username and port and click next.

It will try to install the agent in remote target hosts if everything is configured correctly. In case if it fails due to the below error.

ERROR 2017-07-21 14:33:56,892 NetUtil.py:84 - EOF occurred in violation of protocol (_ssl.c:765)ERROR 2017-07-21 14:33:56,892 NetUtil.py:85 - SSLError: Failed to connect. Please check openssl library versions.

Update file /etc/ambari-agent/conf/ambari-agent.ini Undeer [security] header.

sudo vi /etc/ambari-agent/conf/ambari-agent.ini

Add below setting under [security] header.

force_https_protocol=PROTOCOL_TLSv1_2

SuccessFul Registered and installed Agents

Post this step select Hadoop services you wish to install in your cluster and Assign the node for master services and click next.

Assign the nodes for slaves and clients and click next.

At customize service step you may define many properties like username and password for hive meta store DB and Oozie DB. Also, you can set Namenode directories and the Datanode directory path.

Please note while putting directory path for data node machine always use non-lvm based mount points. Click next.

Under Review Section check the summary presented and click Deploy.

Components installation in progress for different target hosts

Login into Ambari server and check whether all Hadoop services are running successfully if there are some alarms you have to fix individual components and restart the service.

Live HDP services monitored by Ambari

So, Initially installation looks complex but Ambari made it simpler.

If you are going to install each component in every target host this is going to consume a lot of time and it will be very complex.

Yet in the second approach where we install Ambari first and then install HDP cluster with Ambari, all hassle is taken care of by Ambari.

Show More

Related Articles

Leave a Reply

Back to top button