In the past, I have written an article about creating a multi-node Hadoop cluster setup from scratch.
After installing Ambari and completing Hadoop multi-node cluster setup, sometimes you need to uninstall Hadoop components and re-install the Hadoop cluster from scratch. However, I know these times are very rare especially in production. Still, during the learning curve, it is very important to know how we can uninstall services/components completely from the Hadoop cluster.
It looks easy if you are aware of it already but if you are a new learner then you will surely find it unclear what things you need to do to completely scratch the Hadoop cluster.
Ambari is a cluster management tool that has made the administrator’s life quite easy in terms of effort by providing GUI based tools and automated wizards to complete the administrator’s tasks.
Still, there are many things which you need to do, to completely scratch a Hadoop cluster, it is not only about stopping the services and deleting Hadoop components from Ambari.
So let’s see how we can delete the Hadoop cluster completely from the system.
To save some time, we will not hesitate to use Ambari to stop services and delete components.
Delete a Service
First of all, you have to stop the services running on all hosts, this you can easily do via Ambari console.
- Click on Services-> Go to service action ->stop
Doing this will stop the particular service in all hosts.
2. Delete the Service
after service is stopped successfully you can follow the same menu as in step 1 and delete the stopped service.
After this service will be deleted successfully, due to any reason if service is not deleting it could happen due to the following reasons.
- Service is not stopped successfully, in this case, log in to a particular host and stop all the services in all hosts manually using CLI.
- Service is stopped but admin is not able to delete it, this could happen due to the parent/dependent process. Ambari will throw a message for such cases and will let you know about deleting the parent/dependent service first and then delete the child.
for e.g, Yarn and Mapreduce services are dependant, you have to delete MapReduce before YARN.
Cleaning the Host
Now the service is deleted from the host but there are certain directories and users still available which Ambari created during wizard installation, those will still persist in the system. you have to clean every host of the cluster before reinstalling the components.
Ambari provide an automated way of cleaning the hosts, using this script you can delete all home directories, config directories, and log directories and users in one go. If you wish to retain users that Ambari created in the last installation you can retain them with the “ — skip=users” option.
Run below script in each host of the Hadoop Cluster
python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent --skip=users
The above script will retain users as mentioned earlier but delete packages and all directories. However, some directories may not be deleted even with a script that you can delete manually.
If some packages are also not deleted with above script you have to remove them manually using package manager.
yum remove hive\* yum remove oozie\* yum remove pig\* yum remove zookeeper\* yum remove tez\* yum remove hbase\* yum remove ranger\* yum remove knox\* yum remove storm\* yum remove accumulo\* yum remove falcon\* yum remove ambari-metrics-hadoop-sink yum remove smartsense-hst yum remove slider_2_4_2_0_258 yum remove ambari-metrics-monitor yum remove spark2_2_5_3_0_37-yarn-shuffle yum remove spark_2_5_3_0_37-yarn-shuffle yum remove ambari-infra-solr-client
Here we are not deleting Ambari but just resetting it, if you wish to delete Ambari you can do that as well it totally depends on your requirement.
To erase Ambari
ambari-server stop ambari-agent stop yum erase ambari-server yum erase ambari-agent
Retain Ambari, Just Erase DB and its Data.
ambari-server stop ambari-server reset
Setup Fresh Ambari DB
ambari-server setup ambari-server start
Now you can log in into Ambari Url using https://Host-IP:8080 with default username and password. What you will be able to see a fresh Ambari Console to Launch new installation wizard.
Remove log directories on all hosts
rm -rf /var/log/ambari-agent rm -rf /var/log/ambari-metrics-grafana rm -rf /var/log/ambari-metrics-monitor rm -rf /var/log/ambari-server/ rm -rf /var/log/falcon rm -rf /var/log/flume rm -rf /var/log/hadoop rm -rf /var/log/hadoop-mapreduce rm -rf /var/log/hadoop-yarn rm -rf /var/log/hive rm -rf /var/log/hive-hcatalog rm -rf /var/log/hive2 rm -rf /var/log/hst rm -rf /var/log/knox rm -rf /var/log/oozie rm -rf /var/log/solr rm -rf /var/log/zookeeper
Remove Hadoop directories including HDFS data on all hosts
rm -rf /hadoop/* rm -rf /hdfs/hadoop rm -rf /hdfs/lost+found rm -rf /hdfs/var rm -rf /local/opt/hadoop rm -rf /tmp/hadoop rm -rf /usr/bin/hadoop rm -rf /usr/hdp rm -rf /var/hadoop
Remove config directories on all hosts
rm -rf /etc/ambari-agent rm -rf /etc/ambari-metrics-grafana rm -rf /etc/ambari-server rm -rf /etc/ams-hbase rm -rf /etc/falcon rm -rf /etc/flume rm -rf /etc/hadoop rm -rf /etc/hadoop-httpfs rm -rf /etc/hbase rm -rf /etc/hive rm -rf /etc/hive-hcatalog rm -rf /etc/hive-webhcat rm -rf /etc/hive2 rm -rf /etc/hst rm -rf /etc/knox rm -rf /etc/livy rm -rf /etc/mahout rm -rf /etc/oozie rm -rf /etc/phoenix rm -rf /etc/pig rm -rf /etc/ranger-admin rm -rf /etc/ranger-usersync rm -rf /etc/spark2 rm -rf /etc/tez rm -rf /etc/tez_hive2 rm -rf /etc/zookeeper
Remove library folders on cluster Hosts
rm -rf /usr/lib/ambari-agent rm -rf /usr/lib/ambari-infra-solr-client rm -rf /usr/lib/ambari-metrics-hadoop-sink rm -rf /usr/lib/ambari-metrics-kafka-sink rm -rf /usr/lib/ambari-server-backups rm -rf /usr/lib/ams-hbase rm -rf /usr/lib/mysql rm -rf /var/lib/ambari-agent rm -rf /var/lib/ambari-metrics-grafana rm -rf /var/lib/ambari-server rm -rf /var/lib/flume rm -rf /var/lib/hadoop-hdfs rm -rf /var/lib/hadoop-mapreduce rm -rf /var/lib/hadoop-yarn rm -rf /var/lib/hive2 rm -rf /var/lib/knox rm -rf /var/lib/smartsense rm -rf /var/lib/storm
Remove PIDs on all Hosts
rm -rf /var/run/ambari-agent rm -rf /var/run/ambari-metrics-grafana rm -rf /var/run/ambari-server rm -rf /var/run/falcon rm -rf /var/run/flume rm -rf /var/run/hadoop rm -rf /var/run/hadoop-mapreduce rm -rf /var/run/hadoop-yarn rm -rf /var/run/hbase rm -rf /var/run/hive rm -rf /var/run/hive-hcatalog rm -rf /var/run/hive2 rm -rf /var/run/hst rm -rf /var/run/knox rm -rf /var/run/oozie rm -rf /var/run/webhcat rm -rf /var/run/zookeeper
Clean /var/tmp/* on all hosts
rm -rf /var/tmp/*
Delete HST from cron on all hosts of cluster
0 * * * * /usr/hdp/share/hst/bin/hst-scheduled-capture.sh sync 0 2 * * 0 /usr/hdp/share/hst/bin/hst-scheduled-capture.sh
yum remove mysql mysql-server yum erase postgresql rm -rf /var/lib/pgsql rm -rf /var/lib/mysql
Remove symlinks on all clusters hosts.
cd /usr/bin rm -rf accumulo rm -rf atlas-start rm -rf atlas-stop rm -rf beeline rm -rf falcon rm -rf flume-ng rm -rf hbase rm -rf hcat rm -rf hdfs rm -rf hive rm -rf hiveserver2 rm -rf kafka rm -rf mahout rm -rf mapred rm -rf oozie rm -rf oozied.sh rm -rf phoenix-psql rm -rf phoenix-queryserver rm -rf phoenix-sqlline rm -rf phoenix-sqlline-thin rm -rf pig rm -rf python-wrap rm -rf ranger-admin rm -rf ranger-admin-start rm -rf ranger-admin-stop rm -rf ranger-kms rm -rf ranger-usersync rm -rf ranger-usersync-start rm -rf ranger-usersync-stop rm -rf slider rm -rf sqoop rm -rf sqoop-codegen rm -rf sqoop-create-hive-table rm -rf sqoop-eval rm -rf sqoop-export rm -rf sqoop-help rm -rf sqoop-import rm -rf sqoop-import-all-tables rm -rf sqoop-job rm -rf sqoop-list-databases rm -rf sqoop-list-tables rm -rf sqoop-merge rm -rf sqoop-metastore rm -rf sqoop-version rm -rf storm rm -rf storm-slider rm -rf worker-lanucher rm -rf yarn rm -rf zookeeper-client rm -rf zookeeper-server rm -rf zookeeper-server-cleanup
if you wish to delete user either doesn’t use — skip users option ins host cleanup script or delete manually
userdel -r accumulo userdel -r ambari-qa userdel -r ams userdel -r falcon userdel -r flume userdel -r hbase userdel -r hcat userdel -r hdfs userdel -r hive userdel -r kafka userdel -r knox userdel -r mapred userdel -r oozie userdel -r ranger userdel -r spark userdel -r sqoop userdel -r storm userdel -r tez userdel -r yarn userdel -r zeppelin userdel -r zookeeper
Still, if you have more files and directories existing you can find them and delete them manually.
find / -name *ambari* find / -name *accumulo* find / -name *atlas* find / -name *beeline* find / -name *falcon* find / -name *flume* find / -name *hadoop* find / -name *hbase* find / -name *hcat* find / -name *hdfs* find / -name *hdp* find / -name *hive* find / -name *hiveserver2* find / -name *kafka* find / -name *mahout* find / -name *mapred* find / -name *oozie* find / -name *phoenix* find / -name *pig* find / -name *ranger* find / -name *slider* find / -name *sqoop* find / -name *storm* find / -name *yarn* find / -name *zookeeper*
If you wish to delete name node and data node directories as well you can delete them too, but remember this will delete all your existing data as well on data nodes.
rm -rf /hadoop/hdfs/namenode rm -rf /hadoop/hdfs/data
After performing all these steps your Hadoop cluster will be cleaned up and you are good to go with launch wizard of Ambari to setup fresh multi-node cluster setup.