BigData and HadoopTechnology

Completely Uninstall and Delete Hadoop From Hosts-Hortonworks

In the past, I have written an article about creating a multi-node Hadoop cluster setup from scratch.

After installing Ambari and completing Hadoop multi-node cluster setup, sometimes you need to uninstall Hadoop components and re-install the Hadoop cluster from scratch. However, I know these times are very rare especially in production. Still, during the learning curve, it is very important to know how we can uninstall services/components completely from the Hadoop cluster.

It looks easy if you are aware of it already but if you are a new learner then you will surely find it unclear what things you need to do to completely scratch the Hadoop cluster.

Ambari is a cluster management tool that has made the administrator’s life quite easy in terms of effort by providing GUI based tools and automated wizards to complete the administrator’s tasks.

Still, there are many things which you need to do, to completely scratch a Hadoop cluster, it is not only about stopping the services and deleting Hadoop components from Ambari.

So let’s see how we can delete the Hadoop cluster completely from the system.

To save some time, we will not hesitate to use Ambari to stop services and delete components.

Delete a Service

First of all, you have to stop the services running on all hosts, this you can easily do via Ambari console.

  1. Click on Services-> Go to service action ->stop

Stop Service Using Ambari

Doing this will stop the particular service in all hosts.

2. Delete the Service

after service is stopped successfully you can follow the same menu as in step 1 and delete the stopped service.

After this service will be deleted successfully, due to any reason if service is not deleting it could happen due to the following reasons.

  1. Service is not stopped successfully, in this case, log in to a particular host and stop all the services in all hosts manually using CLI.
  2. Service is stopped but admin is not able to delete it, this could happen due to the parent/dependent process. Ambari will throw a message for such cases and will let you know about deleting the parent/dependent service first and then delete the child.

for e.g, Yarn and Mapreduce services are dependant, you have to delete MapReduce before YARN.

Cleaning the Host

Now the service is deleted from the host but there are certain directories and users still available which Ambari created during wizard installation, those will still persist in the system. you have to clean every host of the cluster before reinstalling the components.

Ambari provide an automated way of cleaning the hosts, using this script you can delete all home directories, config directories, and log directories and users in one go. If you wish to retain users that Ambari created in the last installation you can retain them with the “ — skip=users” option.

Run below script in each host of the Hadoop Cluster

python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py --silent --skip=users

The above script will retain users as mentioned earlier but delete packages and all directories. However, some directories may not be deleted even with a script that you can delete manually.

If some packages are also not deleted with above script you have to remove them manually using package manager.

yum remove hive\*
yum remove oozie\*
yum remove pig\*
yum remove zookeeper\*
yum remove tez\*
yum remove hbase\*
yum remove ranger\*
yum remove knox\*
yum remove storm\*
yum remove accumulo\*
yum remove falcon\*
yum remove ambari-metrics-hadoop-sink 
yum remove smartsense-hst
yum remove slider_2_4_2_0_258
yum remove ambari-metrics-monitor
yum remove spark2_2_5_3_0_37-yarn-shuffle
yum remove spark_2_5_3_0_37-yarn-shuffle
yum remove ambari-infra-solr-client

Here we are not deleting Ambari but just resetting it, if you wish to delete Ambari you can do that as well it totally depends on your requirement.

To erase Ambari

ambari-server stop
ambari-agent stop
yum erase ambari-server
yum erase ambari-agent

Retain Ambari, Just Erase DB and its Data.

ambari-server stop
ambari-server reset

Setup Fresh Ambari DB

ambari-server setup
ambari-server start

Now you can log in into Ambari Url using https://Host-IP:8080 with default username and password. What you will be able to see a fresh Ambari Console to Launch new installation wizard.

Remove log directories on all hosts

rm -rf /var/log/ambari-agent
rm -rf /var/log/ambari-metrics-grafana
rm -rf /var/log/ambari-metrics-monitor
rm -rf /var/log/ambari-server/
rm -rf /var/log/falcon
rm -rf /var/log/flume
rm -rf /var/log/hadoop
rm -rf /var/log/hadoop-mapreduce
rm -rf /var/log/hadoop-yarn
rm -rf /var/log/hive
rm -rf /var/log/hive-hcatalog
rm -rf /var/log/hive2
rm -rf /var/log/hst
rm -rf /var/log/knox
rm -rf /var/log/oozie
rm -rf /var/log/solr
rm -rf /var/log/zookeeper

Remove Hadoop directories including HDFS data on all hosts

rm -rf /hadoop/*
rm -rf /hdfs/hadoop
rm -rf /hdfs/lost+found
rm -rf /hdfs/var
rm -rf /local/opt/hadoop
rm -rf /tmp/hadoop
rm -rf /usr/bin/hadoop
rm -rf /usr/hdp
rm -rf /var/hadoop

Remove config directories on all hosts

rm -rf /etc/ambari-agent
rm -rf /etc/ambari-metrics-grafana
rm -rf /etc/ambari-server
rm -rf /etc/ams-hbase
rm -rf /etc/falcon
rm -rf /etc/flume
rm -rf /etc/hadoop
rm -rf /etc/hadoop-httpfs
rm -rf /etc/hbase
rm -rf /etc/hive 
rm -rf /etc/hive-hcatalog
rm -rf /etc/hive-webhcat
rm -rf /etc/hive2
rm -rf /etc/hst
rm -rf /etc/knox 
rm -rf /etc/livy
rm -rf /etc/mahout 
rm -rf /etc/oozie
rm -rf /etc/phoenix
rm -rf /etc/pig 
rm -rf /etc/ranger-admin
rm -rf /etc/ranger-usersync
rm -rf /etc/spark2
rm -rf /etc/tez
rm -rf /etc/tez_hive2
rm -rf /etc/zookeeper

Remove library folders on cluster Hosts

rm -rf /usr/lib/ambari-agent
rm -rf /usr/lib/ambari-infra-solr-client
rm -rf /usr/lib/ambari-metrics-hadoop-sink
rm -rf /usr/lib/ambari-metrics-kafka-sink
rm -rf /usr/lib/ambari-server-backups
rm -rf /usr/lib/ams-hbase
rm -rf /usr/lib/mysql
rm -rf /var/lib/ambari-agent
rm -rf /var/lib/ambari-metrics-grafana
rm -rf /var/lib/ambari-server
rm -rf /var/lib/flume
rm -rf /var/lib/hadoop-hdfs
rm -rf /var/lib/hadoop-mapreduce
rm -rf /var/lib/hadoop-yarn 
rm -rf /var/lib/hive2
rm -rf /var/lib/knox
rm -rf /var/lib/smartsense
rm -rf /var/lib/storm

Remove PIDs on all Hosts

rm -rf /var/run/ambari-agent
rm -rf /var/run/ambari-metrics-grafana
rm -rf /var/run/ambari-server
rm -rf /var/run/falcon
rm -rf /var/run/flume
rm -rf /var/run/hadoop 
rm -rf /var/run/hadoop-mapreduce
rm -rf /var/run/hadoop-yarn
rm -rf /var/run/hbase
rm -rf /var/run/hive
rm -rf /var/run/hive-hcatalog
rm -rf /var/run/hive2
rm -rf /var/run/hst
rm -rf /var/run/knox
rm -rf /var/run/oozie 
rm -rf /var/run/webhcat
rm -rf /var/run/zookeeper

Clean /var/tmp/* on all hosts

rm -rf /var/tmp/*

Delete HST from cron on all hosts of cluster

0 * * * * /usr/hdp/share/hst/bin/hst-scheduled-capture.sh sync
0 2 * * 0 /usr/hdp/share/hst/bin/hst-scheduled-capture.sh

Remove DB

yum remove mysql mysql-server
yum erase postgresql
rm -rf /var/lib/pgsql
rm -rf /var/lib/mysql

Remove symlinks on all clusters hosts.

cd /usr/bin
rm -rf accumulo
rm -rf atlas-start
rm -rf atlas-stop
rm -rf beeline
rm -rf falcon
rm -rf flume-ng
rm -rf hbase
rm -rf hcat
rm -rf hdfs
rm -rf hive
rm -rf hiveserver2
rm -rf kafka
rm -rf mahout
rm -rf mapred
rm -rf oozie
rm -rf oozied.sh
rm -rf phoenix-psql
rm -rf phoenix-queryserver
rm -rf phoenix-sqlline
rm -rf phoenix-sqlline-thin
rm -rf pig
rm -rf python-wrap
rm -rf ranger-admin
rm -rf ranger-admin-start
rm -rf ranger-admin-stop
rm -rf ranger-kms
rm -rf ranger-usersync
rm -rf ranger-usersync-start
rm -rf ranger-usersync-stop
rm -rf slider
rm -rf sqoop
rm -rf sqoop-codegen
rm -rf sqoop-create-hive-table
rm -rf sqoop-eval
rm -rf sqoop-export
rm -rf sqoop-help
rm -rf sqoop-import
rm -rf sqoop-import-all-tables
rm -rf sqoop-job
rm -rf sqoop-list-databases
rm -rf sqoop-list-tables
rm -rf sqoop-merge
rm -rf sqoop-metastore
rm -rf sqoop-version
rm -rf storm
rm -rf storm-slider
rm -rf worker-lanucher
rm -rf yarn
rm -rf zookeeper-client
rm -rf zookeeper-server
rm -rf zookeeper-server-cleanup

if you wish to delete user either doesn’t use — skip users option ins host cleanup script or delete manually

userdel -r accumulo
userdel -r ambari-qa
userdel -r ams
userdel -r falcon
userdel -r flume
userdel -r hbase
userdel -r hcat
userdel -r hdfs
userdel -r hive
userdel -r kafka
userdel -r knox
userdel -r mapred
userdel -r oozie
userdel -r ranger
userdel -r spark
userdel -r sqoop
userdel -r storm
userdel -r tez
userdel -r yarn
userdel -r zeppelin
userdel -r zookeeper

Still, if you have more files and directories existing you can find them and delete them manually.

find / -name *ambari*
find / -name *accumulo*
find / -name *atlas*
find / -name *beeline*
find / -name *falcon*
find / -name *flume*
find / -name *hadoop*
find / -name *hbase*
find / -name *hcat*
find / -name *hdfs*
find / -name *hdp*
find / -name *hive*
find / -name *hiveserver2*
find / -name *kafka*
find / -name *mahout*
find / -name *mapred*
find / -name *oozie*
find / -name *phoenix*
find / -name *pig*
find / -name *ranger*
find / -name *slider*
find / -name *sqoop*
find / -name *storm*
find / -name *yarn*
find / -name *zookeeper*

If you wish to delete name node and data node directories as well you can delete them too, but remember this will delete all your existing data as well on data nodes.

rm -rf /hadoop/hdfs/namenode
rm -rf /hadoop/hdfs/data

After performing all these steps your Hadoop cluster will be cleaned up and you are good to go with launch wizard of Ambari to setup fresh multi-node cluster setup.

Show More

Related Articles

Leave a Reply

Back to top button