Hadoop FS Shell Expunge: Optimizing HDFS Storage with Ease

LabEx
2 min readJun 26, 2024

--

Cover

Introduction

Welcome to our exciting lab set in an interstellar base where you play the role of a skilled intergalactic communicator. In this scenario, you are tasked with managing the Hadoop HDFS using the FS Shell expunge command to maintain data integrity and optimize storage utilization. Your mission is to ensure the efficient cleanup of unnecessary files and directories to free up storage space and improve system performance.

Enabling and Configuring the HDFS Trash Feature

In this step, let’s start by accessing the Hadoop FS Shell and examining the current files and directories in the Hadoop Distributed File System.

  1. Open the terminal and switch to the hadoop user:
  • su - hadoop
  1. Modifying /home/hadoop/hadoop/etc/hadoop/core-site.xml to enable the Trash feature:
  • nano /home/hadoop/hadoop/etc/hadoop/core-site.xml
  1. Add the following property between the <configuration> tags:
  • <property> <name>fs.trash.interval</name> <value>1440</value> </property> <property> <name>fs.trash.checkpoint.interval</name> <value>1440</value> </property>
  1. Save the file and exit the text editor.
  2. restart the HDFS service:
  3. Stop the HDFS service:
  • /home/hadoop/hadoop/sbin/stop-dfs.sh
  1. Start the HDFS service:
  • /home/hadoop/hadoop/sbin/start-dfs.sh
  1. Create a file and delete it in the HDFS:
  2. Create a file in the HDFS:
  • hdfs dfs -touchz /user/hadoop/test.txt
  1. Delete the file:
  • hdfs dfs -rm /user/hadoop/test.txt
  1. Check if the Trash feature is enabled:
  • hdfs dfs -ls /user/hadoop/.Trash/Current/user/hadoop/
  1. You should see the file you deleted in the Trash directory.

Expunge Unnecessary Files

Now, let’s proceed to expunge unnecessary files and directories using the FS Shell expunge command.

  1. Expunge all the trash checkpoints:
  • hdfs dfs -expunge -immediate
  1. Verify that the unnecessary files are successfully expunged:
  • hdfs dfs -ls /user/hadoop/.Trash
  1. There should be no files or directories listed.

Summary

In this lab, we delved into the power of the Hadoop FS Shell expunge command to manage and optimize data storage in the Hadoop Distributed File System. By learning how to initiate the FS Shell, view current files, and expunge unnecessary data, you have gained valuable insights into maintaining data integrity and enhancing system performance. Practicing these skills will equip you to efficiently manage your Hadoop environment and ensure smooth operations.

Want to learn more?

Join our Discord or tweet us @WeAreLabEx ! 😄

--

--

LabEx
LabEx

Written by LabEx

LabEx is an AI-assisted, hands-on learning platform for tech enthusiasts, covering Programming, Data Science, Linux and other areas.

No responses yet