Deleting Files in HDFS using Python Snakebite
Last Updated :
14 Oct, 2020
Prerequisite: Hadoop Installation, HDFS
Python Snakebite is a very popular Python library we can use to communicate with the HDFS. Using the Python client library provided by the Snakebite package we can easily write python code that works on HDFS. It uses protobuf messages to communicate directly with the NameNode. The python client library directly works with HDFS without making a system call to hdfs dfs. The Snakebite doesn’t support python3.
Deleting Files and Directories
In Python Snakebite there is a method named delete() through which we can easily delete the multiple files or directories available in our HDFS. We will use the python client library to perform the deletion. So, let’s start with the Hands-on.
All the Hadoop Daemon should be running. You can start Hadoop Daemons with the help of the below command.
start-dfs.sh // start your namenode datanode and secondary namenode
start-yarn.sh // start resourcemanager and nodemanager

Task: Recursively Deleting files and directory’s available on HDFS (In my case I am removing ‘/demo/demo1’ and ‘/demo2’ directory’s).
Step 1: Let’s see the files and directory that are available in HDFS with the help of the below command.
hdfs dfs -ls /
In the above command hdfs dfs is used to communicate particularly with the Hadoop Distributed File System. ‘ -ls / ‘ is used for listing the file present in the root directory. We can also check the files manually available in HDFS.

Step 2: Create a file in your local directory with the name remove_directory.py at the desired location.
cd Documents/ # Changing directory to Documents(You can choose as per your requirement)
touch remove_directory.py # touch command is used to create file in linux enviournment.

Step 3: Write the below code in the remove_directory.py python file.
Python
from snakebite.client import Client
client = Client( 'localhost' , 9000 )
for p in client.delete([ '/demo' , '/demo2' ], recurse = True ):
print p
|
In the above program recurse=True states that the directory will be deleted recursively means if the directory is not empty and it contains some sub-directory’s then those subdirectories will also be removed. In our case /demo1 will be deleted first then the /demo directory will be removed.
Client() method explanation:
The Client() method can accept all the below listed arguments:
- host(string): IP Address of NameNode.
- port(int): RPC port of Namenode.
- hadoop_version (int): Hadoop protocol version(by default it is: 9)
- use_trash (boolean): Use trash when removing the files.
- effective_use (string): Effective user for the HDFS operations (default user is current user).

In case if the file name we are specifying will not found then the delete() method will throw FileNotFoundException. If the directory contains some subdirectory and recurse=True is not mentioned DirectoryException will be thrown by the delete() method.
Step 4: Run the remove_directory.py file and observe the result.
python remove_directory.py // this will remove directory's recursively as mentioned in delete() argument

In the above image ‘result’ :True states that we have successfully removed the directory.
Step 5: We can check the directories are removed or not either visiting manually or with the below command.
hdfs dfs -ls /

Now we can see that the /demo and /demo2 are no more available on HDFS.
Similar Reads
Creating Files in HDFS using Python Snakebite
Hadoop is a popular big data framework written in Java. But it is not necessary to use Java for working on Hadoop. Some other programming languages like Python, C++ can also be used. We can write C++ code for Hadoop using pipes API or Hadoop pipes. Hadoop pipes enable task-tracker with the help of s
3 min read
Retrieving File Data From HDFS using Python Snakebite
Prerequisite: Hadoop Installation, HDFS Python Snakebite is a very popular Python library that we can use to communicate with the HDFS. Using the Python client library provided by the Snakebite package we can easily write Python code that works on HDFS. It uses protobuf messages to communicate direc
3 min read
Deleting a User in Linux using Python Script
Deleting a user from your system or server via a python script is a very easy task. You just need to pass the username of the user and the script will remove the details and all the files of that user.This python script uses userdel Linux command to delete the user.You can directly use userdel comma
2 min read
Deleting Element from Table in MySql using Python
Prerequisite: Python: MySQL Create Table In this article, we are going to see how to get the size of a table in MySQL using Python. Python allows the integration of a wide range of database servers with applications. A database interface is required to access a database from Python. MySQL Connector-
2 min read
How to Delete files in Python using send2trash module?
In this article, we will see how to safely delete files and folders using the send2trash module in Python. Using send2trash, we can send files to the Trash or Recycle Bin instead of permanently deleting them. The OS module's unlink(), remove() and rmdir() functions can be used to delete files or fol
2 min read
Deleting Duplicate Files Using Python
In this article, we are going to use a concept called hashing to identify unique files and delete duplicate files using Python. Modules required:tkinter: We need to make a way for us to select the folder in which we want to do this cleaning process so every time we run the code we should get a file
6 min read
How to delete a CSV file in Python?
In this article, we are going to delete a CSV file in Python. CSV (Comma-separated values file) is the most commonly used file format to handle tabular data. The data values are separated by, (comma). The first line gives the names of the columns and after the next line the values of each column. Ap
2 min read
Reading binary files in Python
Reading binary files means reading data that is stored in a binary format, which is not human-readable. Unlike text files, which store data as readable characters, binary files store data as raw bytes. Binary files store data as a sequence of bytes. Each byte can represent a wide range of values, fr
6 min read
Unzipping files in Python
In this article we will see how to unzip the files in python we can achieve this functionality by using zipfile module in Python. What is a zip file ZIP file is a file format that is used for compressing multiple files together into a single file. It is used in an archive file format that supports l
3 min read
Python VLC Instance â Deleting Single Media
In this article we will see how we can delete the single media from the Instance class in the python vlc module. VLC media player is a free and open-source portable cross-platform media player software and streaming media server developed by the VideoLAN project. Instance act as a main object of the
2 min read