Get list of files from hdfs (hadoop) directory using python script

How to get a list of files from hdfs (hadoop) directory using python script?

I have tried with following line:

dir = sc.textFile("hdfs://").collect()

The directory have list of files "file1,file2,file3....fileN". By using the line i got all the content list only. But i need to get list of file names.

Can anyone please help me to find out this problem?

Thanks in advance.


Use subprocess

import subprocess
p = subprocess.Popen("hdfs dfs -ls <HDFS Location> |  awk '{print $8}",

for line in p.stdout.readlines():
    print line

EDIT: Answer without python. The first option can be used to recursively print all the sub-directories as well. The last redirect statement can be omitted or changed based on your requirement.

hdfs dfs -ls -R <HDFS LOCATION> | awk '{print $8}' > output.txt
hdfs dfs -ls <HDFS LOCATION> | awk '{print $8}' > output.txt

import subprocess

path = "/data"
args = "hdfs dfs -ls "+path+" | awk '{print $8}'"
proc = subprocess.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)

s_output, s_err = proc.communicate()
all_dart_dirs = s_output.split() #stores list of files and sub-directories in 'path'

you can use the listdir function in the os library files = os.listdir(path)

Need Your Help

MediaWiki Wiki Custom footer depending on category

footer mediawiki wiki wikimedia

I was wondering if there was a way, to show certain parts of my footer, only when in certain categories.

On being Invoked by KeyPress, my gameobject appears alright, but does not get invoked again on key press?

unity3d keypress invoke keydown object-pooling

I was instantiating and destroying my gameobjects earlier but on learning that it is processor intensive, manipulated my code so that it is an object pool of 1 object. When I press Z, my object gets