HdfsCLI: API and command line interface for HDFS.
API and command line interface for HDFS.
$ hdfscli --alias=dev
Welcome to the interactive HDFS python shell.
The HDFS client is available as `CLIENT`.
In [1]: CLIENT.list('models/')
Out[1]: ['1.json', '2.json']
In [2]: CLIENT.status('models/2.json')
Out[2]: {
'accessTime': 1439743128690,
'blockSize': 134217728,
'childrenNum': 0,
'fileId': 16389,
'group': 'supergroup',
'length': 48,
'modificationTime': 1439743129392,
'owner': 'drwho',
'pathSuffix': '',
'permission': '755',
'replication': 1,
'storagePolicy': 0,
'type': 'FILE'
}
In [3]: with CLIENT.read('models/2.json', encoding='utf-8') as reader:
...: from json import load
...: model = load(reader)
...:
Python 3 bindings for the WebHDFS (and HttpFS) API, supporting both secure and insecure clusters.
Command line interface to transfer files and start an interactive client shell, with aliases for convenient namenode URL caching.
Additional functionality through optional extensions:
avro
, to read and write Avro files directly from HDFS.dataframe
, to load and save Pandas dataframes.kerberos
, to support Kerberos authenticated clusters.See the documentation to learn more.
$ pip install hdfs
Then hop on over to the quickstart guide. A Conda feedstock is also available.
HdfsCLI is tested against both WebHDFS and HttpFS. There are two ways
of running tests (see scripts/
for helpers to set up a test HDFS cluster):
$ HDFSCLI_TEST_URL=http://localhost:50070 pytest # Using a namenode's URL.
$ HDFSCLI_TEST_ALIAS=dev pytest # Using an alias.
We'd love to hear what you think on the issues page. Pull requests are also most welcome!