Project: blosc2

Python wrapper for the C-Blosc2 library

Project Details

Latest version
2.4.0
Home Page
PyPI Page
https://pypi.org/project/blosc2/

Project Popularity

PageRank
0.005124063630470081
Number of downloads
480422

============= Python-Blosc2

A Python wrapper for the extremely fast Blosc2 compression library

:Author: The Blosc development team :Contact: blosc@blosc.org :Github: https://github.com/Blosc/python-blosc2 :Actions: |actions| :PyPi: |version| :NumFOCUS: |numfocus| :Code of Conduct: |Contributor Covenant|

.. |version| image:: https://img.shields.io/pypi/v/blosc2.png :target: https://pypi.python.org/pypi/blosc2 .. |Contributor Covenant| image:: https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg :target: https://github.com/Blosc/community/blob/master/code_of_conduct.md .. |numfocus| image:: https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A :target: https://numfocus.org .. |actions| image:: https://github.com/Blosc/python-blosc2/actions/workflows/build.yml/badge.svg :target: https://github.com/Blosc/python-blosc2/actions/workflows/build.yml

What it is

C-Blosc2 <https://github.com/Blosc/c-blosc2>_ is the new major version of C-Blosc <https://github.com/Blosc/c-blosc>_, and is backward compatible with both the C-Blosc1 API and its in-memory format. Python-Blosc2 is a Python package that wraps C-Blosc2, the newest version of the Blosc compressor.

Currently Python-Blosc2 already reproduces the API of Python-Blosc <https://github.com/Blosc/python-blosc>, so it can be used as a drop-in replacement. However, there are a few exceptions for a full compatibility. <https://github.com/Blosc/python-blosc2/blob/main/RELEASE_NOTES.md#changes-from-python-blosc-to-python-blosc2>

In addition, Python-Blosc2 aims to leverage the new C-Blosc2 API so as to support super-chunks, multi-dimensional arrays (NDArray <https://www.blosc.org/python-blosc2/reference/ndarray_api.html>_), serialization and other bells and whistles introduced in C-Blosc2. Although this is always and endless process, we have already catch up with most of the C-Blosc2 API capabilities.

Note: Python-Blosc2 is meant to be backward compatible with Python-Blosc data. That means that it can read data generated with Python-Blosc, but the opposite is not true (i.e. there is no forward compatibility).

SChunk: a 64-bit compressed store

SChunk is the simple data container that handles setting, expanding and getting data and metadata. Contrarily to chunks, a super-chunk can update and resize the data that it contains, supports user metadata, and it does not have the 2 GB storage limitation.

Additionally, you can convert a SChunk into a contiguous, serialized buffer (aka cframe <https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst>_) and vice-versa; as a bonus, the serialization/deserialization process also works with NumPy arrays and PyTorch/TensorFlow tensors at a blazing speed:

.. |compress| image:: https://github.com/Blosc/python-blosc2/blob/main/images/linspace-compress.png?raw=true :width: 100% :alt: Compression speed for different codecs

.. |decompress| image:: https://github.com/Blosc/python-blosc2/blob/main/images/linspace-decompress.png?raw=true :width: 100% :alt: Decompression speed for different codecs

+----------------+---------------+ | |compress| | |decompress| | +----------------+---------------+

while reaching excellent compression ratios:

.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/pack-array-cratios.png?raw=true :width: 75% :align: center :alt: Compression ratio for different codecs

Also, if you are a Mac M1/M2 owner, make you a favor and use its native arm64 arch (yes, we are distributing Mac arm64 wheels too; you are welcome ;-):

.. |pack_arm| image:: https://github.com/Blosc/python-blosc2/blob/main/images/M1-i386-vs-arm64-pack.png?raw=true :width: 100% :alt: Compression speed for different codecs on Apple M1

.. |unpack_arm| image:: https://github.com/Blosc/python-blosc2/blob/main/images/M1-i386-vs-arm64-unpack.png?raw=true :width: 100% :alt: Decompression speed for different codecs on Apple M1

+------------+--------------+ | |pack_arm| | |unpack_arm| | +------------+--------------+

Read more about SChunk features in our blog entry at: https://www.blosc.org/posts/python-blosc2-improvements

NDArray: an N-Dimensional store

One of the latest and more exciting additions in Python-Blosc2 is the NDArray <https://www.blosc.org/python-blosc2/reference/ndarray_api.html>_ object. It can write and read n-dimensional datasets in an extremely efficient way thanks to a n-dim 2-level partitioning, allowing to slice and dice arbitrary large and compressed data in a more fine-grained way:

.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/b2nd-2level-parts.png?raw=true :width: 75%

To wet you appetite, here it is how the NDArray object performs on getting slices orthogonal to the different axis of a 4-dim dataset:

.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/Read-Partial-Slices-B2ND.png?raw=true :width: 75%

We have blogged about this: https://www.blosc.org/posts/blosc2-ndim-intro

We also have a ~2 min explanatory video on why slicing in a pineapple-style (aka double partition) is useful <https://www.youtube.com/watch?v=LvP9zxMGBng>_:

.. image:: https://github.com/Blosc/blogsite/blob/master/files/images/slicing-pineapple-style.png?raw=true :width: 50% :alt: Slicing a dataset in pineapple-style :target: https://www.youtube.com/watch?v=LvP9zxMGBng

Installing

Blosc is now offering Python wheels for the main OS (Win, Mac and Linux) and platforms. You can install binary packages from PyPi using pip:

.. code-block:: console

pip install blosc2

Documentation

The documentation is here:

https://blosc.org/python-blosc2/python-blosc2.html

Also, some examples are available on:

https://github.com/Blosc/python-blosc2/tree/main/examples

Building from sources

python-blosc2 comes with the C-Blosc2 sources with it and can be built in-place:

.. code-block:: console

git clone https://github.com/Blosc/python-blosc2/
cd python-blosc2
git submodule update --init --recursive
python -m pip install -r requirements-build.txt
python setup.py build_ext --inplace

That's all. You can proceed with testing section now.

Testing

After compiling, you can quickly check that the package is sane by running the tests:

.. code-block:: console

python -m pip install -r requirements-tests.txt
python -m pytest  (add -v for verbose mode)

Benchmarking

If curious, you may want to run a small benchmark that compares a plain NumPy array copy against compression through different compressors in your Blosc build:

.. code-block:: console

 PYTHONPATH=. python bench/pack_compress.py

License

The software is licenses under a 3-Clause BSD license. A copy of the python-blosc2 license can be found in LICENSE.txt <https://github.com/Blosc/python-blosc2/tree/main/LICENSE.txt>_.

Mailing list

Discussion about this module is welcome in the Blosc list:

blosc@googlegroups.com

https://groups.google.es/group/blosc

Twitter

Please follow @Blosc2 <https://twitter.com/Blosc2>_ to get informed about the latest developments.

Citing Blosc

You can cite our work on the different libraries under the Blosc umbrella as:

.. code-block:: console

@ONLINE{blosc, author = {{Blosc Development Team}}, title = "{A fast, compressed and persistent data store library}", year = {2009-2023}, note = {https://blosc.org} }


Enjoy!