Python wrapper for the C-Blosc2 library
:Author: The Blosc development team :Contact: blosc@blosc.org :Github: https://github.com/Blosc/python-blosc2 :Actions: |actions| :PyPi: |version| :NumFOCUS: |numfocus| :Code of Conduct: |Contributor Covenant|
.. |version| image:: https://img.shields.io/pypi/v/blosc2.png :target: https://pypi.python.org/pypi/blosc2 .. |Contributor Covenant| image:: https://img.shields.io/badge/Contributor%20Covenant-v2.0%20adopted-ff69b4.svg :target: https://github.com/Blosc/community/blob/master/code_of_conduct.md .. |numfocus| image:: https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A :target: https://numfocus.org .. |actions| image:: https://github.com/Blosc/python-blosc2/actions/workflows/build.yml/badge.svg :target: https://github.com/Blosc/python-blosc2/actions/workflows/build.yml
C-Blosc2 <https://github.com/Blosc/c-blosc2>
_ is the new major version of
C-Blosc <https://github.com/Blosc/c-blosc>
_, and is backward compatible with
both the C-Blosc1 API and its in-memory format. Python-Blosc2 is a Python package
that wraps C-Blosc2, the newest version of the Blosc compressor.
Currently Python-Blosc2 already reproduces the API of
Python-Blosc <https://github.com/Blosc/python-blosc>
, so it can be
used as a drop-in replacement. However, there are a few exceptions for a full compatibility. <https://github.com/Blosc/python-blosc2/blob/main/RELEASE_NOTES.md#changes-from-python-blosc-to-python-blosc2>
In addition, Python-Blosc2 aims to leverage the new C-Blosc2 API so as to support
super-chunks, multi-dimensional arrays
(NDArray <https://www.blosc.org/python-blosc2/reference/ndarray_api.html>
_),
serialization and other bells and whistles introduced in C-Blosc2. Although
this is always and endless process, we have already catch up with most of the
C-Blosc2 API capabilities.
Note: Python-Blosc2 is meant to be backward compatible with Python-Blosc data. That means that it can read data generated with Python-Blosc, but the opposite is not true (i.e. there is no forward compatibility).
SChunk
is the simple data container that handles setting, expanding and getting
data and metadata. Contrarily to chunks, a super-chunk can update and resize the data
that it contains, supports user metadata, and it does not have the 2 GB storage limitation.
Additionally, you can convert a SChunk into a contiguous, serialized buffer (aka
cframe <https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst>
_)
and vice-versa; as a bonus, the serialization/deserialization process also works with NumPy
arrays and PyTorch/TensorFlow tensors at a blazing speed:
.. |compress| image:: https://github.com/Blosc/python-blosc2/blob/main/images/linspace-compress.png?raw=true :width: 100% :alt: Compression speed for different codecs
.. |decompress| image:: https://github.com/Blosc/python-blosc2/blob/main/images/linspace-decompress.png?raw=true :width: 100% :alt: Decompression speed for different codecs
+----------------+---------------+ | |compress| | |decompress| | +----------------+---------------+
while reaching excellent compression ratios:
.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/pack-array-cratios.png?raw=true :width: 75% :align: center :alt: Compression ratio for different codecs
Also, if you are a Mac M1/M2 owner, make you a favor and use its native arm64 arch (yes, we are distributing Mac arm64 wheels too; you are welcome ;-):
.. |pack_arm| image:: https://github.com/Blosc/python-blosc2/blob/main/images/M1-i386-vs-arm64-pack.png?raw=true :width: 100% :alt: Compression speed for different codecs on Apple M1
.. |unpack_arm| image:: https://github.com/Blosc/python-blosc2/blob/main/images/M1-i386-vs-arm64-unpack.png?raw=true :width: 100% :alt: Decompression speed for different codecs on Apple M1
+------------+--------------+ | |pack_arm| | |unpack_arm| | +------------+--------------+
Read more about SChunk
features in our blog entry at: https://www.blosc.org/posts/python-blosc2-improvements
One of the latest and more exciting additions in Python-Blosc2 is the
NDArray <https://www.blosc.org/python-blosc2/reference/ndarray_api.html>
_ object.
It can write and read n-dimensional datasets in an extremely efficient way thanks
to a n-dim 2-level partitioning, allowing to slice and dice arbitrary large and
compressed data in a more fine-grained way:
.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/b2nd-2level-parts.png?raw=true :width: 75%
To wet you appetite, here it is how the NDArray
object performs on getting slices
orthogonal to the different axis of a 4-dim dataset:
.. image:: https://github.com/Blosc/python-blosc2/blob/main/images/Read-Partial-Slices-B2ND.png?raw=true :width: 75%
We have blogged about this: https://www.blosc.org/posts/blosc2-ndim-intro
We also have a ~2 min explanatory video on why slicing in a pineapple-style (aka double partition) is useful <https://www.youtube.com/watch?v=LvP9zxMGBng>
_:
.. image:: https://github.com/Blosc/blogsite/blob/master/files/images/slicing-pineapple-style.png?raw=true :width: 50% :alt: Slicing a dataset in pineapple-style :target: https://www.youtube.com/watch?v=LvP9zxMGBng
Blosc is now offering Python wheels for the main OS (Win, Mac and Linux) and platforms.
You can install binary packages from PyPi using pip
:
.. code-block:: console
pip install blosc2
The documentation is here:
https://blosc.org/python-blosc2/python-blosc2.html
Also, some examples are available on:
https://github.com/Blosc/python-blosc2/tree/main/examples
python-blosc2
comes with the C-Blosc2 sources with it and can be built in-place:
.. code-block:: console
git clone https://github.com/Blosc/python-blosc2/
cd python-blosc2
git submodule update --init --recursive
python -m pip install -r requirements-build.txt
python setup.py build_ext --inplace
That's all. You can proceed with testing section now.
After compiling, you can quickly check that the package is sane by running the tests:
.. code-block:: console
python -m pip install -r requirements-tests.txt
python -m pytest (add -v for verbose mode)
If curious, you may want to run a small benchmark that compares a plain NumPy array copy against compression through different compressors in your Blosc build:
.. code-block:: console
PYTHONPATH=. python bench/pack_compress.py
The software is licenses under a 3-Clause BSD license. A copy of the
python-blosc2 license can be found in LICENSE.txt <https://github.com/Blosc/python-blosc2/tree/main/LICENSE.txt>
_.
Discussion about this module is welcome in the Blosc list:
blosc@googlegroups.com
https://groups.google.es/group/blosc
Please follow @Blosc2 <https://twitter.com/Blosc2>
_ to get informed about the latest developments.
You can cite our work on the different libraries under the Blosc umbrella as:
.. code-block:: console
@ONLINE{blosc, author = {{Blosc Development Team}}, title = "{A fast, compressed and persistent data store library}", year = {2009-2023}, note = {https://blosc.org} }
Enjoy!