Project: spacy-alignments

A spaCy package for the Rust tokenizations library

Project Details

Latest version
0.9.1
Home Page
https://github.com/explosion/spacy-alignments
PyPI Page
https://pypi.org/project/spacy-alignments/

Project Popularity

PageRank
0.001640265895163621
Number of downloads
126169

spacy-alignments: Align tokenizations for spaCy + transformers

A spaCy package for Yohei Tamura's Rust tokenizations library with Python bindings.

Installation

pip install -U pip setuptools wheel
pip install spacy-alignments

If no binary wheel is available for your platform, you will need to install Rust in order to build spacy-alignments from source.

spacy-alignments vs. pytokenizations

The spacy_alignments module is a drop-in replacement for tokenizations:

import spacy_alignments as tokenizations
a2b, b2a = tokenizations.get_alignments(["å", "BC"], ["abc"])
assert a2b == [[0], [0]]
assert b2a == [[0, 1]]

The only difference between this package and the original pytokenizations is that it switches the build system to setuptools-rust to make it easier for us at Explosion to build source and binary packages for a wider range of platforms.

Bug reports and other issues

Please use spaCy's issue tracker to report a bug, or open a new thread on the discussion board for any other issue.