Project: mapply

Sensible multi-core apply function for Pandas

Project Details

Latest version: 0.1.23
Home Page
PyPI Page: https://pypi.org/project/mapply/

Project Popularity

PageRank: 0.003722307758906079
Number of downloads: 57151

mapply

mapply provides a sensible multi-core apply function for Pandas.

mapply vs. pandarallel vs. swifter

Where pandarallel relies on in-house multiprocessing and progressbars, and hard-codes 1 chunk per worker (which will cause idle CPUs when one chunk happens to be more expensive than the others), swifter relies on the heavy dask framework for multiprocessing (converting to Dask DataFrames and back). In an attempt to find the golden mean, mapply is highly customizable and remains lightweight, using tqdm for progressbars and leveraging the powerful pathos framework, which shadows Python's built-in multiprocessing module using dill for universal pickling.

Installation

This pure-Python, OS independent package is available on PyPI:

$ pip install mapply

Usage

For documentation, see mapply.readthedocs.io.

import pandas as pd
import mapply

mapply.init(
    n_workers=-1,
    chunk_size=100,
    max_chunks_per_worker=8,
    progressbar=False
)

df = pd.DataFrame({"A": list(range(100))})

# avoid unnecessary multiprocessing:
# due to chunk_size=100, this will act as regular apply.
# set chunk_size=1 to skip this check and let max_chunks_per_worker decide.
df["squared"] = df.A.mapply(lambda x: x ** 2)

Development

Run make help for options like installing for development, linting, testing, and building docs.