Project: django-pandas

Tools for working with pydata.pandas in your Django projects

Project Details

Latest version
0.6.6
Home Page
https://github.com/chrisdev/django-pandas/
PyPI Page
https://pypi.org/project/django-pandas/

Project Popularity

PageRank
0.002200133394749538
Number of downloads
63339

============== Django Pandas

.. image:: https://github.com/chrisdev/django-pandas/actions/workflows/test.yml/badge.svg :target: https://github.com/chrisdev/django-pandas/actions/workflows/test.yml

.. image:: https://coveralls.io/repos/chrisdev/django-pandas/badge.png?branch=master :target: https://coveralls.io/r/chrisdev/django-pandas

Tools for working with pandas <http://pandas.pydata.org>_ in your Django projects

Contributors

  • Christopher Clarke <https://github.com/chrisdev>_
  • Bertrand Bordage <https://github.com/BertrandBordage>_
  • Guillaume Thomas <https://github.com/gtnx>_
  • Parbhat Puri <https://parbhatpuri.com/>_
  • Fredrik Burman (coachHIPPO) <https://www.coachhippo.com>_
  • Safe Hammad <http://safehammad.com>_
  • Jeff Sternber <https://www.linkedin.com/in/jeffsternberg>_
  • @MiddleFork <https://github.com/MiddleFork>_
  • Daniel Andrlik <https://github.com/andrlik>_
  • Kevin Abbot <https://github.com/kgabbott>_
  • Yousuf Jawwad <https://github.com/ysfjwd>_
  • @henhuy <https://github.com/henhuy>_
  • Hélio Meira Lins <https://github.com/meiralins>_
  • @utpyngo <https://github.com/utpyngo>_
  • Anthony Monthe <https://github.com/ZuluPro>_
  • Vincent Toupet <https://github.com/vtoupet>_
  • Anton Ian Sipos <https://github.com/aisipos>_
  • Thomas Grainger <https://github.com/graingert/>_

What's New

Version 0.6.5 added support for Pandas >= 1.3 and fixes a number of other issues.

Dependencies

django-pandas supports Django_ (>=1.4.5) or later and requires django-model-utils_ (>= 1.4.0) and Pandas_ (>= 0.12.0). Note because of problems with the requires directive of setuptools you probably need to install numpy in your virtualenv before you install this package or if you want to run the test suite ::

pip install numpy
pip install -e .[test]
python runtests.py

Some pandas functionality requires parts of the Scipy stack. You may wish to consult http://www.scipy.org/install.html for more information on installing the Scipy stack.

You need to install your preferred version of Django. as that Django 2 does not support Python 2.

.. _Django: http://djangoproject.com/ .. _django-model-utils: http://pypi.python.org/pypi/django-model-utils .. _Pandas: http://pandas.pydata.org

Contributing

Please file bugs and send pull requests to the GitHub repository_ and issue tracker_.

.. _GitHub repository: https://github.com/chrisdev/django-pandas/ .. _issue tracker: https://github.com/chrisdev/django-pandas/issues

Installation

Start by creating a new virtualenv for your project ::

mkvirtualenv myproject

Next install numpy and pandas and optionally scipy ::

pip install numpy
pip install pandas

You may want to consult the scipy documentation_ for more information on installing the Scipy stack.

.. _scipy documentation: http://www.scipy.org/install.html

Finally, install django-pandas using pip::

pip install django-pandas

or install the development version from github ::

pip install https://github.com/chrisdev/django-pandas/tarball/master

Usage

IO Module

The django-pandas.io module provides some convenience methods to facilitate the creation of DataFrames from Django QuerySets.

read_frame ^^^^^^^^^^^

Parameters

- qs: A Django QuerySet.

- fieldnames: A list of model field names to use in creating the ``DataFrame``.
              You can span a relationship in the usual Django way
              by using  double underscores to specify a related field
              in another model

- index_col: Use specify the field name to use  for the ``DataFrame`` index.
             If the index
             field is not in the field list it will be appended

- coerce_float : Boolean, defaults to True
                 Attempt to convert values to non-string,
                 non-numeric objects (like decimal.Decimal)
                 to floating point.

- verbose:  If  this is ``True`` then populate the DataFrame with the
            human readable versions of any foreign key or choice fields
            else use the actual values set in the model.

- column_names: If not None, use to override the column names in the
                DateFrame

Examples ^^^^^^^^^ Assume that this is your model::

class MyModel(models.Model):

    full_name = models.CharField(max_length=25)
    age = models.IntegerField()
    department = models.CharField(max_length=3)
    wage = models.FloatField()

First create a query set::

from django_pandas.io import read_frame
qs = MyModel.objects.all()

To create a dataframe using all the fields in the underlying model ::

df = read_frame(qs)

The df will contain human readable column values for foreign key and choice fields. The DataFrame will include all the fields in the underlying model including the primary key. To create a DataFrame using specified field names::

 df = read_frame(qs, fieldnames=['age', 'wage', 'full_name'])

To set full_name as the DataFrame index ::

qs.to_dataframe(['age', 'wage'], index_col='full_name'])

You can use filters and excludes ::

qs.filter(age__gt=20, department='IT').to_dataframe(index_col='full_name')

DataFrameManager

django-pandas provides a custom manager to use with models that you want to render as Pandas Dataframes. The DataFrameManager manager provides the to_dataframe method that returns your models queryset as a Pandas DataFrame. To use the DataFrameManager, first override the default manager (objects) in your model's definition as shown in the example below ::

#models.py

from django_pandas.managers import DataFrameManager

class MyModel(models.Model):

    full_name = models.CharField(max_length=25)
    age = models.IntegerField()
    department = models.CharField(max_length=3)
    wage = models.FloatField()

    objects = DataFrameManager()

This will give you access to the following QuerySet methods:

- ``to_dataframe``
- ``to_timeseries``
- ``to_pivot_table``

to_dataframe ^^^^^^^^^^^^^

Returns a DataFrame from the QuerySet

Parameters

- fieldnames:  The model field names to utilise in creating the frame.
            to span a relationship, use the field name of related
            fields across models, separated by double underscores,


- index: specify the field to use  for the index. If the index
            field is not in the field list it will be appended

- coerce_float: Attempt to convert the numeric non-string data
                like object, decimal etc. to float if possible

- verbose:  If  this is ``True`` then populate the DataFrame with the
            human readable versions of any foreign key or choice fields
            else use the actual value set in the model.

Examples ^^^^^^^^^

Create a dataframe using all the fields in your model as follows ::

qs = MyModel.objects.all()

df = qs.to_dataframe()

This will include your primary key. To create a DataFrame using specified field names::

 df = qs.to_dataframe(fieldnames=['age', 'department', 'wage'])

To set full_name as the index ::

qs.to_dataframe(['age', 'department', 'wage'], index='full_name'])

You can use filters and excludes ::

qs.filter(age__gt=20, department='IT').to_dataframe(index='full_name')

to_timeseries

A convenience method for creating a time series i.e the DataFrame index is instance of a DateTime or PeriodIndex

Parameters

- fieldnames:  The model field names to utilise in creating the frame.
    to span a relationship, just use the field name of related
    fields across models, separated by double underscores,

- index: specify the field to use  for the index. If the index
    field is not in the field list it will be appended. This
    is mandatory.

- storage:  Specify if the queryset uses the `wide` or `long` format
    for data.

-  pivot_columns: Required once the you specify `long` format
    storage. This could either be a list or string identifying
    the field name or combination of field. If the pivot_column
    is a single column then the unique values in this column become
    a new columns in the DataFrame
    If the pivot column is a list the values in these columns are
    concatenated (using the '-' as a separator)
    and these values are used for the new timeseries columns

- values: Also required if you utilize the `long` storage the
    values column name is use for populating new frame values

- freq: the offset string or object representing a target conversion

- rs_kwargs: Arguments based on pandas.DataFrame.resample

- verbose:  If  this is ``True`` then populate the DataFrame with the
            human readable versions of any foreign key or choice fields
            else use the actual value set in the model.

Examples ^^^^^^^^^

Using a long storage format ::

#models.py

class LongTimeSeries(models.Model):
    date_ix = models.DateTimeField()
    series_name = models.CharField(max_length=100)
    value = models.FloatField()

    objects = DataFrameManager()

Some sample data:::

========   =====       =====
date_ix    series_name value
========   =====       ======
2010-01-01  gdp        204699

2010-01-01  inflation  2.0

2010-01-01  wages      100.7

2010-02-01  gdp        204704

2010-02-01  inflation  2.4

2010-03-01  wages      100.4

2010-02-01  gdp        205966

2010-02-01  inflation  2.5

2010-03-01  wages      100.5
==========  ========== ======

Create a QuerySet ::

qs = LongTimeSeries.objects.filter(date_ix__year__gte=2010)

Create a timeseries dataframe ::

df = qs.to_timeseries(index='date_ix',
                      pivot_columns='series_name',
                      values='value',
                      storage='long')
df.head()

date_ix      gdp     inflation     wages

2010-01-01   204966     2.0       100.7

2010-02-01   204704      2.4       100.4

2010-03-01   205966      2.5       100.5

Using a wide storage format ::

class WideTimeSeries(models.Model):
    date_ix = models.DateTimeField()
    col1 = models.FloatField()
    col2 = models.FloatField()
    col3 = models.FloatField()
    col4 = models.FloatField()

    objects = DataFrameManager()

qs = WideTimeSeries.objects.all()

rs_kwargs = {'how': 'sum', 'kind': 'period'}
df = qs.to_timeseries(index='date_ix', pivot_columns='series_name',
                      values='value', storage='long',
                      freq='M', rs_kwargs=rs_kwargs)

to_pivot_table

A convenience method for creating a pivot table from a QuerySet

Parameters

  • fieldnames: The model field names to utilise in creating the frame. to span a relationship, just use the field name of related fields across models, separated by double underscores,
  • values : column to aggregate, optional
  • rows : list of column names or arrays to group on Keys to group on the x-axis of the pivot table
  • cols : list of column names or arrays to group on Keys to group on the y-axis of the pivot table
  • aggfunc : function, default numpy.mean, or list of functions If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)
  • fill_value : scalar, default None Value to replace missing values with
  • margins : boolean, default False Add all row / columns (e.g. for subtotal / grand totals)
  • dropna : boolean, default True

Example ::

# models.py
class PivotData(models.Model):
    row_col_a = models.CharField(max_length=15)
    row_col_b = models.CharField(max_length=15)
    row_col_c = models.CharField(max_length=15)
    value_col_d = models.FloatField()
    value_col_e = models.FloatField()
    value_col_f = models.FloatField()

    objects = DataFrameManager()

Usage ::

    rows = ['row_col_a', 'row_col_b']
    cols = ['row_col_c']

    pt = qs.to_pivot_table(values='value_col_d', rows=rows, cols=cols)

.. end-here

CHANGES

0.6.6 (2012-10-27)

The main feature of this is release in the use of a GHA to automate the publishing of the package to PYPI as per PR #146_ (again much thanks @graingert). Several other minor issues have also been addressed.

.. _#146: https://github.com/chrisdev/django-pandas/pull/146

0.6.5 (2021-10-15)

This version added support for Pandas >=1.3 (thanks to @graingert)

Other Changes:

  • Migrated from Travis to Github Actions for CI (also @graingert)

  • Avoid the use of deprecated methods #139_ and #142_ (again much thanks @graingert)

  • Fix for issue #135_ (Thanks @Yonimdo)

  • Silence Django 3.2 errors on testing on etc. #133_ thanks @whyscream.

.. _#139: https://github.com/chrisdev/django-pandas/issues/135 .. _#142: https://github.com/chrisdev/django-pandas/issues/142 .. _#135: https://github.com/chrisdev/django-pandas/issues/135 .. _#133: https://github.com/chrisdev/django-pandas/issues/133

0.6.4 (2021-02-08)

Bumped version number as the previous release was incorrectly uploaded to pypi

0.6.1 (2020-05-26)

Supports the latest release of Pandas 1.0.3

0.6.0 (2019-01-11)

Removes compatibility with Django versions < 1.8

0.5.2 (2019-01-3)

This is the last version that supports Django < 1.8

  • Improved coerce_float option (thanks @ZuluPro )
  • Ensure compatibility with legacy versions of Django ( < 1.8)
  • Test pass with Django 2+ and python 3.7

0.5.1 (2018-01-26)

  • Address Unicode decode error when installing with pip3 on docker (Thanks @utapyngo)

0.5.0 (2018-01-20)

  • Django 2.0 compatibility (Thanks @meirains)

0.4.5 (2017-10-4)

  • A Fix for fieldname deduplication bug thanks to @kgabbott

0.4.4 (2017-07-16)

  • The verbose argument now handles more use cases (Thanks to @henhuy and Kevin Abbott)
  • Corece float argument add to to_timeseries() method (Thanks Yousuf Jawwad)

0.4.3 (2017-06-02)

  • Fix doc typos and formatting
  • Prevent column duplication in read_frame (Thanks Kevin Abbott)

0.4.2 (2017-05-22)

  • Compatibility with pandas 0.20.1
  • Support for Python 2.7 and 3.5 with Django versions 1.8+
  • Suport for Python 3.6 and Django 1.11
  • We still support legacy versions (Python 2.7 and Django 1.4)

0.4.1 (2016-02-05)

  • Address the incompatibility with Django 1.9 due to the removal of specialized query sets like the ValuesQuerySet <https://code.djangoproject.com/ticket/24211>_
  • Address the removal of the PassThrougManager from django-model-utils version 2.4. We've removed the dependency on django-model-utils and included the PassThroughManger (which was always a standalone tool distributed a part of django-model-utils) for compatibility with earlier versions of Django (<= 1.8). For more recent versions of Django we're using Django's built in QuerySet.as_manager().
  • Now supports Pandas 0.14.1 and above
  • The fall in Coverage in this release largely reflects the integration of the PassThroughManager into the code base. We'll add the required test coverage for the PassThroughManager in subsequent releases.

0.3.1 (2015-10-25)

  • Extends the ability to span a ForeignKey relationship with double underscores to OneToOneField too thanks to Safe Hammad
  • Provide better support for ManyToMany and OneToMany relations thanks to Jeff Sternberg and @MiddleFork

0.3.0 (2015-06-16)

  • This version supports Django 1.8
  • Support for Pandas 0.16

0.2.2 (2015-03-02)

  • Added Support for Django 1.7

0.2.1 (2015-01-28)

  • Added Support for Values QuerySets
  • Support for Python 2.6
  • Note we still have limited support for Django 1.7 but this will be coming in the next release

0.2.0 (2014-06-15)

  • Added the io module so that DataFrames can be created from any queryset so you don't need to to add a DataFrame manager to your models. This is good for working with legacy projects.
  • added a Boolean verbose argument to all methods (which defaults to True) This populate the DataFrame columns with the human readable versions of foreign key or choice fields.
  • Improved the performance DataFrame creation by removing dependency on np.core.records.fromrecords
  • Loads of bug fixes, more tests and improved coverage and better documentation