A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
.. image:: https://img.shields.io/pypi/v/gspread-pandas.svg :target: https://pypi.python.org/pypi/gspread-pandas :alt: PyPI Version
.. image:: https://readthedocs.org/projects/gspread-pandas/badge/?version=latest :target: https://gspread-pandas.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status
author: Diego Fernandez
Links:
Documentation <http://gspread-pandas.readthedocs.io/>
_Source code <https://github.com/aiguofer/gspread-pandas>
_Short video tutorial <https://youtu.be/2yIcNYzfzPw>
_.. attention:: Upgrading from < 2.0
If you are upgrading, the ``user`` is now an optional param that
uses ``default`` as the default. If you're a single user, you might
want to re-name your credentials to ``default`` so you can stop
specifying it:
.. code-block:: console
mv ~/.config/gspread_pandas/creds{<old_name>,default}
A package to easily open an instance of a Google spreadsheet and
interact with worksheets through Pandas DataFrames. It enables you to
easily pull data from Google spreadsheets into DataFrames as well as
push data into spreadsheets from DataFrames. It leverages
gspread <https://github.com/burnash/gspread/>
__ in the backend for
most of the heavylifting, but it has a lot of added functionality
to handle things specific to working with DataFrames as well as
some extra nice to have features.
The target audience are Data Analysts and Data Scientists, but it can also be used by Data Engineers or anyone trying to automate workflows with Google Sheets and Pandas.
Some key goals/features:
ValueInputOption
and ValueRenderOption
for specific columnsTo install use pip:
.. code-block:: console
$ pip install gspread-pandas
Or clone the repo:
.. code-block:: console
$ git clone https://github.com/aiguofer/gspread-pandas.git
$ python setup.py install
Before using, you will need to download Google client credentials for your app.
To allow a script to use Google Drive API we need to authenticate our
self towards Google. To do so, we need to create a project, describing
the tool and generate credentials. Please use your web browser and go to
Google console <https://console.developers.google.com/>
__ and :
Choose Create Project in popup menu on the top.
A dialog box appears, so give your project a name and click on Create button.
On the left-side menu click on API Manager.
A table of available APIs is shown. Switch Drive API and click on Enable API button. Do the same for Sheets API. Other APIs might be switched off, for our purpose.
On the left-side menu click on Credentials.
In section OAuth consent screen select your email address and give your product a name. Then click on Save button.
In section Credentials click on Add credentials and switch OAuth client ID (if you want to use your own account or enable the use of multiple accounts) or Service account key (if you prefer to have a service account interacting with spreadsheets).
If you select OAuth client ID:
If you select Service account key
Please be aware, the file contains your private credentials, so take
care of the file in the same way you care of your private SSH key;
Move the downloaded JSON to ~/.config/gspread_pandas/google_secret.json
(or you can configure the directory and file name by directly calling
gspread_pandas.conf.get_config
Thanks to similar project
df2gspread <https://github.com/maybelinot/df2gspread>
__ for this great
description of how to get the client credentials.
You can read more about it in the configuration docs <https://gspread-pandas.readthedocs.io/en/latest/configuration.html>
__
including how to change the default behavior.
.. code:: python
import pandas as pd
from gspread_pandas import Spread, Client
file_name = "http://stats.idre.ucla.edu/stat/data/binary.csv"
df = pd.read_csv(file_name)
# 'Example Spreadsheet' needs to already exist and your user must have access to it
spread = Spread('Example Spreadsheet')
# This will ask to authenticate if you haven't done so before
# Display available worksheets
spread.sheets
# Save DataFrame to worksheet 'New Test Sheet', create it first if it doesn't exist
spread.df_to_sheet(df, index=False, sheet='New Test Sheet', start='A2', replace=True)
spread.update_cells('A1', 'B1', ['Created by:', spread.email])
print(spread)
# <gspread_pandas.client.Spread - User: '<example_user>@gmail.com', Spread: 'Example Spreadsheet', Sheet: 'New Test Sheet'>
# You can now first instanciate a Client separately and query folders and
# instanciate other Spread objects by passing in the Client
client = Client()
# Assumming you have a dir called 'example dir' with sheets in it
available_sheets = client.find_spreadsheet_files_in_folders('example dir')
spreads = []
for sheet in available_sheets.get('example dir', []):
spreads.append(Spread(sheet['id'], client=client))
If you're trying to use gspread_pandas
from within
Rodeo <https://www.yhat.com/products/rodeo>
_ you might get an
EOFError: EOF when reading a line
error when trying to pass in the verification
code. The workaround for this is to first verify your account in a regular shell.
Since you're just doing this to get your Oauth token, the spreadsheet doesn't need
to be valid. Just run this in shell:
.. code:: python
python -c "from gspread_pandas import Spread; Spread('<user_key>','')"
Then follow the instructions to create and store the OAuth creds.
IMO, Google sheets is not the right tool for large datasets. However, there's probably good reaons you might have to use it in such cases. When uploading a large DataFrame, you might run into this error.
By default, Spread.df_to_sheet
will add rows and/or columns needed to accomodate the DataFrame.
Since a new sheet contains a fairly large number of columns, if you're uploading a DF with lots of
rows you might exceed the max number of cells in a worksheet even if your data does not. In order
to fix this you have 2 options:
replace=True
, which will first resize the worksheet and clear out all values.Spread.sheet.resize(1, 1)
and then do df_to_sheet
There's a strange caveat with resizing, so going to 1x1 first is recommended (replace=True
already does this). To read more see this issue <https://issuetracker.google.com/issues/213126648>
_