RDBMS access via IPython
:Author: Catherine Devlin, http://catherinedevlin.blogspot.com
Introduces a %sql (or %%sql) magic.
Connect to a database, using SQLAlchemy URL
_ connect strings, then issue SQL
commands within IPython or IPython Notebook.
.. image:: https://raw.github.com/catherinedevlin/ipython-sql/master/examples/writers.png :width: 600px :alt: screenshot of ipython-sql in the Notebook
.. code-block:: python
In [1]: %load_ext sql
In [2]: %%sql postgresql://will:longliveliz@localhost/shakes
...: select * from character
...: where abbrev = 'ALICE'
...:
Out[2]: [(u'Alice', u'Alice', u'ALICE', u'a lady attending on Princess Katherine', 22)]
In [3]: result = _
In [4]: print(result)
charid charname abbrev description speechcount
=================================================================================
Alice Alice ALICE a lady attending on Princess Katherine 22
In [4]: result.keys
Out[5]: [u'charid', u'charname', u'abbrev', u'description', u'speechcount']
In [6]: result[0][0]
Out[6]: u'Alice'
In [7]: result[0].description
Out[7]: u'a lady attending on Princess Katherine'
After the first connection, connect info can be omitted::
In [8]: %sql select count(*) from work
Out[8]: [(43L,)]
Connections to multiple databases can be maintained. You can refer to an existing connection by username@database
.. code-block:: python
In [9]: %%sql will@shakes
...: select charname, speechcount from character
...: where speechcount = (select max(speechcount)
...: from character);
...:
Out[9]: [(u'Poet', 733)]
In [10]: print(_)
charname speechcount
======================
Poet 733
If no connect string is supplied, %sql
will provide a list of existing connections;
however, if no connections have yet been made and the environment variable DATABASE_URL
is available, that will be used.
For secure access, you may dynamically access your credentials (e.g. from your system environment or getpass.getpass
) to avoid storing your password in the notebook itself. Use the $
before any variable to access it in your %sql
command.
.. code-block:: python
In [11]: user = os.getenv('SOME_USER')
....: password = os.getenv('SOME_PASSWORD')
....: connection_string = "postgresql://{user}:{password}@localhost/some_database".format(user=user, password=password)
....: %sql $connection_string
Out[11]: u'Connected: some_user@some_database'
You may use multiple SQL statements inside a single cell, but you will only see any query results from the last of them, so this really only makes sense for statements with no output
.. code-block:: python
In [11]: %%sql sqlite://
....: CREATE TABLE writer (first_name, last_name, year_of_death);
....: INSERT INTO writer VALUES ('William', 'Shakespeare', 1616);
....: INSERT INTO writer VALUES ('Bertold', 'Brecht', 1956);
....:
Out[11]: []
As a convenience, dict-style access for result sets is supported, with the leftmost column serving as key, for unique values.
.. code-block:: python
In [12]: result = %sql select * from work
43 rows affected.
In [13]: result['richard2']
Out[14]: (u'richard2', u'Richard II', u'History of Richard II', 1595, u'h', None, u'Moby', 22411, 628)
Results can also be retrieved as an iterator of dictionaries (result.dicts()
)
or a single dictionary with a tuple of scalar values per key (result.dict()
)
Bind variables (bind parameters) can be used in the "named" (:x) style. The variable names used should be defined in the local namespace.
.. code-block:: python
In [15]: name = 'Countess'
In [16]: %sql select description from character where charname = :name
Out[16]: [(u'mother to Bertram',)]
Alternately, $variable_name
or {variable_name}
can be
used to inject variables from the local namespace into the SQL
statement before it is formed and passed to the SQL engine.
(Using $
and {}
together, as in ${variable_name}
,
is not supported.)
In [17]: %sql select description from character where charname = '{name}'
Out[17]: [(u'mother to Bertram',)]
Bind variables are passed through to the SQL engine and can only
be used to replace strings passed to SQL. $
and {}
are
substituted before passing to SQL and can be used to form SQL
statements dynamically.
Ordinary IPython assignment works for single-line %sql
queries:
.. code-block:: python
In [18]: works = %sql SELECT title, year FROM work
43 rows affected.
The <<
operator captures query results in a local variable, and
can be used in multi-line %%sql
:
.. code-block:: python
In [19]: %%sql works << SELECT title, year
...: FROM work
...:
43 rows affected.
Returning data to local variable works
Connection strings are SQLAlchemy URL
_ standard.
Some example connection strings::
mysql+pymysql://scott:tiger@localhost/foo
oracle://scott:tiger@127.0.0.1:1521/sidname
sqlite://
sqlite:///foo.db
mssql+pyodbc://username:password@host/database?driver=SQL+Server+Native+Client+11.0
.. _SQLAlchemy URL
: http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
Note that mysql
and mysql+pymysql
connections (and perhaps others)
don't read your client character set information from .my.cnf. You need
to specify it in the connection string::
mysql+pymysql://scott:tiger@localhost/foo?charset=utf8
Note that an impala
connection with impyla
_ for HiveServer2 requires disabling autocommit::
%config SqlMagic.autocommit=False
%sql impala://hserverhost:port/default?kerberos_service_name=hive&auth_mechanism=GSSAPI
.. _impyla: https://github.com/cloudera/impyla
Connection arguments not whitelisted by SQLALchemy can be provided as
a flag with (-a|--connection_arguments)the connection string as a JSON string.
See SQLAlchemy Args
_.
| %sql --connection_arguments {"timeout":10,"mode":"ro"} sqlite:// SELECT * FROM work;
| %sql -a '{"timeout":10, "mode":"ro"}' sqlite:// SELECT * from work;
.. _SQLAlchemy Args
: https://docs.sqlalchemy.org/en/13/core/engines.html#custom-dbapi-args
DSN connections
Alternately, you can store connection info in a
configuration file, under a section name chosen to
refer to your database.
For example, if dsn.ini contains
| [DB_CONFIG_1]
| drivername=postgres
| host=my.remote.host
| port=5433
| database=mydatabase
| username=myuser
| password=1234
then you can
| %config SqlMagic.dsn_filename='./dsn.ini'
| %sql --section DB_CONFIG_1
Configuration
-------------
Query results are loaded as lists, so very large result sets may use up
your system's memory and/or hang your browser. There is no autolimit
by default. However, `autolimit` (if set) limits the size of the result
set (usually with a `LIMIT` clause in the SQL). `displaylimit` is similar,
but the entire result set is still pulled into memory (for later analysis);
only the screen display is truncated.
.. code-block:: python
In [2]: %config SqlMagic
SqlMagic options
--------------
SqlMagic.autocommit=<Bool>
Current: True
Set autocommit mode
SqlMagic.autolimit=<Int>
Current: 0
Automatically limit the size of the returned result sets
SqlMagic.autopandas=<Bool>
Current: False
Return Pandas DataFrames instead of regular result sets
SqlMagic.column_local_vars=<Bool>
Current: False
Return data into local variables from column names
SqlMagic.displaycon=<Bool>
Current: False
Show connection string after execute
SqlMagic.displaylimit=<Int>
Current: None
Automatically limit the number of rows displayed (full result set is still
stored)
SqlMagic.dsn_filename=<Unicode>
Current: 'odbc.ini'
Path to DSN file. When the first argument is of the form [section], a
sqlalchemy connection string is formed from the matching section in the DSN
file.
SqlMagic.feedback=<Bool>
Current: False
Print number of rows affected by DML
SqlMagic.short_errors=<Bool>
Current: True
Don't display the full traceback on SQL Programming Error
SqlMagic.style=<Unicode>
Current: 'DEFAULT'
Set the table printing style to any of prettytable's defined styles
(currently DEFAULT, MSWORD_FRIENDLY, PLAIN_COLUMNS, RANDOM)
In[3]: %config SqlMagic.feedback = False
Please note: if you have autopandas set to true, the displaylimit option will not apply. You can set the pandas display limit by using the pandas ``max_rows`` option as described in the `pandas documentation <http://pandas.pydata.org/pandas-docs/version/0.18.1/options.html#frequently-used-options>`_.
Pandas
------
If you have installed ``pandas``, you can use a result set's
``.DataFrame()`` method
.. code-block:: python
In [3]: result = %sql SELECT * FROM character WHERE speechcount > 25
In [4]: dataframe = result.DataFrame()
The ``--persist`` argument, with the name of a
DataFrame object in memory,
will create a table name
in the database from the named DataFrame.
Or use ``--append`` to add rows to an existing
table by that name.
.. code-block:: python
In [5]: %sql --persist dataframe
In [6]: %sql SELECT * FROM dataframe;
.. _Pandas: http://pandas.pydata.org/
Graphing
--------
If you have installed ``matplotlib``, you can use a result set's
``.plot()``, ``.pie()``, and ``.bar()`` methods for quick plotting
.. code-block:: python
In[5]: result = %sql SELECT title, totalwords FROM work WHERE genretype = 'c'
In[6]: %matplotlib inline
In[7]: result.pie()
.. image:: https://raw.github.com/catherinedevlin/ipython-sql/master/examples/wordcount.png
:alt: pie chart of word count of Shakespeare's comedies
Dumping
-------
Result sets come with a ``.csv(filename=None)`` method. This generates
comma-separated text either as a return value (if ``filename`` is not
specified) or in a file of the given name.
.. code-block:: python
In[8]: result = %sql SELECT title, totalwords FROM work WHERE genretype = 'c'
In[9]: result.csv(filename='work.csv')
PostgreSQL features
-------------------
``psql``-style "backslash" `meta-commands`_ commands (``\d``, ``\dt``, etc.)
are provided by `PGSpecial`_. Example:
.. code-block:: python
In[9]: %sql \d
.. _PGSpecial: https://pypi.python.org/pypi/pgspecial
.. _meta-commands: https://www.postgresql.org/docs/9.6/static/app-psql.html#APP-PSQL-META-COMMANDS
Options
-------
``-l`` / ``--connections``
List all active connections
``-x`` / ``--close <session-name>``
Close named connection
``-c`` / ``--creator <creator-function>``
Specify creator function for new connection
``-s`` / ``--section <section-name>``
Section of dsn_file to be used for generating a connection string
``-p`` / ``--persist``
Create a table name in the database from the named DataFrame
``--append``
Like ``--persist``, but appends to the table if it already exists
``-a`` / ``--connection_arguments <"{connection arguments}">``
Specify dictionary of connection arguments to pass to SQL driver
``-f`` / ``--file <path>``
Run SQL from file at this path
Caution
-------
Comments
~~~~~~~~
Because ipyton-sql accepts ``--``-delimited options like ``--persist``, but ``--``
is also the syntax to denote a SQL comment, the parser needs to make some assumptions.
- If you try to pass an unsupported argument, like ``--lutefisk``, it will
be interpreted as a SQL comment and will not throw an unsupported argument
exception.
- If the SQL statement begins with a first-line comment that looks like one
of the accepted arguments - like ``%sql --persist is great!`` - it will be
parsed like an argument, not a comment. Moving the comment to the second
line or later will avoid this.
Installing
----------
Install the latest release with::
pip install ipython-sql
or download from https://github.com/catherinedevlin/ipython-sql and::
cd ipython-sql
sudo python setup.py install
Development
-----------
https://github.com/catherinedevlin/ipython-sql
Credits
-------
- Matthias Bussonnier for help with configuration
- Olivier Le Thanh Duong for ``%config`` fixes and improvements
- Distribute_
- Buildout_
- modern-package-template_
- Mike Wilson for bind variable code
- Thomas Kluyver and Steve Holden for debugging help
- Berton Earnshaw for DSN connection syntax
- Bruno Harbulot for DSN example
- Andrés Celis for SQL Server bugfix
- Michael Erasmus for DataFrame truth bugfix
- Noam Finkelstein for README clarification
- Xiaochuan Yu for `<<` operator, syntax colorization
- Amjith Ramanujam for PGSpecial and incorporating it here
- Alexander Maznev for better arg parsing, connections accepting specified creator
- Jonathan Larkin for configurable displaycon
- Jared Moore for ``connection-arguments`` support
- Gilbert Brault for ``--append``
- Lucas Zeer for multi-line bugfixes for var substitution, ``<<``
- vkk800 for ``--file``
- Jens Albrecht for MySQL DatabaseError bugfix
- meihkv for connection-closing bugfix
- Abhinav C for SQLAlchemy 2.0 compatibility
.. _Distribute: http://pypi.python.org/pypi/distribute
.. _Buildout: http://www.buildout.org/
.. _modern-package-template: http://pypi.python.org/pypi/modern-package-template
News
----
0.1
~~~
*Release date: 21-Mar-2013*
* Initial release
0.1.1
~~~~~
*Release date: 29-Mar-2013*
* Release to PyPI
* Results returned as lists
* print(_) to get table form in text console
* set autolimit and text wrap in configuration
0.1.2
~~~~~
*Release date: 29-Mar-2013*
* Python 3 compatibility
* use prettyprint package
* allow multiple SQL per cell
0.2.0
~~~~~
*Release date: 30-May-2013*
* Accept bind variables (Thanks Mike Wilson!)
0.2.1
~~~~~
*Release date: 15-June-2013*
* Recognize socket connection strings
* Bugfix - issue 4 (remember existing connections by case)
0.2.2
~~~~~
*Release date: 30-July-2013*
Converted from an IPython Plugin to an Extension for 1.0 compatibility
0.2.2.1
~~~~~~~
*Release date: 01-Aug-2013*
Deleted Plugin import left behind in 0.2.2
0.2.3
~~~~~
*Release date: 20-Sep-2013*
* Contributions from Olivier Le Thanh Duong:
- SQL errors reported without internal IPython error stack
- Proper handling of configuration
* Added .DataFrame(), .pie(), .plot(), and .bar() methods to
result sets
0.3.0
~~~~~
*Release date: 13-Oct-2013*
* displaylimit config parameter
* reports number of rows affected by each query
* test suite working again
* dict-style access for result sets by primary key
0.3.1
~~~~~
* Reporting of number of rows affected configurable with ``feedback``
* Local variables usable as SQL bind variables
0.3.2
~~~~~
* ``.csv(filename=None)`` method added to result sets
0.3.3
~~~~~
* Python 3 compatibility restored
* DSN access supported (thanks Berton Earnshaw)
0.3.4
~~~~~
* PERSIST pseudo-SQL command added
0.3.5
~~~~~
* Indentations visible in HTML cells
* COMMIT each SQL statement immediately - prevent locks
0.3.6
~~~~~
* Fixed issue #30, commit failures for sqlite (thanks stonebig, jandot)
0.3.7
~~~~~
* New `column_local_vars` config option submitted by darikg
* Avoid contaminating user namespace from locals (thanks alope107)
0.3.7.1
~~~~~~~
* Avoid "connection busy" error for SQL Server (thanks Andrés Celis)
0.3.8
~~~~~
* Stop warnings for deprecated use of IPython 3 traitlets in IPython 4 (thanks graphaelli; also stonebig, aebrahim, mccahill)
* README update for keeping connection info private, from eshilts
0.3.9
~~~~~
* Fix truth value of DataFrame error (thanks michael-erasmus)
* `<<` operator (thanks xiaochuanyu)
* added README example (thanks tanhuil)
* bugfix in executing column_local_vars (thanks tebeka)
* pgspecial installation optional (thanks jstoebel and arjoe)
* conceal passwords in connection strings (thanks jstoebel)
0.3.9
~~~~~
* Restored Python 2 compatibility (thanks tokenmathguy)
0.4.0
~~~~~
* Changed most non-SQL commands to argparse arguments (thanks pik)
* User can specify a creator for connections (thanks pik)
* Bogus pseudo-SQL command `PERSIST` removed, replaced with `--persist` arg
* Turn off echo of connection information with `displaycon` in config
* Consistent support for {} variables (thanks Lucas)
0.4.1
~~~~~
* Fixed .rst file location in MANIFEST.in
* Parse SQL comments in first line
* Bugfixes for DSN, `--close`, others
0.5.0
~~~~~
* Use SQLAlchemy 2.0
* Drop undocumented support for dict-style access to raw row instances