Project: price-parser

Extract price and currency from a raw string

Project Details

Latest version
0.3.4
Home Page
https://github.com/scrapinghub/price-parser
PyPI Page
https://pypi.org/project/price-parser/

Project Popularity

PageRank
0.0015528850023319469
Number of downloads
46950

============ price-parser

.. image:: https://img.shields.io/pypi/v/price-parser.svg :target: https://pypi.python.org/pypi/price-parser :alt: PyPI Version

.. image:: https://img.shields.io/pypi/pyversions/price-parser.svg :target: https://pypi.python.org/pypi/price-parser :alt: Supported Python Versions

.. image:: https://travis-ci.org/scrapinghub/price-parser.svg?branch=master :target: https://travis-ci.org/scrapinghub/price-parser :alt: Build Status

.. image:: https://codecov.io/github/scrapinghub/price-parser/coverage.svg?branch=master :target: https://codecov.io/gh/scrapinghub/price-parser :alt: Coverage report

price-parser is a small library for extracting price and currency from raw text strings.

Features:

  • robust price amount and currency symbol extraction
  • zero-effort handling of thousand and decimal separators

The main use case is parsing prices extracted from web pages. For example, you can write a CSS/XPath selector which targets an element with a price, and then use this library for cleaning it up, instead of writing custom site-specific regex or Python code.

License is BSD 3-clause.

Installation

::

pip install price-parser

price-parser requires Python 3.6+.

Usage

Basic usage

from price_parser import Price price = Price.fromstring("22,90 €") price Price(amount=Decimal('22.90'), currency='€') price.amount # numeric price amount Decimal('22.90') price.currency # currency symbol, as appears in the string '€' price.amount_text # price amount, as appears in the string '22,90' price.amount_float # price amount as float, not Decimal 22.9

If you prefer, Price.fromstring has an alias price_parser.parse_price, they do the same:

from price_parser import parse_price parse_price("22,90 €") Price(amount=Decimal('22.90'), currency='€')

The library has extensive tests (900+ real-world examples of price strings). Some of the supported cases are described below.

Supported cases

Unclean price strings with various currencies are supported; thousand separators and decimal separators are handled:

Price.fromstring("Price: $119.00") Price(amount=Decimal('119.00'), currency='$')

Price.fromstring("15 130 Р") Price(amount=Decimal('15130'), currency='Р')

Price.fromstring("151,200 تومان") Price(amount=Decimal('151200'), currency='تومان')

Price.fromstring("Rp 1.550.000") Price(amount=Decimal('1550000'), currency='Rp')

Price.fromstring("Běžná cena 75 990,00 Kč") Price(amount=Decimal('75990.00'), currency='Kč')

Euro sign is used as a decimal separator in a wild:

Price.fromstring("1,235€ 99") Price(amount=Decimal('1235.99'), currency='€')

Price.fromstring("99 € 95 €") Price(amount=Decimal('99'), currency='€')

Price.fromstring("35€ 999") Price(amount=Decimal('35'), currency='€')

Some special cases are handled:

Price.fromstring("Free") Price(amount=Decimal('0'), currency=None)

When price or currency can't be extracted, corresponding attribute values are set to None:

Price.fromstring("") Price(amount=None, currency=None)

Price.fromstring("Foo") Price(amount=None, currency=None)

Price.fromstring("50% OFF") Price(amount=None, currency=None)

Price.fromstring("50") Price(amount=Decimal('50'), currency=None)

Price.fromstring("R$") Price(amount=None, currency='R$')

Currency hints

currency_hint argument allows to pass a text string which may (or may not) contain currency information. This feature is most useful for automated price extraction.

Price.fromstring("34.99", currency_hint="руб. (шт)") Price(amount=Decimal('34.99'), currency='руб.')

Note that currency mentioned in the main price string may be preferred over currency specified in currency_hint argument; it depends on currency symbols found there. If you know the correct currency, you can set it directly:

price = Price.fromstring("1 000") price.currency = 'EUR' price Price(amount=Decimal('1000'), currency='EUR')

Decimal separator

If you know which symbol is used as a decimal separator in the input string, pass that symbol in the decimal_separator argument to prevent price-parser from guessing the wrong decimal separator symbol.

Price.fromstring("Price: $140.600", decimal_separator=".") Price(amount=Decimal('140.600'), currency='$')

Price.fromstring("Price: $140.600", decimal_separator=",") Price(amount=Decimal('140600'), currency='$')

Contributing

  • Source code: https://github.com/scrapinghub/price-parser
  • Issue tracker: https://github.com/scrapinghub/price-parser/issues

Use tox_ to run tests with different Python versions::

tox

The command above also runs type checks; we use mypy.

.. _tox: https://tox.readthedocs.io

Changes

0.3.4 (2020-11-25)

  • Improved parsing of prices without digits before a decimal point ('.75'), https://github.com/scrapinghub/price-parser/pull/42
  • Fix parsing of price with non-breaking spaces https://github.com/scrapinghub/price-parser/pull/43

0.3.3 (2020-02-05)

  • Fixed installation issue on some Windows machines.

0.3.2 (2020-01-28)

  • Improved Korean and Japanese currency detection.
  • Declare Python 3.8 support.

0.3.1 (2019-10-21)

  • Redundant $ signs are no longer returned as a part of currency, e.g. for SGD$ 100 currency would be SGD, not SGD$.

0.3.0 (2019-10-19)

  • New Price.fromstring argument decimal_separator allows to override decimal separator for the cases where it is known (i.e. disable decimal separator detection);
  • NTD and RBM unofficial currency names are added;
  • quantifiers in regular expressions are made non-greedy, which provides a small speedup;
  • test improvements.

0.2.4 (2019-07-03)

  • Declare price-parser as providing type annotations (pep-561). This enables better type checking for projects using price-parser.
  • improved test coverage

0.2.3 (2019-06-18)

  • Follow-up for 0.2.2 release: improved parsing of prices with 4+ digits after a decimal separator.

0.2.2 (2019-06-18)

  • Fixed parsing of prices with 4+ digits after a decimal separator.

0.2.1 (2019-04-19)

  • 23 additional currency symbols are added;
  • A$ alias for Australian Dollar is added.

0.2 (2019-04-12)

Added support for currencies replaced by euro.

0.1.1 (2019-04-12)

Minor packaging fixes.

0.1 (2019-04-12)

Initial release.