Project: textparser

Text parser.

Project Details

Latest version
0.24.0
Home Page
https://github.com/eerimoq/textparser
PyPI Page
https://pypi.org/project/textparser/

Project Popularity

PageRank
0.0021227134842644125
Number of downloads
217580

About

A text parser written in the Python language.

The project has one goal, speed! See the benchmark below more details.

Project homepage: https://github.com/eerimoq/textparser

Documentation: http://textparser.readthedocs.org/en/latest

Credits

  • Thanks PyParsing_ for a user friendly interface. Many of textparser's class names are taken from this project.

Installation

.. code-block:: python

pip install textparser

Example usage

The Hello World_ example parses the string Hello, World! and outputs its parse tree ['Hello', ',', 'World', '!'].

The script:

.. code-block:: python

import textparser from textparser import Sequence

class Parser(textparser.Parser):

   def token_specs(self):
       return [
           ('SKIP',          r'[ \r\n\t]+'),
           ('WORD',          r'\w+'),
           ('EMARK',    '!', r'!'),
           ('COMMA',    ',', r','),
           ('MISMATCH',      r'.')
       ]

   def grammar(self):
       return Sequence('WORD', ',', 'WORD', '!')

tree = Parser().parse('Hello, World!')

print('Tree:', tree)

Script execution:

.. code-block:: text

$ env PYTHONPATH=. python3 examples/hello_world.py Tree: ['Hello', ',', 'World', '!']

Benchmark

A benchmark_ comparing the speed of 10 JSON parsers, parsing a 276 kb file_.

.. code-block:: text

$ env PYTHONPATH=. python3 examples/benchmarks/json/speed.py

Parsed 'examples/benchmarks/json/data.json' 1 time(s) in:

PACKAGE SECONDS RATIO VERSION textparser 0.10 100% 0.21.1 parsimonious 0.17 169% unknown lark (LALR) 0.27 267% 0.7.0 funcparserlib 0.34 340% unknown textx 0.54 546% 1.8.0 pyparsing 0.68 684% 2.4.0 pyleri 0.88 886% 1.2.2 parsy 0.92 925% 1.2.0 parsita 2.28 2286% unknown lark (Earley) 2.34 2348% 0.7.0

NOTE 1: The parsers are not necessarily optimized for speed. Optimizing them will likely affect the measurements.

NOTE 2: The structure of the resulting parse trees varies and additional processing may be required to make them fit the user application.

NOTE 3: Only JSON parsers are compared. Parsing other languages may give vastly different results.

Contributing

#. Fork the repository.

#. Implement the new feature or bug fix.

#. Implement test case(s) to ensure that future changes do not break legacy.

#. Run the tests.

.. code-block:: text

  python3 -m unittest

#. Create a pull request.

.. _PyParsing: https://github.com/pyparsing/pyparsing .. _Hello World: https://github.com/eerimoq/textparser/blob/master/examples/hello_world.py .. _benchmark: https://github.com/eerimoq/textparser/blob/master/examples/benchmarks/json/speed.py .. _276 kb file: https://github.com/eerimoq/textparser/blob/master/examples/benchmarks/json/data.json