A wrapper around the stdlib `tokenize` which roundtrips.
The stdlib tokenize module does not properly roundtrip. This wrapper
around the stdlib provides two additional tokens ESCAPED_NL and
UNIMPORTANT_WS, and a Token data type. Use src_to_tokens and
tokens_to_src to roundtrip.
This library is useful if you're writing a refactoring tool based on the python tokenization.
pip install tokenize-rt
tokenize_rt.Offset(line=None, utf8_byte_offset=None)A token offset, useful as a key when cross referencing the ast and the
tokenized source.
tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)Construct a token
name: one of the token names listed in token.tok_name or
ESCAPED_NL or UNIMPORTANT_WSsrc: token's source as textline: the line number that this token appears on.utf8_byte_offset: the utf8 byte offset that this token appears on in the
line.tokenize_rt.Token.offsetRetrieves an Offset for this token.
Token representationstokenize_rt.src_to_tokens(text: str) -> List[Token]tokenize_rt.tokens_to_src(Iterable[Token]) -> strtokenize-rttokenize_rt.ESCAPED_NLtokenize_rt.UNIMPORTANT_WStokenize_rt.NON_CODING_TOKENSA frozenset containing tokens which may appear between others while not
affecting control flow or code:
COMMENTESCAPED_NLNLUNIMPORTANT_WStokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]parse a string literal into its prefix and string content
>>> parse_string_literal('f"foo"')
('f', '"foo"')
tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]yields (index, token) pairs. Useful for rewriting source.
tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]find the indices of the string parts of a (joined) string literal
i should start at the end of the string literal() (an empty tuple) for things which are not string literals>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)
tokenizetokenize-rt adds ESCAPED_NL for a backslash-escaped newline "token"tokenize-rt adds UNIMPORTANT_WS for whitespace (discarded in tokenize)tokenize-rt normalizes string prefixes, even if they are not parsed -- for
instance, this means you'll see Token('STRING', "f'foo'", ...) even in
python 2.tokenize-rt normalizes python 2 long literals (4l / 4L) and octal
literals (0755) in python 3 (for easier rewriting of python 2 code while
running python 3).