Converting pydantic classes to spark schemas
This library can convert a pydantic class to a spark schema or generate python code from a spark schema.
pip install pydantic-spark
import json
from typing import Optional
from pydantic_spark.base import SparkBase
class TestModel(SparkBase):
key1: str
key2: int
key2: Optional[str]
schema_dict: dict = TestModel.spark_schema()
print(json.dumps(schema_dict))
Pydantic-spark provides a coerce_type
option that allows type coercion.
When applied to a field, pydantic-spark converts the column's data type to the specified coercion type.
import json
from pydantic import Field
from pydantic_spark.base import SparkBase, CoerceType
class TestModel(SparkBase):
key1: str = Field(extra_json_schema={"coerce_type": CoerceType.integer})
schema_dict: dict = TestModel.spark_schema()
print(json.dumps(schema_dict))
poetry install
pytest
coverage run -m pytest # with coverage
# or (depends on your local env)
poetry run pytest
poetry run coverage run -m pytest # with coverage
The linting is checked in the github workflow. To fix and review issues run this:
black . # Auto fix all issues
isort . # Auto fix all issues
pflake . # Only display issues, fixing is manual