Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think it would be useful to differentiate more clearly between what is offered by Python's type system, and what is offered by Pydantic.

That is, you can approximate Rusts's enum (sum type) with pure Python using whatever combination of Literal, Enum, Union and dataclasses. For example (more here[1]):

  @dataclass
  class Foo: ...
  @dataclass
  class Bar: ...
  
  Frobulated = Foo | Bar
Pydantic adds de/ser, but if you're not doing that then you can get very far without it. (And even if you are, there are lighter-weight options that play with dataclasses like cattrs, pyserde, dataclasses-json).

[1] https://threeofwands.com/algebraic-data-types-in-python/



Yep, someone brought this up on another discussion forum. The post was intended to be explicitly about accomplishing the ser/de half as well, hence the emphasis on Pydantic :-)

(Python’s annotated types are very powerful, and you can do this and more with them if you don’t immediately need ser/de! But they also have limitations, e.g. I believe Union wasn’t allowed in isinstance checks or matching until a recent version.)


If you're looking for serialization/deserialization, you might consider confactory [1]. I created to be a factory for objects defined in configs. It actually builds the Python objects without much effort on the user. It simply makes use of type annotations (though you can define your own serializers and deserializers).

It also supports complex structures like union types, lists, etc. I used it to create cresset [2], a package that allows building Pytorch models directly from config files.

[1]: https://pypi.org/project/confactory/ [2]: https://pypi.org/project/cresset/


I think it’s quite useful to separate ser/de, structural validation, and semantic validation. This is where I struggle with a library like ruamel.yaml, running deserialization and structural validation together, or Pydantic, running structural and semantic validation together. It’s not hard to write a Python type annotation for what you get from json.loads, and it’s also not hard to write a recursive function with a 200 line match statement that reflects on type annotations to convert that to typeddicts, data classes, and so forth. But semantic validation is a whole other problem, one that tends to be so domain specific it’s better deferred. Not that you shouldn’t do it, but that it belongs in its own data processing layer. Also this lets you be specific about what’s wrong with a piece of input. Bad JSON? A list where a dictionary was expected? An end timestamp that’s before the start? Sure, check each of these, and in context make invalid state unrepresentable, but invalid state after json.loads is very different from invalid state after validating your timestamps.


Yeah and in practise most people will probably be using Pydantic anyway. Just wanted to point out it's not strictly necessary. :)


Pydantic offers runtime checks.

Also I’d add msgspec to your list at the end. Lightweight and fast, handles validation during decoding.


Good point, but that's not always desirable. If you have strict type-checking and _aren't_ doing ser/de, it's likely not necessary (eg Rust doesn't do runtime checks).


The situation we found where it's still useful is if your app supports extension-type functionality. The 3rd parties writing extensions would ideally be type-checking during development... but they might not bother. Runtime validation becomes useful at the interfaces.


Typeguard too. The @typechecked annotation on any function or method will blow up with an error at runtime if types do not match


I needed to reflect Rust enums and went a bit further with that approach. All variants are wrapped in a decorated class, where the decorator automatically computes the union type and adds de/serialization hooks for `cattrs`.

    @enumclass
    class MyEnum:
        class UnitLikeVariant(Variant0Arg): ...
    
        class TupleLikeVariant(Variant2Arg[int, str]): ...
    
        @dataclass
        class StructLikeVariant:
            foo: float
            bar: int

        # The following class variable is automatically generated:
        #
        # type = UnitLikeVariant | TupleLikeVariant | StructLikeVariant
where the `VariantXArg` classes are predefined.


Fascinating, how did you get the type hint on the `type` class variable to be correct? (Or is this not visible to mypy?)


Does MyPy properly validate the use of these types?

Do you have anything public that elaborates on this?


Everyone is offering their suggestions, but no one has posted about marshmallow which handles everything out of the box including serialization and de-serialization. It's the perfect balance of dataclasses, (de)serialization, and lack of useless features and umpteen hacks that libraries like Pydantic and FastAPI have.


But marshmallow doesn’t do any of the (compile-time) typing stuff.

If you don’t care about types and just want ser/de that’s great, but I think it’s clearly on topic here to care about types.


Last time I tried dataclasses-json it had no type safety whatsoever and relied on the data being correct without checking it.

It was also an order of magnitude slower than other libraries, and at the time all these libraries were much slower.


There is also https://github.com/zifeo/dataconf that relies heavily on dataclasses to represent configs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: