I think it would be useful to differentiate more clearly between what is offered...

woodruffw · on Aug 14, 2024

Yep, someone brought this up on another discussion forum. The post was intended to be explicitly about accomplishing the ser/de half as well, hence the emphasis on Pydantic :-)

(Python’s annotated types are very powerful, and you can do this and more with them if you don’t immediately need ser/de! But they also have limitations, e.g. I believe Union wasn’t allowed in isinstance checks or matching until a recent version.)

nsagent · on Aug 14, 2024

If you're looking for serialization/deserialization, you might consider confactory [1]. I created to be a factory for objects defined in configs. It actually builds the Python objects without much effort on the user. It simply makes use of type annotations (though you can define your own serializers and deserializers).

It also supports complex structures like union types, lists, etc. I used it to create cresset [2], a package that allows building Pytorch models directly from config files.

[1]: https://pypi.org/project/confactory/ [2]: https://pypi.org/project/cresset/

sevensor · on Aug 17, 2024

I think it’s quite useful to separate ser/de, structural validation, and semantic validation. This is where I struggle with a library like ruamel.yaml, running deserialization and structural validation together, or Pydantic, running structural and semantic validation together. It’s not hard to write a Python type annotation for what you get from json.loads, and it’s also not hard to write a recursive function with a 200 line match statement that reflects on type annotations to convert that to typeddicts, data classes, and so forth. But semantic validation is a whole other problem, one that tends to be so domain specific it’s better deferred. Not that you shouldn’t do it, but that it belongs in its own data processing layer. Also this lets you be specific about what’s wrong with a piece of input. Bad JSON? A list where a dictionary was expected? An end timestamp that’s before the start? Sure, check each of these, and in context make invalid state unrepresentable, but invalid state after json.loads is very different from invalid state after validating your timestamps.

carderne · on Aug 14, 2024

Yeah and in practise most people will probably be using Pydantic anyway. Just wanted to point out it's not strictly necessary. :)

intalentive · on Aug 14, 2024

Pydantic offers runtime checks.

Also I’d add msgspec to your list at the end. Lightweight and fast, handles validation during decoding.

carderne · on Aug 14, 2024

Good point, but that's not always desirable. If you have strict type-checking and _aren't_ doing ser/de, it's likely not necessary (eg Rust doesn't do runtime checks).

nsteel · on Aug 15, 2024

The situation we found where it's still useful is if your app supports extension-type functionality. The 3rd parties writing extensions would ideally be type-checking during development... but they might not bother. Runtime validation becomes useful at the interfaces.

mejutoco · on Aug 14, 2024

Typeguard too. The @typechecked annotation on any function or method will blow up with an error at runtime if types do not match

binarycoffee · on Aug 15, 2024

I needed to reflect Rust enums and went a bit further with that approach. All variants are wrapped in a decorated class, where the decorator automatically computes the union type and adds de/serialization hooks for `cattrs`.

    @enumclass
    class MyEnum:
        class UnitLikeVariant(Variant0Arg): ...
    
        class TupleLikeVariant(Variant2Arg[int, str]): ...
    
        @dataclass
        class StructLikeVariant:
            foo: float
            bar: int

        # The following class variable is automatically generated:
        #
        # type = UnitLikeVariant | TupleLikeVariant | StructLikeVariant

where the `VariantXArg` classes are predefined.

Austizzle · on Aug 15, 2024

Fascinating, how did you get the type hint on the `type` class variable to be correct? (Or is this not visible to mypy?)

bmitc · on Aug 15, 2024

Does MyPy properly validate the use of these types?

Do you have anything public that elaborates on this?

zo1 · on Aug 15, 2024

Everyone is offering their suggestions, but no one has posted about marshmallow which handles everything out of the box including serialization and de-serialization. It's the perfect balance of dataclasses, (de)serialization, and lack of useless features and umpteen hacks that libraries like Pydantic and FastAPI have.

carderne · on Aug 16, 2024

But marshmallow doesn’t do any of the (compile-time) typing stuff.

If you don’t care about types and just want ser/de that’s great, but I think it’s clearly on topic here to care about types.

LtWorf · on Aug 15, 2024

Last time I tried dataclasses-json it had no type safety whatsoever and relied on the data being correct without checking it.

It was also an order of magnitude slower than other libraries, and at the time all these libraries were much slower.

d0mine · on Aug 15, 2024

There is also https://github.com/zifeo/dataconf that relies heavily on dataclasses to represent configs.