I think it would be useful to differentiate more clearly between what is offered by Python's type system, and what is offered by Pydantic.
That is, you can approximate Rusts's enum (sum type) with pure Python using whatever combination of Literal, Enum, Union and dataclasses. For example (more here[1]):
@dataclass
class Foo: ...
@dataclass
class Bar: ...
Frobulated = Foo | Bar
Pydantic adds de/ser, but if you're not doing that then you can get very far without it. (And even if you are, there are lighter-weight options that play with dataclasses like cattrs, pyserde, dataclasses-json).
Yep, someone brought this up on another discussion forum. The post was intended to be explicitly about accomplishing the ser/de half as well, hence the emphasis on Pydantic :-)
(Python’s annotated types are very powerful, and you can do this and more with them if you don’t immediately need ser/de! But they also have limitations, e.g. I believe Union wasn’t allowed in isinstance checks or matching until a recent version.)
If you're looking for serialization/deserialization, you might consider confactory [1]. I created to be a factory for objects defined in configs. It actually builds the Python objects without much effort on the user. It simply makes use of type annotations (though you can define your own serializers and deserializers).
It also supports complex structures like union types, lists, etc. I used it to create cresset [2], a package that allows building Pytorch models directly from config files.
I think it’s quite useful to separate ser/de, structural validation, and semantic validation. This is where I struggle with a library like ruamel.yaml, running deserialization and structural validation together, or Pydantic, running structural and semantic validation together. It’s not hard to write a Python type annotation for what you get from json.loads, and it’s also not hard to write a recursive function with a 200 line match statement that reflects on type annotations to convert that to typeddicts, data classes, and so forth. But semantic validation is a whole other problem, one that tends to be so domain specific it’s better deferred. Not that you shouldn’t do it, but that it belongs in its own data processing layer. Also this lets you be specific about what’s wrong with a piece of input. Bad JSON? A list where a dictionary was expected? An end timestamp that’s before the start? Sure, check each of these, and in context make invalid state unrepresentable, but invalid state after json.loads is very different from invalid state after validating your timestamps.
Good point, but that's not always desirable. If you have strict type-checking and _aren't_ doing ser/de, it's likely not necessary (eg Rust doesn't do runtime checks).
The situation we found where it's still useful is if your app supports extension-type functionality. The 3rd parties writing extensions would ideally be type-checking during development... but they might not bother. Runtime validation becomes useful at the interfaces.
I needed to reflect Rust enums and went a bit further with that approach. All variants are wrapped in a decorated class, where the decorator automatically computes the union type and adds de/serialization hooks for `cattrs`.
@enumclass
class MyEnum:
class UnitLikeVariant(Variant0Arg): ...
class TupleLikeVariant(Variant2Arg[int, str]): ...
@dataclass
class StructLikeVariant:
foo: float
bar: int
# The following class variable is automatically generated:
#
# type = UnitLikeVariant | TupleLikeVariant | StructLikeVariant
Everyone is offering their suggestions, but no one has posted about marshmallow which handles everything out of the box including serialization and de-serialization. It's the perfect balance of dataclasses, (de)serialization, and lack of useless features and umpteen hacks that libraries like Pydantic and FastAPI have.
That is, you can approximate Rusts's enum (sum type) with pure Python using whatever combination of Literal, Enum, Union and dataclasses. For example (more here[1]):
Pydantic adds de/ser, but if you're not doing that then you can get very far without it. (And even if you are, there are lighter-weight options that play with dataclasses like cattrs, pyserde, dataclasses-json).[1] https://threeofwands.com/algebraic-data-types-in-python/