Video containers (or the stream itself) should really have a "center of frame" coordinate for each frame, and let playback devices crop based on that center. Maybe this already exists in some standards and im unaware.
For digital files, ideally, the file only contains active pixels so that the file is the frame size of the image. However, for video formats with a clearly defined standard (TV/Broadcast/DVD/Blu-ray/etc), the video frame size must be what is defined in the spec. This is why the mattes exist to fill in the space to make the content fit the space required.
If some people want to crop video to fit different screen aspect ratios, it would be nice for the frame to advertise its "center" so the playback device can apply the appropriate cropping and matting.
Maybe I have my player set to "3:2 full screen, up to 30% crop.". It's not something I would do personally, but it would be better than pan and scan being something implemented in the encode itself.