>...because it implies that the neural network has to understand object identity and movement, which requires understanding physics, and what a train or a person is to begin with.
I agree with most of what you said, but disagree with the need to know underlying fundamentals: most 5 year olds can throw and catch a ball without needing to understand calculus or Newton's laws of gravity.
they don't need to know the formulas or mathematical logic behind physics, but 5 year olds do have a working model of physics in their head. And AI wouldn't need to know the actual formulas either, but it would need some working model that owuld allow it to differentiate between those
That's right. It has to understand how objects move, and the boundaries between objects, what's part of a given object and what isn't, as well as how a given object will be affected by light when moving, shadows, etc. This is difficult to learn from data, even if you have a lot of it.
> It has to understand how objects move[1], and the boundaries between objects[2], what's part of a given object and what isn't[3], as well as how a given object will be affected by light[4] when moving, shadows, etc. This is difficult to learn from data, even if you have a lot of it.
Each one of the items you've listed has been severally accomplished by ML to varying levels of success (baseline being "workable", and more progress is being made.) I probably would have found better references had I spent longer than 5 minutes searching
3. Same as 2 :-). AR Occlusion requires both. Additionally many recent phones use ML models to fake depth-of-field; boundary between foreground objects has to be detected
I work in deep learning research. In my opinion, pose estimation with one human centered in a frame is nowhere near the level of difficulty of constructing an accurate model of a complex a scene with multiple people and moving objects of various sizes. For instance, it wouldn't be so hard to detect that there is a train in this video, but to do upscaling correctly, you need a 3D model of this specific train. Yes, there exist deep learning models that try to do 3D reconstruction, and so far, I haven't seen anything that works robustly. Combining this information with an upscaling model is another challenge. It's not as easy as just plugging components together.
I agree with most of what you said, but disagree with the need to know underlying fundamentals: most 5 year olds can throw and catch a ball without needing to understand calculus or Newton's laws of gravity.