>...because it implies that the neural network has to understand object identity...

mnowicki · on Feb 5, 2020

they don't need to know the formulas or mathematical logic behind physics, but 5 year olds do have a working model of physics in their head. And AI wouldn't need to know the actual formulas either, but it would need some working model that owuld allow it to differentiate between those

tachyonbeam · on Feb 5, 2020

That's right. It has to understand how objects move, and the boundaries between objects, what's part of a given object and what isn't, as well as how a given object will be affected by light when moving, shadows, etc. This is difficult to learn from data, even if you have a lot of it.

sangnoir · on Feb 5, 2020

> It has to understand how objects move[1], and the boundaries between objects[2], what's part of a given object and what isn't[3], as well as how a given object will be affected by light[4] when moving, shadows, etc. This is difficult to learn from data, even if you have a lot of it.

Each one of the items you've listed has been severally accomplished by ML to varying levels of success (baseline being "workable", and more progress is being made.) I probably would have found better references had I spent longer than 5 minutes searching

1. https://www.youtube.com/watch?v=AGm3hF_BlYM

2. https://www.theverge.com/2019/12/9/20999646/google-arcore-au...

3. Same as 2 :-). AR Occlusion requires both. Additionally many recent phones use ML models to fake depth-of-field; boundary between foreground objects has to be detected

4. Also see 2; Augmented Reality is hard. bonus: https://www.awn.com/animationworld/how-klaus-uniquely-combin...

tachyonbeam · on Feb 5, 2020

I work in deep learning research. In my opinion, pose estimation with one human centered in a frame is nowhere near the level of difficulty of constructing an accurate model of a complex a scene with multiple people and moving objects of various sizes. For instance, it wouldn't be so hard to detect that there is a train in this video, but to do upscaling correctly, you need a 3D model of this specific train. Yes, there exist deep learning models that try to do 3D reconstruction, and so far, I haven't seen anything that works robustly. Combining this information with an upscaling model is another challenge. It's not as easy as just plugging components together.