The Human Visual System (HVS) contains a separate information pathway for processing motion. This pathway is formed of millions of neurons, each sensitive to the motion of local features and giving a bandpass response to particular speeds and directions of motion of these local features.
Watch this short clip of a typical traffic scene. What stands out most in this video?
Immediately, your eye picks out the moving cars and the pedestrians. But more than this, you can instantly tell the difference between these two types of moving object, not just based on their appearance, but based on their motion alone. We can process this video using mathematical models of typical neuron responses and we get the same 'jumping-out' effect. Notice that the motion profile of the vehicles and pedestrians are very different. This is because cars are rigid objects while people deform and change shape in order to move. We can use this difference to allow computers to cheaply and quickly distinguish between different types of object.
We can also use this motion field to detect objects and to track them. By using a logical scheme arbitrary objects (people, cars, or anything else) can be tracked in the case of noise and occlusions. No motion models, no background modelling, kalman filtering, particle filtering, or any other complex solutions are required.