Royal Holloway
Royal Holloway, University of London

Department of Physics

Mark Sugrue Research Page

 
RHUL
Search
Help
     
Physics Department
 
Department Personnel
Research groups and info
For prospective undergrads and postgrads
Department contacts
Up-to-date News and Events
 
Material for current students
College Alumni page
College Business page
Material and events for schools
For internal staff
 

 

 

 

  A History of Motion


Motion as a topic

First, a question. What is 'video' and is it different from a sequence of frames? I would suggest that this question can be understood by beginning with a slide show, where each frame is 100% different from the next (obviously not a video) and progressively reducing the inter-frame difference bit by bit until you have a collection of identical frames.

I would suggest that a collection of frames become a video when there is a high degree of inter-frame redundancy; i.e. when it become advantageous to treat it like a video.

Computer scientists have been dealing with issues of temporal redundancy since the begining of Computer Vision in the 1960's. However, it was not until the late 1970's that these issues began to split off into a coherent and seperate topic of research. One of the key figures in this early work is J. K. Aggarwal of University of Texas at Austin. In 1979, Aggarwal, with N.I. Badler, held a workshop on "Computer Analysis of Time-Varying Imagery". This was followed in 1980 with a Special Issue on Motion and Time-Varying Imagery of PAMI and a book in 1988 called Motion Understanding : Robot and Human Vision.

Any new topic goes through frequent changes of name in its early years. In his 1984 summary of the workshop, Aggarwal begins with run down of the multiple alasias of the field; "time-varying imagery", "image sequence processing" and "dynamic scene analysis", to which we may add more recent terms such as "spatio-temporal video analysis" (or 'spatial temporal') and the illogical "image motion analysis". Two of the key sub-topics, which have remained issues to this day, are the broad, and related, problems of motion segmentation and dealing with object occlusion. Segmentation was defined as

  • "the process of determining features of interest together with distinguishing interesting changes from uninteresting changes, and establishing correspondences between the features and components in one image to those of the succeeding images".

This area since been subdivided into the process of Motion Detection or motion segmentation from static (quasi-static or noisy) backgrounds, and the Tracking or correspondence problem. A frame sequence becomes video when it contains both an interesting moving foreground and an uninteresting background. This foreground should be trackable using frame-to-frame correspondence (ie. the frame rate is high enough). The background should be distinguisable from the foreground based on motion. Using this definition of the task at hand, we can now better describe the nature of 'video' for our purpose.

  • Video is a sequence of 2-dimentional visual samples over time of a real-life event, where the sample rate is sufficiently high for the correspondence problem to be solved. (eg. the Shannon sampling theorm applies to video)

The topic developed slowly through the 1980's and early 1990's and also split into a number of application orintated sub-topics. Included in this are a spectrum of moving camera applications from ego-motion to Pan-Tilt-Zoom CCTV cameras to static CCTV with shake to an assumption of perfect stability. The topic of motion detection and tracking in a quasi-stable camera (the scenario of Visual Surveillance) developed in the 1990's. As late as 1991, Pattern Recognition ran a "Glossary of computer vision terms" which contained only 3 motion related terms out of a total of 308. ("Time varying imagery", "optical flow", and "structure from motion").

Motion processing in the 1980's was dominated by optical flow calculations using spatial feature detection and iterative refining for noise and ambiguity reduction. As Aggarwal saw it in 1988, the primary paradyms at that time were what he called "feature based approaches" and "optical flow based approaches". Both these methods are foreground based (ie, the are frame-by-frame operations using instantenous appearance information only) and differ mainly in terms of scale. By "feature based", Aggarwal means "a set of relatively sparse, but highly discriminatory, two dimentional features" which may be extracted and matched frame to frame (solving the corresponence problem) using constraints such as a rigid body assumption. "Optical flow" relies on a very large number of less discriminatory, usually point features, ie. single pixel values. To counter the extreme difficulties and ambiguities of this approach, a number of constrainst and assumptions are made. The most important and most primal assumption in all visual tracking is the 'object assumption' These assumptions can be turned into a set of equations describing the allowable motion of each point. However, simultaneous solution to these equations is costly and so an iterative approach is usually employed. Optical flow's strenght of simplicity of feature point is also its fundamental weakness. As the features are solely point brightness based, the method is extremely vunerable to lighting change. A shift in the direction or intensity of lighting will cause apparent optical flow even when no true motion is present (Nguyen 1996)

  • ”The optical flow is a velocity field in the image that transforms one image into the next image in a sequence. As such it is not uniquely determined.” , Horn 1993.

    Also during the 1980's a simple alternative to optical flow, Frame Differencing, was being explored. The technique is based on the simple and cheap method of subtracting the current frame from its neighbour, and thresholding to produce a binary motion detection mask. Research focused on innovative and adaptive thresholding methods to counter the great difficulties of noise and partial object segmentation. Other work-arounds for noise problems included taking the difference of thresholded edge maps, as these are more robust to illumination changes, however, this made the task of determining the noise level, and thus calculating the threshold level, more difficult. Applications included change detection for video compression (and is still used to this day in the MPEG4 codex) and visual motion alarms.

    Long & Yang's influencial 1990 paper set out a number of methods involving the computation of a running average of pixel values in order to achive 'Stationary Background Generation'. The paper also addresses the issue of an imperfect background (and mentions the effect later known as the 'transient background problem') and proposes a morphological solution. They tested the method on both indoor and outdoor scenes, however the videos were quite short (54-71 frames) and so it is likely that they didn't experience many problems using the 'mean' as their statistic. Prior to this paper a number of 'frame differencing' approaches were reported, including Lee & Hsieh and Anderson, however it is easy to understand why the seemingly obvious step of extending differencing to true statistical background modelling arose only in the 1990's by noting the comment from the final page of Long & Yang's paper:

    • "complete run [of 54 frames] required approximately six hours on a 3 Mbyte SUN 2/170"
    On a modern computer this process would be completed in under one second.

    References

    Early motion work

    • J. K. Aggarwal, N I Badler eds Abstracts for the workshop on Computer Analysis of Time-Varying Imagery, University of Pensylvania, Moore School of Electrical Engineering, Philadelphia PA, April 1979
    • "The Interpretation of Visual Motion", Cambridge MA MIT Press, 1979
    • “Special issue on motion and time-varying imagery”, IEEE Tran Pattern Anal. Machine Intell. Vol PAMI-2 no 6 nov 1980
    • "Image Sequence Analysis", TS Huang, New York Springer-Verlag, 1981
    • E C Hildreth "Computations underlying the measurement of visual motion" Artificial Intell. Vol 23, pp 309-334, 1984
    • Motion Understanding: Robot and Human Vision, Eds W.N. Martin, J.K Aggarwal, Kluwer, 1988
    • Aggarwal, J. K, Nandhakumar, N. "On the Computation of Motion from Sequences of Images - A Review", Invited Paper, Proc of IEEE, Vol 76, No. 8, 1988
    • Robert M. Haralick, Linda G. Shapiro, "Glossary of computer vision terms", Pattern Recognition, Volume 24 , Issue 1 (1991), Pages: 69 - 93.

    Optical flow

    • BKP Horn, BG Schunck "Determining Optical Flow" Artificial Intelligence, 1981
    • D J Fleet A D Jepson "Velocity extraction without form interpretation" proc 3rd workshop on CV Bellaire MI pp 179-185 Oct 1985
    • D J Heeger "Optical flow from spatiotemporal filters" in proc 1st int conf CV London 1987, pp 181-190 June 1987
    • B. K. P. Horn and B. G. Schunck, "Determining Optical Flow - a Retrospective," Artificial Intelligence 59(1993). pp. 81--87
    • Hoa G. Nguyen, "Obtaining Range from Visual Motion Using Local Image Derivatives", TD 2918 July 1996, Naval Command, USA

    Frame differencing

    • R. Jain and H.-H. Nagel. On the analysis of accumulative difference pictures from image sequences of real world scenes. IEEE Trans. Pattern Analysis and Machine Intelligence, 1:206–214, 1979.
    • R. Jain. Extraction of motion information from peripheral processes. IEEE Trans. Pattern Analysis and Machine Intelligence, 3:489–504, 1981.
    • Y.Z. Hsu, H.H. Nagel, and G. Rekers. New likelihood test methods for change detection in image sequences. Computer Vision, Graphics, and Image Processing, 26:73–106, 1984.
    • C. Anderson, Peter Burt, and G. van der Wal. Change detection and tracking using pyramid transformation techniques. In Proceedings of SPIE - Intelligent Robots and Computer Vision, volume 579, pages 72–78, 1985.
    • Extraction and Matching of Multiple Moving Objects via Frame Differencing HJ Lee, CC Hsieh - Journal of Information Science and Engineering, 1988
    • S.C. Brofferio. An object-background image model for predictive video coding. IEEE Trans. Communications, 37:1391–1394, 1989.
    • I. Dinstein. A new technique for visual motion alarm. Pattern Recognition Letters, 8:347–351, 1989.
    • K. Skifstad and R. Jain. Illumination independent change detection for real world image sequences. Computer Vision, Graphics, and Image Processing, 46:387–399, 1989.
    • P.L. Rosin. Thresholding for change detection. Technical Report ISTR-97-01, Brunel University, 1997.

    Background modelling

    • Long, W., YH Yang, "Stationary Background Generation: An Alternative to the Difference of two images", Pattern Recognition, Vol 23, No. 12, pp1351-1359, 1990
    • I. Haritaoglu, Larry S. Davis, and D. Harwood. W4 who? when? where? what? a real time system for detecing and tracking people. In FGR98, 1998.
    • C. Wren, A. Azarbayejani, T. Darrell, and Alex Pentland. Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(7):780– 785, 1997.
    • D Koller, J Weber, T Huang, J Malik, G Ogasawara, B Rao, and S Russel. Towards robust automatic traffic scene analysis in real-time. In Int. Conf. Pattern Recognition, pages 126--131, Jerusalem, Israel, October 1994.
    • Kentaro Toyama, John Krumm, Barry Brumitt, Brian Meyers, "Wallflower: Principles and Practice of Background Maintenance," iccv, p. 255, Seventh International Conference on Computer Vision (ICCV'99) - Volume 1, 1999.

    The Prehistory of Motion

    All creatures with the ability to see relie on motion aquity to a greater or lesser extent. In humans the visual task for the organism can be described as the determination of a figure-ground relationship. It has been shown that birds have the ability to define and distinguish patterns and objects using only motion information. Although most birds seem to have high visual acuity, hawks, penguins and insectivorous birds are strictly dependent on motion cues for detecting prey at more or less great distances. Hawks can spot their small prey from distances of more than a few hundred meters. Similarly, one is always surprised seeing a bird that sits on a branch of a tree, suddenly flying to a particular point over a lake or pond, pick up an insect, and return. In neither case would a motionless object at such distances have been discriminated from the background. And the fact that a lot of species that are hunted as potential prey, have developed the behavior of 'freezing' as a very successful anti-predator strategy, seems evidence in itself for the importance of motion information in figure-ground discrimination.

    Neurophysiological evidence tends in the same direction: it has long been known that there are cells in the pigeon visual system that respond to relative motion between objects and background (e.g., Frost & Nakayama, 1983). The ability to detect and react to motion was one of the earliest parts of biological vision to evolve. Research suggests that it actually evolved in primative creatures before the ability to recognise objects based on appearance. How is this possible, if the correspondence problem first needs to be solved before motion can be detected?

    Further to this, a great deal of information can be gleamed from the curious symptoms of neurological deseases. Damage to the striate cortex (V1) in humans can lead to a condition known as blindsight, in which the patient is not consciously aware, but can locate a moving object in the affected visual hemifield. This spared ability in the "blind" area is presumably the result of the processing ability of the collothalamic visual system. In addition, the extrastriate cortex is the sight of numerous important visually-responsive areas involved in color (V4), form (IT), and motion perception (MT).

    In Humans the answer lies in the Dorsal region of the Visual Cortex. There lies several million neurons which each stand watch over thier individual sector of the visual field (receptive field). Each neuron behaves as a band pass filter, responding to particular speeds and directions. The filter profile is spatio-temporal.

    Biphasic Gaussian responce Diagram of Neuron

    Work on profiling the responce of visual neurons, started by Hubel and Wiesel in the 1960's, has been carried out by Young and others over the past 25 years. Initially, only the spatial responces of feature detector neurons were studies, but recent work has moved onto decoding the responce of motion detector neurons also.

    • "Just as local measurement of wavelength provides ambiguous information about object color, so local measurement of motion in early visual areas provides ambiguous information about object motion." - Farah, 2000
    As Farah points out, the solution to both is the incorporation of global image information. It is a reformulation of the classic aperture problem

    Young's publications

    • Young, R. A. (1978). Orthogonal basis functions for form vision derived from eigenvector analysis, in: ARVO Abstracts, p. 22. Sarasota, FL, Association for Research in Vision and Ophthalmology, Abstract.
    • Young, R. A. (1985a). The Gaussian derivative model for machine and biological image processing, in: Proceedings of the Conference of the Society of Photographic Scientists and Engineers, pp. 64–70. Spring? eld, VA. SPSE.
    • Young, R. A. (1985b). Gaussian derivative model for machine vision, J. Opt. Soc. Amer. A 2, 39. Abstract.
    • Young, R. A. (1985c). Gaussian derivative model of spatial vision, J. Opt. Soc. Amer. A 2, 102. Abstract.
    • Young, R. A. (1985d). The Gaussian derivative theory of spatial vision: Analysis of cortical cell receptive ? eld line-weighting pro? les. Technical Report GMR-4920, General Motors Research Laboratories, Computer Science Dept.,Warren, MI 48090.
    • Young, R. A. (1986a). The Gaussian derivative model for machine vision: Visual cortex simulation. Technical Report GMR-5323, General Motors Research Laboratories,Warren, MI 48090-9055.
    • Young, R. A. (1986b). Simulation of human retinal function with the Gaussian derivative model, in: Proceedings of the IEEE Computer Society Conference on Computer Vision and PatternRecognition, pp. 564–569, Miami, FL.
    • Young, R. A. (1987). The Gaussian derivativemodel for spatial vision: I. Retinal mechanisms, Spatial Vision 2 (4), 273– 293.
    • Young, R. A. (1989). Quantitative tests of cortical visual receptive ? eld models, in: Abstract, OSA Meetings, Orlando, Florida. Optical Society of America.
    • Young, R. A. (1991). Oh say can you see? The physiology of vision, in: Human Vision, Visual Processing, and Digital Display II, Rogowitz, B. E., Brill, M. H. and Allebach, J. P. (Eds), number 1453 in Proc. SPIE, pp. 92–123, San Jose, California. SPIE—The International Society for Optical Engineering. Invited address.
    • Young, R. A. and Lesperance, R. M. (1993). A physiological model of motion analysis for machine vision, in: Human Vision, Visual Processing, and Digital Display, Rogowitz, B. E. and Allebach, J. P. (Eds), number 1913 in Proc. SPIE, pp. 48–123, San Jose, CA. SPIE—The International Society for Optical Engineering. Invited address.
    • Young, R. A; Lesperance, R. M., and Meyer, W. W. 2001 The Gaussian Derivative model for spatialtemporal vision: I. Cortical Model. Spatial Vision, 14(3,4):261-319. (2001)
    • Young, R. A; Lesperance, R. M., and Meyer, W. W. 2001 The Gaussian Derivative model for spatialtemporal vision: II. Cortical data. Spatial Vision, 14(3,4):321-389. (2001)

    Other References

    • Frost, B. J., & Nakayama, K. (1983). Single visual neurons code opposing motion independent of direction. Science, 220, 744-745.
    • Hubel, DH, Wiesel, TN, J. Neurophysiol, 26, 994-1002 (1963)

    Books

    • M J Farah, The Cognitive Neuroscience of Vision, Blackwell, 2000
    • Robert G. Cook, Avian Visual Cognition, Comparative Cognition Press, 2001

    Motion Detection in Computers

    There are two primary and opposite approaches to the motion detection problem which are commonly used in literature. The statistical background modelling approach (Background approach), and the wide class of model based feature and object tracking techniques (Foreground approach).

    Background Modelling