| Qualitative Scene Descriptions from Images for Integrated Speech and Image Understanding (1999) | |||||||||||||||
|
|||||||||||||||
Abstract | |||||||||||||||
| Human-computer interaction using means of communication which are natural to humans, like spoken instructions or gestures, has always been a challenging task. In this thesis, we address the subproblem of fusing the understanding of spoken instructions with the visual perception of the environment. We describe the design and implementation of a high-level computer vision component for the integrated speech and image understanding system QUASI-ACE. QUASI-ACE is a prototype of a `situated artificial communicator', a system which aims to interact with humans in a natural way given a specific scenario or situation. A toy assembly scenario is our domain. The system QUASI-ACE is able to identify objects intended in spoken instructions given by a human instructor, based on results from the image understanding component which visually observes the scene. The high-level image understanding is accomplished by first reconstructing the 3D scene from uncalibrated stereo images. We use a model-based ... | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||