Abstract:

William L. Martens. Chat Space with Spatial Audio, MMC Video on Demand System, The University of Aizu Multimedia Center, July, 2000.

Teleconverse is a sound processing technology providing enhanced spatial cueing for the audio component of a teleconferencing system. The potential benefits of this technology for teleconferencing applications are improved speech intelligibility, increased spatial awareness, and more effective segregation of simultaneous spoken messages. The research question we address is, "How useful is spatial sound cueing for teleconferencing applications?"

The objective of this project is to effectively deploy an audio signal processing technology that controls the apparent direction and distance of a virtual sound source positioned at close range (within 1 meter of the listener's head), and to determine the benefits of these close range spatial sound cues in typical teleconferencing applications.

By combining a 3D model of the receiver with a 3D model of the source and its interaction with its local environment, a very realistic simulation of sound transmission can be achieved. For example, to simulate the sound of someone whispering in the listener's ear, both the way in which sound at close range is collected by the listener's ear, and the way in which sound is emitted from the whisperer's nearby mouth, must be simulated. Furthermore, the indirect sound arriving from nearby objects, such as the desktop between source and receiver, provide additional auditory information for the listener that can serve to externalize the spatial image of the talker's voice.

The rationale for incorporating such computationally-intensive sound processing into an audio teleconferencing solution is that the meaningful variations in the sound field that naturally occur when people confer in a physically shared space should be made available to users of a system employing the rendering of virtual acoustical objects and events. The teleconferencing system should also augment the natural cues typically available to conference participants by capitalizing upon perceptual capacities not typically available during a face to face conference, so that the user of the teleconferencing system can enjoy advantages that might be difficult to realize in the live situation.

Application Field and Potential User

Auditory component of teleconferencing, or audio-only teleconferencing.

The potential user could be the consumer (home use) or the professional (office and conference use). Virtually all teleconferencing applications, including headphone and loudspeaker based systems could be targeted. It is assumed that the teleconferencing system of the future will be built upon networked general-purpose computers, rather than stand-alone hardware systems. This assumption is based on the observation that the dominant distribution system for the digital data streams used in modern teleconferencing applications will be the internet or some other ubiquitous LAN.

What will make Teleconverse technology so effective in such modern teleconferencing applications? Here is an example of the sort of naturalistic features that can be supported using the proposed spatial sound cueing system:

If one talker chooses to confide in one listener only, then that talker can select the 'confide' function for that listener. What that listener hears when the talker's confidential message begins is a speech sound that suddenly arrives from a position near to the listener's ear. When speech is delivered at such close range, it is most immediately noticed by the listener since it originates from within the listener's personal space. Thus the contextual meaning of the message is automatically indicated by the apparent location of the speech sound source.

Keywords:

Sound Spatialization

[Japanese MPG] (47 MB) [English MPG] (47 MB) [PSFC QtVR] [Anechoic Chamber QtVR]