Teleconverse
is a sound processing technology providing enhanced
spatial cueing for the audio component of a teleconferencing system.
The potential benefits of this technology for teleconferencing applications
are improved speech intelligibility, increased spatial awareness, and
more effective segregation of simultaneous spoken messages.
The research question we address is, "How useful is spatial sound
cueing for teleconferencing applications?"
The objective of this project is to effectively deploy an audio signal
processing technology that controls the apparent direction and
distance of a virtual sound source positioned at close range
(within 1 meter of the listener's head), and to determine the benefits of
these close range spatial sound cues in typical teleconferencing applications.
By combining a 3D model of the receiver with a 3D model of the source and its
interaction with its local environment, a
very realistic simulation of sound transmission can be achieved. For
example, to simulate the sound of someone whispering in the listener's ear,
both the way in which sound at close range is collected by the listener's ear,
and the way in which sound is emitted from the whisperer's nearby mouth, must
be simulated. Furthermore, the indirect sound arriving from nearby objects,
such as the desktop between source and receiver, provide additional
auditory information for the listener that can serve to externalize the spatial image
of the talker's voice.
The rationale for incorporating such computationally-intensive sound
processing into an audio teleconferencing solution is that the meaningful
variations in the sound field that naturally occur when people confer in
a physically shared space should be made available to users of a system
employing the rendering of virtual acoustical objects and events.
The teleconferencing system should also augment the natural cues typically
available to conference participants by capitalizing upon perceptual
capacities not typically available during a face to face conference,
so that the user of the teleconferencing system can enjoy advantages
that might be difficult to realize in the live situation.
Auditory component of teleconferencing, or audio-only teleconferencing.
The potential user could be the consumer (home use) or the professional
(office and conference use). Virtually all teleconferencing applications,
including headphone and loudspeaker based systems could be targeted.
It is assumed that the teleconferencing system of the future will be built
upon networked general-purpose computers, rather than stand-alone hardware
systems. This assumption is based on the observation that the dominant
distribution system for the digital data streams used in modern teleconferencing
applications will be the internet or some other ubiquitous LAN.
What will make Teleconverse technology so effective in such modern
teleconferencing applications? Here is an example of the sort of naturalistic
features that can be supported using the proposed spatial sound cueing system:
Application Field and Potential User
If one talker chooses to confide in one listener only, then that talker
can select the 'confide' function for that listener. What that listener
hears when the talker's confidential message begins is a speech sound
that suddenly arrives from a position near to the listener's ear.
When speech is delivered at such close range, it is most immediately
noticed by the listener since it originates from within the listener's
personal space. Thus the contextual meaning of the message is automatically
indicated by the apparent location of the speech sound source.
Sound Spatialization