Most spatial sound authoring systems are closed and do not allow users to develop content for different backend configurations. A multiple audio window system [Cohen93] gives each user a visual, egocentric view on a scene and allows realtime interaction and sound object editing based on direct manipulations and cut & paste metaphors.
In some shared virtual environments (e.g., AlphaWorld) [Waters-Barrus97], users not only explore and meet, they also extend and build the space which they inhabit. Building such a space which is part of a larger system includes sound. The restrictions/constraints in such groupware applications are even tighter, to avoid the social infrastructure becoming damaged through the creation of areas which are inaccessible due to resource load on either the server or client side.
Another important topic for authoring auditory scenes is the temporal relationship between sounds. This is addressed in [Darvishi97], which employed a 2d graph based or textual interface.