Microsoft released spatial audio for Teams to improve communication and reduce meeting fatigue in audio and video conferences. Spatial audio aims to mimic an in-person conversation by spatially separating the voices of individual meeting participants, which results in a more natural listening experience.
Creating realistic and engaging audio and video experiences that simulate dynamic real-world scenarios comes with challenges. For example, we rely on binaural hearing (that is, we use both ears) to help identify and distinguish the sources of sounds in the physical world. However, most audio and video communication applications today provide monophonic audio where speech signals from different participants are transmitted in a single audio channel, thus stripping away valuable spatial context our minds may be anticipating.
Teams Spatial Audio aligns the perceived audio location of each participant with their video representation to make it easier for users to track who is speaking, to understand better when multiple speakers are speaking at the same time, and to lower meeting fatigue and cognitive load. The following demo showcases the difference spatial audio can make (a stereo headset provides the best experience):
If you’re curious about the science behind the benefits of binaural hearing, dive into the well-known study of the “Cocktail Party Effect”  that outlines the brain’s ability to focus auditory attention on a single speech stimulus while filtering other sources of sound. The TLDR version is that two ears (and separate channels) help the listener process speech significantly more efficiently compared to using a single channel. Subsequent studies have shown that separating participants spatially can reduce the effect of sounds “masking” each other and may ultimately improve listener comprehension and memory . Recent research shows video conferences with participants listening via loudspeakers and attendees listening through headsets prefer different levels of spatial separation with attendees listening through loudspeakers preferring greater “horizonal separation” than users listening through headsets .
Teams Spatial Audio is generally available on desktop applications and can be enabled by going to settings -> Devices to turn on spatial audio. Please note that you will need a stereo-capable device such as wired headsets or stereo-capable laptops. Bluetooth devices are currently not supported due to protocol limitation. Next generation LE Audio with stereo-enabled Bluetooth devices will be supported (check details here for known limitations).
Try it out during your next meeting (or cocktail party) and discover how much easier it is to track who is speaking and to follow the flow of the conversation. While we cannot guarantee it will turn your next meeting into a posh event with fancy beverages and riveting small talk, we think you will find that Spatial Audio feels more natural and engaging.
- Currently we support wired headsets for spatial audio. They can be wired USB headsets or headsets connected to the computer audio jack. Some wireless headsets connected to the computer via USB dongle known to support stereo playback during a call are also supported.
- We also support stereo open speakers (built-in or external speakers)
- Native Bluetooth devices do not support stereo during a call, therefore spatial audio is available. New Bluetooth standard LE Audio capable devices may support stereo in calls. For these devices, spatial audio will be supported.
- When a conference call has more than 100 users, some users who were typically in listening mode will be moved to satellite server. Currently, spatial audio is not supported for users on satellite server. When such users speak, they are typically moved back to central media server, spatial audio may become available. In future releases, all users will be supported for spatial audio.
- Users can turn on music mode while receiving spatial audio. In this case, they will send audio in music mode (32kHz sampling and 128kbps), however, they will not be able to receive music mode when spatial audio is enabled. In order to receive music mode, the user needs to turn off spatial audio. Future releases will support receiving music mode in spatial audio.
Impact to Live Interpretation users:
- In Live Interpretation mode, when spatial audio is turned on, main floor audio and interpreter audio will be heard at the same volume from different directions depending on the video location of the main floor speaker and the interpreter. To go back to traditional main floor audio ducking, simply disable spatial audio. Future releases will enable spatial audio for live interpretation where volume control for main floor and interpreter audio will be available.
References:: Cherry, E. Colin. “Some experiments on the recognition of speech, with one and with two ears.” The Journal of the acoustical society of America 25.5 (1953): 975-979.
: . Litovsky, Ruth Y. “Spatial release from masking.” Acoust. Today 8.2 (2012): 18-25
: Jeremy Hyrkas et al., “Spatialized Audio and Hybrid Video Conferencing: Where Should Voices be Positioned for People in the Room and Remote Headset Users?”. CHI ’23, April 23–28, 2023, Hamburg, Germany