YouTube Spatial Audio Inverse Filter

It’s been a little while since my last Ambisonics on YouTube post, so I thought I’d share a filter I’ve made to help make YouTube Ambisonics content sound better!  As you may have noticed, the audio that comes off YouTube once your spatial, Ambisonic, audio is uploaded is quite coloured compared to the original.  This is due to the Head Related Transfer Functions used in the modelling of the system.  If the HRTFs exactly model your own hearing system, then we wouldn’t notice it, but as they won’t, you will!

In order to equalise the system, the same EQ curve just needs applying to all the Ambisonic channels equally before uploading to YouTube.  So, first, we need to find the average response of the system.  There could be a few methods for this, but the simple approach is to pan an impulse around the listener, storing the frequency response each time.  Sum these together, and average them in some way (I used an RMS type approach).  This is then an ‘average’ response of the system.  I then invert this system (adding delay as it’s non-minimum phase) and then decompose the filter into it’s minimum phase only response for the EQ (as that’s all we’re really interested in).

To use this in your Reaper project, load the filter into ReaVerb (as an impulse file) and apply it to all four channels of your ambiX B-format before adding it to your video and uploading to YouTube.  As long as you do the same thing to each of the four B-format channels, it won’t affect the spatial aspects of the recording, just the frequency response.  A plot below shows the frequency response of YouTube and the generated inverse filter.  Note that I’ve used my measured YouTube filters for this, so there was no data above 16kHz.  To make this not make the inversion go a little crazy, I’ve assumed it’s flat after this point (YouTube filters out after this point anyway!)

YouTube Frequency Response vs Inverse Filter

You can download the filter, if you’d like to try it, below.

Inverse Filter for YouTube 1st Order Spatial Audio

EDIT : Below is an inverse filter I’ve calculated from the Google Thrive Impulse Responses, rather than my measured IRs.  It’s very similar, frequency response wise, but should be a slightly better match with the algorithm YouTube is using.  The previous also had my DAC and ADC in the chain!

Inverse Filter for YouTube Spatial Audio Created Directly from Google Thrive IRs

The differences between the two inverse filters can be seen below (again, I’m leveling off the response to account for YouTube rolling off after around 16kHz).

Measured IRs vs Thrive IRs Inverse Filters

For completeness, below are the inverse filters calculated right up to around 21.8kHz (where the anti-aliasing filters kick in in the Google Thrive data).  I can’t tell any difference, but I also can’t hear much above 16kHz, so I wouldn’t, would I!)

Inverse Filter for YouTube’s Spatial Audio using Google Thrive IRs up to 21.8kHz

Below is a graph of how the higher frequencies are boosted compared to the others:

Inverse Filter up to 21.8kHz

A demo of it working is below:

Continue reading

Sounds in Space – Audio for Virtual Reality Animations

I’ve had a few people ask for me to share the animations from my Surround Audio for VR presentation that I delivered at Sounds in Space this year.  I’ve made a video of the powerpoint (30 seconds per slide) so everything can be viewed in context (note there’s no audio, though!).  If you weren’t at the event, it goes through both the graphics and audio processing needed to create VR content and shows the limitations, with respect to the inter-aural level (ILD) and time (ITD) differences reproduced by the Ambisonics to Binaural process at varying orders.  8th order Ambisonics does a great job reproducing both the ILD and ITD up to 4kHz.


Continue reading

YouTube Binaural Reaper Project

So, here’s an example (but empty) Reaper project that contains the YouTube binaural filters I measured.  You’ll need to use your preferred Ambisonics plug-ins of choice, and I’m assuming FuMa channel ordering etc.. they’ll be remapped by a plug-in.

There is a bundle of JS effects in the folder too, that you’ll need to install (instructions at : which allow for:

  • Ambisonic Format Remapping (FuMa -> ambiX)
  • Ambisonic Field Rotation
  • Multi-channel Meter

YouTube have now released the official ones they use (but in individual speaker format….not the most efficient way of doing it!), so it’ll be interesting to compare!

As described in a previous post, the ReaVerb plug-in is filtering W, X, Y and Z with a pair of HRTFs which are then simply summed to create the Left and Right feeds.

YouTube Binaural Project Template


Continue reading

YouTube 360 VR Ambisonics Teardown!

UPDATE : 4th May 2016 – I’ve added a video using the measured filters. This will be useful for auditioning the mixes before uploading them to YouTube.

So, I’ve been experimenting with YouTube’s Ambisonic to Binaural VR videos.  They work, sound spacious and head tracking also functions (although there seems to be some lag, compared to the video – at least on my Sony Z3), but I thought I’d have a dig around and test how they’re implementing it to see what compromises they’ve made for mobile devices (as the localisation could be sharper…)

Cut to the chase – YouTube are using short, anechoic Head Related Transfer Functions that also assume that the head is symmetrical.  Doing this means you can boil down the Ambisonics to Binaural algorithm to just four short Finite Impulse Response Filters that need convolving in real-time with the B-Format channels (W, X, Y & Z in Furse Malham/SoundField notation – I know YouTube uses ambiX, but I’m sticking with this for now!).  These optimisations are likely needed to make the algorithm work on more mobile phones.

So, how do I know this?   I put a test signal (log sine wave sweep) on each of the B-Format channels and then I recorded back the stereo signals allowing me to measure a left and right response for each of the four channels individually.  I carried this out when the phone was both facing front and at +90 degree to check the rotation algorithm was working.  Below are the Head Related Impulse Responses (HRIRs) I got back (click for higher res) – these will also contain any filtering etc. from my phone and computer, but have turned out pretty well considering I had to hold the phone still in the correct position!

You Tube HRIRs 0 degrees You Tube HRIRs +90 degrees

The fact that the left and right HRIRs are identical (or polarity inverted) show that they’ve used the symmetrical head assumption and the X and Y channels swapping between facing 0 and 90 degrees shows the rotation being carried out on the Ambisonic channel signals.  Once you’ve got these HRIRs, generating the Left and Right headphones signals in the phone is (where the multiply with a circle indicates convolution).  However, this is likely to be carried out in the frequency domain where it’s more efficient.

L=W\otimes W_{hrir}+X\otimes X_{hrir}+Y\otimes Y_{hrir}+Z\otimes Z_{hrir}R=W\otimes W_{hrir}+X\otimes X_{hrir}-Y\otimes Y_{hrir}+Z\otimes Z_{hrir}

Below is also the frequency response of the four Ambisonic HRTFs where you can see YouTube cutting off the response at around 16.4kHz (again, click for higher resolution).

WXYZ Frequency Response Plot

Once the W, X, Y & Z filters were obtained, although I sent log sine sweeps every 12 seconds (for a 10 second sweep), due to the fact I was recording in the analogue domain (no clock sync between phone and computer) some clock differences have caused the filters to be slightly mis-aligned.  This can be most easily seen if a source is simulated at 90 degrees with respect to the head of the listener, as this should exhibit the greatest amplitude difference between the ears once the Ambisonics to Binaural algorithm is implemented.  This was achieved when the an extra 2 samples (2/48000 of a second) delay was added to each 12 second ‘jump’.  The frequency plots for a front facing head and a source at +90 degrees, and a +90 degrees facing head and a source at 0 degrees are shown below (so in the first plot, the left ear should be loudest, and in the 2nd plot the right ear should be loudest).  I’ve also trimmed the HRIRs to 512 samples and windowed the responses using a hanning window.  It is likely that the actual clock difference isn’t a multiple of 1 sample, so this method isn’t quite ideal!

ILD Head 0 and Source +90 degrees ILD Head +90 and Source 0 degrees

These plots should be identical, but measurement inconsistencies can be noted between the far ear responses to the source (as these are lower level, they’re more prone to error), and by inspection it seems like the head at +90 degrees is a slightly better capture at this point.  Also, remember that the response above 16.4kHz is not worth worring about as YouTube filters out frequencies above this value.

More to follow…

Ambisonics To Stereo

An aside on stereo.  I’ve not put any plots up yet, but it seems like the Ambisonics to Stereo algorithm used (on non-android YouTube) is simply:


Google should really look into using UHJ for their Ambisonics to Stereo conversion…for an example of the difference, listen to the audio on these two videos.  The first one is Ambisonics to UHJ, the second will be YouTube’s Ambisonics->Stereo algorithm detailed above (to carry out this test DO NOT use the Android YouTube app!)

W/Y Stereo :

How do they Sound?

Here’s video using the extracted filters in Reaper in order to convert the 1st order Ambisonic audio to binaural.

Wider Reading:

Wiggins, B. Paterson-Stephens, I., Schillebeeckx, P. (2001) The analysis of multi-channel sound reproduction algorithms using HRTF data. 19th International AES Surround Sound Convention, Germany, p. 111-123.

Wiggins, B. (2004) An Investigation into the Real-time Manipulation and Control of Three-dimensional Sound Fields. PhD thesis, University of Derby, Derby, UK. p. 103

McKeag, A., McGrath, D. (1996) Sound Field Format to Binaural Decoder with Head-Tracking. 6th Austrailian Regional Convention of the AES, Melbourne, Austrailia. 10 – 12 September. Preprint 4302.

McKeag, A., McGrath, D.S. (1997) Using Auralisation Techniques to Render 5.1 Surround To Binaural and Playback102nd AES Convention in Munich, Germany, 22 – 25 March. preprint 4458

Noisternig, M. et al. (2003) A 3D Ambisonic Based Binaural Sound Reproduction SystemProceedings of the 24th International Conference on Multichannel Audio, Banff, Canada.

Leitner et al (2000) Multi-Channel Sound Reproduction system for Binaural signals – The Ambisonic ApproachProceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-00., Verona, Italy, December, p. 277 – 280.

Continue reading

Multi-channel VU Meter JS Effect for Reaper

It’s always bugged me that the VU meters in Reaper are so small, which is particularly a problem if you’re working with large amounts of channels (which, when using Higher Order Ambisonics, is common!).  So, I’ve knocked up a flexible multi-channel meter than can be made as big as you like so it should be useful for testing and monitoring when setting up etc..

The scaling is flexible (you can specify the minimum dB value to show) and so is the time window used for both the meter and the peak hold (which is individually held per channel).  I’ve commented the code so if you don’t like the colour scheme etc. it should be a doddle for you to alter it yourself!  The file can be downloaded below:

WigWare Multi-Channel VU Meter

Instructions on how to install a JS effect in Reaper can be found at :

Note : I know this isn’t really a VU meter, it’s a peak meter.  However, when ever anyone wants to search for one, they search for a VU meter!

WigMCVUMeter Animation


Continue reading

YouTube, Ambisonics and VR


So, last week Google enabled head (phone!) tracked positional audio on 360 degree videos.  Ambisonics is now one of the defacto standards for VR audio.  This is a big moment!  I’ve been playing a little with some of the command line tools needed to get this to work, and also with using Google PhotoSphere pics as the video as, currently, I don’t have access to a proper 360 degree camera.  You’ll end up with something like this:

So first, the details.

Make an Ambisonic Mix

Google are using Ambisonics as the way of storing and steering the audio.  Ambisonics is a useful format for VR/gaming as, once encoded, it can still be manipulated/rotated without having to have access to all the audio sources/tracks individually.  This is great for applications that need to steer the audio render in real time.  So, the Ambisonic signals can then be decoded to loudspeakers or, in this case, head related transfer functions (HRTFs) instead, resulting in signals ready for headphones.  Now, instead of rotating the speakers (HRTFs) when the head moves, the Ambisonic scene is rotated instead, which is a much simpler transform.  One early paper on this technique is McKeag and McGrath (1996) and I also looked at it in my PhD report too (page 103 of Wiggins 2004).  YouTube currently utilises 1st Order Ambisonics (4 channels) and uses the ACN channel sequencing and SN3D normalisation scheme.  WARNING: THIS ISN’T WHAT ANY MICs or MOST PLUG-INS OUTPUT!  However, it is a more extendable format (channel sequence wise).  A simple, but necessary channel remapping and gain change MUST be done before saving the 4 Channel B-Format Bus.  I’ve made a JS (Reaper) plug-in that does this.  It can be downloaded below (it’ll work up to 3rd Order, 16 channels, but YouTube only accepts 1st order…for now 😉  As always, other people’s tools are also available to do this, but as a learning exercise (so I can teach it at Derby!) I’ve made my own. (How to install a JS Effect)

WigAmbiRemap JS Effect

So, at this point, let’s pretend you’ve made/recorded an Ambisonic mix (perhaps after watching these videos).  You then need to put the plug-in above on the Total B-Format Bus track (or one that is routed from it, ideally, so it doesn’t mess up your speaker decode!).  You can then set the plug-in as shown below and it’ll turn your FurseMalham/SoundField B-Format into AmbiX B-Format.


This should now be rendered/saved as a 4-channel WAVE file (again, see the videos on the teaching page if you’re not sure how to do this.  I’ll make a new video covering all of this soon!)

Generate a 360 Photo

Ok, so what about a stereoscopic, 360 degree video?  Well, I don’t have a camera for these, so instead, I’ve used a jpeg made by the excellent Google Camera app on Android (a PhotoSphere).  Note, that just as this feature has become really useful, Google has taken Google Camera off the Play Store!!!!  It can still be found on APK mirror sites, though (see here, for example, but my phone is stuck on V2.5.052 – I think V3+ is only for Nexus devices, hence why I can no longer see it on the play store).

So, let’s assume we now have the following:

  1. An ambiX 4 channel B-Format Wave File (ambiX.wav)
  2. A high resolution photosphere jpg (PS.jpg)

The PhotoSphere jpeg is an equirectangular mapped 360 degree image.  This is handy, as it’s the exact format that YouTube wants.  However, for it to work, you may need to reduce the resolution of the image so that it’s a maximum of 3840 x 2160 (my images are higher res than that, but YouTube wouldn’t process them!)  Details on video formats/tips can be found at this google help page.

Now, you’re gonna need a few more tools.

  1. FFMPEG – we need this to glue our video (well, picture!) to our audio
  2. Python
  3. The Google Spatial Media Python tools to then tag this video as being spherical AND Ambisonic.
  4. Note – I used homebrew to install both Python and FFMPEG with a load of libraries enabled on my Mac

Make the Video

So, now we’ve got our jpeg (say PS.jpg) and our 4 channel Wave file (say ambiX.wav) in the same directory.  If I want to glue the jpg, as a video, and the wave file together in a .MOV container (the best combination I’ve found as yet!), then type this, replacing the filenames with yours (from the command line/terminal):

ffmpeg -loop 1 -i PS.jpg -i ambiX.wav -map 1:a -map 0:v -c:a copy -channel_layout 4.0 -c:v libx264 -b:v 40000k -bufsize 40000k -shortest

NOTE: the above used to say … -channel_layout quad … however, that’s not correct and is no longer accepted by YouTube.  Thanks Dillon for the tip…

This will (in order).  Loop the jpg input and use the wave input mapping the video from the 1st and audio from the other in a quad audio video using the libx264 library, setting the bitrate and how often to calculate the average bitrate, and specifying the length of the video as the shortest of the two  (the jpg loops for ever!)  Phew!

Tag the Video

This video now needs tagging as a spherical/Ambisonic video.  Again, from the command line/terminal, start the google spatial tools:

cd <location of your spatial media tools folder>

Then, select the options as shown below and save the new video file (it’ll automatically add _injected to your file name) as a separate file:

Google Spatial Media Tools GUI

Now, you should have a video file called  Your video is now ready to be uploaded to YouTube.  Note that the audio won’t react to head movement straight away – it can take several hours for the Google servers to process the audio, and it also may look lower res, than it’ll eventually become, while is processes the video too!

Happy VR’ing!


Special thanks to Dillon Cower and Paul Roberts over at Google and especially Albert Leusink and the Spatial Audio in VR facebook group for tips and info on, in particular, FFMPEG usage and video container formats.


Continue reading

, , , , ,

64-bit WigWare Ambisonics Plugins Now Available.

I’ve recompiled all the plug-ins so there are now 64-bit versions of WigWare for those who now use 64-bit hosts.  All audio processing is un-changed.  Several issues with the Mac graphical user interface occurred when switching to 64-bit (nothing had to change on Windows!) which I had to fix, so please let me know if there are any issues!  Downloads are on the WigWare page, or below :

Below are 2nd and 3rd order horizontal decoders for Mac (for Dan)

2nd and 3rd Order Mac Decoders

Continue reading

Mac VST Fix for Mavericks (and Yosemite?)

I’ve just realised that the plug-ins on the site weren’t the versions that I have fixed for Mavericks.  I had fixed them almost as soon as Mavericks was released so my students could continue using them, so apologies for not sharing!  I’ve replaced all the Mac versions on the WigWare page with the updated graphical versions (the DSP code worked fine, it was the gfx that had issues).

If anyone has any problems with these, please let me know!

Continue reading

Sounds in Space 2014 – Video, Pics and Feedback

Sounds in Space happened on 30th June this year and was an excellent day (the programme and details of the day can be found here).  There are always things which could be done better, and hopefully we’ve got all these noted ready for next year (fingers crossed).  If you weren’t able to make the event, then the details below may give you a glimpse of what you missed and whether you’d like to come next time!

The 27 speaker 3D surround sound setup was the best we’ve ever had, made possible with the help of recent graduates from Sound, Light and Live Event Technology and Richard and Mark, technicians in Electronics and Sound.  Alex Wardle (graduate from Music Tech and Production) also created a video of the event which can be viewed at: 

Simon Lewis, Creative Technologies Research Group member took a few pics of the day which you can find at the bottom of this post.

Survey Results

We had nearly 40 people attend the event, and I’ve managed to get 25 of them to fill in a questionnaire!

Scale is 1 – very dissatisfied 3 – neutral 5 – very satisfied.

SinS Satisfaction 2014

Please tell us what you like best about this conference?

Many mentioned the inclusion of the live demonstrations which were a deliberate feature of this event.  Many commenting of the relaxed and welcoming atmosphere and the range, depth and quality of the content for the event (largely delivered by CTRG members).  Having the excellent BBC keynote by Chris Pike (BBC R & D) was also mentioned.

Please tell us what you’d change to improve this conference in the future?

The main issues raised here (if any!) were that time keeping slipped right at the very end of the conference (it had been pretty much ‘to the minute’ throughout the day), which meant a few missed more opportunity to network at the end, and that a break in the afternoon session would have been welcomed and that softer chairs would have been nice!

SinS Come Again 2014

One person answered ‘no’ to the question above……you can’t please everyone 😉

What was your favourite talk/topic today?

Answers to this question were more often things like ‘I enjoyed them all’ and ‘I couldn’t choose a favourite as each had such a unique topic to offer’ but notable mentions were given for:

  • Chris Pike’s (BBC) Keynote and demo.
  • Duncan Werner’s (Derby) GASP talk and demo.
  • Iain McKenzie’s (Derby) bone conduction headset talk.
  • John Crossley’s (Derby) TiMax talk and demo.
  • Rob Lawrence’s ( talk and demo.
  • Adam Hill’s (Derby) low frequency localisation work.

Overall a great day with some great feedback.  Let’s hope the 5th Sounds in Space leads on to the 6th…

Continue reading

Rosetta Surround Performance Binaural Stream – 7.30pm 7th June

The surround sound Rosetta performance by Sigma 7 (at Derby Theatre, 7.30pm 7th June) will be streamed live with Binaural Audio (wear headphones for 3D audio) at .  Multi-channel Videos will also be available after the show, and they’ll also be a Sound on Sound article about the event in the future too!  If you can’t make the event, the stream will be the next best thing!


Sigma 7 Performance at Derby Theatre in 2012

Continue reading

prev posts
%d bloggers like this: