So, last week Google enabled head (phone!) tracked positional audio on 360 degree videos. Ambisonics is now one of the defacto standards for VR audio. This is a big moment! I’ve been playing a little with some of the command line tools needed to get this to work, and also with using Google PhotoSphere pics as the video as, currently, I don’t have access to a proper 360 degree camera. You’ll end up with something like this:
So first, the details.
Make an Ambisonic Mix
Google are using Ambisonics as the way of storing and steering the audio. Ambisonics is a useful format for VR/gaming as, once encoded, it can still be manipulated/rotated without having to have access to all the audio sources/tracks individually. This is great for applications that need to steer the audio render in real time. So, the Ambisonic signals can then be decoded to loudspeakers or, in this case, head related transfer functions (HRTFs) instead, resulting in signals ready for headphones. Now, instead of rotating the speakers (HRTFs) when the head moves, the Ambisonic scene is rotated instead, which is a much simpler transform. One early paper on this technique is McKeag and McGrath (1996) and I also looked at it in my PhD report too (page 103 of Wiggins 2004). YouTube currently utilises 1st Order Ambisonics (4 channels) and uses the ACN channel sequencing and SN3D normalisation scheme. WARNING: THIS ISN’T WHAT ANY MICs or MOST PLUG-INS OUTPUT! However, it is a more extendable format (channel sequence wise). A simple, but necessary channel remapping and gain change MUST be done before saving the 4 Channel B-Format Bus. I’ve made a JS (Reaper) plug-in that does this. It can be downloaded below (it’ll work up to 3rd Order, 16 channels, but YouTube only accepts 1st order…for now 😉 As always, other people’s tools are also available to do this, but as a learning exercise (so I can teach it at Derby!) I’ve made my own. (How to install a JS Effect)
So, at this point, let’s pretend you’ve made/recorded an Ambisonic mix (perhaps after watching these videos). You then need to put the plug-in above on the Total B-Format Bus track (or one that is routed from it, ideally, so it doesn’t mess up your speaker decode!). You can then set the plug-in as shown below and it’ll turn your FurseMalham/SoundField B-Format into AmbiX B-Format.
This should now be rendered/saved as a 4-channel WAVE file (again, see the videos on the teaching page if you’re not sure how to do this. I’ll make a new video covering all of this soon!)
Generate a 360 Photo
Ok, so what about a stereoscopic, 360 degree video? Well, I don’t have a camera for these, so instead, I’ve used a jpeg made by the excellent Google Camera app on Android (a PhotoSphere). Note, that just as this feature has become really useful, Google has taken Google Camera off the Play Store!!!! It can still be found on APK mirror sites, though (see here, for example, but my phone is stuck on V2.5.052 – I think V3+ is only for Nexus devices, hence why I can no longer see it on the play store).
So, let’s assume we now have the following:
- An ambiX 4 channel B-Format Wave File (ambiX.wav)
- A high resolution photosphere jpg (PS.jpg)
The PhotoSphere jpeg is an equirectangular mapped 360 degree image. This is handy, as it’s the exact format that YouTube wants. However, for it to work, you may need to reduce the resolution of the image so that it’s a maximum of 3840 x 2160 (my images are higher res than that, but YouTube wouldn’t process them!) Details on video formats/tips can be found at this google help page.
Now, you’re gonna need a few more tools.
- FFMPEG – we need this to glue our video (well, picture!) to our audio
- The Google Spatial Media Python tools to then tag this video as being spherical AND Ambisonic.
- Note – I used homebrew to install both Python and FFMPEG with a load of libraries enabled on my Mac
Make the Video
So, now we’ve got our jpeg (say PS.jpg) and our 4 channel Wave file (say ambiX.wav) in the same directory. If I want to glue the jpg, as a video, and the wave file together in a .MOV container (the best combination I’ve found as yet!), then type this, replacing the filenames with yours (from the command line/terminal):
ffmpeg -loop 1 -i PS.jpg -i ambiX.wav -map 1:a -map 0:v -c:a copy -channel_layout 4.0 -c:v libx264 -b:v 40000k -bufsize 40000k -shortest PSandAmbiX.mov
NOTE: the above used to say … -channel_layout quad … however, that’s not correct and is no longer accepted by YouTube. Thanks Dillon for the tip…
This will (in order). Loop the jpg input and use the wave input mapping the video from the 1st and audio from the other in a quad audio video using the libx264 library, setting the bitrate and how often to calculate the average bitrate, and specifying the length of the video as the shortest of the two (the jpg loops for ever!) Phew!
Tag the Video
This video now needs tagging as a spherical/Ambisonic video. Again, from the command line/terminal, start the google spatial tools:
cd <location of your spatial media tools folder> python gui.py
Then, select the options as shown below and save the new video file (it’ll automatically add _injected to your file name) as a separate file:
Now, you should have a video file called PSandAmbiX_injected.mov. Your video is now ready to be uploaded to YouTube. Note that the audio won’t react to head movement straight away – it can take several hours for the Google servers to process the audio, and it also may look lower res, than it’ll eventually become, while is processes the video too!
Special thanks to Dillon Cower and Paul Roberts over at Google and especially Albert Leusink and the Spatial Audio in VR facebook group for tips and info on, in particular, FFMPEG usage and video container formats.
- McKeag, A. and McGrath, D. (1996). “Sound field format to binaural decoder with head tracking”, in Preprint 4302, AES Convention
- Wiggins, B. (2004) “An Investigation into the Real-time Manipulation and Control of Three-dimensional Sound Fields.” PhD thesis, University of Derby, Derby, UK.
- Google Spatial Media Tools
- YouTube Upload 360-Video Help
- APK Mirror