We’ve worked on quite a lot of features so far: instantiating and moving units, setting up a camera, lighting up the scene with some FOV and fog of war… there is, however, one thing we haven’t talked about yet: sounds!
Since we need to prepare quite a lot of little things, I’ll split this tutorial in two parts – in this article we’ll focus more on why sounds are paramount in games, how Unity’s sound system works and how you can avoid consuming too much memory with your audio clips. Then in part 2 we’ll actually implement the system and play our audio files during the game.
Sounds in video games
Sounds are crucial for immersion – when you’re faced with a multimedia content like a movie or a video game, they are about as important as images. Over the years, video games have improved in their rendition of sounds: from the 8-bit musics we had in the 70s to the grandiose soundtrack we now have on big titles, we’ve certainly come a long way. As technology advanced, we’ve been able to include sounds that were more and more realistic, and even some spatialization effects (in particular thanks to binaural recordings).
In most video games, you can differentiate between 3 types of sounds.
The background music
Ever since the beginning, video game designers have worked on finding a unique musical identity for their creation – think of Mario’s or Sonic’s themes, or more recently of the amazing soundtrack of The Last of Us series! The musical theme will be there all along the player’s experience and it will most likely have a big influence on it: just like in films, it can increase the emotional impact of an important scene, or it can help with creating an ambiance.
Video game composers can face various challenges. They may have to make coherent soundtracks for series of games like Glen Stafford‘s work at Blizzard (for the World of Warcraft and Starcraft games); or music can be an inherent part of the game as in Crypt of the NecroDancer; or it can be licensed music that is left in the control of the player by providing him/her with radios and CDs to choose from in-game…
Overall, you have to remember that with video games gaining more and more renown, their soundtracks are becoming more and more spectacular, too – but that even a lo-fi or retro 8-bit vibe tune can work, as long as it conveys the right meaning for your game.
The world/ambient sounds
Similarly to the game music, ambient sounds are here to help with the player’s immersion and they are global to the scene; but there are usually a bit more content-specific, meaning that they are somewhat linked to the current state of the game. They are usually not absolutely essential, in the sense that you can disabled them without loosing crucial information, but they are a nice addition to the player experience.
For example, remember the day-and-night cycle we added a couple of tutorials ago: just like in Warcraft 3, we could also play a little rooster crow or a wolf howl whenever we switch between dusk and day, or dawn and night. This sound is not emitted from a specific point in the scene, but it still depends on the current context.
You can also think of even more subtle things: suppose you had workers mining in a quarry somewhere; then it could be a nice touch to play a little mining sound when the camera hovers the quarry position.
So, we have two types of ambient sounds: on the one hand, the “global ambient sounds” that play over the entire map and are triggered by global game events; on the other hand, the “local ambient sounds” that rather depend on the current focus of the camera and simply emphasize some visual cues.
The contextual sounds
Finally, you have sounds that are directly related to what is happening on screen – those depend on the actions the player is taking, the orders he/she’s giving the units, if some spell has been cast… Contrary to the ambient sounds and music, those contextual sounds are often quite short and precise in time. The important thing is to give some feedback to the player, to show him/her that the action has been taken into account, and to further add to the info on screen.
A very famous example in RTS games are units responses when they are selected. If you’ve ever played Warhammer 40K, then you probably yell “For the Emperor!” from times to times, just to feel like you’re a Space Marine, too. For those who are less familiar with those types of games, units responses are short sentences that your characters say when you click on them; buildings have them too, even though of course they’re more a bunch of noises than an actual sentence. The words/noises depend on the unit type. The point of these responses is to acknowledge that the unit was indeed selected and to quickly tell the player which unit type.
Units also often have contextualized responses so that they answer differently if they are awaiting orders, if they are fulfilling a task, if they can’t get to the target point you’re giving them, etc.
Now that we understand the importance of sounds in a video game, let’s see how to actually implement a sound system in our RTS project!
How does Unity sound system work?
In Unity, playing sounds is pretty easy:
- first, you add one (and only one!) listener in your scene: it is done by adding an AudioListener component to an object in your scene that the player controls and uses to “follow the action”… so Unity actually adds one by default on your main camera when you create a new scene!
- then, you turn various objects in your scene (and/or on your global manager objects) into sound emitters to actually produce noises: this time, you add one or more AudioSource components to the object
Each AudioSource will be able to play one AudioClip at a time (you can have some overlaps but you can only start one clip at a time) and the AudioListener gathers all the information to give you a final mix of the entire scene sounds. That’s why you only need one listener: if you had several, it would be like feeding the player’s computer the same sound stream multiple times and basically overflooding the system with the exact same sounds, just repeated – so, not a great idea 😉
Note: by the way, Unity actually warns when you start the runtime if you happen to have more than one AudioListener in the scene.
What’s really interesting with Unity’s sound system is that it can very easily handle 3D – what I mean by that is that you actually don’t hear emitters that are too far away and the closer the emitter, the louder the sound it emits is for the listener. Hence the reason for placing the listener on the camera, or on the player’s avatar if there is one: by having the listener “move with the player”, you create a better immersion with these volume falloffs whenever the source gets further away. There are also lots of effects you can apply to the sound like reverb; and those can even be restricted to a zone in your map so there is a dynamic evolution of the ambiance!
As you can imagine, the 3D simulation is essential for local ambient sounds: if you were to hear all of the miners on your map… it would be unbearable! So to get this “locality”, we simply use a 3D sound emitter for the quarry building and it will automatically hear the sound whenever the camera is close enough from it (or ignore it otherwise).
However, it is also possible to remove any distance falloff and to turn this 3D simulation off. This way, you can hear the background music or the global ambient sounds no matter the current position of the camera in the scene.
At the moment, we have a listener on our main camera – but we’ll actually change this a bit to better handle the local and contextual sounds. And we do need to take care of the emitters! So as we’ll see in part 2 (next week), we are going to have:
- a “ground target”: an empty object that is simply placed at the 3D world point equivalent to the middle of the screen and has the AudioListener component, instead of the camera itself. This will allow us to better control the interaction between the 3D spatialized audio sources and the listener.
- several audio sources:
- on our “GAME” object, we’ll have two emitters – one for the background music and one for the global ambient sounds. Both those sources will be in 2D mode to avoid distance volume fallout and have those sounds play over the entire map regardless of the current camera position
- then, on our building units, we’ll add two audio sources: one for local ambient sounds and one for contextual sounds
- finally, on the character units, we’ll have just the one source for contextual sounds
Optimizing our sounds
The thing is that even if you have a functioning sound system, by default it will not be optimized! Depending on your target device, this may or may not have a huge impact but anyway: it’s always better to reduce the memory/energy/processing consumption of your game 🙂
With a basic setup where you simply import your audio clips, as we start to add more and more sounds to the game, the total audio memory will increase continuously – if you plan on sharing your game for mobiles, this will likely result in a seriously heavier package and a way longer startup time when players launch the game. That’s because, by default, Unity actually “decompresses on load” all the sound clips that are in your project… even if they are not used! This allows you to very rapidly fetch and play sounds whenever you want (i.e. when you tell the game to play a sound, it plays it right this frame), but of course it largely increases the amount of memory devoted to audio.
For example, after adding just a couple of SFX sounds (~0.5 Mb per file) and three small musical themes (each 1 Mb), I already get to more than 20 Mb during runtime… even though I’m actually only playing one musical theme for now!
To get this debug panel, you’ll need to go to the Window > Analysis > Profiler menu; this profiler shows you a lot of crucial information about your game and we will definitely need to take a look at it when we want to optimize other systems in our game.
In Unity, when you import your sound assets, you’ll have various parameters that you can tweak to optimize the audio memory usage:
Here, the main thing we want to change is the “Load Type” parameter. As explained in the docs, it has 3 possibles values: “decompress on load”, “compressed in memory” and “streaming”. They are all various tradeoffs of memory consumption versus loading overhead:
- “decompress on load” loads everything before the game starts, so all sounds are ready to play when you’re actually in the game but it’s pretty heavy on the memory
- “compressed in memory” is an in-between where you prepare your files but don’t completely decompress them
- “streaming” doesn’t preload anything, it fetches (and decompresses) small chunks of the sound files as needed… but that obviously delays a little bit the moment the chunk is actually played
In this example, I’m working on my background musics. I don’t need it to play at a highly accurate time, but it might be a long file, so it would be nice to not load everything upfront. “Streaming” is clearly the way to go:
As you can see, by setting this option we’ve cut down the total audio memory by 10! On the other hand, we also note some new non-zero values in the streaming file and decode memories that show us we are indeed streaming the file.
So – usually, a good rule of thumb is to:
- set the long and rare clips to load in “streaming” mode (or at least “compressed in memory”), especially if you’re not very concerned with them starting at a very exact point in time
- and set the short and frequently used clips to load with the “decompress on load” mode, in particular if the clip has to play exactly when called
By playing around with these settings, you’ll be able to reduce the memory consumption and the loading time of your game!
We’ve already talked about quite a lot of things today! In particular, we’ve listed the various sound sources that we’ll need for our scene and we’ve seen how to avoid overflooding the memory with audio clips. So we’re ready to actually add these emitters!
Next time, we’ll see how to really implement this sound system and have it interact with our other game systems, like the day-and-night cycler we did a couple of tutorials ago.