Here’s a quick blog post about machine learning integration in game, specifically dealing with system architecture, infrastructure, data collection, and working with Unity.
First up, a quick distinction as to what I mean by machine learning integration. Machine learning engineers predominantly work with Python and focus on running experiments to create models that can be used to control agents within the game. Meanwhile, the game team works within their game engine of choice using the programming language of choice.
Thus, machine learning integration sits in between these two groups. They serve a similar role to a tools programmer in that their role is to provide ML engineers with a suitable interface to the game so that they can train agents in an appropriate environment, while also providing game teams with an interface that allows them to spawn and run bots running ML models.
While much of this bog post refers to an implementation that is specific to Unity using C#, these learnings can be applied to any game engine that you’re working with.
While there are several different ways machine learning can be used in game development, this blog post will focus mostly on in-game bots that could play the game against or together with human players, set up using Unity’s ML-Agents package.

Unity also offers their Sentis package, a library for neural network inference that allows you to use machine learning for significantly more uses beyond bots that play the game. While I won’t be discussing this use case in particular, it’s good to know that the option exists.
In order to understand the problem a bit better, I make reference to the robotics loop which bears some similarities to how agents act in a game world.
While ML engineers are primarily responsible for developing algorithms that focus on the thinking part of the loop (in essence, building the ML model), I would say that machine learning integration developers are responsible for providing interfaces to the game world to allow the agent to perform the sensing and acting steps.
In my experience as a machine learning integration developer, there are two approaches for training game agents that I’ve worked with: reinforcement learning and imitation learning.
With imitation learning, the game agent learns a set of actions based on pre-recorded data (normally data recordings that indicate how these actions were performed, for example, by humans). In comparison, with reinforcement learning, the game agent learns by having access to the environment itself and performing actions, often learning through positive or negative rewards.
While the way these agents are trained is very different, in practice there is not that much difference between integrating agents trained in different ways from a machine learning integration perspective.
Before we discuss the components needed for machine learning integration, we should think a little bit about the game’s architecture and how to make it best work for ML agents, since the game should be built in a way to make it easy for machine learning integration to take place. A lot of these requirements are also useful when thinking about making game systems testable, as well as when building with accessibility in mind; so really, it’s win-win all round!
For example, it should be very easy for the game state to be read. Working in a data-driven fashion (such as by structuring the project using Unity’s DOTS), or by separating logic from presentation (such as by following a Model-View-Controller approach) allows us to easily access the data we need in the game state. Furthermore, using pure C# objects in code is much cleaner than storing data in Unity MonoBehaviours, and avoids having to deal with Unity’s GameObject lifecycles.
Any system that is not directly relevant to the game loop itself and that won’t be used by bots should have the ability to be interfaced or mocked away when not needed. The game agent shouldn’t need to know about online servers, user login systems or any other systems that aren’t strictly necessary to play the core game. This is particularly relevant when running in ML mode. An additional benefit to doing this is gaining the ability to write unit tests that only test those specific systems, as well as tests that can focus on the core gameplay without having to invoke any other systems.
Any reference to direct input (such as by referring to specific buttons on a keyboard, gamepad, or mouse) should be avoided. This is because such agents don’t know how to directly press physical buttons! Instead, some form of InputController class should be implemented to separate the action that will happen from the actual button being pressed (or whatever other input method is being used). A useful way of doing so is by using the Command pattern. An additional benefit here is that this allows keybindings to change during the development process without impacting the bot in any way, as well as opening up the possibility of implementing a system to rebind all input if players wish to do so.
It should also be possible to run the game in a headless manner; ie. without rendering any of the graphics. This allows the game to be significantly sped up when running training simulations, such as by running the game at a higher timescale, as well as not have to use the GPU at all. However, there are a few caveats that need to be addressed. For example, animations tend to cause the game logic to wait for a determined amount of time. Separating the two using the aforementioned Model-View-Controller framework means the game logic no longer needs to wait for the animation to finish; instead, the animation can play separately and inform the game state when it’s done. Animation should always be subordinate to game logic!
The game will also need to be predictable, acting as a forward model. This means that any use of randomness in the game needs to be seeded, and any calculations that could result in a non-deterministic outcome (such as floating point calculations, and multithreaded operations that could resolve in a different order each time) as well as any undefined behaviour that could potentially result in non-deterministic outcome (such as relying on order when using Dictionaries) all need to be rethought.
One suggestion that could come up in order to avoid implementing most of these changes is to train on observable pixels. Unfortunately, this is incredibly expensive to deploy, and still won’t solve all of the problems that were previously mentioned. First of all, training on observable pixels means shifting the problems to the rendering layer; is it possible to remove any post-processing, or any of the in-game text popups, or the entirety of the HUD? We also still need to tell the model to wait when animations are taking place. Furthermore, if your game is multi-platform, there is no guarantee that the training on observable pixels on one platform will transfer to other platforms (in particular, think of the difference between desktop and VR!)
Some form of ML mode will be needed that serves as a training environment when training bots using reinforcement learning, or an environment that can be used to evaluate bots (whether training using reinforcement learning or imitation learning). This should be considered as its own target build, separate to the actual game build. If you don’t have some form of continuous integration system that can automate builds for you, it is highly recommended to set one up so you don’t need to think about making manual builds every time they’re needed. This will also help making sure that both the main game and the ML mode builds are in sync.
Training the game agent might involve the use of separate training scenarios that split up the problem into smaller chunks that are easier to learn. For example, if you wanted to teach a bot how to score a goal in a digital game of football, you might want to start teaching the bot how to kick the ball first, and then progress towards kicking the ball towards the goal, and end with teaching the bot how to score a goal by kicking the ball into the goalposts for a total of three separate training scenarios. Therefore, the ML mode build will need to accept some form of external input that allows it to launch into the correct training scenario (my suggestion here is to use the command line), as well as being able to change settings of the current scenario and switch between training scenarios while active. Finally, it’s important to make sure that the game agent is training in an environment that is as close to the actual game as possible; there's no point training in a completely separate environment after all!
If you’ve decided to use imitation learning to train game agents, you will also need to store vast amounts of recorded data somewhere, probably on the cloud. This data will then be used to train your game agents.
Separate machines will also be needed that can run experiments to train models; these can either be local physical machines or hosted somewhere in the cloud. These machines will need to run the game in ML mode, which might not be a problem if they run Windows (though you will need to think about the cost of Windows licenses) or require some thought if they run Linux (no licensing cost, but your game also needs to run on Linux without issue). As mentioned earlier, while not a hard requirement, it is preferable if the game is configured to run in headless mode and at a higher timescale in order to speed up training.
Machine learning systems are only as good as the data they work with. Therefore, it is highly recommended that you bring in a machine learning engineer early to help you identify what sort of data you should be collecting.
Data used for machine learning needs to be clean, accessible, and consistent:
Note that collecting data that is clean, accessible, and consistent is especially important for imitation learning, since after all this is the data that will be used for training!
If imitation learning is being used to train game agents, one suggestion is to be able to load the replay data and play it back. This is because replaying the game with the same observations and actions of the data that you're giving the bot will allow you to test to see if your data is correct; after all, if your data is incorrect, then the bot's actions will also be incorrect! It is important that playback is done in the same way as the recording was: within the actual game environment itself rather than in a separate area.
It is very common to require different bots for different game difficulties (for example, bots on hard mode should be better at playing the game than bots on easy mode!)
While one way of approaching the problem is to train different ML models for each game difficulty, another way is to train the model to perform at the hardest difficulty and then add a system to nerf the bot on lower difficulties. The implementation of these nerfs tends to be game-specific, but some suggestions include decreasing the speed of the bot slightly, increasing its reaction times, or slightly messing with its observations.
While machine learning engineers try to account for as many situations as possible, there might be situations where the bot observes a state that it hasn't seen before, and therefore does not know how to act. This observation is out of distribution. While it is impossible to account for every possible observation that is out of distribution, it is possible to take a step back and see if there are any ways we can add as many scenarios as possible to the training to reduce the chances of this happening. For reinforcement learning, there could be dedicated training scenarios that could be added to fill in these gaps, while with imitation learning, perhaps the existing data can be augmented with new data that can address some of the identified out of distribution observations (such as presenting players with scenarios that can provide this data), or by reusing the already existing data in different ways by jiggling it.
One useful heuristic we could have is a score card for a bot across different various scenarios. Once this is established, we can now compare different bots trained using different models using our evaluation criteria, allowing us to figure out strengths and weaknesses between models.
Finally, one suggestion is to write as many tests as possible that involve the bot, such as tests for loading ML mode in the correct way, for spawning in a bot and for despawning a bot, and for being able to play the game successfully. Having these tests is important so that if the game team changes the game in some way that breaks the way bots work, this change is detected quickly through continuous integration so that it can be fixed.
I wrote this blog post in an attempt to collect all of my learnings about integrating machine learning into games in one place. Hopefully, it will be useful to you too!