Introduction on ML-Agents in Unity:-
Machine Learning is an application of AI, that provides systems the ability to learn itself and get improves with the experiences.
Machine Learning in Unity, is changing the way we expect to get intelligent behavior of the agents. ML-Agents are used to train game characters or opponents through learning the behavior of the agents.
The typical scenario for training an agent in a virtual environment is to have a single environment and agent which is tightly coupled. Agent actions change the state in the environment and provide the agent with rewards.
ML-Agent SDK allows developers to transform games and simulations created using the Unity Editor into environments where intelligent agents can be trained using a variety of machine learning algorithms like Deep Reinforcement Learning(PPO or SAC), Evolutionary Strategies, or other machine learning methods through a simple to use Python API. The package allows you to convert any Unity scene into a learning environment and train character behaviors using a variety of machine learning algorithms.
1. Introduction to the Learning Environment
- Agent:- Every Agent has a distinctive state and observations and takes distinctive actions within the environment. It is attached to the Gameobject and handles generating observations, performing actions and assigning rewards.
- Behavior:- It defines specific attributes of the agent like the number of observations or number of actions that an agent can take. A Behaviour is like a function that receives the observations and rewards from the agent and returns actions. Every Behaviour is uniquely identified by a Behaviour Name. A Behaviour is of three types: Default, Heuristic, Inference.
- The default behavior is one that is not yet defined but about to be trained.
- Heuristic Behaviour is one that defines rules that are implemented in code from which users can play in a player.
- Inference Behaviour is one that is obtained as a trained Neural Network file, after the agent trained from the learning Behaviour, it becomes an Inference Behaviour.
- Communicator:- The External Communicator is which connects the Learning Environment with the Python API.
- Python API:- It contains the python interface for manipulating a learning environment. Python API is not a part of Unity. This API contains a “mlagents-envs” python package and is used as a python training process to communicate and control Academy during training.
- Python Trainer:- It contains all machine learning algorithms that are used in training agents. The algorithms are implemented in the python and are parts of the mlagents python package.”mlagents-learn” is the command-line that supports all the training methods.
2. Setting up the Project with Unity ML-Agent Package
Prerequisites
- Unity Version 2019 or above
- ML-Agents v1.0
- Python 3.7
To install the ML-Agent package go to Window -> Package Manager, press the Advanced button, and select show preview packages and install ML-Agents v1.0.
Installing Unity ML-Agent Python Package :
Installing the ML-Agent python package involves installing other python packages that ML-Agents depends on.for further instructions on installing python packages go through this link :
To install ML-Agent python package, activate your virtual environment and run from the command line:-
Pip3 install ml-agents
Setting up an Agent:
In the Inspector, Add Behaviour Parameter Component. Every Agent must have a behavior parameter. Behavior determines how an agent makes decisions.
- Vector Observation Space:- Every agent collects the observations to make decisions. Vector Observation is a vector of float numbers that contain relevant information for the agent to make decisions.
- Vector Action:- An agent is given instructions in the form of a float array.
- Model:- Where the trained Brain(.NN File) is placed. From which agent performs the actions.
- Behavior Type:- Where u can choose the type of Behaviour(Default, Heuristic, Inference).
3. Scripting the behavior of the Agents and adding their Functionalities.
Create a child class of Agent(from Unity.MLAgents), we will call it “CarAgent”. We will add the logic that will let our Agent learn to move left/right to avoid collision vehicles using Deep Reinforcement Learning. More specifically, we will need to extend four methods from the Agent base class:
- Initialize()
- OnEpisodeBegin()
- CollectObservations(VectorSensor sensor)
- OnActionReceived(float[] VectorAction)
//Code using System; using UnityEngine; using Unity.MLAgents; using Unity.MLAgents.Sensors; public class CarAgent : Agent { public override void Initialize() { //This method is used to perform one-time initialization or set up of the Agent instance rb = GetComponent(); } public override void CollectObservations(VectorSensor sensor) { //This Method is used to collect Vector Observations of the agent for the step sensor.AddObservation(transform.position.x); } public override void Heuristic(float[] actionsOut) { //To choose an action for this agent using user inputs actionsOut[0] = Input.GetAxis("Horizontal"); } public override void OnActionReceived(float[] vectorAction) { //to set agent behavior at every step, based on the provided action. MoveAction(vectorAction); } public override void OnEpisodeBegin() { //to set up an Agent instance at the beginning of an episode. transform.position = new Vector3(-1f, 0.1f, 0f) ; rb.velocity = Vector3.zero; Reset(); }
In the above Code, the Initialize method is used to initialize the Rigidbody of the Agents gameobject(Car).CollectObservations method is used to observe the X-Position of the Agent from which it can determine which side of the track the car is moving. The Agent needs to be able to move in the x-direction to avoid the collision with the other vehicles coming towards the Agent(Car). At first, Agent determines from which side of the track the vehicle is coming using a Ray Sensor than it takes action from that observation to rather move to stay on the same track. Reinforcement Learning requires rewards. Assign Rewards if the agent successfully passed the other Cars.
4. Train Agent using Python API
Behavior Configurations:-
Behaviour Config file is used to set the configuration for each behavior of the scene. The behavior subsection is used to define the behavior name and trainer-type(PPO/SAC) of your trainer config file.
The hyperparameters for training are specified in a configuration file that you pass to the mlagents-learn program.h
Create a new YAML file like this:-
behaviors: Drive: trainer_type: ppo Hyperparameters : batch_size: 128 buffer_size: 2048 learning_rate: 0.0003 beta: 0.005 epsilon: 0.2 lambd: 0.95 num_epoch: 3 learning_rate_schedule: constant network_settings: normalize: true hidden_units: 256 num_layers: 2 vis_encode_type: simple reward_signals: extrinsic: gamma: 0.99 strength: 1.0 keep_checkpoints: 5 max_steps: 1000000 time_horizon: 1000 summary_freq: 2000 threaded: true self_play: save_steps: 50000 team_change: 100000 swap_steps: 2000 window: 10 play_against_latest_model_ratio: 0.5 initial_elo: 1200.0
Here is the link that explains all the parameters used in the YAML file configuration for training.
To train your agent, open command window give the path where the project is saved run the following command before pressing Play in the Editor:
mlagents-learn <trainer-config-file> --run-id=<run-identifier>
- <trainer-config-file>:- Path of the trainer’s Behaviour configuration file. This contains all Hyperparameters value for the training.
- –run-id=<run-identifier>:- It is a unique name you can use to locate the results of your training.
4. Using of Tensorflow and analyzing Tensorboard
TensorFlow is an open-source library for performing computations using data flow graphs. When we train the behavior of agents the progress of the learning process is shown using the graph in TensorFlow.
ML-Agents ToolKit saves the statistics during the training session that then you can view with TensorFlow utility which is called TensorBoard. It saves the training statistics to a folder named result, organized by the run-id you have assigned to the training session.
Use to monitor the statistics of Agent performance during training, use Tensorboard command given below:
tensorboard --logdir results
Trained ML Gameplay Video :
Here is the final video after trained the Agent in which the car controls itself based on the training
Conclusion
So, I hope you have learned some new things from this blog about the application of AI/ML in the gaming and simulations. ML-Agent ToolKit is a feature-rich SDK to apply the concepts of machine learning to the various areas of Unity applications. This was just a basic implementation to help you get started with ML-Agents in Unity. There are many more features available by ML-Agents ToolKit which we will go through in the upcoming blogs.