Reinforcement learning is a type of machine learning where an agent (e.g. a robot) learns from trying different actions and seeing the result of those actions. For example a robot that is supposed to walk could try different motions of its motors based on its current sensor readings and be given a positive reward if it successfully moves forward. The robot would then repeat the positively rewarded actions given the same circumstances.
Some of the challenges of reinforcement learning are deciding how to encode the rewards (positive for behaviour that is desired and negative for undesired) and adjusting the reinforcement algorithm to achieve desirable behaviours with a minimum amount of exploration of undesired behaviours. For example a complicated physical robot might damage itself before learning how to walk. Reinforcement learning is often done first in simulation, but often the learned behaviours don’t translate to the physical world due to small differences between the simulation and physical conditions (e.g. friction and physical dynamics of the real motors).
In reinforcement learning the agent can see its current state in the environment and takes an action that leads to a new state and gives a reward. Reinforcement learning algorithms allow the agent to maximize the cumulative reward, for example several motors may have to be moved in sequence before the reward from moving forward is achieved.
Reinforcement learning differs from supervised learning in not needing labelled input/output pairs be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).Wikipedia
Basics of Reinforcement Learning – Michael Littman
In-depth video lecture:
“In machine learning, the problem of reinforcement learning is concerned with using experience gained through interacting with the world and evaluative feedback to improve a system’s ability to make behavioral decisions. This tutorial will introduce the fundamental concepts and vocabulary that underlie this field of study. It will also review recent advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology.”
Soft Actor Critic is an algorithm for reinforcement learning that works with a relatively small amount of training.
Spinning Up in Deep RL – Open AI
Tutorials on reinforcement learning combined with deep neural networks from Open AI
SISYPHUS (Our Learning Robot) – Michael Ang
More recent / advanced techniques combine deep neural networks with reinforcement learning for “deep reinforcement learning”. This RC car learns to navigate a hallway using a neural network and reinforcement learning. The car has a grayscale camera, speed encoder, and collision detector. The neural network takes the camera input (last 4 frames) and current action (turning, accelerator) and tries to predict what will happen over the next few time steps. The camera images go through convolution layers which are connected to an LSTM which also takes in the current action. The network is trained by comparing the predictions (of e.g. collision probability) with the ground truth (e.g. did a collision happen). The car requires a few hours of training to be able to navigate around a hallway. The car is able to recover from collisions by reversing and then continuing the training. The onboard processing is done with a Jetson TX1 single board computer that transmits data to a laptop that runs the training. The laptop sends new model parameters back to the car as it’s running.