Reinforcement learning is a type of machine learning where an agent (e.g. a robot) learns from trying different actions and seeing the result of those actions. For example a robot that is supposed to walk could try different motions of its motors based on its current sensor readings and be given a positive reward if it successfully moves forward. The robot would then repeat the positively rewarded actions given the same circumstances.

Some of the challenges of reinforcement learning are deciding how to encode the rewards (positive for behaviour that is desired and negative for undesired) and adjusting the reinforcement algorithm to achieve desirable behaviours with a minimum amount of exploration of undesired behaviours. For example a complicated physical robot might damage itself before learning how to walk. Reinforcement learning is often done first in simulation, but often the learned behaviours don’t translate to the physical world due to small differences between the simulation and physical conditions (e.g. friction and physical dynamics of the real motors).

In reinforcement learning the agent can see its current state in the environment and takes an action that leads to a new state and gives a reward. Reinforcement learning algorithms allow the agent to maximize the cumulative reward, for example several motors may have to be moved in sequence before the reward from moving forward is achieved.

Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward.

Reinforcement learning differs from supervised learning in not needing labelled input/output pairs be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).


Basics of Reinforcement Learning – Michael Littman

In-depth video lecture:

“In machine learning, the problem of reinforcement learning is concerned with using experience gained through interacting with the world and evaluative feedback to improve a system’s ability to make behavioral decisions. This tutorial will introduce the fundamental concepts and vocabulary that underlie this field of study. It will also review recent advances in the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology.”

Using Unity ML-Agents to balance a simulated ball with a real Xbox controller – Little French Kev

Soft Actor Critic—Deep Reinforcement Learning with Real-World Robots – Berkeley Artificial Intelligence Research

Soft Actor Critic is an algorithm for reinforcement learning that works with a relatively small amount of training.

Spinning Up in Deep RL – Open AI

Tutorials on reinforcement learning combined with deep neural networks from Open AI

SISYPHUS (Our Learning Robot) – Michael Ang

Robot that learns to crawl using reinforcement learning implemented on Arduino UNO

More recent / advanced techniques combine deep neural networks with reinforcement learning for “deep reinforcement learning”. This RC car learns to navigate a hallway using a neural network and reinforcement learning. The car has a grayscale camera, speed encoder, and collision detector. The neural network takes the camera input (last 4 frames) and current action (turning, accelerator) and tries to predict what will happen over the next few time steps. The camera images go through convolution layers which are connected to an LSTM which also takes in the current action. The network is trained by comparing the predictions (of e.g. collision probability) with the ground truth (e.g. did a collision happen). The car requires a few hours of training to be able to navigate around a hallway. The car is able to recover from collisions by reversing and then continuing the training. The onboard processing is done with a Jetson TX1 single board computer that transmits data to a laptop that runs the training. The laptop sends new model parameters back to the car as it’s running.

Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation – Berkeley DeepDrive

Berkeley DeepDrive


Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *