The compositional nature of intelligence
Imagine if you had to learn how to chop, peel and stir all over again every time you wanted to learn a new recipe. In many machine learning systems, agents often have to learn entirely from scratch when faced with new challenges. It’s clear, however, that people learn more efficiently than this: they can combine abilities previously learned. In the same way that a finite dictionary of words can be reassembled into sentences of near infinite meanings, people repurpose and re-combine skills they already possess in order to tackle novel challenges.
In nature, learning arises as an animal explores and interacts with its environment in order to gather food and other rewards. This is the paradigm captured by reinforcement learning (RL): interactions with the environment reinforce or inhibit particular patterns of behavior depending on the resulting reward (or penalty). Recently, the combination of RL with deep learning has led to impressive results, such as agents that can learn how to play boardgames like Go and chess, the full spectrum of Atari games, as well as more modern, difficult video games like Dota and StarCraft II.
A major limitation in RL is that current methods require vast amounts of training experience. For example, in order to learn how to play a single Atari game, an RL agent typically consumes an amount of data corresponding to several weeks of uninterrupted playing. A study led by researchers at MIT and Harvard indicated that in some cases, humans are able to reach the same performance level in just fifteen minutes of play.
One possible reason for this discrepancy is that, unlike humans, RL agents usually learn a new task from scratch. We would like our agents to leverage knowledge acquired in previous tasks to learn a new task more quickly, in the same way that a cook will have an easier time learning a new recipe than someone who has never prepared a dish before. In an article recently published in the Proceedings of the National Academy of Sciences (PNAS), we describe a framework aimed at endowing our RL agents with this ability.
Two ways of representing the world
To illustrate our approach, we will explore an example of an activity that is (or at least used to be) an everyday routine: the commute to work. Imagine the following scenario: an agent must commute every day from its home to its office, and it always gets a coffee on the way. There are two cafes between the agent's house and the office: one has great coffee but is on a longer path, and the other one has decent coffee but a shorter commute (Figure 1). Depending on how much the agent values the quality of the coffee versus how much of a rush it is in on a given day, it may choose one of two routes (the yellow and blue paths on the map shown in Figure 1).




