Robotic embodiment of a meta-learning neural model of human decision-making (MetaBot)

A project funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 795919 (2018-2020).

Hosted by the Laboratory of Computational Embodied Neuroscience (LOCEN), at ISTC-CNR. Head of the hosting lab: Gianluca Baldassarre

Decision-making is currently one of the most studied topics in cognitive neuroscience. Despite constant efforts, neurobiology of decision-making still suffers from an excess of empiricism, with a proliferation of narrow hypotheses to explain a wide (and often inconsistent) range of experimental results. The combination of empirical testing with computational modeling is the most promising path to escape this stall situation. Nonetheless, neuro-computational models of decision-making are still affected by two main limitations. First, computer simulations, typically used for testing neural models, represent the environment in a very simplistic way, ignoring what makes the real nervous system able to deal with the complexity of reality. Second, computational neuroscience often neglects that bodily processes do not simply “execute” what is decided centrally, but are part of cognitive processing itself. Here we define the first type of problem as scalability problem, while the second type as embodiment problem. Failing to deal with these two problems could transform a promising approach into a new impasse. The main goal of this project is to solve these two limitations and to open a new path in cognitive and computational neuroscience of decision-making. We implemented the embodiment in a humanoid robotic platform (iCub) of a novel neuro-computational model, representing the state of the art in modeling neurobiology of decision-making. This neural model, the Reinforcement Meta-Learner (RML; Silvetti et al., 2018 Plos CB), generates emergent (i.e., homunculus-free) cognitive control signals and supports learning to solve complex decision problems. In simplified computer simulations, the RML revealed to be exceptionally successful in explaining many different experimental data sets (both neural and behavioural) from different domains. The RML embodiment would represent an important case where a neural model, born in the domain of cognitive neuroscience, would be embodied in a humanoid robot. Indeed, previous works in cognitive and bio-inspired robotics are based mostly on neural architectures designed ad hoc for robotics and able to solve a limited set of tasks. Finally, we are also testing RML predictions on behavioural and neuroimaging data from human volunteers, in order to verify how well this model simulates the human brain functions related to decision-making.

The fusion of computational neuroscience and humanoid robotics will allow investigating the role of embodiment in decision-making in real world problems (contribution to neuroscience), and it also would represent a unique opportunity to test the effectiveness of the RML to be a new way for developing genuinely autonomous decision-making in robots (contribution to robotics).

The Reinforcement Meta-Learner (RML)

One phenomenon that remained particularly elusive is the way the human brain dynamically self-regulates its internal “variables” in order to interact optimally with the environment – i.e., meta-learning. For example, humans can control their learning speed, in order to preserve useful knowledge from random non-informative events, and to update it when relevant events occur. We have recently designed a new neural model of decision-making, to capture the critical aspect of meta-learning – the RML (Figure 1). The RML captures both human behavioural flexibility and neurophysiology of decision-making by self-regulating its internal parameters (i.e., meta-learning). In the RML, meta-learning is modeled by dynamical control of catecholamines release thanks to the recurrent interaction between midbrain nuclei, the ventral tegmental area (VTA) and the locus coeruleus (LC), and the medial prefrontal cortex (MPFC). Thanks to the cortical-subcortical dialogue, the RML generates autonomously adaptive behaviour and can solve complex decision problems. For example, it controls its own neural plasticity to learn faster when the environment changes and to preserve useful knowledge from random environment fluctuations, or it can optimally decide when to exert effort (to invest more energy) to execute a task. Due to these features, the RML captures critical empirical findings from an unprecedented range of domains, namely research on the stability/plasticity balance, on effort processing, on working memory, and on higher-order classical and instrumental conditioning.

Figure 1. a) Brain regions simulated by the RML: a macro-circuit involving the medial prefrontal cortex (MPFC) and the brainstem nuclei releasing dopamine (DA) and norepinephrine (NE) neuromodulators. b) Functional schema displaying the RML architecture. The RML (black dashed box) consists of a cortical module (MPFC) performing action-outcome comparisons and selecting those actions that maximize expected value, and a subcortical module simulating neuromodulatory functions relative to DA and NE. The two modules interact to maximize behavioural (and cognitive) efficiency: the MPFC influences the release of neuromodulators while the latter modulate the functions of MPFC. Finally, NE output can be directed toward external modules, independently developed (e.g. simulating working memory) in order to optimize their functions during tasks execution.

The project is divided in two parts. Part 1 deals with the embodiment problem, while Part 2 consists in testing the capability of the RML to simulate the human brain activity during decision-making when humans are supposed to optimize effort and expected rewards.

Part 1: robotic embodiment

The goal of this part is to implement the RML model into a state-of-the-art of humanoid robot: the iCub ( This humanoid robot reproduces the body shape of a toddler and is provided with complex visual, auditory, tactile, proprioceptive and motor systems. Extensive research on perceptual and motor systems has been already conducted on the iCub making the robot a suitable platform for the embodiment of high-level decision-making models like the RML. The power of this proposal resides in studying the impact of embodiment on decision-making processes by merging one of the most recent and promising neural models in neuroscience of decision-making (the RML) into one of the most advanced humanoid robots. We chose to embody the RML, instead of other recent computational models, not only for continuity of my research line, but also because the RML is ready by design to face both the scalability and embodiment problems. For instance, the RML works in continuous time representation, it is robust to noise and it can be easily connected to iCub cognitive modules, while for most of other models, this challenge would have implied a complete re-design, due to lack of such requirements (Figure 2).

Figure 2. Schema summarizing the interactions between the RML and the iCub. The RML is providing both the choice to be selected (based on iCub visual information) and the physical effort of execution (vigor)

We also believe that RML cognitive flexibility in learning and decision-making could represent a notable step forward for autonomous robotics itself, regardless the progress that this project would mean for cognitive neuroscience research. In the task we designed, the iCub was asked to touch (reaching movement) one of two boxes placed in front of it. Each box, when touched, delivers a reward of variable magnitude and with variable probability. Figure 3 panel a) displays the setup of iCub experiment. During each trial, the iCub was asked to touch one of the two boxes. Figure 3 panel b) shows what happened when the robot touched the box. When touched, the box could deliver a reward with a certain probability and of a certain magnitude, which could be variable over trials. Reward deliver was signaled by a box flashing. The iCub should discover and track over time which box is the most rewarding, making decisions through epochs when the reward magnitudes and probabilities do not change (stationary epoch) and epochs when they change for one or both the boxes (volatile epoch). The robot performs continuous decision-making operations, as during the reaching movement toward a box, it could change decision and select the other box. This process leads to specific arm trajectories that indicate the degree of decision uncertainty, so that low uncertainty decisions results in straighter arm trajectories, while high uncertainty decisions to more curved trajectories (see Spivey et al., 2005 and Lepora et al., 2015 for experimental results on humans). Finally, this decision-making process must take in consideration also the cost (in terms of motor energy expenditure) of changing decision (longer and inefficient reaching trajectories), so that the robot’s goal is to maximize reward while minimizing the motor effort required to execute the task.

Figure 3. iCub decision-making task.

Part 2: Integration of computational and neuroimaging techniques

This second part of the project is aimed at testing the RML predictions on the brain activity from healthy volunteers, during a decision-making task performed during fMRI scanning. Human healthy volunteers (25 subjects) performed a decision-making task during fMRI scanning. The task consisted in a decision epoch, where volunteers where asked to make a binary choice between an easy task (mental calculation) for a low reward and a harder task for a larger reward, testing different reward-difficulty combinations, and a performance epoch when the volunteers actually execute the mental calculation (Figure 4).

Figure 4. Cognitive task admistered to both human volunteers a the RML model. It consisted in two session, one (a) when subjects where asked to choose which kind of trials will face in the second task (easy and not rewarding or hard and rewarding), and the second (b) where they actually performed the mental calculation task.

Like for the iCub experiment, the decision-making process was aimed at maximizing reward while minimizing the cost of task execution. The method we are using in this part of the project is called “model-based fMRI” (Figure 5). It consists in administering the same cognitive task to human volunteers and to the RML model, whose internal parameters are adapted to each volunteer to maximize the similarity between the responses given by the model and the volunteer (creating a digital twin for each volunteer). After that, the metabolic activity of the brain of each volunteer is compared with the neural activity simulated by his/her RML-digital twin, verifying if the model was able to predict correctly the human brain activity. This study will provide a very strict statistical test to verify if and for which regions the RML is able to simulate the human brain activity subtending decision-making processes in uncertain situations.

Figure 5. Model-based analysis pipeline. Here, human volunteers perform a cognitive task while MRI scanned. The RML is exposed to the same task, and its internal parameters are adapted (computational phenotyping) to precisely simulate the behaviour of each single volunteer (creating a digital twin for each volunteer). Afterwards, simulated and real brain activities are statistically compared, to evidence if and where the RML was able to predict the human brain reactions to the cognitive task.