Bengio Team Proposes Flow Network-Based Generative Models That Learn a Stochastic Policy From a Sequence of Actions – Synced

For standard reinforcement learning (RL) algorithms, the maximization of expected return is achieved by selecting the single highest-reward sequence of actions. But for tasks in a combinatorial domain such as drug molecule synthesis, where exploration is important the desired goal is no longer to simply generate the single highest-reward sequence of actions, but rather to carefully sample a diverse set of high-return solutions.

To address this specific machine learning problem, a research team from Mila, McGill University, Universit de Montral, DeepMind and Microsoft has proposed GFlowNet, a novel flow-network-based generative method that can turn a given positive reward into a generative policy that samples with a probability proportional to the return.

The team summarizes their contributions as:

In graph theory, a flow network is a directed graph with sources and sinks, where each edge has a capacity, and each edge receives a flow. The motivation task of this flow network is iterative black-box optimization, where the agent has to compute a reward for a large batch of candidates at each round. The idea behind the proposed GFlowNet is to view the probability assigned to an action given a state as the flow associated with a network whose nodes are states and outgoing edges from that node are deterministic transitions driven by an action.

The team defines a flow network with a single source, where the sinks of the network correspond to the terminal states. Given the graph structure and the outflow of the sinks, they attempt to calculate a valid flow between nodes. Notably, such construction corresponds to a generative model, and the researchers conduct rigorous proofs and explanations to prove that the result is a terminal state (a sink), with probability proportional to the return if we follow the flow.

Based on the above theoretical results, the researchers then create a learning algorithm. They propose approximating the flows such that the flow conditions are obtained at convergence with enough capacity in their estimator of the flows. In this way, they will yield an objection function for GFlowNet where the minimization result of this objective achieves their desiderata a generative policy that samples with a probability proportional to the return.

The team conducted various experiments to evaluate the performance of the proposed GFlowNet. For example, in an experiment that generates small molecules, the team reported the empirical distribution of rewards and the average reward of the top-k as a function of learning.

Compared to the baseline MARS (Xie et al., 2021), GFlowNet found more high-reward molecules. The results also show that for both GFlowNet and MARS, the more molecules are visited, the better they become, with a slow convergence towards the proxys max reward.

Overall, GFlowNet achieves competitive results against baseline methods on the molecule synthesis domain task and performs well on a simple domain where there are many modes to the reward function. The research team believes GFlowNet can serve as an alternative approach for turning an energy function into a fast generative model.

The implementations are available on the project GitHub. The paper Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation is on arXiv.

Author: Hecate He |Editor: Michael Sarazen, Chain Zhang

We know you dont want to miss any news or research breakthroughs.Subscribe to our popular newsletterSynced Global AI Weeklyto get weekly AI updates.

Like Loading...

Follow this link:
Bengio Team Proposes Flow Network-Based Generative Models That Learn a Stochastic Policy From a Sequence of Actions - Synced

Related Posts

Comments are closed.