Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden – Synced

When reading a novel, humans naturally remember relevant plot information even if it was presented many chapters earlier. Although todays transformer-based language models have made impressive progress in natural language processing, they struggle in this regard, as the compute required for modelling long-term memories grows quadratically with the length of the text and will eventually exceed the models finite memory capacity.

To overcome this limitation, a research team from Instituto de Telecomunicaes, DeepMind, Institute of Systems and Robotics, Instituto Superior Tcnico and Unbabel has proposed -former (infinite former) a transformer model equipped with unbounded long-term memory (LTM) that enables it to attend to arbitrarily long contexts.

The team summarizes their studys contributions as:

The team extends the vanilla transformer with a continuous LTM to enable their proposed -former to access long-range context. The novel approach employs a continuous space attention framework to attend over the LTM signal, in which key matrix size depends on the number of basis functions instead of the length of the context being attended to. The models computation complexity is thus rendered independent of context length, enabling it to attend to arbitrarily long contexts without increasing memory requirements or computation burden.

To evaluate their proposed method, the researchers performed extensive experiments on synthetic task and language modelling tasks, using transformer-XL and the compressive transformer as their baselines.

In the synthetic task experiments, transformerXL achieved slightly better performance than the compressive transformer and -former for short memory length, but its accuracy degraded rapidly when the sequence length was increased. The accuracies for both the compressive transformer and -former meanwhile remained relatively stable. In the language modelling experiments, the -former slightly outperformed the compressive transformer.

The researchers also note -formers ability to reduce perplexity in a pretrained model such as GPT-2 by helping the model focus on relevant memories.

Overall, the study shows the proposed -former can scale up to long sequences while maintaining high accuracy, and demonstrates the versatility and benefits of unbounded long-term memory, both in model training from scratch and in the fine-tuning of pretrained language models.

The paper -former: Infinite Memory Transformer is on arXiv.

Author: Hecate He |Editor: Michael Sarazen, Chain Zhang

We know you dont want to miss any news or research breakthroughs.Subscribe to our popular newsletterSynced Global AI Weeklyto get weekly AI updates.

Like Loading...

Follow this link:
Infinite Memory Transformer: Attending to Arbitrarily Long Contexts Without Increasing Computation Burden - Synced

Related Posts

Comments are closed.