The Enformer vs the Basenji – The AI Algorithms for gene expression predictions – Analytics India Magazine

DeepMind and Alphabet at Calico introduced a neural network architecture called Enformer that greatly improved the accuracy of predicting gene expression based on DNA sequence.

In the paper Effective gene expression prediction from sequence by integrating long-range interactions published in Nature Methods, DeepMind suggested that Enformer is more accurate than Basenji.

The basic building blocks of gene expression have typically been convolutional neural networks. They have, however, been limited in their ability and effectiveness to model due to the effects of distal enhancers on gene expression.

So Deepmind depends on Basenji2, built on TensorFlow, which offers a variety of benefits, including distributed computing, a large and adaptive developer community, and is designed to predict quantitative signals using regression loss functions, rather than binary signals using classification loss functions.

The best part of Basenji is that it could predict the regulatory activity of 40,000 base pair DNA sequences at a time.

Enformer, on the other hand, relies on a technique common to natural language processing from Google called Transformers to take into account self-attention mechanisms that would be able to integrate much more DNA context. As Transformers can read long text passages, DeepMind modified them to read DNA sequences of vastly extended length.

Enformer outperformed the best team on the critical assessment of genome interpretation challenge (CAGI5) for noncoding variant interpretation despite no additional training. Furthermore, Enformer learned to predict promoter-enhancer interactions directly from DNA sequences, competing with methods that took direct experimental data as input.

In the case of training, DeepMind used Sonnet to construct neural networks used for many different purposes. It is defined in enformer.py.

DeepMind pre-computed variant effect scores for all frequent variants (MAF>0.5%, in any population) and stored them in HDF5 files per chromosome for the HG19 reference genome under the 1000 genomes project. Additionally, they provide the top 20 principal components of variant-effect scores per chromosome in a tabix-indexed TSV file (HG19 reference genome). These files have the following columns:

Hopefully, these advances will enable better mapping of growing human disease associations to cell-type-specific gene regulatory mechanisms and provide a framework to understand how cis-regulatory evolution works.

Read more here:
The Enformer vs the Basenji - The AI Algorithms for gene expression predictions - Analytics India Magazine

Related Posts

Comments are closed.