Category Archives: Deep Mind

Google AI heavyweight Jeff Dean talks about algorithmic breakthroughs and data center emissions – Fortune

Google sent a jolt of unease into the climate change debate this month when it disclosed that emissions from its data centers rose 13% in 2023, citing the AI transition in its annual environmental report. But according to Jeff Dean, Googles chief scientist, the report doesnt tell the full story and gives AI more than its fair share of blame.

Dean, who is chief scientist at both Google DeepMind and Google Research, said that Google is not backing off its commitment to be powered by 100% clean energy by the end of 2030. But, he said, that progress is not necessarily a linear thing because some of Googles work with clean energy providers will not come on line until several years from now.

Those things will provide significant jumps in the percentage of our energy that is carbon-free energy, but we also want to focus on making our systems as efficient as possible, Dean said at Fortunes Brainstorm Tech conference on Tuesday, in an onstage interview with Fortunes AI editor Jeremy Kahn.

Dean went on to make the larger point that AI is not as responsible for increasing data center usage, and thus carbon emissions, as critics make it out to be.

Theres been a lot of focus on the increasing energy usage of AI, and from a very small base that usage is definitely increasing, Dean said. But I think people often conflate that with overall data center usage of which AI is a very small portion right now but growing fast and then attribute the growth rate of AI based computing to the overall data center usage.

Dean said that its important to examine all the data and the true trends that underlie this, though he did not elaborate on what those trends were.

One of Googles earliest employees, Dean joined the company in 1999 and is credited with being one of the key people who transformed its early internet search engine into a powerful system capable of indexing the internet and reliably serving billions of users. Dean cofounded the Google Brain project in 2011, spearheading the companys efforts to become a leader in AI. Last year, Alphabet merged Google Brain with DeepMind, the AI company Google acquired in 2014, and made Dean chief scientist reporting directly to CEO Sundar Pichai.

By combining the two teams, Dean said that the company has a better set of ideas to build on, and can pool the compute so that we focus on training one large-scale effort like Gemini rather than multiple fragmented efforts.

Dean also responded to a question about the status of Googles Project Astraa research project which DeepMind leader Demis Hassabis unveiled in May at Google I/O, the companys annual developer conference. Described by Hassabis as a universal AI agent that can understand the context of a users environment, a video demonstration of Astra showed how users could point their phone camera to nearby objects and ask the AI agent relevant questions such as What neighborhood am I in? or Did you see where I left my glasses?

At the time, the company said the Astra technology will come to the Gemini app later this year. But Dean put it more conservatively: Were hoping to have something out into the hands of test users by the end of the year, he said.

The ability to combine Gemini models with models that actually have agency and can perceive the world around you in a multimodal way is going to be quite powerful, Dean said. Were obviously approaching this responsibly, so we want to make sure that the technology is ready and that it doesnt have unforeseen consequences, which is why well roll it out first to a smaller set of initial test users.

As for the continued evolution of AI models, Dean noted that additional data and computing power alone will not suffice. A couple more generations of scaling will get us considerably farther, Dean said, but eventually there will be a need for some additional algorithmic breakthroughs.

Dean said his team has long focused on ways to combine scaling with algorithmic approaches in order to improve factuality and reasoning capabilities, so that the model can imagine plausible outputs and reason its way through which one makes the most sense.

Those kind of advances Dean said, will be important to really make these models robust and more reliable than they already are.

Read more coverage from Brainstorm Tech 2024:

Wiz CEO says consolidation in the security market is truly a necessity as reports swirl of $23 billion Google acquisition

Why Grindrs CEO believes synthetic employees are about to unleash a brutal talent war for tech startups

Experts worry that a U.S.-China cold war could turn hot: Everyones waiting for the shoe to drop in Asia

Here is the original post:
Google AI heavyweight Jeff Dean talks about algorithmic breakthroughs and data center emissions - Fortune

Top Deep Learning Interview Questions and Answers for 2024 – Simplilearn

The demand for Deep Learning has grown over the years and its applications are being used in every business sector. Companies are now on the lookout for skilled professionals who can use deep learning and machine learning techniques to build models that can mimic human behavior. As per indeed, the average salary for a deep learning engineer in the United States is $133,580 per annum. In this tutorial, you will learn the top 45 Deep Learning interview questions that are frequently asked.

Check out some of the frequently asked deep learning interview questions below:

If you are going for a deep learning interview, you definitely know what exactly deep learning is. However, with this question the interviewee expects you to give an in-detail answer, with an example.Deep Learning involves taking large volumes of structured or unstructured data and using complex algorithms to train neural networks. It performs complex operations to extract hidden patterns and features (for instance, distinguishing the image of a cat from that of a dog).

Neural Networks replicate the way humans learn, inspired by how the neurons in our brains fire, only much simpler.

The most common Neural Networks consist of three network layers:

Each sheet contains neurons called nodes, performing various operations. Neural Networks are used in deep learning algorithms like CNN, RNN, GAN, etc.

As in Neural Networks, MLPs have an input layer, a hidden layer, and an output layer. It has the samestructure as a single layer perceptron with one or more hidden layers. A single layer perceptron can classify only linear separable classes with binary output (0,1), but MLP can classify nonlinear classes.

Except for the input layer, each node in the other layers uses a nonlinear activation function. This means the input layers, the data coming in, and the activation function is based upon all nodes and weights being added together, producing the output. MLP uses a supervised learning method called backpropagation. In backpropagation, the neural network calculates the error with the help of cost function. It propagates this error backward from where it came (adjusts the weights to train the model more accurately).

The process of standardizing and reforming data is called Data Normalization. Its a pre-processing step to eliminate data redundancy. Often, data comes in, and you get the same information in different formats. In these cases, you should rescale values to fit into a particular range, achieving better convergence.

One of the most basic Deep Learning models is a Boltzmann Machine, resembling a simplified version of the Multi-Layer Perceptron. This model features a visible input layer and a hidden layer -- just a two-layer neural net that makes stochastic decisions as to whether a neuron should be on or off. Nodes are connected across layers, but no two nodes of the same layer are connected.

At the most basic level, an activation function decides whether a neuron should be fired or not. It accepts the weighted sum of the inputs and bias as input to any activation function. Step function, Sigmoid, ReLU, Tanh, and Softmax are examples of activation functions.

Also referred to as loss or error, cost function is a measure to evaluate how good your models performance is. Its used to compute the error of the output layer during backpropagation. We push that error backward through the neural network and use that during the different training functions.

Gradient Descent is an optimal algorithm to minimize the cost function or to minimize an error. The aim is to find the local-global minima of a function. This determines the direction the model should take to reduce the error.

This is one of the most frequently asked deep learning interview questions. Backpropagation is a technique to improve the performance of the network. It backpropagates the error and updates the weights to reduce the error.

In this deep learning interview question, the interviewee expects you to give a detailed answer.

A Feedforward Neural Network signals travel in one direction from input to output. There are no feedback loops; the network considers only the current input. It cannot memorize previous inputs (e.g., CNN).

A Recurrent Neural Networks signals travel in both directions, creating a looped network. It considers the current input with the previously received inputs for generating the output of a layer and can memorize past data due to its internal memory.

The RNN can be used for sentiment analysis, text mining, and image captioning. Recurrent Neural Networks can also address time series problems such as predicting the prices of stocks in a month or quarter.

Softmax is an activation function that generates the output between zero and one. It divides each output, such that the total sum of the outputs is equal to one. Softmax is often used for output layers.

ReLU (or Rectified Linear Unit) is the most widely used activation function. It gives an output of X if X is positive and zeros otherwise. ReLU is often used for hidden layers.

This is another frequently asked deep learning interview question. With neural networks, youre usually working with hyperparameters once the data is formatted correctly. A hyperparameter is a parameter whose value is set before the learning process begins. It determines how a network is trained and the structure of the network (such as the number of hidden units, the learning rate, epochs, etc.).

When your learning rate is too low, training of the model will progress very slowly as we are making minimal updates to the weights. It will take many updates before reaching the minimum point.

If the learning rate is set too high, this causes undesirable divergent behavior to the loss function due to drastic updates in weights. It may fail to converge (model can give a good output) or even diverge (data is too chaotic for the network to train).

Dropout is a technique of dropping out hidden and visible units of a network randomly to prevent overfitting of data (typically dropping 20 percent of the nodes). It doubles the number of iterations needed to converge the network.

Batch normalization is the technique to improve the performance and stability of neural networks by normalizing the inputs in every layer so that they have mean output activation of zero and standard deviation of one.

The next step on this top Deep Learning interview questions and answers blog will be to discuss intermediate questions.

Batch Gradient Descent

Stochastic Gradient Descent

The batch gradient computes the gradient using the entire dataset.

It takes time to converge because the volume of data is huge, and weights update slowly.

The stochastic gradient computes the gradient using a single sample.

It converges much faster than the batch gradient because it updates weight more frequently.

Overfitting occurs when the model learns the details and noise in the training data to the degree that it adversely impacts the execution of the model on new information. It is more likely to occur with nonlinear models that have more flexibility when learning a target function. An example would be if a model is looking at cars and trucks, but only recognizes trucks that have a specific box shape. It might not be able to notice a flatbed truck because there's only a particular kind of truck it saw in training. The model performs well on training data, but not in the real world.

Underfitting alludes to a model that is neither well-trained on data nor can generalize to new information. This usually happens when there is less and incorrect data to train a model. Underfitting has both poor performance and accuracy.

To combat overfitting and underfitting, you can resample the data to estimate the model accuracy (k-fold cross-validation) and by having a validation dataset to evaluate the model.

There are two methods here: we can either initialize the weights to zero or assign them randomly.

Initializing all weights to 0: This makes your model similar to a linear model. All the neurons and every layer perform the same operation, giving the same output and making the deep net useless.

Initializing all weights randomly: Here, the weights are assigned randomly by initializing them very close to 0. It gives better accuracy to the model since every neuron performs different computations. This is the most commonly used method.

There are four layers in CNN:

Pooling is used to reduce the spatial dimensions of a CNN. It performs down-sampling operations to reduce the dimensionality and creates a pooled feature map by sliding a filter matrix over the input matrix.

Long-Short-Term Memory (LSTM) is a special kind of recurrent neural network capable of learning long-term dependencies, remembering information for long periods as its default behavior. There are three steps in an LSTM network:

While training an RNN, your slope can become either too small or too large; this makes the training difficult. When the slope is too small, the problem is known as a Vanishing Gradient. When the slope tends to grow exponentially instead of decaying, its referred to as an Exploding Gradient. Gradient problems lead to long training times, poor performance, and low accuracy.

Tensorflow provides both C++ and Python APIs, making it easier to work on and has a faster compilation time compared to other Deep Learning libraries like Keras and Torch. Tensorflow supports both CPU and GPU computing devices.

This is another most frequently asked deep learning interview question. A tensor is a mathematical object represented as arrays of higher dimensions. These arrays of data with different dimensions and ranks fed as input to the neural network are called Tensors.

Constants - Constants are parameters whose value does not change. To define a constant we use tf.constant() command. For example:

a = tf.constant(2.0,tf.float32)

b = tf.constant(3.0)

Print(a, b)

Variables - Variables allow us to add new trainable parameters to graph. To define a variable, we use the tf.Variable() command and initialize them before running the graph in a session. An example:

W = tf.Variable([.3].dtype=tf.float32)

b = tf.Variable([-.3].dtype=tf.float32)

Placeholders - these allow us to feed data to a tensorflow model from outside a model. It permits a value to be assigned later. To define a placeholder, we use the tf.placeholder() command. An example:

a = tf.placeholder (tf.float32)

b = a*2

with tf.Session() as sess:

result = sess.run(b,feed_dict={a:3.0})

print result

Sessions - a session is run to evaluate the nodes. This is called the Tensorflow runtime. For example:

a = tf.constant(2.0)

b = tf.constant(4.0)

c = a+b

# Launch Session

Sess = tf.Session()

# Evaluate the tensor c

print(sess.run(c))

Everything in a tensorflow is based on creating a computational graph. It has a network of nodes where each node operates, Nodes represent mathematical operations, and edges represent tensors. Since data flows in the form of a graph, it is also called a DataFlow Graph.

Suppose there is a wine shop purchasing wine from dealers, which they resell later. But some dealers sell fake wine. In this case, the shop owner should be able to distinguish between fake and authentic wine.

The forger will try different techniques to sell fake wine and make sure specific techniques go past the shop owners check. The shop owner would probably get some feedback from wine experts that some of the wine is not original. The owner would have to improve how he determines whether a wine is fake or authentic.

The forgers goal is to create wines that are indistinguishable from the authentic ones while the shop owner intends to tell if the wine is real or not accurately.

Let us understand this example with the help of an image shown above.

There is a noise vector coming into the forger who is generating fake wine.

Here the forger acts as a Generator.

The shop owner acts as a Discriminator.

The Discriminator gets two inputs; one is the fake wine, while the other is the real authentic wine. The shop owner has to figure out whether it is real or fake.

So, there are two primary components of Generative Adversarial Network (GAN) named:

The generator is a CNN that keeps keys producing images and is closer in appearance to the real images while the discriminator tries to determine the difference between real and fake images The ultimate aim is to make the discriminator learn to identify real and fake images.

This Neural Network has three layers in which the input neurons are equal to the output neurons. The network's target outside is the same as the input. It uses dimensionality reduction to restructure the input. It works by compressing the image input to a latent space representation then reconstructing the output from this representation.

Bagging and Boosting are ensemble techniques to train multiple models using the same learning algorithm and then taking a call.

With Bagging, we take a dataset and split it into training data and test data. Then we randomly select data to place into the bags and train the model separately.

With Boosting, the emphasis is on selecting data points which give wrong output to improve the accuracy.

Read more here:
Top Deep Learning Interview Questions and Answers for 2024 - Simplilearn

Google’s AI robots are learning from watching movies just like the rest of us – TechRadar

Google DeepMind's robotics team is teaching robots to learn how a human intern would: by watching a video. The team has published a new paper demonstrating how Google's RT-2 robots embedded with the Gemini 1.5 Pro generative AI model can absorb information from videos to learn how to get around and even carry out requests at their destination.

Thanks to the Gemini 1.5 Pro model's long context window, training a robot like a new intern is possible. This window allows the AI to process extensive amounts of information simultaneously. The researchers would film a video tour of a designated area, such as a home or office. Then, the robot would watch the video and learn about the environment.

The details in the video tours let the robot complete tasks based on its learned knowledge, using both verbal and image outputs. It's an impressive way of showing how robots might interact with their environment in ways reminiscent of human behavior. You can see how it works in the video below, as well as examples of different tasks the robot might carry out.

Those demonstrations aren't rare flukes, either. In practical tests, Gemini-powered robots operated within a 9,000-square-foot area and successfully followed over 50 different user instructions with a 90 percent success rate. This high level of accuracy opens up many potential real-world uses for AI-powered robots, helping out at home with chores or at work with menial or even more complex tasks.

That's because one of the more notable aspects of the Gemini 1.5 Pro model is its ability to complete multi-step tasks. DeepMind's research has found that the robots can work out how to answer questions like whether there's a specific drink available by navigating to a refrigerator, visually processing what's within, and then returning and answering the question.

The idea of planning and carrying out the entire sequence of actions demonstrates a level of understanding and execution that goes beyond the current standard of single-step orders for most robots.

Don't expect to see this robot for sale any time soon, though. For one thing, it takes up to 30 seconds to process each instruction, which is way slower than just doing something yourself in most cases. The chaos of real-world homes and offices will be much harder for a robot to navigate than a controlled environment, no matter how advanced the AI model is.

Sign up for breaking news, reviews, opinion, top tech deals, and more.

Still, integrating AI models like Gemini 1.5 Pro into robotics is part of a larger leap forward in the field. Robots equipped with models like Gemini or its rivals could transform healthcare, shipping, and even janitorial duties.

See the original post here:
Google's AI robots are learning from watching movies just like the rest of us - TechRadar

DeepMinds PEER scales language models with millions of tiny experts – VentureBeat

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Mixture-of-Experts (MoE) has become a popular technique for scaling large language models (LLMs) without exploding computational costs. Instead of using the entire model capacity for every input, MoE architectures route the data to small but specialized expert modules. MoE enables LLMs to increase their parameter while keeping inference costs low. MoE is used in several popular LLMs, including Mixtral, DBRX, Grok and reportedly GPT-4.

However, current MoE techniques have limitations that restrict them to a relatively small number of experts. In a new paper, Google DeepMind introduces Parameter Efficient Expert Retrieval (PEER), a novel architecture that can scale MoE models to millions of experts, further improving the performance-compute tradeoff of large language models.

The past few years have shown that scaling language models by increasing their parameter count leads to improved performance and new capabilities. However, there is a limit to how much you can scale a model before running into computational and memory bottlenecks.

Every transformer block used in LLMs has attention layers and feedforward (FFW) layers. The attention layer computes the relations between the sequence of tokens fed to the transformer block. The feedforward network is responsible for storing the models knowledge. FFW layers account for two-thirds of the models parameters and are one of the bottlenecks of scaling transformers. In the classic transformer architecture, all the parameters of the FFW are used in inference, which makes their computational footprint directly proportional to their size.

MoE tries to address this challenge by replacing the FFW with sparsely activated expert modules instead of a single dense FFW layer. Each of the experts contains a fraction of the parameters of the full dense layer and specializes in certain areas. The MoE has a router that assigns each input to several experts who are likely to provide the most accurate answer.

By increasing the number of experts, MoE can increase the capacity of the LLM without increasing the computational cost of running it.

According to recent studies, the optimal number of experts for an MoE model is related to several factors, including the number of training tokens and the compute budget. When these variables are balanced, MoEs have consistently outperformed dense models for the same amount of compute resources.

Furthermore, researchers have found that increasing the granularity of an MoE model, which refers to the number of experts, can lead to performance gains, especially when accompanied by an increase in model size and training data.

High-granularity MoE can also enable models to learn new knowledge more efficiently. Some studies suggest that by adding new experts and regularizing them properly, MoE models can adapt to continuous data streams, which can help language models deal with continuously changing data in their deployment environments.

Current approaches to MoE are limited and unscalable. For example, they usually have fixed routers that are designed for a specific number of experts and need to be readjusted when new experts are added.

DeepMinds Parameter Efficient Expert Retrieval (PEER) architecture addresses the challenges of scaling MoE to millions of experts. PEER replaces the fixed router with a learned index to efficiently route input data to a vast pool of experts. For each given input, PEER first uses a fast initial computation to create a shortlist of potential candidates before choosing and activating the top experts. This mechanism enables the MoE to handle a very large number of experts without slowing down.

Unlike previous MoE architectures, where experts were often as large as the FFW layers they replaced, PEER uses tiny experts with a single neuron in the hidden layer. This design enables the model to share hidden neurons among experts, improving knowledge transfer and parameter efficiency. To compensate for the small size of the experts, PEER uses a multi-head retrieval approach, similar to the multi-head attention mechanism used in transformer models.

A PEER layer can be added to an existing transformer model or used to replace an FFW layer. PEER is also related to parameter-efficient fine-tuning (PEFT) techniques. In PEFT techniques, parameter efficiency refers to the number of parameters that are modified to fine-tune a model for a new task. In PEER, parameter efficiency reduces the number of active parameters in the MoE layer, which directly affects computation and activation memory consumption during pre-training and inference.

According to the paper, PEER could potentially be adapted to select PEFT adapters at runtime, making it possible to dynamically add new knowledge and features to LLMs.

PEER might be used in DeepMinds Gemini 1.5 models, which according to the Google blog uses a new Mixture-of-Experts (MoE) architecture.

The researchers evaluated the performance of PEER on different benchmarks, comparing it against transformer models with dense feedforward layers and other MoE architectures. Their experiments show that PEER models achieve a better performance-compute tradeoff, reaching lower perplexity scores with the same computational budget as their counterparts.

The researchers also found that increasing the number of experts in a PEER model leads to further perplexity reduction.

This design demonstrates a superior compute-performance trade-off in our experiments, positioning it as a competitive alternative to dense FFW layers for scaling foundation models, the researchers write.

The findings are interesting because they challenge the long-held belief that MoE models reach peak efficiency with a limited number of experts. PEER shows that by applying the right retrieval and routing mechanisms, it is possible to scale MoE to millions of experts. This approach can help further reduce the cost and complexity of training and serving very large language models.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat's Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

See the rest here:
DeepMinds PEER scales language models with millions of tiny experts - VentureBeat

Google DeepMind Is Integrating Gemini 1.5 Pro in Robots That Can Navigate Real-World Environments – Gadgets 360

Google DeepMind shared new advancements made in the field of robotics and vision language models (VLMs) on Thursday. The artificial intelligence (AI) research division of the tech giant has been working with advanced vision models to develop new capabilities in robots. In a new study, DeepMind highlighted that using Gemini 1.5 Pro and its long context window has now enabled the division to make breakthroughs in navigation and real-world understanding of its robots. Earlier this year, Nvidia also unveiled new AI technology that powers advanced capabilities in humanoid robots.

In a post on X (formerly known as Twitter), Google DeepMind revealed that it has been training its robots using Gemini 1.5 Pro's 2 million token context window. Context windows can be understood as the window of knowledge visible to an AI model, using which it processes tangential information around the queried topic.

For instance, if a user asks an AI model about most popular ice cream flavours, the AI model will check the keyword ice cream and flavours to find information to that question. If this information window is too small, then the AI will only be able to respond with the names of different ice cream flavours. However, if it is larger, the AI will also be able to see the number of articles about each ice cream flavour to find which has been mentioned the most and deduce the popularity factor.

DeepMind is taking advantage of this long context window to train its robots in real-world environments. The division aims to see if the robot can remember the details of an environment and assist users when asked about the environment with contextual or vague terms. In a video shared on Instagram, the AI division showcased that a robot was able to guide a user to a whiteboard when he asked it for a place where he could draw.

Powered with 1.5 Pro's 1 million token context length, our robots can use human instructions, video tours, and common sense reasoning to successfully find their way around a space, Google DeepMind stated in a post.

In a study published on arXiv (a non-peer-reviewed online journal), DeepMind explained the technology behind the breakthrough. In addition to Gemini, it is also using its own Robotic Transformer 2 (RT-2) model. It is a vision-language-action (VLA) model that learns from both web and robotics data. It utilises computer vision to process real-world environments and use that information to create datasets. This dataset can later be processed by the generative AI to break down contextual commands and produce desired outcomes.

At present, Google DeepMind is using this architecture to train its robots on a broad category known as Multimodal Instruction Navigation (MIN) which includes environment exploration and instruction-guided navigation. If the demonstration shared by the division is legitimate, this technology might further advance robotics.

Go here to read the rest:
Google DeepMind Is Integrating Gemini 1.5 Pro in Robots That Can Navigate Real-World Environments - Gadgets 360

Google DeepMind Unveils JEST, a New AI Training Method That Slashes Energy Use – Maginative

It is simply not sustainable to keep training more advanced AI models using current energy technology. We need models to be trained faster, cheaper, and in more environmentally friendly ways. Google DeepMind has now shared new research on JEST (Joint Example Selection Training), a way of training AI models that is 13 times faster and 10 times more power-efficient than current techniques.

As the AI industry grows, so are concerns about the environmental impact of data centers required to train these sophisticated models. The JEST method arrives just in time, addressing the escalating energy demands of AI training processes. By significantly reducing the computational overhead, JEST could help mitigate the carbon footprint associated with AI advancements.

Traditional AI training methods typically focus on individual data points, which can be time-consuming and computationally expensive. JEST innovates by shifting the focus to entire batches of data. Heres a simplified breakdown of the JEST process:

By utilizing a smaller model to filter and select high-quality data, the larger model can be trained more effectively, leading to significant performance improvements.

JESTs efficiency stems from its ability to evaluate batches of data rather than individual examples. This method leverages multimodal contrastive learning, which looks at how different types of data (like text and images) interact with each other. By scoring entire batches and selecting the most learnable subsets, JEST accelerates the training process.

The method can be broken down into two main components:

DeepMinds experiments with JEST have shown remarkable results. The method achieves state-of-the-art performance with significantly fewer training iterations and lower computational costs. For instance, JEST matches the performance of existing models with up to 13 times fewer training iterations and ten times less energy consumption.

These improvements are not just incrementalthey represent a substantial leap forward in making AI training more sustainable and scalable. By reducing the energy required for training, JEST not only cuts costs but also helps address the pressing issue of AIs environmental impact. According to an analysis by the Electric Power Research Institute,data centers could consume between 4.6% and 9.1% of US electricity by 2030.

However, the researchers note some limitations of their approach. For example, JEST still relies on having access to smaller, well-curated datasets to guide the selection process. Developing methods to automatically infer optimal reference distributions remains an open challenge.

Nevertheless, the dramatic efficiency improvements demonstrated by JEST point to significant headroom for optimizing AI training. As models grow ever larger and more energy-intensive, such innovations will likely prove crucial for sustainable scaling of artificial intelligence capabilities.

Chris McKay is the founder and chief editor of Maginative. His thought leadership in AI literacy and strategic AI adoption has been recognized by top academic institutions, media, and global brands.

The rest is here:
Google DeepMind Unveils JEST, a New AI Training Method That Slashes Energy Use - Maginative

Google DeepMind’s AI Rat Brains Could Make Robots Scurry Like the Real Thing – Singularity Hub

Rats are incredibly nimble creatures. They can climb up curtains, jump down tall ledges, and scurry across complex terrainsay, your basement stacked with odd-shaped stuffat mind-blowing speed.

Robots, in contrast, are anything but nimble. Despite recent advances in AI to guide their movements, robots remain stiff and clumsy, especially when navigating new environments.

To make robots more agile, why not control them with algorithms distilled from biological brains? Our movements are rooted in the physical world and based on experiencetwo components that let us easily explore different surroundings.

Theres one major obstacle. Despite decades of research, neuroscientists havent yet pinpointed how brain circuits control and coordinate movement. Most studies have correlated neural activity with measurable motor responsessay, a twitch of a hand or the speed of lifting a leg. In other words, we know brain activation patterns that can describe a movement. But which neural circuits cause those movements in the first place?

We may find the answer by trying to recreate them in digital form. As the famous physicist Richard Feynman once said, What I cannot create, I do not understand.

This month, Google DeepMind and Harvard University built a realistic virtual rat to home in on the neural circuits that control complex movement. The rats digital brain, composed of artificial neural networks, was trained on tens of hours of neural recordings from actual rats running around in an open arena.

Comparing activation patterns of the artificial brain to signals from living, breathing animals, the team found the digital brain could predict the neural activation patterns of real rats and produce the same behaviorfor example, running or rearing up on hind legs.

The collaboration was fantastic, said study author Dr. Bence Olveczky at Harvard in a press release. DeepMind had developed a pipeline to train biomechanical agents to move around complex environments. We simply didnt have the resources to run simulations like those, to train these networks.

The virtual rats brain recapitulated two regions especially important for movement. Tweaking connections in those areas changed motor responses across a variety of behaviors, suggesting these neural signals are involved in walking, running, climbing, and other movements.

Virtual animals trained to behave like their real counterparts could provide a platform for virtual neurosciencethat would otherwise be difficult or impossible to experimentally deduce, the team wrote in their article.

Artificial intelligence lives in the digital world. To power robots, it needs to understand the physical world.

One way to teach it about the world is to record neural signals from rodents and use the recordings to engineer algorithms that can control biomechanically realistic models replicating natural behaviors. The goal is to distill the brains computations into algorithms that can pilot robots and also give neuroscientists a deeper understanding of the brains workings.

So far, the strategy has been successfully used to decipher the brains computations for vision, smell, navigation, and recognizing faces, the authors explained in their paper. However, modeling movement has been a challenge. Individuals move differently, and noise from brain recordings can easily mess up the resulting AIs precision.

This study tackled the challenges head on with a cornucopia of data.

The team first placed multiple rats into a six-camera arena to capture their movementrunning around, rearing up, or spinning in circles. Rats can be lazy bums. To encourage them to move, the team dangled Cheerios across the arena.

As the rats explored the arena, the team recorded 607 hours of video and also neural activity with a 128-channel array of electrodes implanted in their brains.

They used this data to train an artificial neural networka virtual rats brainto control body movement. To do this, they first tracked how 23 joints moved in the videos and transferred them to a simulation of the rats skeletal movements. Our joints only bend in certain ways, and this step filters out whats physically impossible (say, bending legs in the opposite direction).

The core of the virtual rats brain is a type of AI algorithm called an inverse dynamics model. Basically, it knows where body positions are in space at any given time and, from there, predicts the next movements leading to a goalsay, grab that coffee cup without dropping it.

Through trial-and-error, the AI eventually came close to matching the movements of its biological counterparts. Surprisingly, the virtual rat could also easily generalize motor skills to unfamiliar places and scenariosin part by learning the forces needed to navigate the new environments.

The similarities allowed the team to compare real rats to their digital doppelgangers, when performing the same behavior.

In one test, the team analyzed activity in two brain regions known to guide motor skills. Compared to an older computational model used to decode brain networks, the AI could better simulate neural signals in the virtual rat across multiple physical tasks.

Because of this, the virtual rat offers a way to study movement digitally.

One long-standing question, for example, is how the brain and nerves command muscle movement depending on the task. Grabbing a cup of coffee in the morning, for example, requires a steady hand without any jerking action but enough strength to hold it steady.

The team tweaked the neural connections in the virtual rodent to see how changes in brain networks alter the final behaviorgetting that cup of coffee. They found one network measure that could identify a behavior at any given time and guide it through.

Compared to lab studies, these insights can only be directly accessed through simulation, wrote the team.

The virtual rat bridges AI and neuroscience. The AI models here recreate the physicality and neural signals of living creatures, making them invaluable for probing brain functions. In this study, one aspect of the virtual rats motor skills relied on two brain regionspinpointing them as potential regions key to guiding complex, adaptable movement.

A similar strategy could provide more insight into the computations underlying vision, sensation, or perhaps even higher cognitive functions such as reasoning. But the virtual rat brain isnt a complete replication of a real one. It only captures snapshots of part of the brain. But it does let neuroscientists zoom in on their favorite brain region and test hypotheses quickly and easily compared to traditional lab experiments, which often take weeks to months.

On the robotics side, the method adds a physicality to AI.

Weve learned a huge amount from the challenge of building embodied agents: AI systems that not only have to think intelligently, but also have to translate that thinking into physical action in a complex environment, said study author Dr. Matthew Botvinick at DeepMind in a press release. It seemed plausible that taking this same approach in a neuroscience context might be useful for providing insights in both behavior and brain function.

The team is next planning to test the virtual rat with more complex tasks, alongside its biological counterparts, to further peek inside the inner workings of the digital brain.

From our experiments, we have a lot of ideas about how such tasks are solved, said lveczky to The Harvard Gazette. We want to start using the virtual rats to test these ideas and help advance our understanding of how real brains generate complex behavior.

Image Credit: Google DeepMind

Continue reading here:
Google DeepMind's AI Rat Brains Could Make Robots Scurry Like the Real Thing - Singularity Hub

Google claims new AI training tech is 13 times faster and 10 times more power efficient DeepMind’s new JEST … – Tom’s Hardware

Google DeepMind, Google's AI research lab, has published new research on training AI models that claims to greatly accelerate both training speed and energy efficiency by an order of magnitude, yielding 13 times more performance and ten times higher power efficiency than other methods. The new JEST training method comes in a timely fashion as conversations about the environmental impact of AI data centers are heating up.

DeepMind's method, dubbed JEST or joint example selection, breaks apart from traditional AI model training techniques in a simple fashion. Typical training methods focus on individual data points for training and learning, while JEST trains based on entire batches. The JEST method first creates a smaller AI model that will grade data quality from extremely high-quality sources, ranking the batches by quality. Then it compares that grading to a larger, lower-quality set. The small JEST model determines the batches most fit for training, and a large model is then trained from the findings of the smaller model.

The paper itself, available here, provides a more thorough explanation of the processes used in the study and the future of the research.

DeepMind researchers make it clear in their paper that this "ability to steer the data selection process towards the distribution of smaller, well-curated datasets" is essential to the success of the JEST method. Success is the correct word for this research; DeepMind claims that "our approach surpasses state-of-the-art models with up to 13 fewer iterations and 10 less computation."

Of course, this system relies entirely on the quality of its training data, as the bootstrapping technique falls apart without a human-curated data set of the highest possible quality. Nowhere is the mantra "garbage in, garbage out" truer than this method, which attempts to "skip ahead" in its training process. This makes the JEST method much more difficult for hobbyists or amateur AI developers to match than most others, as expert-level research skills are likely required to curate the initial highest-grade training data.

The JEST research comes not a moment too soon, as the tech industry and world governments are beginning discussions on artificial intelligence's extreme power demands. AI workloads took up about 4.3 GW in 2023, almost matching the annual power consumption of the nation of Cyprus. And things are definitely not slowing down: a single ChatGPT request costs 10x more than a Google search in power, and Arm's CEO estimates that AI will take up a quarter of the United States' power grid by 2030.

If and how JEST methods are adopted by major players in the AI space remains to be seen. GPT-4o reportedly cost $100 million to train, and future larger models may soon hit the billion-dollar mark, so firms are likely hunting for ways to save their wallets in this department. Hopefuls think that JEST methods will be used to keep current training productivity rates at much lower power draws, easing the costs of AI and helping the planet. However, much more likely is that the machine of capital will keep the pedal to the metal, using JEST methods to keep power draw at maximum for hyper-fast training output. Cost savings versus output scale, who will win?

Get Tom's Hardware's best news and in-depth reviews, straight to your inbox.

Read the original here:
Google claims new AI training tech is 13 times faster and 10 times more power efficient DeepMind's new JEST ... - Tom's Hardware

Vogue Business AI Luxury Summit with Google examines uses and opportunities – Vogue Business

Sign up to receive the Vogue Business newsletterfor the latest luxury news and insights on AI in luxury, plus exclusive membership discounts.

Vogue Business and Google invited senior delegates from the Parisian fashion and luxury industries to discuss the challenges and opportunities of artificial intelligence. Hosted at Google Frances headquarters, the half-day summit attracted executives from Louis Vuitton, Chanel, Balenciaga, Hugo Boss and Isabel Marant, among others.

Axel De Goursac, director of LVMH Groups AI Factory, and Jolle Barral, senior director of research and engineering at Google DeepMind, each spoke to Vogue Business senior innovation editor Maghan McDowell about their work and their goals for AI in fashion. Additionally, Vogue Business Paris correspondent Laure Guilbault spoke one-on-one with the founders of four curated startups who use AI to provide new solutions, and Vogue Business head of advisory Anusha Couttigane shared highlights from a Vogue Business and Google study on the opportunity for AI in luxury.

Photo: Cesare Piaser

Key themes of the day included the need to prioritise human intuition and creativity; the need for global consensus on AI regulation, use and standards; and AI for good, in terms of sustainability and reducing waste, more specifically.

Read the original here:
Vogue Business AI Luxury Summit with Google examines uses and opportunities - Vogue Business

With AI Tools, Scientists Can Crack the Code of Life – WIRED

In 2021, AI research lab DeepMind announced the development of its first digital biology neural network, AlphaFold. The model was capable of accurately predicting the 3D structure of proteins, which determines the functions that these molecules play. Were just floating bags of water moving around, says Pushmeet Kohli, VP of research at DeepMind. What makes us special are proteins, the building blocks of life. How they interact with each other is what makes the magic of life happen.

AlphaFold was considered by the journal Science as the breakthrough of the year in 2021. In 2022, it was the most cited research paper in AI. People have been on [protein structures] for many decades and were not able to make that much progress, Kohli says. Then came AI. DeepMind also released the AlphaFold Protein Structure Databasewhich contained the protein structures of almost every organism whose genome has been sequencedmaking it freely available to scientists worldwide.

More than 1.7 million researchers in 190 countries have used it for research ranging from the design of plastic-eating enzymes to the development of more effective malaria vaccines. A quarter of the research involving AlphaFold was dedicated to the understanding of cancer, Covid-19, and neurodegenerative diseases like Parkinsons and Alzheimers. Last year, DeepMind released its next generation of AlphaFold, which extended its structure prediction algorithm to biomolecules like nucleic acids and ligands.

It has democratized scientific research, Kohli says. Scientists working in a developing country on a neglected tropical disease did not have access to the funds to get the structure of a protein computed. Now, at the click of a button, they can go to the AlphaFold database and get these predictions for free. For instance, one of DeepMinds early partners, the Drugs for Neglected Diseases Initiative, used AlphaFold to develop medicine for diseases that affect millionssuch as sleeping sickness, Chagas disease, and leishmaniasisyet receive comparatively little research.

DeepMinds latest breakthrough is called AlphaMissense. The model categorizes the so-called missense mutationsgenetic alterations that can result in different amino acids being produced at particular positions in proteins. Such mutations can alter the function of the protein itself, and AlphaMissense attributes a likelihood score for that mutation being either pathogenic or benign. Understanding and predicting those effects is crucial for the discovery of rare genetic diseases, Kohli says. The algorithm, which was released last year, has classified around 89 percent of all possible human missense. Before, only 0.1 percent of all possible variants had been clinically classified by researchers.

This is just the beginning, Kohli says. Ultimately, he believes AI could eventually lead to the creation of a virtual cell that could radically accelerate biomedical research, enabling biology to be explored in-silico rather than in real-world laboratories. With AI and machine learning we finally have the tools to comprehend this very sophisticated system that we call life.

This article appears in the July/August 2024 issue of WIRED UK magazine.

See the original post here:
With AI Tools, Scientists Can Crack the Code of Life - WIRED